RandomSampler and TotalOrderPartitioner

2014-08-28 Thread Keren Ouaknine
Hello, I am running a global sort (on Pigmix input data, size 600GB) based on TotalOrderPartitioner. The best practice according to the literature points to data sampling using RandomSampler. The query succeeds but takes a very long time (7 hours) and that's because there is only one reducer

Re: Local file system to access hdfs blocks

2014-08-28 Thread Stanley Shi
As far as I know, there's no combination of hadoop API can do that. You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file. On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni nid...@gmail.com wrote: Yehia, No problem at all. I

Re: What happens when .....?

2014-08-28 Thread Stanley Shi
Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way is to create a cron job to scan the folder size and submit a MR job if necessary;

Re: Appending to HDFS file

2014-08-28 Thread Stanley Shi
You should not use this method: FSDataOutputStream fp = fs.create(pt, true) Here's the java doc for this create method: /** * Create an FSDataOutputStream at the indicated Path. * @param f the file to create * @*param** overwrite if a file with this name already exists, then if

libhdfs result in JVM crash issue, please help me

2014-08-28 Thread Vincent,Wei
All I am using libhdfs, I need some usage like following ,and when the JNI call return, it had result in some Crash in JVM, Attachment is the detail information. JAVA Call JNI Call C LIB Call Libhdfs Crash info # # A fatal error has

Re: Hadoop 2.5.0 - HDFS browser-based file view

2014-08-28 Thread Stanley Shi
Normally files in HDFS are intended to be quite big, it's not very easy to be shown in the browser. On Fri, Aug 22, 2014 at 10:56 PM, Brian C. Huffman bhuff...@etinternational.com wrote: All, I noticed that that on Hadoop 2.5.0, when browsing the HDFS filesystem on port 50070, you can't

RE: Appending to HDFS file

2014-08-28 Thread Liu, Yi A
Right, please use FileSystem#append From: Stanley Shi [mailto:s...@pivotal.io] Sent: Thursday, August 28, 2014 2:18 PM To: user@hadoop.apache.org Subject: Re: Appending to HDFS file You should not use this method: FSDataOutputStream fp = fs.create(pt, true) Here's the java doc for this create

replication factor in hdfs-site.xml

2014-08-28 Thread Satyam Singh
Hi Users, I want behaviour for hadoop cluster for writing/reading in following replication cases: 1. replication=2 and in cluster (3 datatnodes+namenode) one datanode gets down. 2. replication=2 and in cluster (3 datatnodes+namenode) 2 datanode gets down. BR, Satyam

AW: Running job issues

2014-08-28 Thread Blanca Hernandez
Thanks, it fixed my problem! Von: Arpit Agarwal [mailto:aagar...@hortonworks.com] Gesendet: Donnerstag, 28. August 2014 01:41 An: user@hadoop.apache.org Betreff: Re: Running job issues Susheel is right. I've fixed the typo on the wiki page. On Wed, Aug 27, 2014 at 12:28 AM, Susheel Kumar

RE: replication factor in hdfs-site.xml

2014-08-28 Thread Liu, Yi A
For 1#, since you still have 2 datanodes alive, and the replication is 2, writing will success. (Read will success) For 2#, now you only have 1 datanode, and the replication is 2, then initial writing will success, but later sometime pipeline recovery will fail. Regards, Yi Liu

RE: Appending to HDFS file

2014-08-28 Thread rab ra
Thank you all, It works now Regards rab On 28 Aug 2014 12:06, Liu, Yi A yi.a@intel.com wrote: Right, please use FileSystem#append *From:* Stanley Shi [mailto:s...@pivotal.io] *Sent:* Thursday, August 28, 2014 2:18 PM *To:* user@hadoop.apache.org *Subject:* Re: Appending to HDFS

Re: libhdfs result in JVM crash issue, please help me

2014-08-28 Thread Vincent,Wei
#0 0x7f1e3872c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x7f1e3872c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f1e3872fb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7f1e380a4405 in os::abort(bool) () from

Error could only be replicated to 0 nodes instead of minReplication (=1)

2014-08-28 Thread Jakub Stransky
Hello, we are using Hadoop 2.2.0 (HDP 2.0), avro 1.7.4. running on CentOS 6.3 I am facing a following issue when using a AvroMultipleOutputs with dynamic output files. My M/R job works fine for a smaller amount of data or at least the error hasn't appear there so far. With bigger amount of data I

Hadoop on Windows 8 with Java 8

2014-08-28 Thread Ruebenacker, Oliver A
Hello, I can't find any information on how possible or difficult it is to install Hadoop as a single node on Windows 8 running Oracle Java 8. The tutorial on Hadoop 2 on Windowshttp://wiki.apache.org/hadoop/Hadoop2OnWindows mentions neither Windows 8 nor Java 8. Is there anything known

RE: Hadoop on Windows 8 with Java 8

2014-08-28 Thread Liu, Yi A
Currently Hadoop doesn't officially support JAVA8 Regards, Yi Liu From: Ruebenacker, Oliver A [mailto:oliver.ruebenac...@altisource.com] Sent: Thursday, August 28, 2014 8:46 PM To: user@hadoop.apache.org Subject: Hadoop on Windows 8 with Java 8 Hello, I can't find any information on

Re: What happens when .....?

2014-08-28 Thread Eric Payne
Or, maybe have a look at Apache Falcon: Falcon - Apache Falcon - Data management and processing platform Falcon - Apache Falcon - Data management and processing platform Apache Falcon - Data management and processing platform View on falcon.incubator.apache.org Preview by Yahoo

Re: What happens when .....?

2014-08-28 Thread Mahesh Khandewal
unsubscribe On Thu, Aug 28, 2014 at 6:42 PM, Eric Payne eric.payne1...@yahoo.com wrote: Or, maybe have a look at Apache Falcon: Falcon - Apache Falcon - Data management and processing platform http://falcon.incubator.apache.org/ Falcon - Apache Falcon - Data management and processing

hadoop installation error: localhost: ssh: connect to host localhost port 22: connection refused

2014-08-28 Thread Li Chen
Can anyone please help me with this installation error? After I type start-yarn.sh : starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-xx.out localhost: ssh: connect to host localhost port 22: connection refused when I ran jps to check, only Jps and

Re: hadoop installation error: localhost: ssh: connect to host localhost port 22: connection refused

2014-08-28 Thread Ritesh Kumar Singh
try 'ssh localhost' and show the output On Thu, Aug 28, 2014 at 7:55 PM, Li Chen ahli1...@gmail.com wrote: Can anyone please help me with this installation error? After I type start-yarn.sh : starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-xx.out

Re: Need some tutorials for Mapreduce written in Python

2014-08-28 Thread Amar Singh
Thank you to everyone who responded to this thread. I got couple of good moves and got some good online courses to explore from to get some fundamental understanding of the things. Thanks Amar On Thu, Aug 28, 2014 at 10:15 AM, Sriram Balachander sriram.balachan...@gmail.com wrote: Hadoop The

ApplicationMaster link on cluster web page does not work

2014-08-28 Thread Margusja
Hi configuration !-- Site specific YARN configuration properties -- property nameyarn.app.mapreduce.am.staging-dir/name value/user/value /property property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property

Re: ApplicationMaster link on cluster web page does not work

2014-08-28 Thread Margusja
More information after I started resourcemanager [root@vm38 ~]# /etc/init.d/hadoop-yarn-resourcemanager start Starting Hadoop resourcemanager: [ OK ] and I open cluster web interface there is some tcp connections to 8088: [root@vm38 ~]# netstat -np | grep 8088 tcp

Re: ApplicationMaster link on cluster web page does not work

2014-08-28 Thread Margusja
Moved resourcemanager to another server and it works. I guess I have some network miss routing there :) Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) On

Job is reported as complete on history server while on console it shows as only half way thru

2014-08-28 Thread S.L
Hi All, I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the problem is when I submit this job YARN created multiple applications for that submitted job , and the last application that is running in YARN is marked as complete even as on console its reported as only 58% complete . I have

org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not found” error: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully

Re: Local file system to access hdfs blocks

2014-08-28 Thread Demai Ni
Stanley and all, thanks. I will write a client application to explore this path. A quick question again. Using the fsck command, I can retrieve all the necessary info $ hadoop fsck /tmp/list2.txt -files -blocks -racks . *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread Tsuyoshi OZAWA
Hi, It looks a problem of class path at spark side. Thanks, - Tsuyoshi On Fri, Aug 29, 2014 at 8:49 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not found” error: hadoop checknative 14/08/29 02:54:51

How to retrieve cached data in datanode? (Centralized cache management)

2014-08-28 Thread 남윤민
Hello, I have a question about using the cached data in memory via centralized cache management. I cached the data what I want to use through the CLI (hdfs cacheadmin -addDirectives ...).Then, when I write my mapreduce application, how can I read the cached data in memory? Here is the source