RandomSampler and TotalOrderPartitioner
Hello, I am running a global sort (on Pigmix input data, size 600GB) based on TotalOrderPartitioner. The best practice according to the literature points to data sampling using RandomSampler. The query succeeds but takes a very long time (7 hours) and that's because there is only one reducer (which nullifies the point of using the above classes :) ). I am trying to figure out what forces the # of reducers to be *one*, as I defined them to be* 400*. I looked into the documentation and in the code of RandomSampler, there is a requirement which says: // Set the path to the SequenceFile storing the sorted partition keyset. It must be the case that for R reduces, there are R-1 keys in the SequenceFile. And therefore I sampled as follows: *InputSampler.SamplerText, Text sampler =new InputSampler.RandomSamplerText, Text(0.9, 399, 444);* Looking into my _partition file I can see there is only one partition which explains the one reducer: SEQ org.apache.hadoop.io.Text!org.apache.hadoop.io.NullWritable I am wondering how come the partition file contains only one sample, though I asked for 399 samples above? Thanks for the help!! Keren -- Keren Ouaknine www.kereno.com
Re: Local file system to access hdfs blocks
As far as I know, there's no combination of hadoop API can do that. You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file. On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni nid...@gmail.com wrote: Yehia, No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is /dfs/dn I would like to it programmatically, so wondering whether someone already done it? or maybe better a hadoop API call already implemented for this exact purpose Demai On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater y.z.elsha...@gmail.com wrote: Hi Demai, Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys). Thanks Yehia On 27 August 2014 21:20, Yehia Elshater y.z.elsha...@gmail.com wrote: Hi Demai, You can use fsck utility like the following: hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks This will display all the information you need about the blocks of your file. Hope it helps. Yehia On 27 August 2014 20:18, Demai Ni nid...@gmail.com wrote: Hi, Stanley, Many thanks. Your method works. For now, I can have two steps approach: 1) getFileBlockLocations to grab hdfs BlockLocation[] 2) use local file system call(like find command) to match the block to files on local file system . Maybe there is an existing Hadoop API to return such info in already? Demai on the run On Aug 26, 2014, at 9:14 PM, Stanley Shi s...@pivotal.io wrote: I am not sure this is what you want but you can try this shell command: find [DATANODE_DIR] -name [blockname] On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni nid...@gmail.com wrote: Hi, folks, New in this area. Hopefully to get a couple pointers. I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) I am wondering whether there is a interface to get each hdfs block information in the term of local file system. For example, I can use Hadoop fsck /tmp/test.txt -files -blocks -racks to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...] With such info, is there a way to 1) login to hfds01, and read the block directly at local file system level? Thanks Demai on the run -- Regards, *Stanley Shi,* -- Regards, *Stanley Shi,*
Re: What happens when .....?
Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way is to create a cron job to scan the folder size and submit a MR job if necessary; On Wed, Aug 27, 2014 at 7:38 PM, Kandoi, Nikhil nikhil.kan...@emc.com wrote: Hi All, I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point. My Question is that when I submit a map reduce job, would it only work on the files present at that point ?? Regards, Nikhil Kandoi -- Regards, *Stanley Shi,*
Re: Appending to HDFS file
You should not use this method: FSDataOutputStream fp = fs.create(pt, true) Here's the java doc for this create method: /** * Create an FSDataOutputStream at the indicated Path. * @param f the file to create * @*param** overwrite if a file with this name already exists, then if true,* * the file will be overwritten, and if false an exception will be thrown. */ public FSDataOutputStream create(Path f, boolean overwrite) throws IOException { return create(f, overwrite, getConf().getInt(io.file.buffer.size, 4096), getDefaultReplication(f), getDefaultBlockSize(f)); } On Wed, Aug 27, 2014 at 2:12 PM, rab ra rab...@gmail.com wrote: hello Here is d code snippet, I use to append def outFile = ${outputFile}.txt Path pt = new Path(${hdfsName}/${dir}/${outFile}) def fs = org.apache.hadoop.fs.FileSystem.get(configuration); FSDataOutputStream fp = fs.create(pt, true) fp ${key} ${value}\n On 27 Aug 2014 09:46, Stanley Shi s...@pivotal.io wrote: would you please past the code in the loop? On Sat, Aug 23, 2014 at 2:47 PM, rab ra rab...@gmail.com wrote: Hi By default, it is true in hadoop 2.4.1. Nevertheless, I have set it to true explicitly in hdfs-site.xml. Still, I am not able to achieve append. Regards On 23 Aug 2014 11:20, Jagat Singh jagatsi...@gmail.com wrote: What is value of dfs.support.append in hdfs-site.cml https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml On Sat, Aug 23, 2014 at 1:41 AM, rab ra rab...@gmail.com wrote: Hello, I am currently using Hadoop 2.4.1.I am running a MR job using hadoop streaming utility. The executable needs to write large amount of information in a file. However, this write is not done in single attempt. The file needs to be appended with streams of information generated. In the code, inside a loop, I open a file in hdfs, appends some information. This is not working and I see only the last write. How do I accomplish append operation in hadoop? Can anyone share a pointer to me? regards Bala -- Regards, *Stanley Shi,* -- Regards, *Stanley Shi,*
libhdfs result in JVM crash issue, please help me
All I am using libhdfs, I need some usage like following ,and when the JNI call return, it had result in some Crash in JVM, Attachment is the detail information. JAVA Call JNI Call C LIB Call Libhdfs Crash info # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f3271ad3159, pid=9880, tid=139854651725568 # # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x7d6159] Monitor::ILock(Thread*)+0x79 # # Core dump written. Default location: /home/haduser/core or core.9880 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --- T H R E A D --- Current thread is native thread siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0168 Registers: RAX=0x7f326c1eb201, RBX=0x7f326c011b70, RCX=0x7f326c1eb201, RDX=0x RSP=0x7f3272d60e00, RBP=0x7f3272d60e20, RSI=0x, RDI=0x7f326c011b70 R8 =0x, R9 =0x0001, R10=0x, R11=0x0202 R12=0x, R13=0x7f3272134a60, R14=0x, R15=0x RIP=0x7f3271ad3159, EFLAGS=0x00010202, CSGSFS=0x0033, ERR=0x0004 TRAPNO=0x000e Top of Stack: (sp=0x7f3272d60e00) 0x7f3272d60e00: 7f326c011b70 0x7f3272d60e10: 0004 0x7f3272d60e20: 7f3272d60e40 7f3271ad34cf 0x7f3272d60e30: 7f326c0159e8 7f3272b562c0 0x7f3272d60e40: 7f3272d60e50 7f3271c98409 0x7f3272d60e50: 7f3272d60e70 7f327192c67d 0x7f3272d60e60: 7f326c0159e8 0x7f3272d60e70: 7f326c27f3e0 7f326410ea14 0x7f3272d60e80: 000500010002 7f3272105b48 0x7f3272d60e90: 0030 0x7f3272d60ea0: 7f3272d61a10 7f3272945c83 0x7f3272d60eb0: 7f3272d61700 0x7f3272d60ec0: 7fff7a8c5670 0x7f3272d60ed0: 7f3272d619c0 0x7f3272d60ee0: 0003 7f3272945ea8 0x7f3272d60ef0: 7f3272d61700 0x7f3272d60f00: 0x7f3272d60f10: 0x7f3272d60f20: 0x7f3272d60f30: 0x7f3272d60f40: 0x7f3272d60f50: 0x7f3272d60f60: 0x7f3272d60f70: 0x7f3272d60f80: 6a29d305af3e5c9c 0x7f3272d60f90: 7fff7a8c5670 7f3272d619c0 0x7f3272d60fa0: 0003 0x7f3272d60fb0: 944d36a9b2de5c9c 944d362d13fe5c9c 0x7f3272d60fc0: 0x7f3272d60fd0: 0x7f3272d60fe0: 0x7f3272d60ff0: 7f32722573fd Instructions: (pc=0x7f3271ad3159) 0x7f3271ad3139: 9f c6 40 80 fe 00 74 01 f0 48 0f b1 13 48 39 c1 0x7f3271ad3149: 74 d1 48 89 c1 f6 c1 01 74 d5 4c 89 f6 48 89 df 0x7f3271ad3159: 4d 8b a6 68 01 00 00 e8 5b fc ff ff 85 c0 75 b3 0x7f3271ad3169: 41 c7 44 24 20 00 00 00 00 41 83 7d 00 01 7e 05 Register to memory mapping: RAX=0x7f326c1eb201 is an unknown value RBX=0x7f326c011b70 is an unknown value RCX=0x7f326c1eb201 is an unknown value RDX=0x is an unknown value RSP=0x7f3272d60e00 is an unknown value RBP=0x7f3272d60e20 is an unknown value RSI=0x is an unknown value RDI=0x7f326c011b70 is an unknown value R8 =0x is an unknown value R9 =0x0001 is an unknown value R10=0x is an unknown value R11=0x0202 is an unknown value R12=0x is an unknown value R13=0x7f3272134a60: offset 0xe37a60 in /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so at 0x7f32712fd000 R14=0x is an unknown value R15=0x is an unknown value Stack: [0x7f3272c61000,0x7f3272d62000], sp=0x7f3272d60e00, free space=1023k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x7d6159] Monitor::ILock(Thread*)+0x79 V [libjvm.so+0x7d64cf] Monitor::lock_without_safepoint_check()+0x2f V [libjvm.so+0x99b409] VM_Exit::wait_if_vm_exited()+0x39 V [libjvm.so+0x62f67d] jni_DetachCurrentThread+0x3d -- BR, Vincent.Wei hs_err_pid9880.log Description: Binary data
Re: Hadoop 2.5.0 - HDFS browser-based file view
Normally files in HDFS are intended to be quite big, it's not very easy to be shown in the browser. On Fri, Aug 22, 2014 at 10:56 PM, Brian C. Huffman bhuff...@etinternational.com wrote: All, I noticed that that on Hadoop 2.5.0, when browsing the HDFS filesystem on port 50070, you can't view a file in the browser. Clicking a file gives a little popup with metadata and a download link. Can HDFS be configured to show plaintext file contents in the browser? Thanks, Brian -- Regards, *Stanley Shi,*
RE: Appending to HDFS file
Right, please use FileSystem#append From: Stanley Shi [mailto:s...@pivotal.io] Sent: Thursday, August 28, 2014 2:18 PM To: user@hadoop.apache.org Subject: Re: Appending to HDFS file You should not use this method: FSDataOutputStream fp = fs.create(pt, true) Here's the java doc for this create method: /** * Create an FSDataOutputStream at the indicated Path. * @param f the file to create * @param overwrite if a file with this name already exists, then if true, * the file will be overwritten, and if false an exception will be thrown. */ public FSDataOutputStream create(Path f, boolean overwrite) throws IOException { return create(f, overwrite, getConf().getInt(io.file.buffer.size, 4096), getDefaultReplication(f), getDefaultBlockSize(f)); } On Wed, Aug 27, 2014 at 2:12 PM, rab ra rab...@gmail.commailto:rab...@gmail.com wrote: hello Here is d code snippet, I use to append def outFile = ${outputFile}.txt Path pt = new Path(${hdfsName}/${dir}/${outFile}) def fs = org.apache.hadoop.fs.FileSystem.get(configuration); FSDataOutputStream fp = fs.create(pt, true) fp ${key} ${value}\n On 27 Aug 2014 09:46, Stanley Shi s...@pivotal.iomailto:s...@pivotal.io wrote: would you please past the code in the loop? On Sat, Aug 23, 2014 at 2:47 PM, rab ra rab...@gmail.commailto:rab...@gmail.com wrote: Hi By default, it is true in hadoop 2.4.1. Nevertheless, I have set it to true explicitly in hdfs-site.xml. Still, I am not able to achieve append. Regards On 23 Aug 2014 11:20, Jagat Singh jagatsi...@gmail.commailto:jagatsi...@gmail.com wrote: What is value of dfs.support.append in hdfs-site.cml https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml On Sat, Aug 23, 2014 at 1:41 AM, rab ra rab...@gmail.commailto:rab...@gmail.com wrote: Hello, I am currently using Hadoop 2.4.1.I am running a MR job using hadoop streaming utility. The executable needs to write large amount of information in a file. However, this write is not done in single attempt. The file needs to be appended with streams of information generated. In the code, inside a loop, I open a file in hdfs, appends some information. This is not working and I see only the last write. How do I accomplish append operation in hadoop? Can anyone share a pointer to me? regards Bala -- Regards, Stanley Shi, [http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png] -- Regards, Stanley Shi, [http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]
replication factor in hdfs-site.xml
Hi Users, I want behaviour for hadoop cluster for writing/reading in following replication cases: 1. replication=2 and in cluster (3 datatnodes+namenode) one datanode gets down. 2. replication=2 and in cluster (3 datatnodes+namenode) 2 datanode gets down. BR, Satyam
AW: Running job issues
Thanks, it fixed my problem! Von: Arpit Agarwal [mailto:aagar...@hortonworks.com] Gesendet: Donnerstag, 28. August 2014 01:41 An: user@hadoop.apache.org Betreff: Re: Running job issues Susheel is right. I've fixed the typo on the wiki page. On Wed, Aug 27, 2014 at 12:28 AM, Susheel Kumar Gadalay skgada...@gmail.commailto:skgada...@gmail.com wrote: You have to use this command to format hdfs namenode –format not hdfs dfs -format On 8/27/14, Blanca Hernandez blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.at wrote: Hi, thanks for your answers. Sorry, I forgot to add it, I couldn´t run the command neither: C:\development\tools\hadoop%HADOOP_PREFIX%\bin\hdfs dfs -format -format: Unknown command C:\development\tools\hadoopecho %HADOOP_PREFIX% C:\development\tools\hadoop By using –help command there is no format param. My Hadoop version: 2.4.0 (the latest version supported by mongo-hadoop). Best regards, Blanca Von: Arpit Agarwal [mailto:aagar...@hortonworks.commailto:aagar...@hortonworks.com] Gesendet: Dienstag, 26. August 2014 21:39 An: user@hadoop.apache.orgmailto:user@hadoop.apache.org Betreff: Re: Running job issues And the namenode does not even start: 14/08/26 12:01:09 WARN namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: NameNode is not formatted. Have you formatted HDFS (step 3.4)? On Tue, Aug 26, 2014 at 3:08 AM, Blanca Hernandez blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.at wrote: Hi! I have just installed hadoop in my windows x64 machine.l followd carefully the instructions in https://wiki.apache.org/hadoop/Hadoop2OnWindows but in the 3.5https://wiki.apache.org/hadoop/Hadoop2OnWindows%20but%20in%20the%203.5 and 3.6 points I have some problems I can not handle. %HADOOP_PREFIX%\sbin\start-dfs.cmd The datanode can no connect: 14/08/26 12:01:30 WARN datanode.DataNode: Problem connecting to server: 0.0.0.0/0.0.0.0:19000http://0.0.0.0/0.0.0.0:19000http://0.0.0.0/0.0.0.0:19000 And the namenode does not even start: 14/08/26 12:01:09 WARN namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: NameNode is not formatted. Trying to get running the mapreduce-example indicated in the pont 3.6, I get the connection exception again. Tha brings me to the http://wiki.apache.org/hadoop/ConnectionRefused page where the exception in explained. So I guess I have some misunderstandings with the configuration. I tried to find information about that and found the page http://wiki.apache.org/hadoop/HowToConfigure but is still not very clear for me. I attach my config files, the ones I modified maybe you can help me out… Many thanks!! CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: replication factor in hdfs-site.xml
For 1#, since you still have 2 datanodes alive, and the replication is 2, writing will success. (Read will success) For 2#, now you only have 1 datanode, and the replication is 2, then initial writing will success, but later sometime pipeline recovery will fail. Regards, Yi Liu -Original Message- From: Satyam Singh [mailto:satyam.si...@ericsson.com] Sent: Thursday, August 28, 2014 3:01 PM To: user@hadoop.apache.org Subject: replication factor in hdfs-site.xml Hi Users, I want behaviour for hadoop cluster for writing/reading in following replication cases: 1. replication=2 and in cluster (3 datatnodes+namenode) one datanode gets down. 2. replication=2 and in cluster (3 datatnodes+namenode) 2 datanode gets down. BR, Satyam
RE: Appending to HDFS file
Thank you all, It works now Regards rab On 28 Aug 2014 12:06, Liu, Yi A yi.a@intel.com wrote: Right, please use FileSystem#append *From:* Stanley Shi [mailto:s...@pivotal.io] *Sent:* Thursday, August 28, 2014 2:18 PM *To:* user@hadoop.apache.org *Subject:* Re: Appending to HDFS file You should not use this method: FSDataOutputStream fp = fs.create(pt, true) Here's the java doc for this create method: /** * Create an FSDataOutputStream at the indicated Path. * @param f the file to create * @*param overwrite if a file with this name already exists, then if true,* * the file will be overwritten, and if false an exception will be thrown. */ public FSDataOutputStream create(Path f, boolean overwrite) throws IOException { return create(f, overwrite, getConf().getInt(io.file.buffer.size, 4096), getDefaultReplication(f), getDefaultBlockSize(f)); } On Wed, Aug 27, 2014 at 2:12 PM, rab ra rab...@gmail.com wrote: hello Here is d code snippet, I use to append def outFile = ${outputFile}.txt Path pt = new Path(${hdfsName}/${dir}/${outFile}) def fs = org.apache.hadoop.fs.FileSystem.get(configuration); FSDataOutputStream fp = fs.create(pt, true) fp ${key} ${value}\n On 27 Aug 2014 09:46, Stanley Shi s...@pivotal.io wrote: would you please past the code in the loop? On Sat, Aug 23, 2014 at 2:47 PM, rab ra rab...@gmail.com wrote: Hi By default, it is true in hadoop 2.4.1. Nevertheless, I have set it to true explicitly in hdfs-site.xml. Still, I am not able to achieve append. Regards On 23 Aug 2014 11:20, Jagat Singh jagatsi...@gmail.com wrote: What is value of dfs.support.append in hdfs-site.cml https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml On Sat, Aug 23, 2014 at 1:41 AM, rab ra rab...@gmail.com wrote: Hello, I am currently using Hadoop 2.4.1.I am running a MR job using hadoop streaming utility. The executable needs to write large amount of information in a file. However, this write is not done in single attempt. The file needs to be appended with streams of information generated. In the code, inside a loop, I open a file in hdfs, appends some information. This is not working and I see only the last write. How do I accomplish append operation in hadoop? Can anyone share a pointer to me? regards Bala -- Regards, *Stanley Shi,* -- Regards, *Stanley Shi,*
Re: libhdfs result in JVM crash issue, please help me
#0 0x7f1e3872c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x7f1e3872c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f1e3872fb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7f1e380a4405 in os::abort(bool) () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x7f1e38223347 in VMError::report_and_die() () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x7f1e380a8d8f in JVM_handle_linux_signal () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 signal handler called #6 0x7f1e38066159 in Monitor::ILock(Thread*) () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x7f1e380664cf in Monitor::lock_without_safepoint_check() () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #8 0x7f1e3822b409 in VM_Exit::wait_if_vm_exited() () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #9 0x7f1e37ebf67d in jni_DetachCurrentThread () from /usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #10 0x7f1e26544a14 in hdfsThreadDestructor (v=optimized out) at /home/haduser/hadoop-2.2.0-src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:84 #11 0x7f1e38ed8c83 in __nptl_deallocate_tsd () from /lib/x86_64-linux-gnu/libpthread.so.0 #12 0x7f1e38ed8ea8 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x7f1e387ea3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #14 0x in ?? () 2014-08-28 14:28 GMT+08:00 Vincent,Wei weikun0...@gmail.com: All I am using libhdfs, I need some usage like following ,and when the JNI call return, it had result in some Crash in JVM, Attachment is the detail information. JAVA Call JNI Call C LIB Call Libhdfs Crash info # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f3271ad3159, pid=9880, tid=139854651725568 # # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x7d6159] Monitor::ILock(Thread*)+0x79 # # Core dump written. Default location: /home/haduser/core or core.9880 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --- T H R E A D --- Current thread is native thread siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0168 Registers: RAX=0x7f326c1eb201, RBX=0x7f326c011b70, RCX=0x7f326c1eb201, RDX=0x RSP=0x7f3272d60e00, RBP=0x7f3272d60e20, RSI=0x, RDI=0x7f326c011b70 R8 =0x, R9 =0x0001, R10=0x, R11=0x0202 R12=0x, R13=0x7f3272134a60, R14=0x, R15=0x RIP=0x7f3271ad3159, EFLAGS=0x00010202, CSGSFS=0x0033, ERR=0x0004 TRAPNO=0x000e Top of Stack: (sp=0x7f3272d60e00) 0x7f3272d60e00: 7f326c011b70 0x7f3272d60e10: 0004 0x7f3272d60e20: 7f3272d60e40 7f3271ad34cf 0x7f3272d60e30: 7f326c0159e8 7f3272b562c0 0x7f3272d60e40: 7f3272d60e50 7f3271c98409 0x7f3272d60e50: 7f3272d60e70 7f327192c67d 0x7f3272d60e60: 7f326c0159e8 0x7f3272d60e70: 7f326c27f3e0 7f326410ea14 0x7f3272d60e80: 000500010002 7f3272105b48 0x7f3272d60e90: 0030 0x7f3272d60ea0: 7f3272d61a10 7f3272945c83 0x7f3272d60eb0: 7f3272d61700 0x7f3272d60ec0: 7fff7a8c5670 0x7f3272d60ed0: 7f3272d619c0 0x7f3272d60ee0: 0003 7f3272945ea8 0x7f3272d60ef0: 7f3272d61700 0x7f3272d60f00: 0x7f3272d60f10: 0x7f3272d60f20: 0x7f3272d60f30: 0x7f3272d60f40: 0x7f3272d60f50: 0x7f3272d60f60: 0x7f3272d60f70: 0x7f3272d60f80: 6a29d305af3e5c9c 0x7f3272d60f90: 7fff7a8c5670 7f3272d619c0 0x7f3272d60fa0: 0003 0x7f3272d60fb0: 944d36a9b2de5c9c 944d362d13fe5c9c 0x7f3272d60fc0: 0x7f3272d60fd0: 0x7f3272d60fe0: 0x7f3272d60ff0: 7f32722573fd
Error could only be replicated to 0 nodes instead of minReplication (=1)
Hello, we are using Hadoop 2.2.0 (HDP 2.0), avro 1.7.4. running on CentOS 6.3 I am facing a following issue when using a AvroMultipleOutputs with dynamic output files. My M/R job works fine for a smaller amount of data or at least the error hasn't appear there so far. With bigger amount of data I am getting following error back to console: Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/pif-dob-categorize/2014/08/26/14/_temporary/1/_temporary/attempt_1409147867302_0090_r_00_0/HISTORY/20140216/64619-r-0.avro could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2503) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy11.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1231) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) I checked that: - cluster is no partitioned - network is fine - that HDFS cluster has enough capacity - actually just 8% used - setting for dfs.reserved is set to 1 but 80GB is still free. Our cluster is small for development purposes just to clarify. DataNode log contains a lot of following errors: 2014-08-28 06:57:22,585 ERROR datanode.DataNode (DataXceiver.java:run(225)) - bd-prg-dev1-dn1.corp.ncr.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /153.86.209.223:47123 dest: / 153.86.209.223:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) NameNode contails planty of errors of type: 2014-08-28 06:58:19,904 WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(295)) - Not able to place enough replicas, still in need of 2 to reach 2 For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger 2014-08-28 06:58:19,905 WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(295)) - Not able to place enough replicas, still in need of 2 to reach 2 For more
Hadoop on Windows 8 with Java 8
Hello, I can't find any information on how possible or difficult it is to install Hadoop as a single node on Windows 8 running Oracle Java 8. The tutorial on Hadoop 2 on Windowshttp://wiki.apache.org/hadoop/Hadoop2OnWindows mentions neither Windows 8 nor Java 8. Is there anything known about this? Thanks! Best, Oliver -- Oliver Ruebenacker Solutions Architect at Altisource Labshttp://www.altisourcelabs.com/ Be always grateful, but never satisfied. *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Hadoop on Windows 8 with Java 8
Currently Hadoop doesn't officially support JAVA8 Regards, Yi Liu From: Ruebenacker, Oliver A [mailto:oliver.ruebenac...@altisource.com] Sent: Thursday, August 28, 2014 8:46 PM To: user@hadoop.apache.org Subject: Hadoop on Windows 8 with Java 8 Hello, I can't find any information on how possible or difficult it is to install Hadoop as a single node on Windows 8 running Oracle Java 8. The tutorial on Hadoop 2 on Windowshttp://wiki.apache.org/hadoop/Hadoop2OnWindows mentions neither Windows 8 nor Java 8. Is there anything known about this? Thanks! Best, Oliver -- Oliver Ruebenacker Solutions Architect at Altisource Labshttp://www.altisourcelabs.com/ Be always grateful, but never satisfied. *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Re: What happens when .....?
Or, maybe have a look at Apache Falcon: Falcon - Apache Falcon - Data management and processing platform Falcon - Apache Falcon - Data management and processing platform Apache Falcon - Data management and processing platform View on falcon.incubator.apache.org Preview by Yahoo From: Stanley Shi s...@pivotal.io To: user@hadoop.apache.org user@hadoop.apache.org Sent: Thursday, August 28, 2014 1:15 AM Subject: Re: What happens when .? Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way is to create a cron job to scan the folder size and submit a MR job if necessary; On Wed, Aug 27, 2014 at 7:38 PM, Kandoi, Nikhil nikhil.kan...@emc.com wrote: Hi All, I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point. My Question is that when I submit a map reduce job, would it only work on the files present at that point ?? Regards, Nikhil Kandoi -- Regards, Stanley Shi,
Re: What happens when .....?
unsubscribe On Thu, Aug 28, 2014 at 6:42 PM, Eric Payne eric.payne1...@yahoo.com wrote: Or, maybe have a look at Apache Falcon: Falcon - Apache Falcon - Data management and processing platform http://falcon.incubator.apache.org/ Falcon - Apache Falcon - Data management and processing platform http://falcon.incubator.apache.org/ Apache Falcon - Data management and processing platform View on falcon.incubator.apache.org http://falcon.incubator.apache.org/ Preview by Yahoo *From:* Stanley Shi s...@pivotal.io *To:* user@hadoop.apache.org user@hadoop.apache.org *Sent:* Thursday, August 28, 2014 1:15 AM *Subject:* Re: What happens when .? Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way is to create a cron job to scan the folder size and submit a MR job if necessary; On Wed, Aug 27, 2014 at 7:38 PM, Kandoi, Nikhil nikhil.kan...@emc.com wrote: Hi All, I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point. My Question is that when I submit a map reduce job, would it only work on the files present at that point ?? Regards, Nikhil Kandoi -- Regards, *Stanley Shi,*
hadoop installation error: localhost: ssh: connect to host localhost port 22: connection refused
Can anyone please help me with this installation error? After I type start-yarn.sh : starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-xx.out localhost: ssh: connect to host localhost port 22: connection refused when I ran jps to check, only Jps and ResourceManager were shown - NodeManager, NameNode and DataNode were not shown.
Re: hadoop installation error: localhost: ssh: connect to host localhost port 22: connection refused
try 'ssh localhost' and show the output On Thu, Aug 28, 2014 at 7:55 PM, Li Chen ahli1...@gmail.com wrote: Can anyone please help me with this installation error? After I type start-yarn.sh : starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-xx.out localhost: ssh: connect to host localhost port 22: connection refused when I ran jps to check, only Jps and ResourceManager were shown - NodeManager, NameNode and DataNode were not shown.
Re: Need some tutorials for Mapreduce written in Python
Thank you to everyone who responded to this thread. I got couple of good moves and got some good online courses to explore from to get some fundamental understanding of the things. Thanks Amar On Thu, Aug 28, 2014 at 10:15 AM, Sriram Balachander sriram.balachan...@gmail.com wrote: Hadoop The Definitive Guide, Hadoop in action are good books and the course in edureka is also good. Regards Sriram On Wed, Aug 27, 2014 at 9:25 PM, thejas prasad thejch...@gmail.com wrote: Are any books for this as well? On Wed, Aug 27, 2014 at 8:30 PM, Marco Shaw marco.s...@gmail.com wrote: You might want to consider the Hadoop course on udacity.com. I think it provides a decent foundation to Hadoop/MapReduce with a focus on Python (using the streaming API like Sebastiano mentions). Marco On Wed, Aug 27, 2014 at 3:13 PM, Amar Singh amarsingh...@gmail.com wrote: Hi Users, I am new to big data world and was in process of reading some material of writing mapreduce using Python. Any links or pointers in that direction will be really helpful.
ApplicationMaster link on cluster web page does not work
Hi configuration !-- Site specific YARN configuration properties -- property nameyarn.app.mapreduce.am.staging-dir/name value/user/value /property property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.application.classpath/name value /etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,/home/hduser/mahout-1.0-snapshot/math/target/* /value /property property nameyarn.resourcemanager.scheduler.address/name value0.0.0.0:8030/value /property property nameyarn.web-proxy.address/name valuevm38.dbweb.ee:8089/value /property /configuration when i click job's ApplicationMaster link (http://vm38.dbweb.ee:8089/proxy/application_1409246753441_0001/) on claster all applications list I will get: HTTP ERROR 404 Problem accessing /proxy/application_1409246753441_0001/mapreduce. Reason: NOT_FOUND /Powered by Jetty:// /I use yarn.web-proxy because without yarn.web-proxy when I click ApplicationMaster link I will have loads of tcp connections with established statuses. In yarn-proxy log there is nothing interesting -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
Re: ApplicationMaster link on cluster web page does not work
More information after I started resourcemanager [root@vm38 ~]# /etc/init.d/hadoop-yarn-resourcemanager start Starting Hadoop resourcemanager: [ OK ] and I open cluster web interface there is some tcp connections to 8088: [root@vm38 ~]# netstat -np | grep 8088 tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:61120ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:64412ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:50139ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:64407ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:58817ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:52250ESTABLISHED 15723/java this is ok Now I started new map reduce job and got tracking url http://server:8088/proxy/application_1409250808355_0001/ now there are loads of connections and tracking url does not response: tcp0 0 :::90.190.106.33:46910 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:35984 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:37559 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:44154 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:45417 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:57294 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:47949 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55330 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp 432 0 :::90.190.106.33:8088 :::90.190.106.33:45467 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:58405 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55580 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:51992 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:44578 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:38686 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:39242 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55916 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:48064 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:35148 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:42638 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:50836 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:44789 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55051 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:39207 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:53781 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:54180 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:56344 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:52707 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:52274 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:45417 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:46627 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:44583 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:40928 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:47014 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:60939 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:52274 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:38391 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:47321 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:36186 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:39160 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:54352
Re: ApplicationMaster link on cluster web page does not work
Moved resourcemanager to another server and it works. I guess I have some network miss routing there :) Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) On 28/08/14 21:39, Margusja wrote: More information after I started resourcemanager [root@vm38 ~]# /etc/init.d/hadoop-yarn-resourcemanager start Starting Hadoop resourcemanager: [ OK ] and I open cluster web interface there is some tcp connections to 8088: [root@vm38 ~]# netstat -np | grep 8088 tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:61120ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:64412ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:50139ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:64407ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:58817ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::84.50.21.39:52250ESTABLISHED 15723/java this is ok Now I started new map reduce job and got tracking url http://server:8088/proxy/application_1409250808355_0001/ now there are loads of connections and tracking url does not response: tcp0 0 :::90.190.106.33:46910 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:35984 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:37559 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:44154 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:45417 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:57294 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:47949 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55330 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp 432 0 :::90.190.106.33:8088 :::90.190.106.33:45467 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:58405 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55580 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:51992 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:44578 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:38686 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:39242 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55916 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:48064 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:35148 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:42638 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:50836 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:44789 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:55051 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:39207 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:53781 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:54180 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:56344 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:52707 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:52274 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:45417 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:46627 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:44583 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:40928 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:47014 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:60939 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:8088 :::90.190.106.33:52274 ESTABLISHED 15723/java tcp0 0 :::90.190.106.33:38391 :::90.190.106.33:8088 ESTABLISHED 15723/java tcp0 0
Job is reported as complete on history server while on console it shows as only half way thru
Hi All, I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the problem is when I submit this job YARN created multiple applications for that submitted job , and the last application that is running in YARN is marked as complete even as on console its reported as only 58% complete . I have confirmed that its also not printing the log statements that its supposed to print when the job is actually complete . Please see the output from the job submission console below. It just stops at 58% and job history server and YARN cluster UI reports that this job has already succeeded. 4/08/28 08:36:19 INFO mapreduce.Job: map 54% reduce 0% 14/08/28 08:44:13 INFO mapreduce.Job: map 55% reduce 0% 14/08/28 08:52:16 INFO mapreduce.Job: map 56% reduce 0% 14/08/28 08:59:22 INFO mapreduce.Job: map 57% reduce 0% 14/08/28 09:07:33 INFO mapreduce.Job: map 58% reduce 0% Thanks.
org.apache.hadoop.io.compress.SnappyCodec not found
Hi, I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not found” error: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library Native library checking: hadoop: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so zlib: true /lib64/libz.so.1 snappy: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1 lz4:true revision:99 bzip2: false (smoke test is ok) bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 30 /tmp/teragenout 14/08/29 07:40:41 INFO mapreduce.Job: Running job: job_1409253811850_0002 14/08/29 07:40:53 INFO mapreduce.Job: Job job_1409253811850_0002 running in uber mode : false 14/08/29 07:40:53 INFO mapreduce.Job: map 0% reduce 0% 14/08/29 07:41:00 INFO mapreduce.Job: map 50% reduce 0% 14/08/29 07:41:01 INFO mapreduce.Job: map 100% reduce 0% 14/08/29 07:41:02 INFO mapreduce.Job: Job job_1409253811850_0002 completed successfully 14/08/29 07:41:02 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=197312 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=167 HDFS: Number of bytes written=3000 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=11925 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=11925 Total vcore-seconds taken by all map tasks=11925 Total megabyte-seconds taken by all map tasks=109900800 Map-Reduce Framework Map input records=30 Map output records=30 Input split bytes=167 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=22 CPU time spent (ms)=1910 Physical memory (bytes) snapshot=357318656 Virtual memory (bytes) snapshot=1691631616 Total committed heap usage (bytes)=401997824 org.apache.hadoop.examples.terasort.TeraGen$Counters CHECKSUM=644086318705578 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=3000 14/08/29 07:41:03 INFO terasort.TeraSort: starting 14/08/29 07:41:03 INFO input.FileInputFormat: Total input paths to process : 2 However I got org.apache.hadoop.io.compress.SnappyCodec not found” when running spark smoke test program: scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
Re: Local file system to access hdfs blocks
Stanley and all, thanks. I will write a client application to explore this path. A quick question again. Using the fsck command, I can retrieve all the necessary info $ hadoop fsck /tmp/list2.txt -files -blocks -racks . *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2 [/default/10.122.195.198:50010, /default/10.122.195.196:50010] However, using getFileBlockLocations(), I can't get the block name/id info, such as *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html is there another entry point? somethinig fsck is using? thanks Demai On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi s...@pivotal.io wrote: As far as I know, there's no combination of hadoop API can do that. You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file. On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni nid...@gmail.com wrote: Yehia, No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is /dfs/dn I would like to it programmatically, so wondering whether someone already done it? or maybe better a hadoop API call already implemented for this exact purpose Demai On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater y.z.elsha...@gmail.com wrote: Hi Demai, Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys). Thanks Yehia On 27 August 2014 21:20, Yehia Elshater y.z.elsha...@gmail.com wrote: Hi Demai, You can use fsck utility like the following: hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks This will display all the information you need about the blocks of your file. Hope it helps. Yehia On 27 August 2014 20:18, Demai Ni nid...@gmail.com wrote: Hi, Stanley, Many thanks. Your method works. For now, I can have two steps approach: 1) getFileBlockLocations to grab hdfs BlockLocation[] 2) use local file system call(like find command) to match the block to files on local file system . Maybe there is an existing Hadoop API to return such info in already? Demai on the run On Aug 26, 2014, at 9:14 PM, Stanley Shi s...@pivotal.io wrote: I am not sure this is what you want but you can try this shell command: find [DATANODE_DIR] -name [blockname] On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni nid...@gmail.com wrote: Hi, folks, New in this area. Hopefully to get a couple pointers. I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) I am wondering whether there is a interface to get each hdfs block information in the term of local file system. For example, I can use Hadoop fsck /tmp/test.txt -files -blocks -racks to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...] With such info, is there a way to 1) login to hfds01, and read the block directly at local file system level? Thanks Demai on the run -- Regards, *Stanley Shi,* -- Regards, *Stanley Shi,*
Re: org.apache.hadoop.io.compress.SnappyCodec not found
Hi, It looks a problem of class path at spark side. Thanks, - Tsuyoshi On Fri, Aug 29, 2014 at 8:49 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not found” error: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library Native library checking: hadoop: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so zlib: true /lib64/libz.so.1 snappy: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1 lz4:true revision:99 bzip2: false (smoke test is ok) bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 30 /tmp/teragenout 14/08/29 07:40:41 INFO mapreduce.Job: Running job: job_1409253811850_0002 14/08/29 07:40:53 INFO mapreduce.Job: Job job_1409253811850_0002 running in uber mode : false 14/08/29 07:40:53 INFO mapreduce.Job: map 0% reduce 0% 14/08/29 07:41:00 INFO mapreduce.Job: map 50% reduce 0% 14/08/29 07:41:01 INFO mapreduce.Job: map 100% reduce 0% 14/08/29 07:41:02 INFO mapreduce.Job: Job job_1409253811850_0002 completed successfully 14/08/29 07:41:02 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=197312 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=167 HDFS: Number of bytes written=3000 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=11925 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=11925 Total vcore-seconds taken by all map tasks=11925 Total megabyte-seconds taken by all map tasks=109900800 Map-Reduce Framework Map input records=30 Map output records=30 Input split bytes=167 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=22 CPU time spent (ms)=1910 Physical memory (bytes) snapshot=357318656 Virtual memory (bytes) snapshot=1691631616 Total committed heap usage (bytes)=401997824 org.apache.hadoop.examples.terasort.TeraGen$Counters CHECKSUM=644086318705578 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=3000 14/08/29 07:41:03 INFO terasort.TeraSort: starting 14/08/29 07:41:03 INFO input.FileInputFormat: Total input paths to process : 2 However I got org.apache.hadoop.io.compress.SnappyCodec not found” when running spark smoke test program: scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at
How to retrieve cached data in datanode? (Centralized cache management)
Hello, I have a question about using the cached data in memory via centralized cache management. I cached the data what I want to use through the CLI (hdfs cacheadmin -addDirectives ...).Then, when I write my mapreduce application, how can I read the cached data in memory? Here is the source code from my mapreduce application.System.out.println(Ready for loading customer table from Centralized Cache in DataNode); System.out.println(Connecting HDFS... at + hdfsURI.toString()); DFSClient dfs = new DFSClient(hdfsURI, new Configuration()); CacheDirectiveInfo info = new CacheDirectiveInfo.Builder().setPath(new Path(path in HDFS for cached data)).setPool(cache).build(); CacheDirectiveEntry cachedFile = dfs.listCacheDirectives(info).next(); System.out.println(We got cachedFile! ID: +cachedFile.getInfo().getId() + , Path: + cachedFile.getInfo().getPath() + , CachedPool: + cachedFile.getInfo().getPool()); System.out.println(Open DFSInputStream t o read cachedFile to ByteBuffer); DFSInputStream in = dfs.open(cachedFile.getInfo().getPath().toString()); ElasticByteBufferPool bufPool = new ElasticByteBufferPool(); ByteBuffer buf = ByteBuffer.allocate(1); System.out.println(Generating Off-Heap ByteBuffer! size: + buf.capacity()); in.read(buf); buf.flip(); // Flip: ready for reading data after writing data into buffer System.out.println(Zero-Copying cached file into buffer!); Is it right source code for using the centralized cache management feature? Thanks // Yoonmin Nam