Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010
Hi Dhana, Increase the ulimit for all the datanodes. If you are starting the service using hadoop increase the ulimit value for hadoop user. Do the changes in the following file. */etc/security/limits.conf* Example:- *hadoop softnofile 35000* *hadoop hardnofile 35000* Regards, Varun Kumar.P On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan bugcy...@gmail.comwrote: Hi Guys I am frequently getting is error in my Data nodes. Please guide what is the exact problem this. dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50373 dest: /172.16.30.138:50010 java.net.SocketTimeoutException: 7 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280 remote=/172.16.30.140:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115) at java.io.FilterInputStream.read(FilterInputStream.java:66) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50531 dest: /172.16.30.138:50010 java.io.EOFException: while trying to read 65563 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) How to resolve this. -Dhanasekaran. Did I learn something today? If not, I wasted it. -- -- Regards, Varun Kumar.P
Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010
Hi Varun I believe is not ulimit issue. /etc/security/limits.conf # End of file * - nofile 100 * - nproc 100 please guide me Guys, I want fix this. share your thoughts DataXceiver error. Did I learn something today? If not, I wasted it. On Fri, Mar 8, 2013 at 3:50 AM, varun kumar varun@gmail.com wrote: Hi Dhana, Increase the ulimit for all the datanodes. If you are starting the service using hadoop increase the ulimit value for hadoop user. Do the changes in the following file. */etc/security/limits.conf* Example:- *hadoop softnofile 35000* *hadoop hardnofile 35000* Regards, Varun Kumar.P On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan bugcy...@gmail.com wrote: Hi Guys I am frequently getting is error in my Data nodes. Please guide what is the exact problem this. dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50373 dest: /172.16.30.138:50010 java.net.SocketTimeoutException: 7 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280 remote=/172.16.30.140:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115) at java.io.FilterInputStream.read(FilterInputStream.java:66) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50531 dest: /172.16.30.138:50010 java.io.EOFException: while trying to read 65563 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) How to resolve this. -Dhanasekaran. Did I learn something today? If not, I wasted it. -- -- Regards, Varun Kumar.P
Re: fsimage.ckpt are not deleted - Exception in doCheckpoint
I have met this exception too. The new fsimage played by SNN could not be transfered to NN. My hdfs version is 2.0.0. did anyone know how to fix it? @Regards Elmar The new fsimage has been created successfully. But it could not be transfered to NN,so the old fsimage.ckpt not deleted. I have tried the new fsimage. Startup the cluster with the new fsimage and new edits in progress. It's successfully and no data lost. 2013/3/6, Elmar Grote elmar.gr...@optivo.de: Hi, we are writing our fsimage and edits file on the namenode and secondary namenode and additional on a nfs share. In these folders we found a a lot of fsimage.ckpt_0 . files, the oldest is from 9. Aug 2012. As far a i know these files should be deleted after the secondary namenodes creates the new fsimage file. I looked in our log files from the namenode and secondary namenode to see what happen at that time. As example i searched for this file: 20. Feb 04:02 fsimage.ckpt_00726216952 In the namenode log i found this: 2013-02-20 04:02:51,404 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Input/output error 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Input/output error In the secondary namenode i think this is the relevant part: 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Image has not changed. Will not download image. 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://s_namenode.domain.local:50070/getimage?getedit=1startTxId=726172233endTxId=726216952storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file edits_00726172233-00726216952 size 6881797 bytes. 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer about to load edits from 1 stream(s). 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_00726172233-00726216952 expecting start txid #726172233 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_00726172233-00726216952 of size 6881797 edits # 44720 loaded in 0 seconds. 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage.ckpt_00726216952 using no compression 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage.ckpt_00726216952 using no compression 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1211973003 saved in 22 seconds. 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1211973003 saved in 32 seconds. 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid = 726172232 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_00726121750, cpktTxId=00726121750) 2013-02-20 04:01:51,000 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage_00726121750, cpktTxId=00726121750) 2013-02-20 04:01:51,379 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs older than 725172233 2013-02-20 04:01:51,381 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs older than 725172233 2013-02-20 04:01:51,400 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://s_namenode.domain.local:50070/getimage?putimage=1txid=726216952port=50090storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4 2013-02-20 04:02:51,411 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Image transfer servlet at http://s_namenode.domain.local:50070/getimage?putimage=1txid=726216952port=50090storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4 failed with status code 410 Response message: GetImage failed. java.io.IOException: Input/output error at sun.nio.ch.FileChannelImpl.force0(Native Method) at
Need info on mapred.child.java.opts, mapred.map.child.java.opts and mapred.reduce.child.java.opts
Hi, While I was reading about the important Hadoop configuration properties, I came across a state of doubt regarding the Java heap space properties for the child tasks. According to my understanding, *mapred.child.java.opts* is the overall heap size allocated to any task (map or reduce). Then when we are setting *mapred.map.child.java.opts* and *mapred.reduce.child.java.opts*separately, are they overriding the *mapred.child.java.opts*? For example, if I have the following configuration: *mapred.child.java.opts = -Xmx1g mapred.map.child.java.opts = -Xmx2g mapred.reduce.child.java.opts = -Xmx512m *Then how exactly the memory allocation is getting distributed between map and reduce? My mapper gets more than the overall heap space as specified or it is restricted to 1g? Can some one help me understand this concept? Also, what are the other heap space related properties which we can use with the above and how? Thanks, Gaurav
Re: Job driver and 3rd party jars
Still doesn't work. Is this works for you? Can you upload some working example so I can verify I didn't miss something? On Fri, Mar 8, 2013 at 9:15 AM, 刘晓文 lxw1...@qq.com wrote: try: hadoop jar *-Dmapreduce.task.classpath.user.precedence=true *-libjars your_jar -- Original -- *From: * Barak Yaishbarak.ya...@gmail.com; *Date: * Fri, Mar 8, 2013 03:06 PM *To: * useruser@hadoop.apache.org; ** *Subject: * Re: Job driver and 3rd party jars Yep, my typo, I'm using the later. I was also trying export HADOOP_CLASSPATH_USER_FIRST =true and export HADOOP_CLASSPATH=myjar before launching the hadoop jar, but I still getting the same exception. I'm running hadoop 1.0.4. On Mar 8, 2013 2:27 AM, Harsh J ha...@cloudera.com wrote: To be precise, did you use -libjar or -libjars? The latter is the right option. On Fri, Mar 8, 2013 at 12:18 AM, Barak Yaish barak.ya...@gmail.com wrote: Hi, I'm able to run M/R jobs where the mapper and reducer required to use 3rd party jars. I'm registering those jars in -libjar while invoking the hadoop jar command. I'm facing a strange problem, though, when the job driver itself ( extends Configured implements Tool ) required to run such code ( for example notify some remote service upon start and end). Is there a way to configure classpath when submitting jobs using hadoop jar? Seems like -libjar doesn't work for this case... Exception in thread main java.lang.NoClassDefFoundError: com/me/context/DefaultContext at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at com.peer39.bigdata.mr.pnm.PnmDataCruncher.run(PnmDataCruncher.java:50) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.me.mr.pnm.PnmMR.main(PnmDataCruncher.java:261) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: com.me.context.DefaultContext at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) -- Harsh J
Re: OutOfMemory during Plain Java MapReduce
I posted this question to stackoverflow also: http://stackoverflow.com/questions/15292061/how-to-implement-a-java-mapreduce-that-produce-output-values-large-then-the-maxi Best Regards, Christian. 2013/3/8 Christian Schneider cschneiderpub...@gmail.com I had a look to the stacktrace and it says the problem is at the reducer: userSet.add(iterator.next().toString()); Error: Java heap space attempt_201303072200_0016_r_02_0: WARN : mapreduce.Counters - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, use dfs.metrics.session-id attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. Instead, use dfs.datanode.hostname attempt_201303072200_0016_r_02_0: FATAL: org.apache.hadoop.mapred.Child - Error running child : java.lang.OutOfMemoryError: Java heap space attempt_201303072200_0016_r_02_0: at java.util.Arrays.copyOfRange(Arrays.java:3209) attempt_201303072200_0016_r_02_0: at java.lang.String.init(String.java:215) attempt_201303072200_0016_r_02_0: at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) attempt_201303072200_0016_r_02_0: at java.nio.CharBuffer.toString(CharBuffer.java:1157) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:394) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:371) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.toString(Text.java:273) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer .reduce(RankingReducer.java:1) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201303072200_0016_r_02_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201303072200_0016_r_02_0: at javax.security.auth.Subject.doAs(Subject.java:396) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) But how to solve this? 2013/3/7 Christian Schneider cschneiderpub...@gmail.com Hi, during the Reduce phase or afterwards (i don't really know how to debug it) I get a heap out of Memory Exception. I guess this is because the value of the reduce task (a Custom Writable) holds a List with a lot of user ids. The Setup is quite simple. This are the related classes I used: //--- // The Reducer // It just add all userIds of the Iterable to the UserSetWriteAble //--- public class UserToAppReducer extends ReducerText, Text, Text, UserSetWritable { @Override protected void reduce(final Text appId, final IterableText userIds, final Context context) throws IOException, InterruptedException { final UserSetWritable userSet = new UserSetWritable(); final IteratorText iterator = userIds.iterator(); while (iterator.hasNext()) { userSet.add(iterator.next().toString()); } context.write(appId, userSet); } } //--- // The Custom Writable // Needed to implement a own toString Method bring the output into the right format. Maybe i can to this also with a own OutputFormat class. //--- public class UserSetWritable implements Writable { private final SetString userIds = new HashSetString(); public void add(final String userId) { this.userIds.add(userId); } @Override public void write(final DataOutput out) throws IOException { out.writeInt(this.userIds.size()); for (final String userId : this.userIds) { out.writeUTF(userId); } } @Override public void readFields(final DataInput in) throws IOException { final int size = in.readInt(); for (int i = 0; i size; i++) { final String readUTF = in.readUTF(); this.userIds.add(readUTF); } } @Override public String toString() { String result = ; for (final String userId : this.userIds) { result += userId + \t; } result += this.userIds.size(); return result; } } As Outputformat I used the default TextOutputFormat. A potential problem could be, that a reduce is going to write files 600MB and our mapred.child.java.opts is set to ~380MB. I digged deeper into the
Submit RHadoop job using Ozzie in Cloudera Manager
Hi All I am using Cloudera Manager 4.5 . As of now I can submit MR jobs using Oozie. Can we submit Rhadoop jobs using Ozzie in Cloudera Manager ?
Re: Submit RHadoop job using Ozzie in Cloudera Manager
Hi Do you have rmr and rhdfs packages installed on all nodes? For hadoop it doesnt matter what type of job is till you have libraries it needs to run in the cluster. Submitting any job would be fine. Thanks On Fri, Mar 8, 2013 at 9:46 PM, rohit sarewar rohitsare...@gmail.comwrote: Hi All I am using Cloudera Manager 4.5 . As of now I can submit MR jobs using Oozie. Can we submit Rhadoop jobs using Ozzie in Cloudera Manager ?
Re: Submit RHadoop job using Ozzie in Cloudera Manager
Hi I have R and RHadoop packages installed on all the nodes. I can submit RMR jobs manually from the terminal. I just want to know How to submit RMR jobs from Oozie web interface ? -Rohit On Fri, Mar 8, 2013 at 4:18 PM, Jagat Singh jagatsi...@gmail.com wrote: Hi Do you have rmr and rhdfs packages installed on all nodes? For hadoop it doesnt matter what type of job is till you have libraries it needs to run in the cluster. Submitting any job would be fine. Thanks On Fri, Mar 8, 2013 at 9:46 PM, rohit sarewar rohitsare...@gmail.comwrote: Hi All I am using Cloudera Manager 4.5 . As of now I can submit MR jobs using Oozie. Can we submit Rhadoop jobs using Ozzie in Cloudera Manager ?
Re: OutOfMemory during Plain Java MapReduce
Hi, When you implement code that starts memory-storing value copies for every record (even if of just a single key), things are going to break in big-data-land. Practically, post-partitioning, the # of values for a given key can be huge given the source data, so you cannot hold it all in and then write in one go. You'd probably need to write out something continuously if you really really want to do this, or use an alternative form of key-value storage where updates can be made incrementally (Apache HBase is such a store, as one example). This has been discussed before IIRC, and if the goal were to store the outputs onto a file then its better to just directly serialize them with a file opened instead of keeping it in a data structure and serializing it at the end. The caveats that'd apply if you were to open your own file from a task are described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider cschneiderpub...@gmail.com wrote: I had a look to the stacktrace and it says the problem is at the reducer: userSet.add(iterator.next().toString()); Error: Java heap space attempt_201303072200_0016_r_02_0: WARN : mapreduce.Counters - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, use dfs.metrics.session-id attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. Instead, use dfs.datanode.hostname attempt_201303072200_0016_r_02_0: FATAL: org.apache.hadoop.mapred.Child - Error running child : java.lang.OutOfMemoryError: Java heap space attempt_201303072200_0016_r_02_0: at java.util.Arrays.copyOfRange(Arrays.java:3209) attempt_201303072200_0016_r_02_0: at java.lang.String.init(String.java:215) attempt_201303072200_0016_r_02_0: at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) attempt_201303072200_0016_r_02_0: at java.nio.CharBuffer.toString(CharBuffer.java:1157) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:394) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:371) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.toString(Text.java:273) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201303072200_0016_r_02_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201303072200_0016_r_02_0: at javax.security.auth.Subject.doAs(Subject.java:396) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) But how to solve this? 2013/3/7 Christian Schneider cschneiderpub...@gmail.com Hi, during the Reduce phase or afterwards (i don't really know how to debug it) I get a heap out of Memory Exception. I guess this is because the value of the reduce task (a Custom Writable) holds a List with a lot of user ids. The Setup is quite simple. This are the related classes I used: //--- // The Reducer // It just add all userIds of the Iterable to the UserSetWriteAble //--- public class UserToAppReducer extends ReducerText, Text, Text, UserSetWritable { @Override protected void reduce(final Text appId, final IterableText userIds, final Context context) throws IOException, InterruptedException { final UserSetWritable userSet = new UserSetWritable(); final IteratorText iterator = userIds.iterator(); while (iterator.hasNext()) { userSet.add(iterator.next().toString()); } context.write(appId, userSet); } } //--- // The Custom Writable // Needed to implement a own toString Method bring the output into the right format. Maybe i can to this also with a own OutputFormat class. //--- public class UserSetWritable implements Writable { private final SetString userIds = new HashSetString(); public void add(final String userId) { this.userIds.add(userId);
Re: Need info on mapred.child.java.opts, mapred.map.child.java.opts and mapred.reduce.child.java.opts
Its easier to understand if you know the history. First there was just mapred.child.java.opts, which controlled java options for both Map and Reduce tasks (i.e. all tasks). Then there came a need for task-specific java opts, so the project introduced mapred.map.child.java.opts and mapred.reduce.child.java.opts, while keeping around mapred.child.java.opts. Hence, if mapred.map.child.java.opts is present, it is preferred over the mapred.child.java.opts, likewise for mapred.reduce.child.java.opts vs. mapred.child.java.opts. If neither of the specifics is present, we look for mapred.child.java.opts. P.s. Please do not cross-post to multiple email lists; it is a bad practice and potentially spawns two different diverging conversation threads on the same topic. This question is apt-enough for user@hadoop.apache.org alone as it is not CDH specific, so I've moved cdh-u...@cloudera.org to bcc. On Fri, Mar 8, 2013 at 3:41 PM, Gaurav Dasgupta gdsay...@gmail.com wrote: Hi, While I was reading about the important Hadoop configuration properties, I came across a state of doubt regarding the Java heap space properties for the child tasks. According to my understanding, mapred.child.java.opts is the overall heap size allocated to any task (map or reduce). Then when we are setting mapred.map.child.java.opts and mapred.reduce.child.java.opts separately, are they overriding the mapred.child.java.opts? For example, if I have the following configuration: mapred.child.java.opts = -Xmx1g mapred.map.child.java.opts = -Xmx2g mapred.reduce.child.java.opts = -Xmx512m Then how exactly the memory allocation is getting distributed between map and reduce? My mapper gets more than the overall heap space as specified or it is restricted to 1g? Can some one help me understand this concept? Also, what are the other heap space related properties which we can use with the above and how? Thanks, Gaurav -- Harsh J
Re: Need info on mapred.child.java.opts, mapred.map.child.java.opts and mapred.reduce.child.java.opts
Thanks for replying Harsh. So, it means that in my case of configuration, *mapred.child.java.opts = -Xmx1g* will be avoided completely and *mapred.map.child.java.opts = -Xmx2g*will be considered for map tasks and * mapred.reduce.child.java.opts = -Xmx512m* will be considered for reduce tasks. Right? Thanks, Gaurav
Re: error while running reduce
how can I fix this? when I run the same job with 1GB of input with 1 map and 1 reducer, it works fine. On Thu, Mar 7, 2013 at 11:14 PM, Jagmohan Chauhan simplefundumn...@gmail.com wrote: Hi I think the problem is in replication factor.. As, you are using replication factor of 1 and you have a single node the data cannot be replicated anywhere else. On Thu, Mar 7, 2013 at 4:31 AM, Arindam Choudhury arindamchoudhu...@gmail.com wrote: Hi, I am trying to do a performance analysis of hadoop on virtual machine. When I try to run terasort with 2GB of input data with 1 map and 1 reduce, the map finishes properly, but reduce gives error. I can not understand why? any help? I have a single node hadoop deployment in a virtual machine. The F18 virtual machine have 1 core and 2 GB of memory. my configuration: core-site.xml configuration property namefs.default.name/name valuehdfs://hadoopa.arindam.com:54310/value /property property namehadoop.tmp.dir/name value/tmp/${user.name}/value /property property namefs.inmemory.size.mb/name value20/value /property property nameio.file.buffer.size/name value131072/value /property /configuration hdfs-site.xml configuration property namedfs.name.dir/name value/home/hadoop/hadoop-dir/name-dir/value /property property namedfs.data.dir/name value/home/hadoop/hadoop-dir/data-dir/value /property property namedfs.block.size/name value204800/value finaltrue/final /property property namedfs.replication/name value1/value /property /configuration mapred-site.xml configuration property namemapred.job.tracker/name valuehadoopa.arindam.com:54311/value /property property namemapred.system.dir/name value/home/hadoop/hadoop-dir/system-dir/value /property property namemapred.local.dir/name value/home/hadoop/hadoop-dir/local-dir/value /property property namemapred.map.child.java.opts/name value-Xmx1024M/value /property property namemapred.reduce.child.java.opts/name value-Xmx1024M/value /property /configuration I created 2GB of data to run tera sort. hadoop dfsadmin -report Configured Capacity: 21606146048 (20.12 GB) Present Capacity: 14480427242 (13.49 GB) DFS Remaining: 12416368640 (11.56 GB) DFS Used: 2064058602 (1.92 GB) DFS Used%: 14.25% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 - Datanodes available: 1 (1 total, 0 dead) Name: 192.168.122.32:50010 Decommission Status : Normal Configured Capacity: 21606146048 (20.12 GB) DFS Used: 2064058602 (1.92 GB) Non DFS Used: 7125718806 (6.64 GB) DFS Remaining: 12416368640(11.56 GB) DFS Used%: 9.55% DFS Remaining%: 57.47% But when I run the terasort, i am getting the following error: 13/03/04 17:56:16 INFO mapred.JobClient: Task Id : attempt_201303041741_0002_r_00_0, Status : FAILED org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/output/_temporary/_attempt_201303041741_0002_r_00_0/part-0 could only be replicated to 0 nodes, instead of 1 hadoop dfsadmin -report Configured Capacity: 21606146048 (20.12 GB) Present Capacity: 10582014209 (9.86 GB) DFS Remaining: 8517738496 (7.93 GB) DFS Used: 2064275713 (1.92 GB) DFS Used%: 19.51% Under replicated blocks: 2 Blocks with corrupt replicas: 0 Missing blocks: 0 - Datanodes available: 1 (1 total, 0 dead) Name: 192.168.122.32:50010 Decommission Status : Normal Configured Capacity: 21606146048 (20.12 GB) DFS Used: 2064275713 (1.92 GB) Non DFS Used: 11024131839 (10.27 GB) DFS Remaining: 8517738496(7.93 GB) DFS Used%: 9.55% DFS Remaining%: 39.42% Thanks, -- Thanks and Regards Jagmohan Chauhan MSc student,CS Univ. of Saskatchewan IEEE Graduate Student Member http://homepage.usask.ca/~jac735/
Re: [Hadoop-Help]About Map-Reduce implementation
Hi Mayur, Take a look here: http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process. = SingleNode. So you can only use the Fully-Distributed mode. JM 2013/3/8 Mayur Patil ram.nath241...@gmail.com: Hello, Thank you sir for your favorable reply. I am going to use 1master and 2 worker nodes ; totally 3 nodes. Thank you !! -- Cheers, Mayur On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mayur, Those 3 modes are 3 differents ways to use Hadoop, however, the only production mode here is the fully distributed one. The 2 others are more for local testing. How many nodes are you expecting to use hadoop on? JM 2013/3/7 Mayur Patil ram.nath241...@gmail.com: Hello, Now I am slowly understanding Hadoop working. As I want to collect the logs from three machines including Master itself . My small query is which mode should I implement for this?? Standalone Operation Pseudo-Distributed Operation Fully-Distributed Operation Seeking for guidance, Thank you !! -- Cheers, Mayur Hi mayur, Flume is used for data collection. Pig is used for data processing. For eg, if you have a bunch of servers that you want to collect the logs from and push to HDFS - you would use flume. Now if you need to run some analysis on that data, you could use pig to do that. Sent from my iPhone On Feb 14, 2013, at 1:39 AM, Mayur Patil ram.nath241...@gmail.com wrote: Hello, I just read about Pig Pig A data flow language and execution environment for exploring very large datasets. Pig runs on HDFS and MapReduce clusters. What the actual difference between Pig and Flume makes in logs clustering?? Thank you !! -- Cheers, Mayur. Hey Mayur, If you are collecting logs from multiple servers then you can use flume for the same. if the contents of the logs are different in format then you can just use textfileinput format to read and write into any other format you want for your processing in later part of your projects first thing you need to learn is how to setup hadoop then you can try writing sample hadoop mapreduce jobs to read from text file and then process them and write the results into another file then you can integrate flume as your log collection mechanism once you get hold on the system then you can decide more on which paths you want to follow based on your requirements for storage, compute time, compute capacity, compression etc -- -- Hi, Please read basics on how hadoop works. Then start your hands on with map reduce coding. The tool which has been made for you is flume , but don't see tool till you complete above two steps. Good luck , keep us posted. Regards, Jagat Singh --- Sent from Mobile , short and crisp. On 06-Feb-2013 8:32 AM, Mayur Patil ram.nath241...@gmail.com wrote: Hello, I am new to Hadoop. I am doing a project in cloud in which I have to use hadoop for Map-reduce. It is such that I am going to collect logs from 2-3 machines having different locations. The logs are also in different formats such as .rtf .log .txt Later, I have to collect and convert them to one format and collect to one location. So I am asking which module of Hadoop that I need to study for this implementation?? Or whole framework should I need to study ?? Seeking for guidance, Thank you !! -- Cheers, Mayur. -- Cheers, Mayur.
Re: fsimage.ckpt are not deleted - Exception in doCheckpoint
Hi Yifan, thank you for the answer. But as far as i understand the SN downloads the fsimage and edits files from NN, build the new fsimage in uploads it to the NN. So here the upload didn't work. The next time the creation starts there is the old fsimage on the NN. But what about the edits files ? Are the old ones still there? Or where they deleted during the not working upload of the fsimage? If they where deleted the are missing and there should be a loss or inconsistence of data. Or am i wrong? When will the edits files be deleted? After a successful upload or before? Regards Elmar _ From: Yifan Du [mailto:duyifa...@gmail.com] To: user@hadoop.apache.org Sent: Fri, 08 Mar 2013 11:08:09 +0100 Subject: Re: fsimage.ckpt are not deleted - Exception in doCheckpoint I have met this exception too. The new fsimage played by SNN could not be transfered to NN. My hdfs version is 2.0.0. did anyone know how to fix it? @Regards Elmar The new fsimage has been created successfully. But it could not be transfered to NN,so the old fsimage.ckpt not deleted. I have tried the new fsimage. Startup the cluster with the new fsimage and new edits in progress. It's successfully and no data lost. 2013/3/6, Elmar Grote elmar.gr...@optivo.de: Hi, we are writing our fsimage and edits file on the namenode and secondary namenode and additional on a nfs share. In these folders we found a a lot of fsimage.ckpt_0 . files, the oldest is from 9. Aug 2012. As far a i know these files should be deleted after the secondary namenodes creates the new fsimage file. I looked in our log files from the namenode and secondary namenode to see what happen at that time. As example i searched for this file: 20. Feb 04:02 fsimage.ckpt_00726216952 In the namenode log i found this: 2013-02-20 04:02:51,404 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Input/output error 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Input/output error In the secondary namenode i think this is the relevant part: 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Image has not changed. Will not download image. 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://s_namenode.domain.local:50070/getimage?getedit=1startTxId=726172233endTxId=726216952storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file edits_00726172233-00726216952 size 6881797 bytes. 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer about to load edits from 1 stream(s). 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_00726172233-00726216952 expecting start txid #726172233 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_00726172233-00726216952 of size 6881797 edits # 44720 loaded in 0 seconds. 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage.ckpt_00726216952 using no compression 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage.ckpt_00726216952 using no compression 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1211973003 saved in 22 seconds. 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1211973003 saved in 32 seconds. 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid = 726172232 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_00726121750, cpktTxId=00726121750) 2013-02-20 04:01:51,000 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage_00726121750, cpktTxId=00726121750) 2013-02-20 04:01:51,379 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs older than
Re: reg memory allocation failed
Hi Manoj, It's related to your JVM. Which version are you using? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Team, I am getting this issue when reducer starts executing after map's completed. After job failed with below exception when i restart its running fine. We are getting this issue after upgrading to MRv1(CDH4). Any inputs will be more helpful, Thanks in advance. # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for CHeapObj-new # An error report file with more information is saved as: # /data/9/mapred/local/taskTracker/alum/jobcache/job_201303021416_2999/attempt_201303021416_2999_r_02_0/work/hs_err_pid948.log Cheers! Manoj.
Re: reg memory allocation failed
Hi, I am using version 1.6. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, It's related to your JVM. Which version are you using? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Team, I am getting this issue when reducer starts executing after map's completed. After job failed with below exception when i restart its running fine. We are getting this issue after upgrading to MRv1(CDH4). Any inputs will be more helpful, Thanks in advance. # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for CHeapObj-new # An error report file with more information is saved as: # /data/9/mapred/local/taskTracker/alum/jobcache/job_201303021416_2999/attempt_201303021416_2999_r_02_0/work/hs_err_pid948.log Cheers! Manoj.
Re: reg memory allocation failed
Hi Manoj, Oracle 1.6? OpenJDK 1.6? Which 1.6 release? The 24? What is java -version giving you? 2013/3/8 Manoj Babu manoj...@gmail.com: Hi, I am using version 1.6. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, It's related to your JVM. Which version are you using? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Team, I am getting this issue when reducer starts executing after map's completed. After job failed with below exception when i restart its running fine. We are getting this issue after upgrading to MRv1(CDH4). Any inputs will be more helpful, Thanks in advance. # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for CHeapObj-new # An error report file with more information is saved as: # /data/9/mapred/local/taskTracker/alum/jobcache/job_201303021416_2999/attempt_201303021416_2999_r_02_0/work/hs_err_pid948.log Cheers! Manoj.
Re: reg memory allocation failed
Hi Jean, Java(TM) SE Runtime Environment (build pxa6460sr10fp1-20120321_01(SR10 FP1)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr10fp1-20120202_101568 (JIT enabled, AOT enabled) J9VM - 20120202_101568 JIT - r9_2007_21307ifx1 GC - 20120202_AA) JCL - 20120320_01 Thanks in advance. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:48 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, Oracle 1.6? OpenJDK 1.6? Which 1.6 release? The 24? What is java -version giving you? 2013/3/8 Manoj Babu manoj...@gmail.com: Hi, I am using version 1.6. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, It's related to your JVM. Which version are you using? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Team, I am getting this issue when reducer starts executing after map's completed. After job failed with below exception when i restart its running fine. We are getting this issue after upgrading to MRv1(CDH4). Any inputs will be more helpful, Thanks in advance. # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for CHeapObj-new # An error report file with more information is saved as: # /data/9/mapred/local/taskTracker/alum/jobcache/job_201303021416_2999/attempt_201303021416_2999_r_02_0/work/hs_err_pid948.log Cheers! Manoj.
Re: OutOfMemory during Plain Java MapReduce
A potential problem could be, that a reduce is going to write files 600MB and our mapred.child.java.opts is set to ~380MB. Isn't the minimum heap normally 512MB? Why not just increase your child heap size, assuming you have enough memory on the box... On Mar 8, 2013, at 4:57 AM, Harsh J ha...@cloudera.com wrote: Hi, When you implement code that starts memory-storing value copies for every record (even if of just a single key), things are going to break in big-data-land. Practically, post-partitioning, the # of values for a given key can be huge given the source data, so you cannot hold it all in and then write in one go. You'd probably need to write out something continuously if you really really want to do this, or use an alternative form of key-value storage where updates can be made incrementally (Apache HBase is such a store, as one example). This has been discussed before IIRC, and if the goal were to store the outputs onto a file then its better to just directly serialize them with a file opened instead of keeping it in a data structure and serializing it at the end. The caveats that'd apply if you were to open your own file from a task are described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider cschneiderpub...@gmail.com wrote: I had a look to the stacktrace and it says the problem is at the reducer: userSet.add(iterator.next().toString()); Error: Java heap space attempt_201303072200_0016_r_02_0: WARN : mapreduce.Counters - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, use dfs.metrics.session-id attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. Instead, use dfs.datanode.hostname attempt_201303072200_0016_r_02_0: FATAL: org.apache.hadoop.mapred.Child - Error running child : java.lang.OutOfMemoryError: Java heap space attempt_201303072200_0016_r_02_0: at java.util.Arrays.copyOfRange(Arrays.java:3209) attempt_201303072200_0016_r_02_0: at java.lang.String.init(String.java:215) attempt_201303072200_0016_r_02_0: at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) attempt_201303072200_0016_r_02_0: at java.nio.CharBuffer.toString(CharBuffer.java:1157) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:394) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:371) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.toString(Text.java:273) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201303072200_0016_r_02_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201303072200_0016_r_02_0: at javax.security.auth.Subject.doAs(Subject.java:396) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) But how to solve this? 2013/3/7 Christian Schneider cschneiderpub...@gmail.com Hi, during the Reduce phase or afterwards (i don't really know how to debug it) I get a heap out of Memory Exception. I guess this is because the value of the reduce task (a Custom Writable) holds a List with a lot of user ids. The Setup is quite simple. This are the related classes I used: //--- // The Reducer // It just add all userIds of the Iterable to the UserSetWriteAble //--- public class UserToAppReducer extends ReducerText, Text, Text, UserSetWritable { @Override protected void reduce(final Text appId, final IterableText userIds, final Context context) throws IOException, InterruptedException { final UserSetWritable userSet = new UserSetWritable(); final IteratorText iterator = userIds.iterator(); while (iterator.hasNext()) { userSet.add(iterator.next().toString()); } context.write(appId, userSet); } } //--- // The Custom Writable // Needed to
Re: reg memory allocation failed
Hi Manoj, Do you have the required rights to test with another JVM? Can you test the Oracle JVM Java SE 6 Update 43? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Hi Jean, Java(TM) SE Runtime Environment (build pxa6460sr10fp1-20120321_01(SR10 FP1)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr10fp1-20120202_101568 (JIT enabled, AOT enabled) J9VM - 20120202_101568 JIT - r9_2007_21307ifx1 GC - 20120202_AA) JCL - 20120320_01 Thanks in advance. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:48 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, Oracle 1.6? OpenJDK 1.6? Which 1.6 release? The 24? What is java -version giving you? 2013/3/8 Manoj Babu manoj...@gmail.com: Hi, I am using version 1.6. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, It's related to your JVM. Which version are you using? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Team, I am getting this issue when reducer starts executing after map's completed. After job failed with below exception when i restart its running fine. We are getting this issue after upgrading to MRv1(CDH4). Any inputs will be more helpful, Thanks in advance. # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for CHeapObj-new # An error report file with more information is saved as: # /data/9/mapred/local/taskTracker/alum/jobcache/job_201303021416_2999/attempt_201303021416_2999_r_02_0/work/hs_err_pid948.log Cheers! Manoj.
Re: reg memory allocation failed
Hi Jean, I dont have that rights. Is there any way to find? On 8 Mar 2013 20:13, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, Do you have the required rights to test with another JVM? Can you test the Oracle JVM Java SE 6 Update 43? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Hi Jean, Java(TM) SE Runtime Environment (build pxa6460sr10fp1-20120321_01(SR10 FP1)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr10fp1-20120202_101568 (JIT enabled, AOT enabled) J9VM - 20120202_101568 JIT - r9_2007_21307ifx1 GC - 20120202_AA) JCL - 20120320_01 Thanks in advance. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:48 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, Oracle 1.6? OpenJDK 1.6? Which 1.6 release? The 24? What is java -version giving you? 2013/3/8 Manoj Babu manoj...@gmail.com: Hi, I am using version 1.6. Cheers! Manoj. On Fri, Mar 8, 2013 at 7:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, It's related to your JVM. Which version are you using? JM 2013/3/8 Manoj Babu manoj...@gmail.com: Team, I am getting this issue when reducer starts executing after map's completed. After job failed with below exception when i restart its running fine. We are getting this issue after upgrading to MRv1(CDH4). Any inputs will be more helpful, Thanks in advance. # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for CHeapObj-new # An error report file with more information is saved as: # /data/9/mapred/local/taskTracker/alum/jobcache/job_201303021416_2999/attempt_201303021416_2999_r_02_0/work/hs_err_pid948.log Cheers! Manoj.
Re: Hadoop cluster hangs on big hive job
Dude I'am not going to read all you log files, but try to run this as a normal map reduce job, it could be memory related, something wrong with some of the zip files, wrong config etc. -Håvard On Thu, Mar 7, 2013 at 8:53 PM, Daning Wang dan...@netseer.com wrote: We have hive query processing zipped csv files. the query was scanning for 10 days(partitioned by date). data for each day around 130G. The problem is not consistent since if you run it again, it might go through. but the problem has never happened on the smaller jobs(like processing only one days data). We don't have space issue. I have attached log file when problem happening. it is stuck like following(just search 19706 of 49964) 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) Thanks, Daning On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård haavard.kongsga...@gmail.com wrote: hadoop logs? On 6. mars 2013 21:04, Daning Wang dan...@netseer.com wrote: We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while running big jobs. Basically all the nodes are dead, from that trasktracker's log looks it went into some kinds of loop forever. All the log entries like this when problem happened. Any idea how to debug the issue? Thanks in advance. 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_16_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_04_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_43_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker:
Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010
I am also having this issue and tried a lot of solutions, but could not solve it. ]# ulimit -n ** running as root and hdfs (datanode user) 32768 ]# cat /proc/sys/fs/file-nr 208008047008 ]# lsof | wc -l 5157 Sometimes this issue happens from one node to the same node :( I also think this issue is messing with my regionservers which are crashing all day long!! Thanks, Pablo On 03/08/2013 06:42 AM, Dhanasekaran Anbalagan wrote: Hi Varun I believe is not ulimit issue. /etc/security/limits.conf # End of file * - nofile 100 * - nproc 100 please guide me Guys, I want fix this. share your thoughts DataXceiver error. Did I learn something today? If not, I wasted it. On Fri, Mar 8, 2013 at 3:50 AM, varun kumar varun@gmail.com mailto:varun@gmail.com wrote: Hi Dhana, Increase the ulimit for all the datanodes. If you are starting the service using hadoop increase the ulimit value for hadoop user. Do the changes in the following file. */etc/security/limits.conf* Example:- *hadoop softnofile 35000* *hadoop hardnofile 35000* Regards, Varun Kumar.P On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan bugcy...@gmail.com mailto:bugcy...@gmail.com wrote: Hi Guys I am frequently getting is error in my Data nodes. Please guide what is the exact problem this. dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50373 http://172.16.30.138:50373 dest: /172.16.30.138:50010 http://172.16.30.138:50010 java.net.SocketTimeoutException: 7 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280 http://172.16.30.138:34280 remote=/172.16.30.140:50010 http://172.16.30.140:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115) at java.io.FilterInputStream.read(FilterInputStream.java:66) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50531 http://172.16.30.138:50531 dest: /172.16.30.138:50010 http://172.16.30.138:50010 java.io.EOFException: while trying to read 65563 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) How to resolve this. -Dhanasekaran. Did I learn something today? If not, I wasted it. -- -- Regards, Varun Kumar.P
Re: Submit RHadoop job using Ozzie in Cloudera Manager
[moving thread to user@oozie.a.o, BCCing common-user@hadoop.a.o] Oozie web UI is read only, it does not do job submissions. If you want to do that you should look at Hue. Thx On Fri, Mar 8, 2013 at 2:53 AM, rohit sarewar rohitsare...@gmail.comwrote: Hi I have R and RHadoop packages installed on all the nodes. I can submit RMR jobs manually from the terminal. I just want to know How to submit RMR jobs from Oozie web interface ? -Rohit On Fri, Mar 8, 2013 at 4:18 PM, Jagat Singh jagatsi...@gmail.com wrote: Hi Do you have rmr and rhdfs packages installed on all nodes? For hadoop it doesnt matter what type of job is till you have libraries it needs to run in the cluster. Submitting any job would be fine. Thanks On Fri, Mar 8, 2013 at 9:46 PM, rohit sarewar rohitsare...@gmail.comwrote: Hi All I am using Cloudera Manager 4.5 . As of now I can submit MR jobs using Oozie. Can we submit Rhadoop jobs using Ozzie in Cloudera Manager ? -- Alejandro
Re: [jira] [Commented] (HDFS-4533) start-dfs.sh ignored additional parameters besides -upgrade
Please followup on Jenkins failures. Looks like the patch is generated at the wrong directory. On Thu, Feb 28, 2013 at 1:34 AM, Azuryy Yu azury...@gmail.com wrote: Who can review this JIRA(https://issues.apache.org/jira/browse/HDFS-4533), which is very simple. -- Forwarded message -- From: Hadoop QA (JIRA) j...@apache.org Date: Wed, Feb 27, 2013 at 4:53 PM Subject: [jira] [Commented] (HDFS-4533) start-dfs.sh ignored additional parameters besides -upgrade To: azury...@gmail.com [ https://issues.apache.org/jira/browse/HDFS-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588130#comment-13588130] Hadoop QA commented on HDFS-4533: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571164/HDFS-4533.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4008//console This message is automatically generated. start-dfs.sh ignored additional parameters besides -upgrade --- Key: HDFS-4533 URL: https://issues.apache.org/jira/browse/HDFS-4533 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.0.3-alpha Reporter: Fengdong Yu Labels: patch Fix For: 2.0.4-beta Attachments: HDFS-4533.patch start-dfs.sh only takes -upgrade option and ignored others. So If run the following command, it will ignore the clusterId option. start-dfs.sh -upgrade -clusterId 1234 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -- http://hortonworks.com/download/
Re: OutOfMemory during Plain Java MapReduce
As always, what Harsh said :) Looking at your reducer code, it appears that you are trying to compute the distinct set of user IDs for a given reduce key. Rather than computing this by holding the set in memory, use a secondary sort of the reduce values, then while iterating over the reduce values, look for changes of user id. Whenever it changes, write out the key and the newly found value. Your output will change from this: key, [value 1, value2, ... valueN] to this: key, value1 key, value2 ... key, valueN Whether this is suitable for your follow-on processing is the next question, but this approach will scale to whatever data you can throw at it. Paul On 8 March 2013 10:57, Harsh J ha...@cloudera.com wrote: Hi, When you implement code that starts memory-storing value copies for every record (even if of just a single key), things are going to break in big-data-land. Practically, post-partitioning, the # of values for a given key can be huge given the source data, so you cannot hold it all in and then write in one go. You'd probably need to write out something continuously if you really really want to do this, or use an alternative form of key-value storage where updates can be made incrementally (Apache HBase is such a store, as one example). This has been discussed before IIRC, and if the goal were to store the outputs onto a file then its better to just directly serialize them with a file opened instead of keeping it in a data structure and serializing it at the end. The caveats that'd apply if you were to open your own file from a task are described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider cschneiderpub...@gmail.com wrote: I had a look to the stacktrace and it says the problem is at the reducer: userSet.add(iterator.next().toString()); Error: Java heap space attempt_201303072200_0016_r_02_0: WARN : mapreduce.Counters - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, use dfs.metrics.session-id attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. Instead, use dfs.datanode.hostname attempt_201303072200_0016_r_02_0: FATAL: org.apache.hadoop.mapred.Child - Error running child : java.lang.OutOfMemoryError: Java heap space attempt_201303072200_0016_r_02_0: at java.util.Arrays.copyOfRange(Arrays.java:3209) attempt_201303072200_0016_r_02_0: at java.lang.String.init(String.java:215) attempt_201303072200_0016_r_02_0: at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) attempt_201303072200_0016_r_02_0: at java.nio.CharBuffer.toString(CharBuffer.java:1157) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:394) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:371) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.toString(Text.java:273) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201303072200_0016_r_02_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201303072200_0016_r_02_0: at javax.security.auth.Subject.doAs(Subject.java:396) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) But how to solve this? 2013/3/7 Christian Schneider cschneiderpub...@gmail.com Hi, during the Reduce phase or afterwards (i don't really know how to debug it) I get a heap out of Memory Exception. I guess this is because the value of the reduce task (a Custom Writable) holds a List with a lot of user ids. The Setup is quite simple. This are the related classes I used: //--- // The Reducer // It just add all userIds of the Iterable to the UserSetWriteAble //--- public class UserToAppReducer extends ReducerText, Text,
Re: OutOfMemory during Plain Java MapReduce
Paul's way is much more easier than doing the serialization way I mentioned earlier. I didn't pay attention to the logic used but just the implementation, my bad :) On Fri, Mar 8, 2013 at 5:39 PM, Paul Wilkinson pa...@cloudera.com wrote: As always, what Harsh said :) Looking at your reducer code, it appears that you are trying to compute the distinct set of user IDs for a given reduce key. Rather than computing this by holding the set in memory, use a secondary sort of the reduce values, then while iterating over the reduce values, look for changes of user id. Whenever it changes, write out the key and the newly found value. Your output will change from this: key, [value 1, value2, ... valueN] to this: key, value1 key, value2 ... key, valueN Whether this is suitable for your follow-on processing is the next question, but this approach will scale to whatever data you can throw at it. Paul On 8 March 2013 10:57, Harsh J ha...@cloudera.com wrote: Hi, When you implement code that starts memory-storing value copies for every record (even if of just a single key), things are going to break in big-data-land. Practically, post-partitioning, the # of values for a given key can be huge given the source data, so you cannot hold it all in and then write in one go. You'd probably need to write out something continuously if you really really want to do this, or use an alternative form of key-value storage where updates can be made incrementally (Apache HBase is such a store, as one example). This has been discussed before IIRC, and if the goal were to store the outputs onto a file then its better to just directly serialize them with a file opened instead of keeping it in a data structure and serializing it at the end. The caveats that'd apply if you were to open your own file from a task are described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. On Fri, Mar 8, 2013 at 4:35 AM, Christian Schneider cschneiderpub...@gmail.com wrote: I had a look to the stacktrace and it says the problem is at the reducer: userSet.add(iterator.next().toString()); Error: Java heap space attempt_201303072200_0016_r_02_0: WARN : mapreduce.Counters - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, use dfs.metrics.session-id attempt_201303072200_0016_r_02_0: WARN : org.apache.hadoop.conf.Configuration - slave.host.name is deprecated. Instead, use dfs.datanode.hostname attempt_201303072200_0016_r_02_0: FATAL: org.apache.hadoop.mapred.Child - Error running child : java.lang.OutOfMemoryError: Java heap space attempt_201303072200_0016_r_02_0: at java.util.Arrays.copyOfRange(Arrays.java:3209) attempt_201303072200_0016_r_02_0: at java.lang.String.init(String.java:215) attempt_201303072200_0016_r_02_0: at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) attempt_201303072200_0016_r_02_0: at java.nio.CharBuffer.toString(CharBuffer.java:1157) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:394) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.decode(Text.java:371) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.io.Text.toString(Text.java:273) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:21) attempt_201303072200_0016_r_02_0: at com.myCompany.UserToAppReducer.reduce(RankingReducer.java:1) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:268) attempt_201303072200_0016_r_02_0: at java.security.AccessController.doPrivileged(Native Method) attempt_201303072200_0016_r_02_0: at javax.security.auth.Subject.doAs(Subject.java:396) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) attempt_201303072200_0016_r_02_0: at org.apache.hadoop.mapred.Child.main(Child.java:262) But how to solve this? 2013/3/7 Christian Schneider cschneiderpub...@gmail.com Hi, during the Reduce phase or afterwards (i don't really know how to debug it) I get a heap out of Memory Exception. I guess this is because the value of the reduce task (a Custom Writable) holds a List with a lot of user ids. The Setup is quite simple. This are the related classes I
Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010
Hi, If all of the # of open files limit ( hbase , and hdfs : users ) are set to more than 30 K. Please change the dfs.datanode.max.xcievers to more than the value below. property namedfs.datanode.max.xcievers/name value2096/value descriptionPRIVATE CONFIG VARIABLE/description /property Try to increase this one and tunne it to the hbase usage. Thanks -Abdelrahman On Fri, Mar 8, 2013 at 9:28 AM, Pablo Musa pa...@psafe.com wrote: I am also having this issue and tried a lot of solutions, but could not solve it. ]# ulimit -n ** running as root and hdfs (datanode user) 32768 ]# cat /proc/sys/fs/file-nr 208008047008 ]# lsof | wc -l 5157 Sometimes this issue happens from one node to the same node :( I also think this issue is messing with my regionservers which are crashing all day long!! Thanks, Pablo On 03/08/2013 06:42 AM, Dhanasekaran Anbalagan wrote: Hi Varun I believe is not ulimit issue. /etc/security/limits.conf # End of file * - nofile 100 * - nproc 100 please guide me Guys, I want fix this. share your thoughts DataXceiver error. Did I learn something today? If not, I wasted it. On Fri, Mar 8, 2013 at 3:50 AM, varun kumar varun@gmail.com wrote: Hi Dhana, Increase the ulimit for all the datanodes. If you are starting the service using hadoop increase the ulimit value for hadoop user. Do the changes in the following file. */etc/security/limits.conf* Example:- *hadoop softnofile 35000* *hadoop hardnofile 35000* Regards, Varun Kumar.P On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan bugcy...@gmail.com wrote: Hi Guys I am frequently getting is error in my Data nodes. Please guide what is the exact problem this. dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50373 dest: /172.16.30.138:50010 java.net.SocketTimeoutException: 7 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280 remote=/172.16.30.140:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115) at java.io.FilterInputStream.read(FilterInputStream.java:66) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50531 dest: /172.16.30.138:50010 java.io.EOFException: while trying to read 65563 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang.Thread.run(Thread.java:662) How to resolve this. -Dhanasekaran. Did I learn something today? If not, I wasted it. -- -- Regards, Varun Kumar.P
Re: Need info on mapred.child.java.opts, mapred.map.child.java.opts and mapred.reduce.child.java.opts
Hi Gaurav, That's correct. If the following was set: *mapred.child.java.opts = -Xmx1g* *mapred.map.child.java.opts = -Xmx2g* *mapred.reduce.child.java.opts = -Xmx512m* then: 1) -Xmx2G will be used for map tasks 2) -Xmx512m will be used for reduce tasks 3) the -Xmx1g will be ignored. Kind Regards, Anthony Rojas On Fri, Mar 8, 2013 at 3:22 AM, Gaurav Dasgupta gdsay...@gmail.com wrote: Thanks for replying Harsh. So, it means that in my case of configuration, *mapred.child.java.opts = -Xmx1g* will be avoided completely and *mapred.map.child.java.opts = -Xmx2g* will be considered for map tasks and*mapred.reduce.child.java.opts = -Xmx512m * will be considered for reduce tasks. Right? Thanks, Gaurav