Equally split a RDD partition into two partition at the same node
Dear all, I want to equally divide a RDD partition into two partitions. That means, the first half of elements in the partition will create a new partition, and the second half of elements in the partition will generate another new partition. But the two new partitions are required to be at the same node with their parent partition, which can help get high data locality. Is there anyone who knows how to implement it or any hints for it? Thanks in advance, Fei
Re: RDD Location
It will be very appreciated if you can give more details about why runJob function could not be called in getPreferredLocations() In the NewHadoopRDD class and HadoopRDD class, they get the location information from the inputSplit. But there may be an issue in NewHadoopRDD, because it generates all of the inputSplits on the master node, which means I can only use a single node to generate and filter the inputSplits even if the number of inputSplits is huge. Will it be a performance bottleneck? Thanks, Fei On Fri, Dec 30, 2016 at 10:41 PM, Sun Rui <sunrise_...@163.com> wrote: > You can’t call runJob inside getPreferredLocations(). > You can take a look at the source code of HadoopRDD to help you implement > getPreferredLocations() > appropriately. > > On Dec 31, 2016, at 09:48, Fei Hu <hufe...@gmail.com> wrote: > > That is a good idea. > > I tried add the following code to get getPreferredLocations() function: > > val results: Array[Array[DataChunkPartition]] = context.runJob( > partitionsRDD, (context: TaskContext, partIter: > Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true) > > But it seems to be suspended when executing this function. But if I move > the code to other places, like the main() function, it runs well. > > What is the reason for it? > > Thanks, > Fei > > On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <sunrise_...@163.com> wrote: > >> Maybe you can create your own subclass of RDD and override the >> getPreferredLocations() to implement the logic of dynamic changing of the >> locations. >> > On Dec 30, 2016, at 12:06, Fei Hu <hufe...@gmail.com> wrote: >> > >> > Dear all, >> > >> > Is there any way to change the host location for a certain partition of >> RDD? >> > >> > "protected def getPreferredLocations(split: Partition)" can be used to >> initialize the location, but how to change it after the initialization? >> > >> > >> > Thanks, >> > Fei >> > >> > >> >> >> > >
context.runJob() was suspended in getPreferredLocations() function
Dear all, I tried to customize my own RDD. In the getPreferredLocations() function, I used the following code to query anonter RDD, which was used as an input to initialize this customized RDD: * val results: Array[Array[DataChunkPartition]] = context.runJob(partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) => partIter.toArray, partitions, allowLocal = true)* The problem is that when executing the above code, the task seemed to be suspended. I mean the job just stopped at this code, but no errors and no outputs. What is the reason for it? Thanks, Fei
Re: RDD Location
That is a good idea. I tried add the following code to get getPreferredLocations() function: val results: Array[Array[DataChunkPartition]] = context.runJob( partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true) But it seems to be suspended when executing this function. But if I move the code to other places, like the main() function, it runs well. What is the reason for it? Thanks, Fei On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <sunrise_...@163.com> wrote: > Maybe you can create your own subclass of RDD and override the > getPreferredLocations() to implement the logic of dynamic changing of the > locations. > > On Dec 30, 2016, at 12:06, Fei Hu <hufe...@gmail.com> wrote: > > > > Dear all, > > > > Is there any way to change the host location for a certain partition of > RDD? > > > > "protected def getPreferredLocations(split: Partition)" can be used to > initialize the location, but how to change it after the initialization? > > > > > > Thanks, > > Fei > > > > > > >
RDD Location
Dear all, Is there any way to change the host location for a certain partition of RDD? "protected def getPreferredLocations(split: Partition)" can be used to initialize the location, but how to change it after the initialization? Thanks, Fei
Kryo on Zeppelin
Hi All, I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version 1.5.0). I customized the Spark interpreter to use org.apache.spark. serializer.KryoSerializer as spark.serializer. And in the dependency I added Kyro-3.0.3 as following: com.esotericsoftware:kryo:3.0.3 When I wrote the scala notebook and run the program, I got the following errors. But If I compiled these code as jars, and use spark-submit to run it on the cluster, it worked well without errors. WARN [2016-10-10 23:43:40,801] ({task-result-getter-1} Logging.scala[logWarning]:71) - Lost task 0.0 in stage 3.0 (TID 9, svr-A3-A-U20): java.io.EOFException at org.apache.spark.serializer.KryoDeserializationStream. readObject(KryoSerializer.scala:196) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject( TorrentBroadcast.scala:217) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$ readBroadcastBlock$1.apply(TorrentBroadcast.scala:178) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1175) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock( TorrentBroadcast.scala:165) at org.apache.spark.broadcast.TorrentBroadcast._value$ lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value( TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue( TorrentBroadcast.scala:88) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run( Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) There were also some errors when I run the Zeppelin Tutorial: Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163) at org.apache.spark.rdd.ParallelCollectionPartition.readObject( ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeReadObject( ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData( ObjectInputStream.java:1900) at java.io.ObjectInputStream.readOrdinaryObject( ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream. java:1351) at java.io.ObjectInputStream.defaultReadFields( ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData( ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject( ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream. java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream. readObject(JavaSerializer.scala:72) at org.apache.spark.serializer.JavaSerializerInstance. deserialize(JavaSerializer.scala:98) at org.apache.spark.executor.Executor$TaskRunner.run( Executor.scala:194) ... 3 more Caused by: java.lang.NullPointerException at com.twitter.chill.WrappedArraySerializer.read( WrappedArraySerializer.scala:38) at com.twitter.chill.WrappedArraySerializer.read( WrappedArraySerializer.scala:23) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoDeserializationStream. readObject(KryoSerializer.scala:192) at org.apache.spark.rdd.ParallelCollectionPartition$$ anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply( ParallelCollectionRDD.scala:80) at org.apache.spark.rdd.ParallelCollectionPartition$$ anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply( ParallelCollectionRDD.scala:80) at org.apache.spark.util.Utils$.deserializeViaNestedStream( Utils.scala:142) at org.apache.spark.rdd.ParallelCollectionPartition$$ anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:80) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160) Is there anyone knowing why it happended? Thanks in advance, Fei
[no subject]
Hi All, I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version 1.5.0). I customized the Spark interpreter to use org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the dependency I added Kyro-3.0.3 as following: com.esotericsoftware:kryo:3.0.3 When I wrote the scala notebook and run the program, I got the following errors. But If I compiled these code as jars, and use spark-submit to run it on the cluster, it worked well without errors. WARN [2016-10-10 23:43:40,801] ({task-result-getter-1} Logging.scala[logWarning]:71) - Lost task 0.0 in stage 3.0 (TID 9, svr-A3-A-U20): java.io.EOFException at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:196) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:217) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1175) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) There were also some errors when I run the Zeppelin Tutorial: Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) ... 3 more Caused by: java.lang.NullPointerException at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:38) at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply(ParallelCollectionRDD.scala:80) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply(ParallelCollectionRDD.scala:80) at org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:142) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:80) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160) Is there anyone knowing why it happended? Thanks in advance, Fei
Spark application Runtime Measurement
Dear all, I have a question about how to measure the runtime for a Spak application. Here is an example: - On the Spark UI: the total duration time is 2.0 minutes = 120 seconds as following [image: Screen Shot 2016-07-09 at 11.45.44 PM.png] - However, when I check the jobs launched by the application, the time is 13s + 0.8s + 4s = 17.8 seconds, which is much less than 120 seconds. I am not sure which time I should choose to measure the performance of the Spark application. [image: Screen Shot 2016-07-09 at 11.48.26 PM.png] - I also check the event timeline as following. There is a big gap between the second job and the third job. I do not know what happened during that gap. [image: Screen Shot 2016-07-09 at 11.53.29 PM.png] Is there anyone who can help explain which time is the exact time to measure the performance of a Spark application. Thanks in advance, Fei
Remotely submit a job to Yarn on CDH5.4
Hi, I want to remotely submit a job to Yarn on CDH5.4. The following is the code about the WordCount and the error report. Any one knows how to solve it? Thanks in advance, Fei INFO: Job job_1439867352386_0025 failed with state FAILED due to: Application application_1439867352386_0025 failed 2 times due to AM Container for appattempt_1439867352386_0025_02 exited with exitCode: 1 For more detailed output, check application tracking page:http://compute-04:8088/proxy/application_1439867352386_0025/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1439867352386_0025_02_01 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); System.setProperty(HADOOP_USER_NAME,hdfs); conf.set(hadoop.job.ugi, supergroup); conf.set(mapreduce.framework.name, yarn); conf.set(fs.defaultFS, hdfs://compute-04:8020 hdfs://compute-04:8020); conf.set(mapreduce.map.java.opts, -Xmx1024M); conf.set(mapreduce.reduce.java.opts, -Xmx1024M); conf.set(fs.hdfs.impl, org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set(fs.file.impl, org.apache.hadoop.fs.LocalFileSystem.class.getName()); conf.set(yarn.resourcemanager.address, 199.25.200.134:8032); conf.set(yarn.resourcemanager.resource-tracker.address, 199.25.200.134:8031); conf.set(yarn.resourcemanager.scheduler.address, 199.25.200.134:8030); conf.set(yarn.resourcemanager.admin.address, 199.25.200.134:8033); conf.set(yarn.nodemanager.aux-services, mapreduce_shuffle); conf.set(yarn.application.classpath, /etc/hadoop/conf.cloudera.hdfs, + /etc/hadoop/conf.cloudera.yarn, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*”); GenericOptionsParser optionParser = new GenericOptionsParser(conf, args); String[] remainingArgs = optionParser.getRemainingArgs(); if (!(remainingArgs.length != 2 || remainingArgs.length != 4)) { System.err.println(Usage: wordcount in out [-skip skipPatternFile]); System.exit(2); } Job job = Job.getInstance(conf, word count); job.setJarByClass(WordCount2.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); ListString otherArgs = new ArrayListString(); for (int i=0; i remainingArgs.length; ++i) { if (-skip.equals(remainingArgs[i])) { job.addCacheFile(new Path(remainingArgs[++i]).toUri()); job.getConfiguration().setBoolean(wordcount.skip.patterns, true); } else { otherArgs.add(remainingArgs[i]); } } FileInputFormat.addInputPath(job, new Path(otherArgs.get(0))); FileOutputFormat.setOutputPath(job, new Path(otherArgs.get(1))); System.exit(job.waitForCompletion(true) ? 0 : 1); }
Remotely submit a job to Yarn on CDH5.4
Hi, I want to remotely submit a job to Yarn on CDH5.4. The following is the code about the WordCount and the error report. Any one knows how to solve it? Thanks in advance, Fei INFO: Job job_1439867352386_0025 failed with state FAILED due to: Application application_1439867352386_0025 failed 2 times due to AM Container for appattempt_1439867352386_0025_02 exited with exitCode: 1 For more detailed output, check application tracking page:http://compute-04:8088/proxy/application_1439867352386_0025/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1439867352386_0025_02_01 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); System.setProperty(HADOOP_USER_NAME,hdfs); conf.set(hadoop.job.ugi, supergroup); conf.set(mapreduce.framework.name, yarn); conf.set(fs.defaultFS, hdfs://compute-04:8020); conf.set(mapreduce.map.java.opts, -Xmx1024M); conf.set(mapreduce.reduce.java.opts, -Xmx1024M); conf.set(fs.hdfs.impl, org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set(fs.file.impl, org.apache.hadoop.fs.LocalFileSystem.class.getName()); conf.set(yarn.resourcemanager.address, 199.25.200.134:8032); conf.set(yarn.resourcemanager.resource-tracker.address, 199.25.200.134:8031); conf.set(yarn.resourcemanager.scheduler.address, 199.25.200.134:8030); conf.set(yarn.resourcemanager.admin.address, 199.25.200.134:8033); conf.set(yarn.nodemanager.aux-services, mapreduce_shuffle); conf.set(yarn.application.classpath, /etc/hadoop/conf.cloudera.hdfs, + /etc/hadoop/conf.cloudera.yarn, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*, + /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*); GenericOptionsParser optionParser = new GenericOptionsParser(conf, args); String[] remainingArgs = optionParser.getRemainingArgs(); if (!(remainingArgs.length != 2 || remainingArgs.length != 4)) { System.err.println(Usage: wordcount in out [-skip skipPatternFile]); System.exit(2); } Job job = Job.getInstance(conf, word count); job.setJarByClass(WordCount2.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); ListString otherArgs = new ArrayListString(); for (int i=0; i remainingArgs.length; ++i) { if (-skip.equals(remainingArgs[i])) { job.addCacheFile(new Path(remainingArgs[++i]).toUri()); job.getConfiguration().setBoolean(wordcount.skip.patterns, true); } else { otherArgs.add(remainingArgs[i]); } } FileInputFormat.addInputPath(job, new Path(otherArgs.get(0))); FileOutputFormat.setOutputPath(job, new Path(otherArgs.get(1))); System.exit(job.waitForCompletion(true) ? 0 : 1); }
Re: Container beyond virtual memory limits
Thank you for your help. It is useful. Best, Fei On Mar 23, 2015, at 1:09 AM, Drake민영근 drake@nexr.com wrote: Hi, See 6. Killing of Tasks Due to Virtual Memory Usage in http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ Drake 민영근 Ph.D kt NexR On Sun, Mar 22, 2015 at 12:43 PM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Hi, I just test my yarn installation, and run a Wordcount program. But it always report the following error, who knows how to solve it? Thank you in advance. Container [pid=7954,containerID=container_1426992254950_0002_01_05] is running beyond virtual memory limits. Current usage: 13.6 MB of 1 GB physical memory used; 4.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1426992254950_0002_01_05 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7960 7954 7954 7954 (java) 5 0 4576591872 3199 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 1638 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426992254950_0002/container_1426992254950_0002_01_05/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 199.26.254.140 36542 attempt_1426992254950_0002_m_03_0 5 |- 7954 7949 7954 7954 (bash) 0 0 65421312 275 /bin/bash -c /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 1638 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426992254950_0002/container_1426992254950_0002_01_05/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 199.26.254.140 36542 attempt_1426992254950_0002_m_03_0 5 1/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05/stdout 2/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05/stderr Exception from container-launch. Container id: container_1426992254950_0002_01_05 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Thanks, Fei
Re: Container beyond virtual memory limits
Thank you. It works. Best, Fei On Mar 23, 2015, at 11:27 AM, Gaurav Gupta gaurav.gopi...@gmail.com wrote: Increasing vmem:pmem ratio should help you out. Default value is 2:1, change it to 5:1 On Sun, Mar 22, 2015 at 10:09 PM, Drake민영근 drake@nexr.com mailto:drake@nexr.com wrote: Hi, See 6. Killing of Tasks Due to Virtual Memory Usage in http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ Drake 민영근 Ph.D kt NexR On Sun, Mar 22, 2015 at 12:43 PM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Hi, I just test my yarn installation, and run a Wordcount program. But it always report the following error, who knows how to solve it? Thank you in advance. Container [pid=7954,containerID=container_1426992254950_0002_01_05] is running beyond virtual memory limits. Current usage: 13.6 MB of 1 GB physical memory used; 4.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1426992254950_0002_01_05 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7960 7954 7954 7954 (java) 5 0 4576591872 3199 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 1638 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426992254950_0002/container_1426992254950_0002_01_05/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 199.26.254.140 36542 attempt_1426992254950_0002_m_03_0 5 |- 7954 7949 7954 7954 (bash) 0 0 65421312 275 /bin/bash -c /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 1638 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426992254950_0002/container_1426992254950_0002_01_05/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 199.26.254.140 36542 attempt_1426992254950_0002_m_03_0 5 1/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05/stdout 2/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05/stderr Exception from container-launch. Container id: container_1426992254950_0002_01_05 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Thanks, Fei
Container beyond virtual memory limits
Hi, I just test my yarn installation, and run a Wordcount program. But it always report the following error, who knows how to solve it? Thank you in advance. Container [pid=7954,containerID=container_1426992254950_0002_01_05] is running beyond virtual memory limits. Current usage: 13.6 MB of 1 GB physical memory used; 4.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1426992254950_0002_01_05 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7960 7954 7954 7954 (java) 5 0 4576591872 3199 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 1638 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426992254950_0002/container_1426992254950_0002_01_05/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 199.26.254.140 36542 attempt_1426992254950_0002_m_03_0 5 |- 7954 7949 7954 7954 (bash) 0 0 65421312 275 /bin/bash -c /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 1638 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426992254950_0002/container_1426992254950_0002_01_05/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 199.26.254.140 36542 attempt_1426992254950_0002_m_03_0 5 1/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05/stdout 2/home/hadoop-lzl/hadoop-2.6.0/logs/userlogs/application_1426992254950_0002/container_1426992254950_0002_01_05/stderr Exception from container-launch. Container id: container_1426992254950_0002_01_05 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Thanks, Fei
Re: Prune out data to a specific reduce task
Maybe you could use Partitioner.class to solve your problem. On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote: Hi, I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute. How can I do this in MapReduce? ExampleJobExecution.png Thanks, -- --
Re: Prune out data to a specific reduce task
In the Reducer.class, you could ignore the data that you want to exclude based on the key or value. On Mar 12, 2015, at 12:47 PM, xeonmailinglist-gmail xeonmailingl...@gmail.com wrote: If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks. The method public int getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion? Forwarded Message Subject: Re: Prune out data to a specific reduce task Date: Thu, 12 Mar 2015 12:40:04 -0400 From: Fei Hu hufe...@gmail.com http://mailto:hufe...@gmail.com/ Reply-To: user@hadoop.apache.org mailto:user@hadoop.apache.org To: user@hadoop.apache.org mailto:user@hadoop.apache.org Maybe you could use Partitioner.class to solve your problem. On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote: Hi, I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute. How can I do this in MapReduce? ExampleJobExecution.png Thanks, -- --
Monitor data transformation
Hi All, I developed a scheduler for data locality. Now I want to test the performance of the scheduler, so I need to monitor how many data are read remotely. Is there every tool for monitoring the volume of data moved around the cluster? Thanks, Fei
Data locality
Hi All, I developed a scheduler for data locality. Now I want to test the performance of the scheduler, so I need to monitor how many data are read remotely. Is there any tool for monitoring the volume of data moved around the cluster? Thanks, Fei
Re: Data Placement Strategy for HDFS
Hi, Thank you for your help. I searched HDFS-7613 by Google and the link https://issues.apache.org/jira/issues/?jql=project%20%3D%20HDFS%20AND%20text%20~%20%227613%22 https://issues.apache.org/jira/issues/?jql=project%20=%20HDFS%20AND%20text%20~%20%227613%22, but I could not find it. Could you email me the link? Thank you very much. Sincerely, Fei On Jan 14, 2015, at 5:43 PM, Ted Yu yuzhih...@gmail.com wrote: Fei: You can watch this issue: HDFS-7613 Block placement policy for erasure coding groups the solution there would be helpful to us. Cheers On Wed, Jan 14, 2015 at 11:04 AM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Thank you for your quick response. After reading the materials you recommended, my conclusion is that Hadoop does not provide interface to customize the data placement policy. We need to add some codes to the source package of HDFS. Is that right? Thanks, Fei On Tue Jan 13 2015 at 10:42:27 PM Ted Yu yuzhih...@gmail.com mailto:yuzhih...@gmail.com wrote: See this thread: http://search-hadoop.com/m/lAh9i28K7 http://search-hadoop.com/m/lAh9i28K7 See also HDFS-7228 Cheers On Tue, Jan 13, 2015 at 7:33 PM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Hi, I want to customize the data placement strategy rather than using the default strategy in HDFS. Is there any way to control which datanode the replica is delivered to? Thank you in advance. Best regards, Fei
Re: Data Placement Strategy for HDFS
Thank you very much. Fei On Jan 14, 2015, at 7:36 PM, Ted Yu yuzhih...@gmail.com wrote: https://issues.apache.org/jira/browse/HDFS-7613 https://issues.apache.org/jira/browse/HDFS-7613 Cheers On Wed, Jan 14, 2015 at 4:35 PM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Hi, Thank you for your help. I searched HDFS-7613 by Google and the link https://issues.apache.org/jira/issues/?jql=project%20%3D%20HDFS%20AND%20text%20~%20%227613%22 https://issues.apache.org/jira/issues/?jql=project%20=%20HDFS%20AND%20text%20~%20%227613%22, but I could not find it. Could you email me the link? Thank you very much. Sincerely, Fei On Jan 14, 2015, at 5:43 PM, Ted Yu yuzhih...@gmail.com mailto:yuzhih...@gmail.com wrote: Fei: You can watch this issue: HDFS-7613 Block placement policy for erasure coding groups the solution there would be helpful to us. Cheers On Wed, Jan 14, 2015 at 11:04 AM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Thank you for your quick response. After reading the materials you recommended, my conclusion is that Hadoop does not provide interface to customize the data placement policy. We need to add some codes to the source package of HDFS. Is that right? Thanks, Fei On Tue Jan 13 2015 at 10:42:27 PM Ted Yu yuzhih...@gmail.com mailto:yuzhih...@gmail.com wrote: See this thread: http://search-hadoop.com/m/lAh9i28K7 http://search-hadoop.com/m/lAh9i28K7 See also HDFS-7228 Cheers On Tue, Jan 13, 2015 at 7:33 PM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Hi, I want to customize the data placement strategy rather than using the default strategy in HDFS. Is there any way to control which datanode the replica is delivered to? Thank you in advance. Best regards, Fei
Data Placement Strategy for HDFS
Hi, I want to customize the data placement strategy rather than using the default strategy in HDFS. Is there any way to control which datanode the replica is delivered to? Thank you in advance. Best regards, Fei
Re: Hadoop Installation on Multihomed Networks
I solved the problem by changing the hosts file as follows: 10.10.0.10 10.5.0.10 yngcr10nc01 Thanks, Fei On Nov 11, 2014, at 11:58 AM, daemeon reiydelle daeme...@gmail.com wrote: You may want to consider configuring host names that embed the subnet in the host name itself (e.g. foo50, foo40, for foo via each of the 50 and 40 subnets). ssh key file contents etc may have to be fiddled with a bit ... “The race is not to the swift, nor the battle to the strong, but to those who can see it coming and jump aside.” - Hunter Thompson Daemeon C.M. Reiydelle USA (+1) 415.501.0198 London (+44) (0) 20 8144 9872 On Tue, Nov 11, 2014 at 7:47 AM, Fei Hu hufe...@gmail.com mailto:hufe...@gmail.com wrote: Hi, I am trying to install Hadoop1.0.4 on multihomed networks. I have done it as the link http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html. But it still doesn’t work. The datanode could not work. Its log is as the following. In my hdfs-site.xml, I set the ip:10.50.0.10 for datanode, but in the following report, host = yngcr10nc01/10.10.0.10 http://10.10.0.10/.I think it may be because in /etc/hosts file, I add the pair of ip and hostname:10.10.0.10 YNGCR10NC01 before. The problem now is that I could not add 10.50.0.10 YNGCR10NC01 into hosts file, because 10.10.0.10 YNGCR10NC01 is necessary for another program. Is there any way to solve the problem on multihomed networks? Thanks in advance, Fei Hu 2014-11-11 04:20:28,228 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = yngcr10nc01/10.10.0.10 http://10.10.0.10/ STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2014-11-11 04:20:28,436 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-11-11 04:20:28,446 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-11-11 04:20:28,447 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-11-11 04:20:28,447 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2014-11-11 04:20:28,572 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-11-11 04:20:28,607 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2014-11-11 04:20:29,870 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 0 time(s). 2014-11-11 04:20:30,871 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 1 time(s). 2014-11-11 04:20:31,872 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 2 time(s). 2014-11-11 04:20:32,873 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 3 time(s). 2014-11-11 04:20:33,874 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 4 time(s). 2014-11-11 04:20:34,875 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 5 time(s). 2014-11-11 04:20:35,876 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 6 time(s). 2014-11-11 04:20:36,877 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 7 time(s). 2014-11-11 04:20:37,877 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 8 time(s). 2014-11-11 04:20:38,879 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/. Already tried 9 time(s). 2014-11-11 04:20:38,884 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to yngcr11hm01/10.50.0.5:9000 http://10.50.0.5:9000/ failed on local exception: java.net.NoRouteToHostException: No route to host at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107
Hadoop Installation on Multihomed Networks
Hi, I am trying to install Hadoop1.0.4 on multihomed networks. I have done it as the link http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html. But it still doesn’t work. The datanode could not work. Its log is as the following. In my hdfs-site.xml, I set the ip:10.50.0.10 for datanode, but in the following report, host = yngcr10nc01/10.10.0.10.I think it may be because in /etc/hosts file, I add the pair of ip and hostname:10.10.0.10 YNGCR10NC01 before. The problem now is that I could not add 10.50.0.10 YNGCR10NC01 into hosts file, because 10.10.0.10 YNGCR10NC01 is necessary for another program. Is there any way to solve the problem on multihomed networks? Thanks in advance, Fei Hu 2014-11-11 04:20:28,228 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = yngcr10nc01/10.10.0.10 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2014-11-11 04:20:28,436 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-11-11 04:20:28,446 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-11-11 04:20:28,447 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-11-11 04:20:28,447 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2014-11-11 04:20:28,572 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-11-11 04:20:28,607 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2014-11-11 04:20:29,870 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 0 time(s). 2014-11-11 04:20:30,871 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 1 time(s). 2014-11-11 04:20:31,872 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 2 time(s). 2014-11-11 04:20:32,873 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 3 time(s). 2014-11-11 04:20:33,874 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 4 time(s). 2014-11-11 04:20:34,875 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 5 time(s). 2014-11-11 04:20:35,876 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 6 time(s). 2014-11-11 04:20:36,877 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 7 time(s). 2014-11-11 04:20:37,877 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 8 time(s). 2014-11-11 04:20:38,879 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yngcr11hm01/10.50.0.5:9000. Already tried 9 time(s). 2014-11-11 04:20:38,884 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to yngcr11hm01/10.50.0.5:9000 failed on local exception: java.net.NoRouteToHostException: No route to host at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107) at org.apache.hadoop.ipc.Client.call(Client.java:1075) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:299) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) Caused by: java.net.NoRouteToHostException: No route to host
Datanode could not work for the ip is not the same as specified in hdfs-site.xml
Hi, I am installing Hadoop 1.0.4 on our clusters. And I am meeting a problem about IP setting for datanode. Maybe it is about multimode networks. But I have tried to solve the problem as this link http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html, it still does not work. There are two networks on each computer. For example: on the computer whose hostname is YNGCR10NC01, it has two ip: brpub Link encap:Ethernet HWaddr EC:F4:BB:C4:86:28 inet addr:10.10.0.10 Bcast:10.10.255.255 Mask:255.255.0.0 em3 Link encap:Ethernet HWaddr EC:F4:BB:C4:86:2C inet addr:10.50.0.10 Bcast:10.50.0.255 Mask:255.255.255.0 Now I want to use IP:10.50.0.10 to install DataNode. In hfs-site.xml, I change some properties as the following: property namedfs.datanode.address/name value10.50.0.10:50010/value /property property namedfs.datanode.http.address/name value10.50.0.10:50075/value /property But because I am using ip:10.10.0.10 for another job, I have added ip:10.10.0.10 to /etc/hosts before as the following: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.10.0.10 YNGCR10NC01 //This is occupied by another program, so I could not add 10.50.0.10 YNGCR10NC01 to the hosts file 10.50.0.5 yngcr11hm01//This is a master node Therefore, I could not add 10.50.0.10 YNGCR10NC01 to the hosts file After I start hadoop, the datanode log reports the following errors: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to yngcr11hm01/10.50.0.5:9000 failed on local exception: java.net.NoRouteToHostException: No route to host The front of the datanode log shows / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = yngcr10nc01/10.10.0.10 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / I don’t why the host ip is still 10.10.0.10. I want the host ip to be 10.50.0.10. Maybe it is caused by the hosts file. But now I could not change the hosts file, because the pair of 10.10.0.10 YNGCR10NC01 is being used for another program. Is there any way to solve this problem. Thank you very much in advance! All the best, Fei Hu