Hi Djordje, Thanks for your help, the method works.
Best Regards, Fu, Binzhang 于 2012/4/13 6:12, Djordje Jevdjic 写道: > Hello Fu, > > I see that your mapred-site.xml file says that you want four map and two > reduce processes. Each of these is spawned by the framework as > a separate process. You also ask for 2GB of heap space per process, which you > definitely don't have in your little Xen machine. > Basically, the minimum memory you need (number_of_map_processes * heap_size) > + the framework overhead (less than 1GB) to > run this. How many cores does your virtual machine have? For the purpose of > this benchmark, you should have a hardware configuration similar to the > following: > > number of maps = number of cores you want to run this on > number of reduce jobs = 1, unless the number of mappers is >8 > amount of memory = number of mappers * heap size > > If you can't afford more than 2GB of memory, I suggest that you change the > number of mappers and reducers to 1 in the config file and set the heap size > to 1.5GB. > The following parameters are affected (mapred-site.xml): > > <property> > <name>mapred.tasktracker.map.tasks.maximum</name> > <value>1</value> > <description>The maximum number of map tasks that will be run > simultaneously by a task tracker. </description> > </property> > > <property> > <name>mapred.tasktracker.reduce.tasks.maximum</name> > <value>1</value> > <description>The maximum number of reduce tasks that will be run > simultaneously by a task tracker. </description> > </property> > > > <property> > <name>mapred.map.tasks</name> > <value>1</value> > <description>The default number of map tasks per job. Ignored > when mapred.job.tracker is "local".</description> > </property> > > <property> > <name>mapred.reduce.tasks</name> > <value>1</value> > <description>The default number of reduce tasks per job.</description> > </property> > > <property> > <name>mapred.child.java.opts</name> > <value>-Xmx1536M</value> > </property> > > > Also, please check the benchmark documentation tomorrow, I will refresh the > instructions so that > you can run the benchmark with smaller memory requirements. > > > Regards, > Djordje > > > > ________________________________________ > From: 付斌章 [[email protected]] > Sent: Thursday, April 12, 2012 3:27 PM > To: Djordje Jevdjic > Subject: Re: RE: A question about "data analytics" > > Hello Djordje, > > Thanks for your advice, the problem is really caused by the tmp > directory. I think the reason maybe that i didn't reformat the namenode after > i changed the tmp directory. After i reformatted it, the "class cast" > exception disappeared. Unfortunately, another problem happened. The job was > killed every time i tried. I find in "tasktracker.log" that the error is > "FATAL org.apache.hadoop.mapred.TaskTracker: Task: > attempt_201204120549_0001_m_000000_3 - Killed : Java heap space". > > Does it mean that the main memory is not enough. Actually, i am running > data analytics in a Xen virtual machine with 2GB memory. Is this memory too > small? Or, there is somthing wrong in my configuration. I have attached the > configuration files and log files with this email. I will be very appreciated > if you can help me to check these files. > > BTW: > the output is > ---------------------------------------- > hadoop@debian-98:~$ $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i > wikipedia/chunks -o wikipediainput -c > $MAHOUT_HOME/examples/temp/categories.txt > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. > Running on hadoop, using HADOOP_HOME=/home/hadoop/hadoop-0.20.2 > No HADOOP_CONF_DIR set, using /home/hadoop/hadoop-0.20.2/conf > MAHOUT-JOB: > /home/hadoop/mahout-distribution-0.6/examples/target/mahout-examples-0.6-job.jar > 12/04/12 01:40:23 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props > found on classpath, will use command-line arguments only > 12/04/12 01:41:03 INFO bayes.WikipediaDatasetCreatorDriver: Input: > wikipedia/chunks Out: wikipediainput Categories: > /home/hadoop/mahout-distribution-0.6//examples/temp/categories.txt > 12/04/12 01:41:04 INFO common.HadoopUtil: Deleting wikipediainput > 12/04/12 01:41:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing > the arguments. Applications should implement Tool for the same. > 12/04/12 01:41:05 INFO input.FileInputFormat: Total input paths to process : 7 > 12/04/12 01:41:07 INFO mapred.JobClient: Running job: job_201204120140_0001 > 12/04/12 01:41:08 INFO mapred.JobClient: map 0% reduce 0% > Killed > > Best Regards, > Fu, Binzhang > >> -----原始邮件----- >> 发件人: "Djordje Jevdjic" <[email protected]> >> 发送时间: 2012年4月12日 星期四 >> 收件人: "Fu Bin-zhang" <[email protected]>, "[email protected]" >> <[email protected]> >> 抄送: >> 主题: RE: A question about "data analytics" >> >> Hello Fu Bin-zhang, >> >> The error message is very weird because FileSplit is a class derived from >> InputSplit, >> and the conversion is legal. However, I've seen this message several times. >> The error >> is highly likely related to the location of the hadoop tmp directory. Could >> you please compress and >> send me the your $HADOOP_HOME/conf folder? No need to broadcast to the list, >> send it to me directly. >> >> Regards, >> Djordje >> ________________________________________ >> From: Fu Bin-zhang [[email protected]] >> Sent: Wednesday, April 11, 2012 4:11 PM >> To: [email protected] >> Subject: A question about "data analytics" >> >> Hi all, >> >> I am trying to run the data analytics benchmark. I followed the >> intructions in the cloudsuite website. Everything is ok until the 7th step >> "create the category-based split of the Wikipedia dataset: ". The error is >> "java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit". I failed to find the answer with >> google. Can anybody give a hint? >> >> Thanks in advance. >> >> The output is: >> ------------------------------- >> hadoop@debian-98:~$ $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i >> wikipedia/chunks -o wikipediainput -c >> $MAHOUT_HOME/examples/temp/categories.txt >> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. >> Running on hadoop, using HADOOP_HOME=/home/hadoop/hadoop-0.20.2 >> No HADOOP_CONF_DIR set, using /home/hadoop/hadoop-0.20.2/conf >> MAHOUT-JOB: >> /home/hadoop/mahout-distribution-0.6/examples/target/mahout-examples-0.6-job.jar >> 12/04/11 06:55:10 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props >> found on classpath, will use command-line arguments only >> 12/04/11 06:55:12 INFO bayes.WikipediaDatasetCreatorDriver: Input: >> wikipedia/chunks Out: wikipediainput Categories: >> /home/hadoop/mahout-distribution-0.6/examples/temp/categories.txt >> 12/04/11 06:55:13 INFO common.HadoopUtil: Deleting wikipediainput >> 12/04/11 06:55:13 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 12/04/11 06:55:15 INFO input.FileInputFormat: Total input paths to process : >> 7 >> 12/04/11 06:55:17 INFO mapred.JobClient: Running job: job_201204110624_0002 >> 12/04/11 06:55:18 INFO mapred.JobClient: map 0% reduce 0% >> 12/04/11 06:55:44 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000003_0, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:55:48 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000000_0, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:55:48 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000001_0, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:55:51 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000002_0, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:55:54 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000003_1, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:03 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000001_1, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:03 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000000_1, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:03 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000002_1, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:06 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000003_2, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:15 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000002_2, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:18 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000000_2, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:18 INFO mapred.JobClient: Task Id : >> attempt_201204110624_0002_m_000001_2, Status : FAILED >> java.lang.ClassCastException: >> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to >> org.apache.hadoop.mapred.InputSplit >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 12/04/11 06:56:24 INFO mapred.JobClient: Job complete: job_201204110624_0002 >> 12/04/11 06:56:24 INFO mapred.JobClient: Counters: 3 >> 12/04/11 06:56:24 INFO mapred.JobClient: Job Counters >> 12/04/11 06:56:24 INFO mapred.JobClient: Launched map tasks=14 >> 12/04/11 06:56:24 INFO mapred.JobClient: Data-local map tasks=14 >> 12/04/11 06:56:24 INFO mapred.JobClient: Failed map tasks=1 >> 12/04/11 06:56:24 INFO driver.MahoutDriver: Program took 74439 ms (Minutes: >> 1.24065) >> >> ---------------- >> Fu, Binzhang >> 2012-04-11 > > > -- Best Regards, Fu, Bin-zhang Assistant Professor Institute of Computing Technology Chinese Academy of Sciences Tel: +86-010-62601035
