Re: A question about "data analytics"

Fu Bin-zhang Sun, 15 Apr 2012 19:35:54 -0700

Hi Djordje,

Thanks for your help, the method works.


Best Regards,
Fu, Binzhang

于 2012/4/13 6:12, Djordje Jevdjic 写道:
> Hello Fu,
>
> I see that your mapred-site.xml file says that you want four map and two 
> reduce processes. Each of these is spawned by the framework as
> a separate process. You also ask for 2GB of heap space per process, which you 
> definitely don't have in your little Xen machine. 
> Basically, the minimum memory you need (number_of_map_processes * heap_size) 
> + the framework overhead (less than 1GB) to 
> run this. How many cores does your virtual machine have? For the purpose of 
> this benchmark, you should have a hardware configuration similar to the 
> following:
>
> number of maps = number of cores you want to run this on
> number of reduce jobs = 1, unless the number of mappers is >8
> amount of memory = number of mappers * heap size
>
> If you can't afford more than 2GB of memory, I suggest that you change the 
> number of mappers and reducers to 1 in the config file and set the heap size 
> to 1.5GB.
> The following parameters are affected (mapred-site.xml):
>
> <property>
>   <name>mapred.tasktracker.map.tasks.maximum</name>
>       <value>1</value>
>             <description>The maximum number of map tasks that will be run 
> simultaneously by a task tracker. </description>
>   </property>
>
> <property>
>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>       <value>1</value>
>             <description>The maximum number of reduce tasks that will be run 
> simultaneously by a task tracker. </description>
>     </property>
>
>
> <property>
>   <name>mapred.map.tasks</name>
>       <value>1</value>
>             <description>The default number of map tasks per job. Ignored 
> when mapred.job.tracker is "local".</description>
>      </property>
>
> <property>
>   <name>mapred.reduce.tasks</name>
>       <value>1</value>
>         <description>The default number of reduce tasks per job.</description>
>  </property>
>
> <property>
>   <name>mapred.child.java.opts</name>
>       <value>-Xmx1536M</value>
>  </property>
>
>
> Also, please check the benchmark documentation tomorrow, I will refresh the 
> instructions so that 
> you can run the benchmark with smaller memory requirements. 
>
>
> Regards,
> Djordje
>
>
>
> ________________________________________
> From: 付斌章 [[email protected]]
> Sent: Thursday, April 12, 2012 3:27 PM
> To: Djordje Jevdjic
> Subject: Re: RE: A question about "data analytics"
>
> Hello Djordje,
>
>     Thanks for your advice, the problem is really caused by the tmp 
> directory. I think the reason maybe that i didn't reformat the namenode after 
> i changed the tmp directory. After i reformatted it, the "class cast" 
> exception disappeared.  Unfortunately, another problem happened.  The job was 
> killed every time i tried.  I find in "tasktracker.log" that the error is 
> "FATAL org.apache.hadoop.mapred.TaskTracker: Task: 
> attempt_201204120549_0001_m_000000_3 - Killed : Java heap space".
>
>     Does it mean that the main memory is not enough. Actually, i am running 
> data analytics in a Xen virtual machine with 2GB memory. Is this memory too 
> small? Or, there is somthing wrong in my configuration. I have attached the 
> configuration files and log files with this email. I will be very appreciated 
> if you can help me to check these files.
>
> BTW:
> the output is
> ----------------------------------------
> hadoop@debian-98:~$ $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i 
> wikipedia/chunks -o wikipediainput -c 
> $MAHOUT_HOME/examples/temp/categories.txt
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using HADOOP_HOME=/home/hadoop/hadoop-0.20.2
> No HADOOP_CONF_DIR set, using /home/hadoop/hadoop-0.20.2/conf
> MAHOUT-JOB: 
> /home/hadoop/mahout-distribution-0.6/examples/target/mahout-examples-0.6-job.jar
> 12/04/12 01:40:23 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props 
> found on classpath, will use command-line arguments only
> 12/04/12 01:41:03 INFO bayes.WikipediaDatasetCreatorDriver: Input: 
> wikipedia/chunks Out: wikipediainput Categories: 
> /home/hadoop/mahout-distribution-0.6//examples/temp/categories.txt
> 12/04/12 01:41:04 INFO common.HadoopUtil: Deleting wikipediainput
> 12/04/12 01:41:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
> the arguments. Applications should implement Tool for the same.
> 12/04/12 01:41:05 INFO input.FileInputFormat: Total input paths to process : 7
> 12/04/12 01:41:07 INFO mapred.JobClient: Running job: job_201204120140_0001
> 12/04/12 01:41:08 INFO mapred.JobClient:  map 0% reduce 0%
> Killed
>
> Best Regards,
> Fu, Binzhang
>
>> -----原始邮件-----
>> 发件人: "Djordje Jevdjic" <[email protected]>
>> 发送时间: 2012年4月12日 星期四
>> 收件人: "Fu Bin-zhang" <[email protected]>, "[email protected]" 
>> <[email protected]>
>> 抄送:
>> 主题: RE: A question about "data analytics"
>>
>> Hello Fu Bin-zhang,
>>
>> The error message is very weird because FileSplit is a class derived from 
>> InputSplit,
>> and the conversion is legal. However, I've seen this message several times. 
>> The error
>> is highly likely related to the location of the hadoop tmp directory. Could 
>> you please compress and
>> send me the your $HADOOP_HOME/conf folder? No need to broadcast to the list, 
>> send it to me directly.
>>
>> Regards,
>> Djordje
>> ________________________________________
>> From: Fu Bin-zhang [[email protected]]
>> Sent: Wednesday, April 11, 2012 4:11 PM
>> To: [email protected]
>> Subject: A question about "data analytics"
>>
>> Hi all,
>>
>>     I am trying to run the data analytics benchmark. I followed the 
>> intructions in the cloudsuite website. Everything is ok until the 7th step 
>> "create the category-based split of the Wikipedia dataset: ". The error is 
>> "java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit". I failed to find the answer with 
>> google. Can anybody give a hint?
>>
>>     Thanks in advance.
>>
>> The output is:
>> -------------------------------
>> hadoop@debian-98:~$ $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i 
>> wikipedia/chunks -o wikipediainput -c 
>> $MAHOUT_HOME/examples/temp/categories.txt
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using HADOOP_HOME=/home/hadoop/hadoop-0.20.2
>> No HADOOP_CONF_DIR set, using /home/hadoop/hadoop-0.20.2/conf
>> MAHOUT-JOB: 
>> /home/hadoop/mahout-distribution-0.6/examples/target/mahout-examples-0.6-job.jar
>> 12/04/11 06:55:10 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props 
>> found on classpath, will use command-line arguments only
>> 12/04/11 06:55:12 INFO bayes.WikipediaDatasetCreatorDriver: Input: 
>> wikipedia/chunks Out: wikipediainput Categories: 
>> /home/hadoop/mahout-distribution-0.6/examples/temp/categories.txt
>> 12/04/11 06:55:13 INFO common.HadoopUtil: Deleting wikipediainput
>> 12/04/11 06:55:13 WARN mapred.JobClient: Use GenericOptionsParser for 
>> parsing the arguments. Applications should implement Tool for the same.
>> 12/04/11 06:55:15 INFO input.FileInputFormat: Total input paths to process : 
>> 7
>> 12/04/11 06:55:17 INFO mapred.JobClient: Running job: job_201204110624_0002
>> 12/04/11 06:55:18 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/04/11 06:55:44 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000003_0, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:55:48 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000000_0, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:55:48 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000001_0, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:55:51 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000002_0, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:55:54 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000003_1, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:03 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000001_1, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:03 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000000_1, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:03 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000002_1, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:06 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000003_2, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:15 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000002_2, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:18 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000000_2, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:18 INFO mapred.JobClient: Task Id : 
>> attempt_201204110624_0002_m_000001_2, Status : FAILED
>> java.lang.ClassCastException: 
>> org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to 
>> org.apache.hadoop.mapred.InputSplit
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:323)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 12/04/11 06:56:24 INFO mapred.JobClient: Job complete: job_201204110624_0002
>> 12/04/11 06:56:24 INFO mapred.JobClient: Counters: 3
>> 12/04/11 06:56:24 INFO mapred.JobClient:   Job Counters
>> 12/04/11 06:56:24 INFO mapred.JobClient:     Launched map tasks=14
>> 12/04/11 06:56:24 INFO mapred.JobClient:     Data-local map tasks=14
>> 12/04/11 06:56:24 INFO mapred.JobClient:     Failed map tasks=1
>> 12/04/11 06:56:24 INFO driver.MahoutDriver: Program took 74439 ms (Minutes: 
>> 1.24065)
>>
>> ----------------
>> Fu, Binzhang
>> 2012-04-11
>
>
>


-- 
Best Regards,
Fu, Bin-zhang
Assistant Professor
Institute of Computing Technology
Chinese Academy of Sciences
Tel: +86-010-62601035

Re: A question about "data analytics"

Reply via email to