Re: Question about data analytic

Jinchun Kim Sun, 24 Mar 2013 16:56:12 -0700

Thanks Djordje.

The heap size indicated in mapred-site.xml is set to -Xmx 2048M and my
machine has 8GB DRAM.
Based on your reply to Fu, (
http://www.mail-archive.com/[email protected]/msg00019.html)
I'm using 4 mappers and 2 reducers, so I guess my machine is not able to
run benchmark with 2GB heap size.
In the reply, you said


number of maps = number of cores you want to run this on
number of reduce jobs = 1, unless the number of mappers is >8
amount of memory = number of mappers * heap size


Thus, running 4 mappers will require 8GB heap size in total, which is
not available for my machine

because OS and other processes might use the heap area also.

I'm going to reduce the heap size for 1.5GB and try it again.


What I'm wondering is if reducers also use the heap area...

If so, I need to decrease the number of reducers and hep size.

Does each reducer require the heap area?



On Sun, Mar 24, 2013 at 6:05 PM, Djordje Jevdjic <[email protected]>wrote:

> Dear Jinchun,
>
> A timeout of 1200sec is already too generous. Increasing it will not solve
> the problem.
> I cannot see your logs, but yes, the problem again seems to be the
> indicated heap size
> and the DRAM capacity your machine has.
>
> Regards,
> Djordje
> ________________________________________
> From: Jinchun Kim [[email protected]]
> Sent: Friday, March 22, 2013 3:04 PM
> To: Djordje Jevdjic
> Cc: [email protected]
> Subject: Re: Question about data analytic
>
> Thanks Djordje :)
> I was able to prepare the input data file and now I'm trying to create
> category-based splits of
> Wikipedia dataset(41GB) and the training data set(5GB) using Mahout.
>
> I had no problem with the training data set, but Hadoop showed following
> messages
> when I tried to do a same job with Wikipedia dataset,
>
> .........
> 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
> 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
> 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_r_000000_1, Status : FAILED
> Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
> seconds. Killing!
> 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000037_1, Status : FAILED
> Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
> seconds. Killing!
> 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
> 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000018_1, Status : FAILED
> Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
> seconds. Killing!
> 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
> 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
> ........
>
> Reduce falls back to the previous point and the process gets end at map
> 46%, reduce 2% without being completed.
> Is this also relevant to the heap and DRAM size?
> I was wondering if increasing outage time will help or not..
>
>
> On Fri, Mar 22, 2013 at 8:46 AM, Djordje Jevdjic <[email protected]
> <mailto:[email protected]>> wrote:
> Dear Jinchun,
>
> The warning message that you get is irrelevant. The problem seems to be in
> the amount of memory that is given to the map-reduce tasks. You need to
> increase the heap size (e.g., run -Xmx 2048M) and make sure that you have
> enough DRAM for the heap size you indicate. To change the heap size, edit
> the following file
> $HADOOP_HOME/conf/mapred-site.xml
> and specify the heap size by adding/changing the following parameter
> mapred.child.java.opts
>
> If your machine doesn't have enough DRAM, the whole process of preparing
> the data and the model is indeed expected to take a couple of hours.
>
> Regards,
> Djordje
> ________________________________________
> From: Jinchun Kim [[email protected]<mailto:[email protected]>]
> Sent: Friday, March 22, 2013 1:14 PM
> To: [email protected]<mailto:[email protected]>
> Subject: Question about data analytic
>
> Hi, All.
>
> I'm trying to run Data analytic on my x86, Ubuntu machine.
> I found that when I divided 30GB Wikipedia input data into small chunks of
> 64MB,
> CPU usage was really low.
> It was checked by /usr/bin/time command.
> Most of execution time was idle and waiting.
> User cpu time was only 13% of total running time.
>
> Is it because I'm running Data analytic with single node?
> Or does it have something to do with following warning message..?
>
> WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath,
> will use command-line arguments only
>
> I don't understand why user cpu time is so low while it takes 2.5 hours to
> finish
> splitting Wikipedia inputs.
> Thanks!
>
> --
> Jinchun Kim
>
>
>
> --
> Jinchun Kim
>



-- 
Thanks,
Jinchun Kim

Re: Question about data analytic

Reply via email to