Java heap space error on PFPGrowth

2010-11-11 Thread Mark
I am trying to run PFPGrowth but I keep receiving this Java heap space 
error at the end of the first step/beginning of second step.


I am using the following parameters:  -method mapreduce -regex [\\t] 
-s 5 -g 55000


Output:

..
10/11/11 08:12:56 INFO mapred.JobClient:  map 100% reduce 85%
10/11/11 08:12:59 INFO mapred.JobClient:  map 100% reduce 90%
10/11/11 08:13:02 INFO mapred.JobClient:  map 100% reduce 94%
10/11/11 08:13:09 INFO mapred.JobClient:  map 100% reduce 100%
10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
10/11/11 08:13:11 INFO mapred.JobClient:   Job Counters
10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient:   FileSystemCounters
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
10/11/11 08:13:11 INFO mapred.JobClient:   Map-Reduce Framework
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to 
process : 1

10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
10/11/11 08:13:45 INFO mapred.JobClient:  map 0% reduce 0%
10/11/11 08:14:16 INFO mapred.JobClient: Task Id : 
attempt_201011101701_0006_m_00_0, Status : FAILED

Error: Java heap space


Is there anything I can do to alleviate this problem?

FYI: I running a 4-node cluster with 12GB of ram in each machine.

Thanks


RE: Java heap space error on PFPGrowth

2010-11-11 Thread praveen.peddi
Hi Mark,
I got into the same error and figured that I needed to add following hadoop 
param in mapred-site.xml in hadoop 0.20.2. You can try with lesser memory than 
4GB.


  mapred.child.java.opts
  -Xmx4096m
  map heap size for child task


Hope this solves your issue.

Praveen

-Original Message-
From: ext Mark [mailto:static.void@gmail.com] 
Sent: Thursday, November 11, 2010 11:24 AM
To: common-user@hadoop.apache.org; u...@mahout.apache.org
Subject: Java heap space error on PFPGrowth

I am trying to run PFPGrowth but I keep receiving this Java heap space error at 
the end of the first step/beginning of second step.

I am using the following parameters:  -method mapreduce -regex [\\t] -s 5 
-g 55000

Output:

..
10/11/11 08:12:56 INFO mapred.JobClient:  map 100% reduce 85%
10/11/11 08:12:59 INFO mapred.JobClient:  map 100% reduce 90%
10/11/11 08:13:02 INFO mapred.JobClient:  map 100% reduce 94%
10/11/11 08:13:09 INFO mapred.JobClient:  map 100% reduce 100%
10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
10/11/11 08:13:11 INFO mapred.JobClient:   Job Counters
10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient:   FileSystemCounters
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
10/11/11 08:13:11 INFO mapred.JobClient:   Map-Reduce Framework
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to process : 1
10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
10/11/11 08:13:45 INFO mapred.JobClient:  map 0% reduce 0%
10/11/11 08:14:16 INFO mapred.JobClient: Task Id : 
attempt_201011101701_0006_m_00_0, Status : FAILED
Error: Java heap space


Is there anything I can do to alleviate this problem?

FYI: I running a 4-node cluster with 12GB of ram in each machine.

Thanks