Hmm, my guess is that you are actually giving the VM too much memory,
such that you are constantly in swap, which is forcing the GC to take
too long, which is what that exception is, I believe.
I have a 2gb laptop and was able to run the Synthetic control
without a problem, although it looks like you are doing things a
little bit differently by using a diff. distance measure. Can you run
the "default" synthetic control K-means?
On Nov 13, 2008, at 3:41 PM, Philippe Lamarche wrote:
2gig
On Thu, Nov 13, 2008 at 2:57 PM, Grant Ingersoll
<[EMAIL PROTECTED]>wrote:
How much memory does your laptop have?
On Nov 13, 2008, at 11:53 AM, Philippe Lamarche wrote:
Hi,
I am using KMeans to do some text clustering and I get into memory
problems.
As of now, I only tried it on a laptop in pseudo distributed
master/slave
mode.
This is on Hadoop branch-0.19. The "texttovector.jar" contains a
hacked
version of the syntheticcontrol KMeans example, the only
difference is in
the first input phase.
Is this memory error "normal"? I am running with export
HADOOP_OPTS="-server
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:NewSize=1G
-XX:MaxNewSize=1G
-XX:-UseGCOverheadLimit"
In my understanding, the "-XX:-UseGCOverheadLimit" should remove the
GCOverhead "feature".
Any ideas?
[EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar
/home/philippe/workspace/MTI830/dist/texttovector.jar
org.apache.mahout.clustering.text.kmeans.Job testallmti/vectors/
part*
testallclusteroutput1
org.apache.mahout.utils.TanimotoDistanceMeasure
1.001
.001 .000005 10
08/11/13 11:37:23 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
08/11/13 11:37:23 INFO mapred.FileInputFormat: Total input paths to
process
: 1
08/11/13 11:37:23 INFO mapred.JobClient: Running job:
job_200811131133_0007
08/11/13 11:37:24 INFO mapred.JobClient: map 0% reduce 0%
08/11/13 11:37:37 INFO mapred.JobClient: map 31% reduce 0%
08/11/13 11:37:42 INFO mapred.JobClient: map 63% reduce 0%
08/11/13 11:37:45 INFO mapred.JobClient: map 83% reduce 0%
08/11/13 11:37:50 INFO mapred.JobClient: map 100% reduce 0%
08/11/13 11:37:51 INFO mapred.JobClient: Job complete:
job_200811131133_0007
08/11/13 11:37:51 INFO mapred.JobClient: Counters: 7
08/11/13 11:37:51 INFO mapred.JobClient: File Systems
08/11/13 11:37:51 INFO mapred.JobClient: HDFS bytes
read=118875664
08/11/13 11:37:51 INFO mapred.JobClient: HDFS bytes
written=146866785
08/11/13 11:37:51 INFO mapred.JobClient: Job Counters
08/11/13 11:37:51 INFO mapred.JobClient: Launched map tasks=2
08/11/13 11:37:51 INFO mapred.JobClient: Data-local map tasks=2
08/11/13 11:37:51 INFO mapred.JobClient: Map-Reduce Framework
08/11/13 11:37:51 INFO mapred.JobClient: Map input records=1702
08/11/13 11:37:51 INFO mapred.JobClient: Map input
bytes=118836254
08/11/13 11:37:51 INFO mapred.JobClient: Map output records=1702
08/11/13 11:37:51 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
08/11/13 11:37:51 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/11/13 11:37:51 INFO mapred.JobClient: Running job:
job_200811131133_0008
08/11/13 11:37:52 INFO mapred.JobClient: map 0% reduce 0%
08/11/13 11:38:07 INFO mapred.JobClient: map 4% reduce 0%
08/11/13 11:38:12 INFO mapred.JobClient: map 9% reduce 0%
08/11/13 11:38:17 INFO mapred.JobClient: map 11% reduce 0%
08/11/13 11:38:22 INFO mapred.JobClient: map 13% reduce 0%
08/11/13 11:38:27 INFO mapred.JobClient: map 15% reduce 0%
08/11/13 11:38:32 INFO mapred.JobClient: map 16% reduce 0%
08/11/13 11:38:37 INFO mapred.JobClient: map 18% reduce 0%
08/11/13 11:38:42 INFO mapred.JobClient: map 19% reduce 0%
08/11/13 11:38:47 INFO mapred.JobClient: map 21% reduce 0%
08/11/13 11:38:52 INFO mapred.JobClient: map 22% reduce 0%
08/11/13 11:38:57 INFO mapred.JobClient: map 23% reduce 0%
08/11/13 11:39:01 INFO mapred.JobClient: map 24% reduce 0%
08/11/13 11:39:06 INFO mapred.JobClient: map 25% reduce 0%
08/11/13 11:39:12 INFO mapred.JobClient: map 26% reduce 0%
08/11/13 11:39:17 INFO mapred.JobClient: map 27% reduce 0%
08/11/13 11:39:27 INFO mapred.JobClient: map 28% reduce 0%
08/11/13 11:39:37 INFO mapred.JobClient: map 29% reduce 0%
08/11/13 11:39:47 INFO mapred.JobClient: map 30% reduce 0%
08/11/13 11:39:57 INFO mapred.JobClient: map 31% reduce 0%
08/11/13 11:40:07 INFO mapred.JobClient: map 32% reduce 0%
08/11/13 11:40:17 INFO mapred.JobClient: map 33% reduce 0%
08/11/13 11:40:32 INFO mapred.JobClient: map 34% reduce 0%
08/11/13 11:40:42 INFO mapred.JobClient: map 35% reduce 0%
08/11/13 11:40:52 INFO mapred.JobClient: map 36% reduce 0%
08/11/13 11:41:07 INFO mapred.JobClient: map 37% reduce 0%
08/11/13 11:41:17 INFO mapred.JobClient: map 38% reduce 0%
08/11/13 11:41:33 INFO mapred.JobClient: map 39% reduce 0%
08/11/13 11:41:38 INFO mapred.JobClient: map 40% reduce 0%
08/11/13 11:41:53 INFO mapred.JobClient: map 41% reduce 0%
08/11/13 11:42:03 INFO mapred.JobClient: map 42% reduce 0%
08/11/13 11:42:17 INFO mapred.JobClient: map 43% reduce 0%
08/11/13 11:42:32 INFO mapred.JobClient: map 44% reduce 0%
08/11/13 11:42:42 INFO mapred.JobClient: map 45% reduce 0%
08/11/13 11:42:57 INFO mapred.JobClient: map 46% reduce 0%
08/11/13 11:43:13 INFO mapred.JobClient: map 47% reduce 0%
08/11/13 11:43:33 INFO mapred.JobClient: map 48% reduce 0%
08/11/13 11:43:48 INFO mapred.JobClient: map 49% reduce 0%
08/11/13 11:44:08 INFO mapred.JobClient: map 50% reduce 0%
08/11/13 11:44:28 INFO mapred.JobClient: map 51% reduce 0%
08/11/13 11:44:53 INFO mapred.JobClient: map 52% reduce 0%
08/11/13 11:45:23 INFO mapred.JobClient: map 53% reduce 0%
08/11/13 11:46:03 INFO mapred.JobClient: map 54% reduce 0%
08/11/13 11:46:10 INFO mapred.JobClient: map 28% reduce 0%
08/11/13 11:46:10 INFO mapred.JobClient: Task Id :
attempt_200811131133_0008_m_000000_0, Status : FAILED
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.mahout.matrix.DenseVector
$Iterator.next(DenseVector.java:184)
at
org.apache.mahout.matrix.DenseVector
$Iterator.next(DenseVector.java:172)
at
org
.apache
.mahout
.utils
.TanimotoDistanceMeasure.distance(TanimotoDistanceMeasure.java:73)
at
org
.apache
.mahout
.clustering.canopy.Canopy.emitPointToNewCanopies(Canopy.java:181)
at
org
.apache
.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:42)
at
org
.apache
.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
[EMAIL PROTECTED]:/usr/local/hadoop$
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ