[jira] [Commented] (MAHOUT-958) NullPointerException in RepresentativePointsMapper when running cluster-reuters.sh example with kmeans

Adam J. Baron (JIRA) Fri, 12 Oct 2012 11:35:11 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475225#comment-13475225
 ]


Adam J. Baron commented on MAHOUT-958:
--------------------------------------

I had the exact same issue, but what ehgjr said about wildcards in a January 
2012 comment gave me an idea.

The problem in the cluster-reuters.sh script is the 'clusters-*-final':
  $MAHOUT clusterdump \
    -i ${WORK_DIR}/reuters-kmeans/clusters-*-final \
    -o ${WORK_DIR}/reuters-kmeans/clusterdump \
    -d ${WORK_DIR}/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 \
    -dt sequencefile -b 100 -n 20 --evaluate -dm 
org.apache.mahout.common.distance.CosineDistanceMeasure -sp 0 \
    --pointsDir ${WORK_DIR}/reuters-kmeans/clusteredPoints \

For me, the clusters-*-final resolved to clusters-2-final.  So I just re-ran 
that one clusterdump command outside of the cluster-reuters.sh script using 
'clusters-2-final' instead and all ran fine.  Obviously not a fix to 
cluster-reuters.sh, but a workaround to help you see the clusterdump results.

PS: I'm running this over a 20-node Hadoop cluster, not locally.  It seems 
strange that the --input, --dictionary and --pointsDir parameters reference 
HDFS locations while the the --output parameter references your EdgeNode's file 
system.
                
> NullPointerException in RepresentativePointsMapper when running 
> cluster-reuters.sh example with kmeans
> ------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-958
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-958
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.6
>         Environment: {code}
> > uname -a
> Linux 3.2.1-3.fc16.x86_64 #1 SMP Mon Jan 23 15:36:17 UTC 2012 x86_64 x86_64 
> x86_64 GNU/Linux
> {code}
> {code}
> > java -version
> java version "1.7.0_02"
> Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)
> {code}
> Hadoop Version: 0.20.203.0, r1099333
>            Reporter: Rares Vernica
>            Assignee: Grant Ingersoll
>
> {code}
> > svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/mahout/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 1235544
> Node Kind: directory
> Schedule: normal
> Last Changed Author: tdunning
> Last Changed Rev: 1231800
> Last Changed Date: 2012-01-15 16:01:38 -0800 (Sun, 15 Jan 2012)
> {code}
> {code}
> > ./examples/bin/cluster-reuters.sh
> ...
> 1. kmeans clustering
> ...
> Inter-Cluster Density: NaN
> Intra-Cluster Density: 0.0
> CDbw Inter-Cluster Density: 0.0
> CDbw Intra-Cluster Density: NaN
> CDbw Separation: 0.0
> 12/01/24 16:08:47 INFO clustering.ClusterDumper: Wrote 20 clusters
> 12/01/24 16:08:47 INFO driver.MahoutDriver: Program took 126749 ms (Minutes: 
> 2.1124833333333335)
> {code}
> All five "{{Representative Points Driver}}" jobs fail.
> {code}
> 2012-01-24 16:07:11,555 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
> the native-hadoop library
> 2012-01-24 16:07:11,881 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
> 100
> 2012-01-24 16:07:11,896 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
> 79691776/99614720
> 2012-01-24 16:07:11,896 INFO org.apache.hadoop.mapred.MapTask: record buffer 
> = 262144/327680
> 2012-01-24 16:07:11,956 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-01-24 16:07:11,979 INFO org.apache.hadoop.io.nativeio.NativeIO: 
> Initialized cache for UID to User mapping with a cache timeout of 14400 
> seconds.
> 2012-01-24 16:07:11,979 INFO org.apache.hadoop.io.nativeio.NativeIO: Got 
> UserName vernica for UID 1000 from the native implementation
> 2012-01-24 16:07:11,981 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.NullPointerException
>       at 
> org.apache.mahout.clustering.evaluation.RepresentativePointsMapper.mapPoint(RepresentativePointsMapper.java:73)
>       at 
> org.apache.mahout.clustering.evaluation.RepresentativePointsMapper.map(RepresentativePointsMapper.java:60)
>       at 
> org.apache.mahout.clustering.evaluation.RepresentativePointsMapper.map(RepresentativePointsMapper.java:40)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>       at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-958) NullPointerException in RepresentativePointsMapper when running cluster-reuters.sh example with kmeans

Reply via email to