[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132104#comment-13132104
]
Jeff Eastman commented on MAHOUT-524:
-------------------------------------
I've found where the /data is being added to the input path: its in
SequenceFileInputFormat.listStatus(JobConf). Here is where
MapFile.DATA_FILE_NAME is appended to get the dataFile path. This seems to not
be the source of the problem; however, rather I'm looking in DRM.times() where
it calls TimesSquaredJob.createTimesJobConf(...). Looks to me like this method
is setting the conf feature "DistributedMatrix.times.inputVector" to the
correct file path
(examples/output/calculations/laplacian-25/tmp/<ts>/DistributedMatrix.times.inputVector/<ts>),
but is not setting the job's input paths, since
FileInputFormat.getInputPaths(new JobConf(conf)) returns only
"examples/output/calculations/laplacian-25".
By the time the thread gets to listStatus() after kicking off DRM.times(), the
JobConf input paths contain only
"examples/output/calculations/laplacian-113/tmp" and /data is appended to that.
The whole handling of Configurations and JobConfs is very twisted and difficult
to follow.
> DisplaySpectralKMeans example fails
> -----------------------------------
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.4, 0.5
> Reporter: Jeff Eastman
> Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt,
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira