[
https://issues.apache.org/jira/browse/MAHOUT-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Davydov updated MAHOUT-1128:
-----------------------------------
Environment:
I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with
Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "client" machine and it submits tasks to
cluster.
was:
I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with
Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "clien" machine and it submit tasks to
cluster.
> MAHOUT-999 issue still actual
> ------------------------------
>
> Key: MAHOUT-1128
> URL: https://issues.apache.org/jira/browse/MAHOUT-1128
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.7
> Environment: I work on Hadoop 1.0.3 cluster deployed on Amazon EC2
> virtual computers with Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
> I run my application from separated "client" machine and it submits tasks to
> cluster.
> Reporter: Andrey Davydov
>
> I'm sorry my english is not well and I'm newbie with Mahout. But it seems
> that MAHOUT-999 issue still actual.
> I use mahout-core 0.7 loaded from maven-central and I've got the same fail.
> I've investigate sources and found following in the
> org.apache.mahout.clustering.classify.ClusterClassifier class:
> public void writeToSeqFiles(Path path) throws IOException {
> writePolicy(policy, path);
> Configuration config = new Configuration();
> FileSystem fs = FileSystem.get(path.toUri(), config);
> SequenceFile.Writer writer = null;
> ClusterWritable cw = new ClusterWritable();
> for (int i = 0; i < models.size(); i++) {
> ...
> } finally {
> Closeables.closeQuietly(writer);
> }
> }
> }
>
> public void readFromSeqFiles(Configuration conf, Path path) throws
> IOException {
> Configuration config = new Configuration();
> List<Cluster> clusters = Lists.newArrayList();
> for (ClusterWritable cw : new
> SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
> PathFilters.logsCRCFilter(), config)) {
> ...
> }
> this.models = clusters;
> modelClass = models.get(0).getClass().getName();
> this.policy = readPolicy(path);
> }
> Both methods use new default Configuration and they try to work with local
> file system. I.e. KMeansDriver wrote initial clusters to local file system of
> the "client" system and CIMapper try to read it from cluster node local file
> system.
> It seems that current implementation can work only pseudo-distributed hadoop
> system. I think that ClusterClassifier should store intermediate results in
> the HDFS using Configuration passed by api from user.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira