[
https://issues.apache.org/jira/browse/MAHOUT-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302710#comment-15302710
]
ASF GitHub Bot commented on MAHOUT-1863:
----------------------------------------
Github user andrewpalumbo commented on the pull request:
https://github.com/apache/mahout/pull/235#issuecomment-221965587
We are actually not doing anything in the way of MapReduce anymore. Since
Mahout 0.9: MAHOUT-1510, we're doing all of the our new work in what we've
called the Mahout "Samsara" environment:
http://mahout.apache.org/users/sparkbindings/home.html
aka "Mahout on Spark", "Mahout on Flink", "Mahout on H2O", etc..
Many of the committers responsible for maintaining the MapReduce code are
not currently active, making it hard to review patches.
That being said, There may be some interest in clean up of certain
MapReduce algorithms maintained by the maintainers that are still around if
they you have some obvious bug fixes that you've found. Maybe you could shoot
an email to dev@ if you have some in mind?
If you'd like to get more involved in Mahout, you'll find yourself very
welcomed! Your time would probably be better spent working on the new
framework. We have a good amount of JIRAs started for the next (0.13.0)
release and will be adding more, as we just finished up the milestone 0.12.x
releases.
Thank you again for the patch!
> cluster-syntheticcontrol.sh errors out with "Input path does not exist"
> -----------------------------------------------------------------------
>
> Key: MAHOUT-1863
> URL: https://issues.apache.org/jira/browse/MAHOUT-1863
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Albert Chu
> Priority: Minor
>
> Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error:
> {noformat}
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
> not exist: hdfs://apex156:54310/user/achu/testdata
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
> at
> org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
> at
> org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133)
> at
> org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch
> {noformat}
> commit 23267a0bef064f3351fd879274724bcb02333c4a
> {noformat}
> one change in question
> {noformat}
> - $DFS -mkdir testdata
> + $DFS -mkdir ${WORK_DIR}/testdata
> {noformat}
> now requires that the -p option be specified to -mkdir. This fix is simple.
> Another change:
> {noformat}
> - $DFS -put ${WORK_DIR}/synthetic_control.data testdata
> + $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata
> {noformat}
> appears to break the example b/c in:
> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
> the file 'testdata' is hard coded into the example as just 'testdata'.
> ${WORK_DIR}/testdata needs to be passed in as an option.
> Reverting the lines listed above fixes the problem. However, the reverting
> presumably breaks the original problem listed in MAHOUT-1773.
> I originally attempted to fix this by simply passing in the option "--input
> ${WORK_DIR}/testdata" into the command in the script. However, a number of
> other options are required if one option is specified.
> I considered modifying the above Job.java files to take a minimal number of
> arguments and set the rest to some default, but that would have also required
> changes to DefaultOptionCreator.java to make required options non-optional,
> which I didn't want to go down the path of determining what other examples
> had requires/non-requires requirements.
> So I just passed in every required option into cluster-syntheticcontrol.sh to
> fix this, using whatever defaults were hard coded into the Job.java files
> above.
> I'm sure there's a better way to do this, and I'm happy to supply a patch,
> but thought I'd start with this.
> Github pull request to be sent shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)