[
https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459672#comment-13459672
]
Brock Noland edited comment on CRUNCH-68 at 9/21/12 2:31 AM:
-------------------------------------------------------------
Alright, here is what I have uncovered:
1) The reason that the main and run methods are getting the classname is
because the jar manifest has the classname already specified:
{code}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar
not.a.class.name wordcount/input wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job
"org.apache.crunch.examples.WordCount:
Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{code}
Note that not.a.class.name is only required because the run() method is looking
for 3 args.
2) Due to #1, it's actually not possible to run the other examples:
{code}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar
org.apache.crunch.examples.TotalBytesByIP access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job
"org.apache.crunch.examples.WordCount:
Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{code}
3) All examples use ToolRunner which in both 1.X and 2.X already parse the args
with GenericOptionsParser and pass the remaining args to the run() method:
https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64
Points of action:
1) Either a jar should be generated for all examples or we should remove the
mainClass from the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar
manifest or on the command line and will never be passed to the run() method
unless you have it both in the manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.
Let me know if you agree and I can open JIRAs for said items.
was (Author: brocknoland):
Alright, here is what I have uncovered:
1) The reason that the main and run methods are getting the classname is
because the jar manifest has the classname already specified:
{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar
not.a.class.name wordcount/input wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job
"org.apache.crunch.examples.WordCount:
Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{noformat}
Note that not.a.class.name is only required because the run() method is looking
for 3 args.
2) Due to #1, it's actually not possible to run the other examples:
{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar
org.apache.crunch.examples.TotalBytesByIP access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job
"org.apache.crunch.examples.WordCount:
Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{noformat}
3) All examples use ToolRunner which in both 1.X and 2.X already parse the args
with GenericOptionsParser and pass the remaining args to the run() method:
https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64
Points of action:
1) Either a jar should be generated for all examples or we should remove the
mainClass from the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar
manifest or on the command line and will never be passed to the run() method
unless you have it both in the manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.
Let me know if you agree and I can open JIRAs for said items.
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
> Key: CRUNCH-68
> URL: https://issues.apache.org/jira/browse/CRUNCH-68
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.3.0
> Reporter: Roman Shaposhnik
> Assignee: Matthias Friedrich
> Fix For: 0.4.0
>
> Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
> if (args.length != 3) {
> System.err.println();
> System.err.println("Usage: " + this.getClass().getName() + " [generic
> options] input output");
> System.err.println();
> GenericOptionsParser.printGenericCommandUsage(System.err);
> return 1;
> }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and
> thus you can't predict the value of
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2
> because of a MAPREDUCE-4068.
> Essentially at this point a combination of MAPREDUCE-4068 and inability to
> pass -libjars makes Crunch example DOA for Hadoop 2 clusters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira