removing dependency jars from the mahout binary distribution
The mahout distribution currently is shipping ~56 MB of dependecy jars in the /lib directory of the distribution. I believe most of these are included in the mahout-examples-*-job.jar. These are only added to the classpath by /bin/mahout in the binary distribution. It seems that we can remove them from the distribution. (we need to get the size of the distribution down) Any input is appreciated.
[jira] [Comment Edited] (MAHOUT-1705) Pare down job jar for mahout-examples
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529277#comment-14529277 ] Andrew Palumbo edited comment on MAHOUT-1705 at 5/5/15 9:14 PM: I'm wondering if it would make more sense to just keep all of the dependencies in the examples jar and ship that in the release as (with some tweaks) is and remove the {{/lib}} directory. This seems to be what {{/bin/mahout}} is setup for. was (Author: andrew_palumbo): I'm wondering if it would make more sense to just keep all of the dependencies in the examples jar and ship that in the release as is and remove the {{/lib}} directory. This seems to be what {{/bin/mahout}} is setup for. > Pare down job jar for mahout-examples > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo >Assignee: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAHOUT-1705) Pare down job jar for mahout-examples
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529277#comment-14529277 ] Andrew Palumbo commented on MAHOUT-1705: I'm wondering if it would make more sense to just keep all of the dependencies in the examples jar and ship that in the release as is and remove the {{/lib}} directory. This seems to be what {{/bin/mahout}} is setup for. > Pare down job jar for mahout-examples > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo >Assignee: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: where is IndexHDFS.java
>I don’t think this is from an Apache Project>I'm not aware of any Apache >project with a class named IndexHDFS Not a good news but at least I am now >confident on that. I have to ask some other third party projects that use >hadoop. Thanks, Regards, Mahmood
[jira] [Assigned] (MAHOUT-1705) Pare down job jar for mahout-examples
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo reassigned MAHOUT-1705: -- Assignee: Andrew Palumbo > Pare down job jar for mahout-examples > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo >Assignee: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAHOUT-1704) Pare down dependency jar for h2o
[ https://issues.apache.org/jira/browse/MAHOUT-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo reassigned MAHOUT-1704: -- Assignee: Andrew Palumbo > Pare down dependency jar for h2o > > > Key: MAHOUT-1704 > URL: https://issues.apache.org/jira/browse/MAHOUT-1704 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo >Assignee: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > The dependency jar for h2o is very large: ~68MB. Pare this down to only > include only necessary runtime classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Just noticed that web sites can be git based
Can you give a pointer to such an icon? On Tue, May 5, 2015 at 6:16 PM, Pat Ferrel wrote: > I asked to sign us up when this was first announced but haven’t heard > back. On another project I hit an “edit” icon on their site, which > automatically sent me to the page on github, where I was allowed to edit. > This automatically created a branch in my repo and a pr to the correct > branch of their repo. Very convenient. That way an edit icon can be put on > every Mahout CMS page and users will find requesting some rewording quite > easy. Notice that no write access is required since edits go through a PR. > > Not sure if the ASF implementation does this, but would be nice. > > On May 3, 2015, at 9:58 AM, Ted Dunning wrote: > > https://blogs.apache.org/infra/entry/git_based_websites_available > > This might be nice to get rid of the svn step in web site updates. It > would involve an alternative workflow for updates rather than the CMS > process. > >
Re: Just noticed that web sites can be git based
I asked to sign us up when this was first announced but haven’t heard back. On another project I hit an “edit” icon on their site, which automatically sent me to the page on github, where I was allowed to edit. This automatically created a branch in my repo and a pr to the correct branch of their repo. Very convenient. That way an edit icon can be put on every Mahout CMS page and users will find requesting some rewording quite easy. Notice that no write access is required since edits go through a PR. Not sure if the ASF implementation does this, but would be nice. On May 3, 2015, at 9:58 AM, Ted Dunning wrote: https://blogs.apache.org/infra/entry/git_based_websites_available This might be nice to get rid of the svn step in web site updates. It would involve an alternative workflow for updates rather than the CMS process.
Re: where is IndexHDFS.java
Hello Mahmood, I'm not aware of any Apache project with a class named IndexHDFS. I just did a scan over my local checkouts of the code for many of the Apache projects, and I didn't find anything. The fact that IndexHDFS is not prefixed with a package name in the stack trace tells me that this is unlikely to be a class from any Apache project. Apache projects will put their classes into packages, usually some form of org.apache... Instead, this is likely to be application code that you ran using "hadoop jar", coming from either your own project or some kind of third-party tool that you're using. --Chris Nauroth On 5/5/15, 2:53 AM, "Mahmood N" wrote: >Dear Apache Guys, >I am trying to run a hadoop/java command which uses a jar file called >"IndexData.jar". However, I get an error and the call stack shows > >Exception in thread "main" java.lang.NullPointerException >at IndexHDFS.indexData(IndexHDFS.java:92) >at IndexHDFS.main(IndexHDFS.java:72) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: >57) >at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm >pl.java:43) >at java.lang.reflect.Method.invoke(Method.java:606) >at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > >So my question is which apache project use the file "IndexHDFS.java"? If >you know, please let me know and save a someone's life! Regards, >Mahmood
[jira] [Updated] (MAHOUT-1705) Pare down job jar for mahout-examples jar
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1705: --- Summary: Pare down job jar for mahout-examples jar (was: Pare down job jar for h2o mahout-examples jar) > Pare down job jar for mahout-examples jar > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAHOUT-1705) Pare down job jar for mahout-examples
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1705: --- Summary: Pare down job jar for mahout-examples (was: Pare down job jar for mahout-examples jar) > Pare down job jar for mahout-examples > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1705: --- Issue Type: Improvement (was: Bug) > Pare down job jar for h2o mahout-examples jar > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1705: --- Affects Version/s: 0.10.0 > Pare down job jar for h2o mahout-examples jar > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar
[ https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1705: --- Fix Version/s: 0.11.0 0.10.1 > Pare down job jar for h2o mahout-examples jar > - > > Key: MAHOUT-1705 > URL: https://issues.apache.org/jira/browse/MAHOUT-1705 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Andrew Palumbo > Fix For: 0.10.1, 0.11.0 > > > mahout-example-*-job.jar is around ~56M, and packages redundant libraries and > transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar
Andrew Palumbo created MAHOUT-1705: -- Summary: Pare down job jar for h2o mahout-examples jar Key: MAHOUT-1705 URL: https://issues.apache.org/jira/browse/MAHOUT-1705 Project: Mahout Issue Type: Bug Reporter: Andrew Palumbo mahout-example-*-job.jar is around ~56M, and packages redundant libraries and transitive dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
where is IndexHDFS.java
Dear Apache Guys, I am trying to run a hadoop/java command which uses a jar file called "IndexData.jar". However, I get an error and the call stack shows Exception in thread "main" java.lang.NullPointerException at IndexHDFS.indexData(IndexHDFS.java:92) at IndexHDFS.main(IndexHDFS.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) So my question is which apache project use the file "IndexHDFS.java"? If you know, please let me know and save a someone's life! Regards, Mahmood