removing dependency jars from the mahout binary distribution

2015-05-05 Thread Andrew Palumbo
The mahout distribution currently is shipping ~56 MB of dependecy jars 
in the /lib directory of the distribution.  I believe most of these are 
included in the mahout-examples-*-job.jar. These are only added to the 
classpath by /bin/mahout in the binary distribution. It seems that we 
can remove them from the distribution. (we need to get the size of the 
distribution down)


Any input is appreciated.


[jira] [Comment Edited] (MAHOUT-1705) Pare down job jar for mahout-examples

2015-05-05 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529277#comment-14529277
 ] 

Andrew Palumbo edited comment on MAHOUT-1705 at 5/5/15 9:14 PM:


I'm wondering if it would make more sense to just keep all of the dependencies 
in the examples jar and ship that in the release as (with some tweaks) is and 
remove the  {{/lib}} directory. This seems to be what {{/bin/mahout}} is setup 
for.




was (Author: andrew_palumbo):
I'm wondering if it would make more sense to just keep all of the dependencies 
in the examples jar and ship that in the release as is and remove the  {{/lib}} 
directory. This seems to be what {{/bin/mahout}} is setup for.



> Pare down job jar for mahout-examples
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1705) Pare down job jar for mahout-examples

2015-05-05 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529277#comment-14529277
 ] 

Andrew Palumbo commented on MAHOUT-1705:


I'm wondering if it would make more sense to just keep all of the dependencies 
in the examples jar and ship that in the release as is and remove the  {{/lib}} 
directory. This seems to be what {{/bin/mahout}} is setup for.



> Pare down job jar for mahout-examples
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: where is IndexHDFS.java

2015-05-05 Thread Mahmood N
>I don’t think this is from an Apache Project>I'm not aware of any Apache 
>project with a class named IndexHDFS Not a good news but at least I am now 
>confident on that. I have to ask some other third party projects that use 
>hadoop.
Thanks,
Regards,
Mahmood 



  

[jira] [Assigned] (MAHOUT-1705) Pare down job jar for mahout-examples

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo reassigned MAHOUT-1705:
--

Assignee: Andrew Palumbo

> Pare down job jar for mahout-examples
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1704) Pare down dependency jar for h2o

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo reassigned MAHOUT-1704:
--

Assignee: Andrew Palumbo

> Pare down dependency jar for h2o
> 
>
> Key: MAHOUT-1704
> URL: https://issues.apache.org/jira/browse/MAHOUT-1704
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> The dependency jar for h2o is very large: ~68MB.  Pare this down to only 
> include only  necessary runtime classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Just noticed that web sites can be git based

2015-05-05 Thread Ted Dunning
Can you give a pointer to such an icon?



On Tue, May 5, 2015 at 6:16 PM, Pat Ferrel  wrote:

> I asked to sign us up when this was first announced but haven’t heard
> back. On another project I hit an “edit” icon on their site, which
> automatically sent me to the page on github, where I was allowed to edit.
> This automatically created a branch in my repo and a pr to the correct
> branch of their repo. Very convenient. That way an edit icon can be put on
> every Mahout CMS page and users will find requesting some rewording quite
> easy. Notice that no write access is required since edits go through a PR.
>
> Not sure if the ASF implementation does this, but would be nice.
>
> On May 3, 2015, at 9:58 AM, Ted Dunning  wrote:
>
> https://blogs.apache.org/infra/entry/git_based_websites_available
>
> This might be nice to get rid of the svn step in web site updates.  It
> would involve an alternative workflow for updates rather than the CMS
> process.
>
>


Re: Just noticed that web sites can be git based

2015-05-05 Thread Pat Ferrel
I asked to sign us up when this was first announced but haven’t heard back. On 
another project I hit an “edit” icon on their site, which automatically sent me 
to the page on github, where I was allowed to edit. This automatically created 
a branch in my repo and a pr to the correct branch of their repo. Very 
convenient. That way an edit icon can be put on every Mahout CMS page and users 
will find requesting some rewording quite easy. Notice that no write access is 
required since edits go through a PR.

Not sure if the ASF implementation does this, but would be nice.

On May 3, 2015, at 9:58 AM, Ted Dunning  wrote:

https://blogs.apache.org/infra/entry/git_based_websites_available

This might be nice to get rid of the svn step in web site updates.  It
would involve an alternative workflow for updates rather than the CMS
process.



Re: where is IndexHDFS.java

2015-05-05 Thread Chris Nauroth
Hello Mahmood,

I'm not aware of any Apache project with a class named IndexHDFS.  I just
did a scan over my local checkouts of the code for many of the Apache
projects, and I didn't find anything.

The fact that IndexHDFS is not prefixed with a package name in the stack
trace tells me that this is unlikely to be a class from any Apache
project.  Apache projects will put their classes into packages, usually
some form of org.apache...  Instead, this is likely to be application code
that you ran using "hadoop jar", coming from either your own project or
some kind of third-party tool that you're using.

--Chris Nauroth




On 5/5/15, 2:53 AM, "Mahmood N"  wrote:

>Dear Apache Guys,
>I am trying to run a hadoop/java command which uses a jar file called
>"IndexData.jar". However, I get an error and the call stack shows
>
>Exception in thread "main" java.lang.NullPointerException
>at IndexHDFS.indexData(IndexHDFS.java:92)
>at IndexHDFS.main(IndexHDFS.java:72)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>57)
>at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>pl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:606)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>
>So my question is which apache project use the file "IndexHDFS.java"? If
>you know, please let me know and save a someone's life! Regards,
>Mahmood



[jira] [Updated] (MAHOUT-1705) Pare down job jar for mahout-examples jar

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo updated MAHOUT-1705:
---
Summary: Pare down job jar for mahout-examples jar  (was: Pare down job jar 
for h2o mahout-examples jar)

> Pare down job jar for mahout-examples jar
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1705) Pare down job jar for mahout-examples

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo updated MAHOUT-1705:
---
Summary: Pare down job jar for mahout-examples  (was: Pare down job jar for 
mahout-examples jar)

> Pare down job jar for mahout-examples
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo updated MAHOUT-1705:
---
Issue Type: Improvement  (was: Bug)

> Pare down job jar for h2o mahout-examples jar
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo updated MAHOUT-1705:
---
Affects Version/s: 0.10.0

> Pare down job jar for h2o mahout-examples jar
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar

2015-05-05 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo updated MAHOUT-1705:
---
Fix Version/s: 0.11.0
   0.10.1

> Pare down job jar for h2o mahout-examples jar
> -
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
> Fix For: 0.10.1, 0.11.0
>
>
> mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
> transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1705) Pare down job jar for h2o mahout-examples jar

2015-05-05 Thread Andrew Palumbo (JIRA)
Andrew Palumbo created MAHOUT-1705:
--

 Summary: Pare down job jar for h2o mahout-examples jar
 Key: MAHOUT-1705
 URL: https://issues.apache.org/jira/browse/MAHOUT-1705
 Project: Mahout
  Issue Type: Bug
Reporter: Andrew Palumbo


mahout-example-*-job.jar is around ~56M, and packages redundant libraries and 
transitive dependencies.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


where is IndexHDFS.java

2015-05-05 Thread Mahmood N
Dear Apache Guys,
I am trying to run a hadoop/java command which uses a jar file called 
"IndexData.jar". However, I get an error and the call stack shows

Exception in thread "main" java.lang.NullPointerException
    at IndexHDFS.indexData(IndexHDFS.java:92)
    at IndexHDFS.main(IndexHDFS.java:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    
    
So my question is which apache project use the file "IndexHDFS.java"? If you 
know, please let me know and save a someone's life! Regards,
Mahmood