Re: local file input for seqdirectory

2014-03-13 Thread Steven Cullens
Thanks, Suneel.


On Thu, Mar 13, 2014 at 4:17 PM, Suneel Marthi wrote:

> The workaround is to add -xm sequential. A MR version of seqdirectory was
> introduced in 0.8 and hence the default execution mode is MR if none is
> specified.
>
>
>
>
>
>
> On Thursday, March 13, 2014 4:12 PM, Steven Cullens 
> wrote:
>
> Hi,
>
> I have a large number of files on the order of kilobytes on my local
> machine that I want to convert to a sequence file on HDFS.  Whenever, I try
> to copy the local files to HDFS, hadoop complains about bad blocks,
> presumably because each block is 64mb and there are more files than blocks.
> In mahout 0.7, I would tell it that the input files are local, like:
>
> mahout seqdirectory -i file:// -o 
>
> But I can't use the same command on Mahout 0.9, where it expects the file
> system to be HDFS.  Is there a workaround to generating the sequence file
> using Mahout 0.9?  Thanks.
>
> Steven
>


Re: local file input for seqdirectory

2014-03-13 Thread Suneel Marthi
The workaround is to add -xm sequential. A MR version of seqdirectory was 
introduced in 0.8 and hence the default execution mode is MR if none is 
specified.






On Thursday, March 13, 2014 4:12 PM, Steven Cullens  wrote:
 
Hi,

I have a large number of files on the order of kilobytes on my local
machine that I want to convert to a sequence file on HDFS.  Whenever, I try
to copy the local files to HDFS, hadoop complains about bad blocks,
presumably because each block is 64mb and there are more files than blocks.
In mahout 0.7, I would tell it that the input files are local, like:

mahout seqdirectory -i file:// -o 

But I can't use the same command on Mahout 0.9, where it expects the file
system to be HDFS.  Is there a workaround to generating the sequence file
using Mahout 0.9?  Thanks.

Steven

local file input for seqdirectory

2014-03-13 Thread Steven Cullens
Hi,

I have a large number of files on the order of kilobytes on my local
machine that I want to convert to a sequence file on HDFS.  Whenever, I try
to copy the local files to HDFS, hadoop complains about bad blocks,
presumably because each block is 64mb and there are more files than blocks.
 In mahout 0.7, I would tell it that the input files are local, like:

mahout seqdirectory -i file:// -o 

But I can't use the same command on Mahout 0.9, where it expects the file
system to be HDFS.  Is there a workaround to generating the sequence file
using Mahout 0.9?  Thanks.

Steven


Re: Website, urgent help needed

2014-03-13 Thread Scott C. Cote
I have created issue https://issues.apache.org/jira/browse/MAHOUT-1461

Will upload shell scripts and suggested replacement text later tonight ….

SCott

On 3/13/14, 10:43 AM, "Sebastian Schelter"  wrote:

>Hi Scott,
>
>Create a jira ticket and attach your scripts and a text version of the
>page there.
>
>Best,
>Sebastian
>
>
>On 03/12/2014 03:27 PM, Scott C. Cote wrote:
>> I took the tour of the text analysis and pushed through despite the
>> problems on the page.  Commiters helped me over the hump where others
>> might have just gave up (to your point).
>> When I did it, I made shell scripts so that my steps would be repeatable
>> with an anticipation of updating the page.
>>
>> Unforunately, I gave up on trying to figure out how to update the page
>> (there were links indicating that I could do it), and I didn¹t want to
>> appear to be stupid asking how to update the documentation (my bad - not
>> anyone else).  Now I know that it was not possible unless I was a
>>commiter.
>>
>> Who should I send my scripts to, or how should I proceed with a current
>> form of the page?
>>
>> SCott
>>
>> On 3/12/14, 5:02 AM, "Sebastian Schelter"  wrote:
>>
>>> Hi Pavan,
>>>
>>> Awesome that you're willing to help. The documentation are the pages
>>> listed under "Clustering" in the navigation bar under mahout.apache.org
>>>
>>> If you start working on one of the pages listed there (e.g. the k-Means
>>> doc), please created jira ticket in our issue tracker with a title
>>>along
>>> the lines of "Cleaning up the documentation for k-Means on the
>>>website".
>>>
>>> Put a list of errors and corrections into the jira and I (or some other
>>> committer) will make sure to fix the website.
>>>
>>> Thanks,
>>> Sebastian
>>>
>>>
>>> On 03/12/2014 08:48 AM, Pavan Kumar N wrote:
 i ll help with clustering algorithms documentation. do send me old
 documentation and i will check and remove errors.  or better let me
know
 how to proceed.

 Pavan
 On Mar 12, 2014 12:35 PM, "Sebastian Schelter"  wrote:

> Hi,
>
> As you've probably noticed, I've put in a lot of effort over the last
> days
> to kickstart cleaning up our website. I've thrown out a lot of stuff
> and
> have been startled by the amout of outdated and incorrect information
> on
> our website, as well as links pointing to nowhere.
>
> I think our lack of documentation makes it superhard to use Mahout
>for
> new
> people. A crucial next step is to clean up the documentation on
> classification and clustering. I cannot do this alone, because I
>don't
> have
> the time and I'm not so familiar with the background of the
>algorithms.
>
> I need volunteers to go through all the pages under "Classification"
> and
> "Clustering" on the website. For the algorithms, the content and
> claims of
> the articles need to be checked, for the examples we need to make
>sure
> that
> everything still works as described. It would also be great to move
> articles from personal blogs to our website.
>
> Imagine that some developer wants to try out Mahout and takes one
>hour
> for
> that in the evening. She will go to our website, download Mahout,
>read
> the
> description of an algorithm and try to run an example. In the current
> state
> of the documentation, I'm afraid that most people will walk away
> frustrated, because the website does not help them as it should.
>
> Best,
> Sebastian
>
> PS: I will make my standpoint on whether Mahout should do a 1.0
>release
> depend on whether we manage to clean up and maintain our
>documentation.
>

>>>
>>
>>
>




Re: bug report

2014-03-13 Thread Mahmood Naderan
OK I see that. thanks


 
Regards,
Mahmood



On Thursday, March 13, 2014 10:25 PM, Andrew Musselman 
 wrote:
 
That's right, thanks



On Thu, Mar 13, 2014 at 11:52 AM, Ted Dunning  wrote:

> You have to be logged in to JIRA to do this.  To log in, you may need to
> create an account.
>
>
>
> On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> >
> >
> https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> >
> > "Create Issue" button at the top of the page.
> >
> >
> > On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan  > >wrote:
> >
> > > Hi
> > > Where can I submit a mahout bug? I am not familiar with JIRA and I see
> > > issues and agile.
> > >
> > >
> > >
> > > Regards,
> > > Mahmood
> >
>

Re: Solving "heap size error"

2014-03-13 Thread Mahmood Naderan
We used the wikipedia splitter as a benchmark for our simulation on hadoop 0.2. 
I am now trying to run that on the latest hadoop to be up to date and check 
some differences. For now, I have no other choice.


 
Regards,
Mahmood



On Thursday, March 13, 2014 10:12 PM, Andrew Musselman 
 wrote:
 
What's your larger goal here; are you putting Hadoop and Mahout through
paces as an exercise?

If your process is blowing through data quickly up to a certain point there
may be something happening with a common value, which is a "data bug".  I
don't know what this wikipedia splitter class does but if you're interested
in isolating the issue you could find out what is happening data-wise and
see if there is some very large grouping on a pathologically frequent key
for instance.



On Thu, Mar 13, 2014 at 11:31 AM, Mahmood Naderan wrote:

> I am pretty sure that there is something wrong with hadoop/mahout/java.
> With any configuration, it stuck at the chunk #571. Previous chunks are
> created rapidly but I see it waits for bout 30 minutes on 571 and that is
> the reason for heap error size.
>
> I will try to submit a bug report.
>
>
> Regards,
> Mahmood
>
>
>
> On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan 
> wrote:
>
> Strange thing is that if I use either -Xmx128m of -Xmx16384m the process
> stops at the chunk #571 (571*64=36.5GB).
> Still I haven't figured out is this a problem with JVM or Hadoop or Mahout?
>
> I have tested various parameters on 16GB RAM
>
>
> 
> mapred.map.child.java.opts
> -Xmx2048m
>
> 
> 
> mapred.reduce.child.java.opts
> -Xmx4096m
>
> 
>
> Is there an relation between the parameters and the amount of available
> memory?
> I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by
> default. What is that?
>
> Regards,
> Mahmood
>
>
>
> On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan 
> wrote:
>
> As I posted earlier, here is the result of a successful test
>
> 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml)
> with 4GB of RAM and -Xmx128m tooks 5 minutes to complete.
>
> I didn't find a larger wikipedia XML file. Need
>  to test 10GB, 20GB and 30GB files
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> Can you please try running this on a smaller file first, per Suneel's
> comment a while back:
>
> "Please first try running this on a smaller dataset like
> 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
> english wikipedia."
>
>
>
> On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan  >wrote:
>
> > Hi,
> > Recently I have faced a heap size error when I run
> >
> >   $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d
> >
> $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o
> > wikipedia/chunks -c 64
> >
> > Here is the specs
> > 1- XML file size = 44GB
> > 2- System memory = 54GB (on virtualbox)
> > 3- Heap size = 51GB (-Xmx51000m)
> >
> > At the time of failure, I see that 571 chunks are created (hadoop dfs
> -ls)
> > so 36GB of the original file has been processed. Now here are my
> questions
> >
> > 1- Is there any way to
>  resume the process? As stated before, 571 chunks
> > have been created. So by resuming, it can create the rest of the chunks
> > (572~).
> >
> > 2- Is it possible to parallelize the process? Assume, 100GB of heap is
> > required to process the XML file and my system cannot
> afford that. Then we
> > can create 20 threads each requires 5GB of heap. Next by feeding the
> first
> > 10 threads we can use the available 50GB of heap and after completion, we
> > can feed the next set of threads.
> >
> >
> > Regards,
> > Mahmood
>

Re: bug report

2014-03-13 Thread Andrew Musselman
That's right, thanks


On Thu, Mar 13, 2014 at 11:52 AM, Ted Dunning  wrote:

> You have to be logged in to JIRA to do this.  To log in, you may need to
> create an account.
>
>
>
> On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> >
> >
> https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> >
> > "Create Issue" button at the top of the page.
> >
> >
> > On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan  > >wrote:
> >
> > > Hi
> > > Where can I submit a mahout bug? I am not familiar with JIRA and I see
> > > issues and agile.
> > >
> > >
> > >
> > > Regards,
> > > Mahmood
> >
>


Re: bug report

2014-03-13 Thread Ted Dunning
You have to be logged in to JIRA to do this.  To log in, you may need to
create an account.



On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

>
> https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
>
> "Create Issue" button at the top of the page.
>
>
> On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan  >wrote:
>
> > Hi
> > Where can I submit a mahout bug? I am not familiar with JIRA and I see
> > issues and agile.
> >
> >
> >
> > Regards,
> > Mahmood
>


Re: Solving "heap size error"

2014-03-13 Thread Andrew Musselman
What's your larger goal here; are you putting Hadoop and Mahout through
paces as an exercise?

If your process is blowing through data quickly up to a certain point there
may be something happening with a common value, which is a "data bug".  I
don't know what this wikipedia splitter class does but if you're interested
in isolating the issue you could find out what is happening data-wise and
see if there is some very large grouping on a pathologically frequent key
for instance.


On Thu, Mar 13, 2014 at 11:31 AM, Mahmood Naderan wrote:

> I am pretty sure that there is something wrong with hadoop/mahout/java.
> With any configuration, it stuck at the chunk #571. Previous chunks are
> created rapidly but I see it waits for bout 30 minutes on 571 and that is
> the reason for heap error size.
>
> I will try to submit a bug report.
>
>
> Regards,
> Mahmood
>
>
>
> On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan 
> wrote:
>
> Strange thing is that if I use either -Xmx128m of -Xmx16384m the process
> stops at the chunk #571 (571*64=36.5GB).
> Still I haven't figured out is this a problem with JVM or Hadoop or Mahout?
>
> I have tested various parameters on 16GB RAM
>
>
> 
> mapred.map.child.java.opts
> -Xmx2048m
>
> 
> 
> mapred.reduce.child.java.opts
> -Xmx4096m
>
> 
>
> Is there an relation between the parameters and the amount of available
> memory?
> I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by
> default. What is that?
>
> Regards,
> Mahmood
>
>
>
> On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan 
> wrote:
>
> As I posted earlier, here is the result of a successful test
>
> 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml)
> with 4GB of RAM and -Xmx128m tooks 5 minutes to complete.
>
> I didn't find a larger wikipedia XML file. Need
>  to test 10GB, 20GB and 30GB files
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> Can you please try running this on a smaller file first, per Suneel's
> comment a while back:
>
> "Please first try running this on a smaller dataset like
> 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
> english wikipedia."
>
>
>
> On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan  >wrote:
>
> > Hi,
> > Recently I have faced a heap size error when I run
> >
> >   $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d
> >
> $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o
> > wikipedia/chunks -c 64
> >
> > Here is the specs
> > 1- XML file size = 44GB
> > 2- System memory = 54GB (on virtualbox)
> > 3- Heap size = 51GB (-Xmx51000m)
> >
> > At the time of failure, I see that 571 chunks are created (hadoop dfs
> -ls)
> > so 36GB of the original file has been processed. Now here are my
> questions
> >
> > 1- Is there any way to
>  resume the process? As stated before, 571 chunks
> > have been created. So by resuming, it can create the rest of the chunks
> > (572~).
> >
> > 2- Is it possible to parallelize the process? Assume, 100GB of heap is
> > required to process the XML file and my system cannot
> afford that. Then we
> > can create 20 threads each requires 5GB of heap. Next by feeding the
> first
> > 10 threads we can use the available 50GB of heap and after completion, we
> > can feed the next set of threads.
> >
> >
> > Regards,
> > Mahmood
>


Re: bug report

2014-03-13 Thread Andrew Musselman
https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

"Create Issue" button at the top of the page.


On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan wrote:

> Hi
> Where can I submit a mahout bug? I am not familiar with JIRA and I see
> issues and agile.
>
>
>
> Regards,
> Mahmood


Re: Solving "heap size error"

2014-03-13 Thread Mahmood Naderan
I am pretty sure that there is something wrong with hadoop/mahout/java. With 
any configuration, it stuck at the chunk #571. Previous chunks are created 
rapidly but I see it waits for bout 30 minutes on 571 and that is the reason 
for heap error size.

I will try to submit a bug report.

 
Regards,
Mahmood



On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan  
wrote:
 
Strange thing is that if I use either -Xmx128m of -Xmx16384m the process stops 
at the chunk #571 (571*64=36.5GB).
Still I haven't figured out is this a problem with JVM or Hadoop or Mahout?

I have tested various parameters on 16GB RAM



mapred.map.child.java.opts
-Xmx2048m



mapred.reduce.child.java.opts
-Xmx4096m



Is there an relation between the parameters and the amount of available memory?
I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by default. 
What is that?
 
Regards,
Mahmood



On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan  
wrote:
 
As I posted earlier, here is the result of a successful test

5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) with 
4GB of RAM and -Xmx128m tooks 5 minutes to complete.

I didn't find a larger wikipedia XML file. Need
 to test 10GB, 20GB and 30GB files


 
Regards,
Mahmood




On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman 
 wrote:

Can you please try running this on a smaller file first, per Suneel's
comment a while back:

"Please first try running this on a smaller dataset like
'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
english wikipedia."



On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan wrote:

> Hi,
> Recently I have faced a heap size error when I run
>
>   $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d
>
$MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o
> wikipedia/chunks -c 64
>
> Here is the specs
> 1- XML file size = 44GB
> 2- System memory = 54GB (on virtualbox)
> 3- Heap size = 51GB (-Xmx51000m)
>
> At the time of failure, I see that 571 chunks are created (hadoop dfs -ls)
> so 36GB of the original file has been processed. Now here are my questions
>
> 1- Is there any way to
 resume the process? As stated before, 571 chunks
> have been created. So by resuming, it can create the rest of the chunks
> (572~).
>
> 2- Is it possible to parallelize the process? Assume, 100GB of heap is
> required to process the XML file and my system cannot
afford that. Then we
> can create 20 threads each requires 5GB of heap. Next by feeding the first
> 10 threads we can use the available 50GB of heap and after completion, we
> can feed the next set of threads.
>
>
> Regards,
> Mahmood

bug report

2014-03-13 Thread Mahmood Naderan
Hi
Where can I submit a mahout bug? I am not familiar with JIRA and I see issues 
and agile. 


 
Regards,
Mahmood

Re: Website, urgent help needed

2014-03-13 Thread Sebastian Schelter

Hi Scott,

Create a jira ticket and attach your scripts and a text version of the 
page there.


Best,
Sebastian


On 03/12/2014 03:27 PM, Scott C. Cote wrote:

I took the tour of the text analysis and pushed through despite the
problems on the page.  Commiters helped me over the hump where others
might have just gave up (to your point).
When I did it, I made shell scripts so that my steps would be repeatable
with an anticipation of updating the page.

Unforunately, I gave up on trying to figure out how to update the page
(there were links indicating that I could do it), and I didn¹t want to
appear to be stupid asking how to update the documentation (my bad - not
anyone else).  Now I know that it was not possible unless I was a commiter.

Who should I send my scripts to, or how should I proceed with a current
form of the page?

SCott

On 3/12/14, 5:02 AM, "Sebastian Schelter"  wrote:


Hi Pavan,

Awesome that you're willing to help. The documentation are the pages
listed under "Clustering" in the navigation bar under mahout.apache.org

If you start working on one of the pages listed there (e.g. the k-Means
doc), please created jira ticket in our issue tracker with a title along
the lines of "Cleaning up the documentation for k-Means on the website".

Put a list of errors and corrections into the jira and I (or some other
committer) will make sure to fix the website.

Thanks,
Sebastian


On 03/12/2014 08:48 AM, Pavan Kumar N wrote:

i ll help with clustering algorithms documentation. do send me old
documentation and i will check and remove errors.  or better let me know
how to proceed.

Pavan
On Mar 12, 2014 12:35 PM, "Sebastian Schelter"  wrote:


Hi,

As you've probably noticed, I've put in a lot of effort over the last
days
to kickstart cleaning up our website. I've thrown out a lot of stuff
and
have been startled by the amout of outdated and incorrect information
on
our website, as well as links pointing to nowhere.

I think our lack of documentation makes it superhard to use Mahout for
new
people. A crucial next step is to clean up the documentation on
classification and clustering. I cannot do this alone, because I don't
have
the time and I'm not so familiar with the background of the algorithms.

I need volunteers to go through all the pages under "Classification"
and
"Clustering" on the website. For the algorithms, the content and
claims of
the articles need to be checked, for the examples we need to make sure
that
everything still works as described. It would also be great to move
articles from personal blogs to our website.

Imagine that some developer wants to try out Mahout and takes one hour
for
that in the evening. She will go to our website, download Mahout, read
the
description of an algorithm and try to run an example. In the current
state
of the documentation, I'm afraid that most people will walk away
frustrated, because the website does not help them as it should.

Best,
Sebastian

PS: I will make my standpoint on whether Mahout should do a 1.0 release
depend on whether we manage to clean up and maintain our documentation.












Re: Commons IO version mismatch with CDH 4.6

2014-03-13 Thread Bikash Gupta
My Bad!!!

I was pointing to wrong jar..sorry for this...


On Thu, Mar 13, 2014 at 4:11 PM, Bikash Gupta wrote:

> Hi,
>
> Running Kmeans in cluster of CDH 4.6 I have a new issue with commons-io
> compatibility
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V
> at
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1107)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:539)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:738)
> at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:782)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800)
> at
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
> at
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1714)
> at
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1728)
> at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56)
> at
> org.apache.mahout.clustering.iterator.ClusterIterator.isConverged(ClusterIterator.java:207)
> at
> org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:188)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:217)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:140)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> Any suggestion?
>
> --
> Thanks & Regards
> Bikash Gupta
>



-- 
Thanks & Regards
Bikash Kumar Gupta


Re: Solving "heap size error"

2014-03-13 Thread Mahmood Naderan
Strange thing is that if I use either -Xmx128m of -Xmx16384m the process stops 
at the chunk #571 (571*64=36.5GB).
Still I haven't figured out is this a problem with JVM or Hadoop or Mahout?

I have tested various parameters on 16GB RAM



mapred.map.child.java.opts
-Xmx2048m



mapred.reduce.child.java.opts
-Xmx4096m



Is there an relation between the parameters and the amount of available memory?
I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by default. 
What is that?
 
Regards,
Mahmood



On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan  
wrote:
 
As I posted earlier, here is the result of a successful test

5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) with 
4GB of RAM and -Xmx128m tooks 5 minutes to complete.

I didn't find a larger wikipedia XML file. Need to test 10GB, 20GB and 30GB 
files


 
Regards,
Mahmood




On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman 
 wrote:

Can you please try running this on a smaller file first, per Suneel's
comment a while back:

"Please first try running this on a smaller dataset like
'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
english wikipedia."



On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan wrote:

> Hi,
> Recently I have faced a heap size error when I run
>
>   $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d
>
$MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o
> wikipedia/chunks -c 64
>
> Here is the specs
> 1- XML file size = 44GB
> 2- System memory = 54GB (on virtualbox)
> 3- Heap size = 51GB (-Xmx51000m)
>
> At the time of failure, I see that 571 chunks are created (hadoop dfs -ls)
> so 36GB of the original file has been processed. Now here are my questions
>
> 1- Is there any way to resume the process? As stated before, 571 chunks
> have been created. So by resuming, it can create the rest of the chunks
> (572~).
>
> 2- Is it possible to parallelize the process? Assume, 100GB of heap is
> required to process the XML file and my system cannot
afford that. Then we
> can create 20 threads each requires 5GB of heap. Next by feeding the first
> 10 threads we can use the available 50GB of heap and after completion, we
> can feed the next set of threads.
>
>
> Regards,
> Mahmood

Re: verbose output

2014-03-13 Thread Mahmood Naderan
The hadoop-2.3.0/log is empty when I run mahout command which uses hadoop

 
Regards,
Mahmood



On Thursday, March 13, 2014 12:53 PM, Sebastian Schelter  
wrote:
 
To my knowledge, there is no such flag for mahout. You can check 
hadoop's logs for further information however.


On 03/13/2014 10:21 AM, Mahmood Naderan wrote:
> Hi,
> Is there any verbosity flag for hadoop and mahout commands? I can not find 
> such thing in the command line.
>
>
> Regards,
> Mahmood
>

Commons IO version mismatch with CDH 4.6

2014-03-13 Thread Bikash Gupta
Hi,

Running Kmeans in cluster of CDH 4.6 I have a new issue with commons-io
compatibility

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V
at
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1107)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:539)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:738)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:782)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800)
at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1714)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1728)
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56)
at
org.apache.mahout.clustering.iterator.ClusterIterator.isConverged(ClusterIterator.java:207)
at
org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:188)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:217)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:140)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

Any suggestion?

-- 
Thanks & Regards
Bikash Gupta


Fwd: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Kevin Moulart
Here are the followings of the conversation between Sebastian and me, since
I hit reply instead of reply all at some point.


Kévin Moulart


-- Forwarded message --
From: Sebastian Schelter 
Date: 2014-03-13 10:33 GMT+01:00
Subject: Re: Compiling Mahout with maven in Eclipse
To: Kevin Moulart 


I use Intellij IDEA. Its support for maven projects is very nice. You
should be able to simply import mahout as maven project there and
everything should work fine.

--sebastian


On 03/13/2014 10:24 AM, Kevin Moulart wrote:

> Actually I pretty much don't care which IDE I use, if you could share the
> one you use I can try and make something work. At this point I would only
> update the documentation on several files, so even something like Sublime
> Text would work for me. I just wanted eclipse for the ease of accessing the
> javadoc and the links between the files (I can click a class and get
> directly to it's declaration...).
>
> Kévin Moulart
>
>
> 2014-03-13 10:20 GMT+01:00 Sebastian Schelter :
>
>  Hm, I'm not an Eclipse user myself, maybe someone else can help?
>>
>>
>>
>> On 03/13/2014 10:15 AM, Kevin Moulart wrote:
>>
>>  [myCompany@node01 mahout-trunk]$ ls -al
>>> total 180
>>> drwxrwxr-x 15 myCompany myCompany  4096 Mar 13 10:05 .
>>> drwxrwxr-x  6 myCompany myCompany  4096 Mar  7 11:42 ..
>>> drwxrwxr-x  2 myCompany myCompany  4096 Mar  7 11:43 bin
>>> drwxrwxr-x  5 myCompany myCompany  4096 Mar 13 10:05 buildtools
>>> -rw-rw-r--  1 myCompany myCompany 12979 Mar  7 11:43 CHANGELOG
>>> drwxrwxr-x  6 myCompany myCompany  4096 Mar 12 17:00 core
>>> drwxrwxr-x  4 myCompany myCompany  4096 Mar  7 11:46 distribution
>>> -rw-rw-r--  1 myCompany myCompany  2320 Mar  7 11:43 doap_Mahout.rdf
>>> drwxrwxr-x  6 myCompany myCompany  4096 Mar 12 17:11 examples
>>> -rw-rw-r--  1 myCompany myCompany   213 Mar  7 11:43 .gitignore
>>> drwxrwxr-x  7 myCompany myCompany  4096 Mar 12 17:11 integration
>>> -rw-rw-r--  1 myCompany myCompany 39588 Mar  7 11:43 LICENSE.txt
>>> drwxrwxr-x  5 myCompany myCompany  4096 Mar 13 10:05 math
>>> drwxrwxr-x  5 myCompany myCompany  4096 Mar 12 17:12 math-scala
>>> -rw-rw-r--  1 myCompany myCompany  1888 Mar  7 11:43 NOTICE.txt
>>> -rw-rw-r--  1 myCompany myCompany 42747 Mar  7 11:49 pom.xml
>>> -rw-rw-r--  1 myCompany myCompany   375 Mar  7 11:43 .project
>>> -rw-rw-r--  1 myCompany myCompany  1212 Mar  7 11:42 README.txt
>>> drwxrwxr-x  2 myCompany myCompany  4096 Mar  7 11:43 .settings
>>> drwxrwxr-x  6 myCompany myCompany  4096 Mar 12 17:12 spark
>>> drwxrwxr-x  4 myCompany myCompany  4096 Mar  7 11:42 src
>>> drwxrwxr-x  4 myCompany myCompany  4096 Mar  7 11:43 .svn
>>> drwxrwxr-x  3 myCompany myCompany  4096 Mar 13 10:05 target
>>>
>>> Yes I do think so. But I don't understand how mvn package can work
>>> without
>>> generating those files... Even the JUnit tests passed... ?
>>>
>>> Kévin Moulart
>>>
>>>
>>> 2014-03-13 10:11 GMT+01:00 Sebastian Schelter :
>>>
>>>   Are executing maven in the topmost directory?
>>>


 On 03/13/2014 10:09 AM, Kevin Moulart wrote:

   I did, but then it fails because of these missing files :

> https://gist.github.com/kmoulart/9524828
>
> Kévin Moulart
>
>
> 2014-03-13 9:57 GMT+01:00 Sebastian Schelter :
>
>Maven should generate the classes automatically. Have you tried
> running
>
>
>> mvn -DskipTests clean install
>>
>> on the commandline?
>>
>>
>>
>>
>> On 03/13/2014 09:50 AM, Kevin Moulart wrote:
>>
>>How can I generate them to make these errors go away then ? Or
>> don't
>> I
>>
>>  have
>>> to ?
>>>
>>> Kévin Moulart
>>>
>>>
>>> 2014-03-13 9:17 GMT+01:00 Sebastian Schelter <
>>> ssc.o...@googlemail.com
>>>
 :

>>>
>>> Those are autogenerated.
>>>
>>>
>>>
 On 03/13/2014 09:05 AM, Kevin Moulart wrote:

 Ok it does compile with maven in eclipse as well, but still,
 many

   imports

> are not recognized in the sources :
>
> - import org.apache.mahout.math.function.IntObjectProcedure;
> - import org.apache.mahout.math.map.OpenIntLongHashMap;
> - import org.apache.mahout.math.map.OpenIntObjectHashMap;
> - import org.apache.mahout.math.set.OpenIntHashSet;
> - import org.apache.mahout.math.list.DoubleArrayList;
> ...
>
> Pretty much all the problems come from the OpenInt... classes that
> it
> doesn't seem to find. Is there a jar or a pom entry I need to add
> here ?
> Or do I have the wrong version of org.apache.mahout.math, because I
> can't
> find those maps/sets/lists in the math package ?
>
> (I have the same problem on both my windows, centos and mac os)
>
> Kévin Moulart
>
>
> 2014-03-12 

Re: verbose output

2014-03-13 Thread Sebastian Schelter
To my knowledge, there is no such flag for mahout. You can check 
hadoop's logs for further information however.


On 03/13/2014 10:21 AM, Mahmood Naderan wrote:

Hi,
Is there any verbosity flag for hadoop and mahout commands? I can not find such 
thing in the command line.


Regards,
Mahmood





verbose output

2014-03-13 Thread Mahmood Naderan
Hi,
Is there any verbosity flag for hadoop and mahout commands? I can not find such 
thing in the command line.

 
Regards,
Mahmood

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter

Are executing maven in the topmost directory?

On 03/13/2014 10:09 AM, Kevin Moulart wrote:

I did, but then it fails because of these missing files :
https://gist.github.com/kmoulart/9524828

Kévin Moulart


2014-03-13 9:57 GMT+01:00 Sebastian Schelter :


Maven should generate the classes automatically. Have you tried running

mvn -DskipTests clean install

on the commandline?




On 03/13/2014 09:50 AM, Kevin Moulart wrote:


How can I generate them to make these errors go away then ? Or don't I
have
to ?

Kévin Moulart


2014-03-13 9:17 GMT+01:00 Sebastian Schelter :

  Those are autogenerated.



On 03/13/2014 09:05 AM, Kevin Moulart wrote:

  Ok it does compile with maven in eclipse as well, but still, many

imports
are not recognized in the sources :

- import org.apache.mahout.math.function.IntObjectProcedure;
- import org.apache.mahout.math.map.OpenIntLongHashMap;
- import org.apache.mahout.math.map.OpenIntObjectHashMap;
- import org.apache.mahout.math.set.OpenIntHashSet;
- import org.apache.mahout.math.list.DoubleArrayList;
...

Pretty much all the problems come from the OpenInt... classes that it
doesn't seem to find. Is there a jar or a pom entry I need to add here ?
Or do I have the wrong version of org.apache.mahout.math, because I
can't
find those maps/sets/lists in the math package ?

(I have the same problem on both my windows, centos and mac os)

Kévin Moulart


2014-03-12 17:00 GMT+01:00 Kevin Moulart :

   Never mind, I found where the problem lied, I deleted the full
content of


.m2 and retried it as non root user and it worked. Trying in Eclipse
now,
with tests I'll let you now if it doesn't work.

Kévin Moulart


2014-03-12 16:45 GMT+01:00 Kevin Moulart :

Hi,



I tried to fix all the problem I had to configure eclipse in order to
compile mahout in it using "maven clean package" as goal.

First I had to make a change in mahout core in the class
GroupTree.java,
line 171 :

   stack = new ArrayDeque();






Then I tried compiling with eclipse (I already had the plugin and all
imported and I'm working on the trunk version).

   From eclipse it runs until it tries compiling the examples :

   [INFO] Building jar:


/home/myCompany/Workspace_eclipse/mahout-trunk/examples/
target/mahout-examples-1.0-SNAPSHOT-job.jar
[INFO]


[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools  SUCCESS [
1.173 s]
[INFO] Apache Mahout . SUCCESS [
0.307 s]
[INFO] Mahout Math ... SUCCESS [
8.041 s]
[INFO] Mahout Core ... SUCCESS [
8.378 s]
[INFO] Mahout Integration  SUCCESS [
1.030 s]
[INFO] Mahout Examples ... FAILURE [
5.325 s]
[INFO] Mahout Release Package  SKIPPED
[INFO] Mahout Math/Scala wrappers  SKIPPED
[INFO] Mahout Spark bindings . SKIPPED
[INFO]


[INFO] BUILD FAILURE
[INFO]


[INFO] Total time: 24.630 s
[INFO] Finished at: 2014-03-12T16:38:08+01:00
[INFO] Final Memory: 101M/1430M
[INFO]


[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on
project
mahout-examples: Failed to create assembly: Error creating assembly
archive
job: IOException when zipping com/ibm/icu/ICUConfig.properties:
invalid LOC
header (bad signature) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with
the
-e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug
logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/
MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with
the
command
[ERROR]   mvn  -rf :mahout-examples




It does the exact same thing when I try typing mvn clean package in
terminal, but when I try it as root, it works, so it might be an issue
with
the permissions however I fail to see where (I did a chown -R on my
entire
home folder just to be on the safe side and it still fails).

Anyone had the same problem ? Any idea about how to fix it ?

Kévin Moulart




















Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Kevin Moulart
I did, but then it fails because of these missing files :
https://gist.github.com/kmoulart/9524828

Kévin Moulart


2014-03-13 9:57 GMT+01:00 Sebastian Schelter :

> Maven should generate the classes automatically. Have you tried running
>
> mvn -DskipTests clean install
>
> on the commandline?
>
>
>
>
> On 03/13/2014 09:50 AM, Kevin Moulart wrote:
>
>> How can I generate them to make these errors go away then ? Or don't I
>> have
>> to ?
>>
>> Kévin Moulart
>>
>>
>> 2014-03-13 9:17 GMT+01:00 Sebastian Schelter :
>>
>>  Those are autogenerated.
>>>
>>>
>>> On 03/13/2014 09:05 AM, Kevin Moulart wrote:
>>>
>>>  Ok it does compile with maven in eclipse as well, but still, many
 imports
 are not recognized in the sources :

 - import org.apache.mahout.math.function.IntObjectProcedure;
 - import org.apache.mahout.math.map.OpenIntLongHashMap;
 - import org.apache.mahout.math.map.OpenIntObjectHashMap;
 - import org.apache.mahout.math.set.OpenIntHashSet;
 - import org.apache.mahout.math.list.DoubleArrayList;
 ...

 Pretty much all the problems come from the OpenInt... classes that it
 doesn't seem to find. Is there a jar or a pom entry I need to add here ?
 Or do I have the wrong version of org.apache.mahout.math, because I
 can't
 find those maps/sets/lists in the math package ?

 (I have the same problem on both my windows, centos and mac os)

 Kévin Moulart


 2014-03-12 17:00 GMT+01:00 Kevin Moulart :

   Never mind, I found where the problem lied, I deleted the full
 content of

> .m2 and retried it as non root user and it worked. Trying in Eclipse
> now,
> with tests I'll let you now if it doesn't work.
>
> Kévin Moulart
>
>
> 2014-03-12 16:45 GMT+01:00 Kevin Moulart :
>
> Hi,
>
>
>> I tried to fix all the problem I had to configure eclipse in order to
>> compile mahout in it using "maven clean package" as goal.
>>
>> First I had to make a change in mahout core in the class
>> GroupTree.java,
>> line 171 :
>>
>>   stack = new ArrayDeque();
>>
>>>
>>>
>>
>> Then I tried compiling with eclipse (I already had the plugin and all
>> imported and I'm working on the trunk version).
>>
>>   From eclipse it runs until it tries compiling the examples :
>>
>>   [INFO] Building jar:
>>
>>> /home/myCompany/Workspace_eclipse/mahout-trunk/examples/
>>> target/mahout-examples-1.0-SNAPSHOT-job.jar
>>> [INFO]
>>> 
>>> 
>>> [INFO] Reactor Summary:
>>> [INFO]
>>> [INFO] Mahout Build Tools  SUCCESS [
>>>1.173 s]
>>> [INFO] Apache Mahout . SUCCESS [
>>>0.307 s]
>>> [INFO] Mahout Math ... SUCCESS [
>>>8.041 s]
>>> [INFO] Mahout Core ... SUCCESS [
>>>8.378 s]
>>> [INFO] Mahout Integration  SUCCESS [
>>>1.030 s]
>>> [INFO] Mahout Examples ... FAILURE [
>>>5.325 s]
>>> [INFO] Mahout Release Package  SKIPPED
>>> [INFO] Mahout Math/Scala wrappers  SKIPPED
>>> [INFO] Mahout Spark bindings . SKIPPED
>>> [INFO]
>>> 
>>> 
>>> [INFO] BUILD FAILURE
>>> [INFO]
>>> 
>>> 
>>> [INFO] Total time: 24.630 s
>>> [INFO] Finished at: 2014-03-12T16:38:08+01:00
>>> [INFO] Final Memory: 101M/1430M
>>> [INFO]
>>> 
>>> 
>>> [ERROR] Failed to execute goal
>>> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on
>>> project
>>> mahout-examples: Failed to create assembly: Error creating assembly
>>> archive
>>> job: IOException when zipping com/ibm/icu/ICUConfig.properties:
>>> invalid LOC
>>> header (bad signature) -> [Help 1]
>>> [ERROR]
>>> [ERROR] To see the full stack trace of the errors, re-run Maven with
>>> the
>>> -e switch.
>>> [ERROR] Re-run Maven using the -X switch to enable full debug
>>> logging.
>>> [ERROR]
>>> [ERROR] For more information about the errors and possible solutions,
>>> please read the following articles:
>>> [ERROR] [Help 1]
>>> http://cwiki.apache.org/confluence/display/MAVEN/
>>> MojoExecutionException
>>> [ERROR]
>>> [ERROR] After correcting the problems, you can resume the build with
>>> the
>>> command
>>> [ERROR]   mvn  -rf :mahout-examples
>>>
>>>
>>
>>

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter

Maven should generate the classes automatically. Have you tried running

mvn -DskipTests clean install

on the commandline?



On 03/13/2014 09:50 AM, Kevin Moulart wrote:

How can I generate them to make these errors go away then ? Or don't I have
to ?

Kévin Moulart


2014-03-13 9:17 GMT+01:00 Sebastian Schelter :


Those are autogenerated.


On 03/13/2014 09:05 AM, Kevin Moulart wrote:


Ok it does compile with maven in eclipse as well, but still, many imports
are not recognized in the sources :

- import org.apache.mahout.math.function.IntObjectProcedure;
- import org.apache.mahout.math.map.OpenIntLongHashMap;
- import org.apache.mahout.math.map.OpenIntObjectHashMap;
- import org.apache.mahout.math.set.OpenIntHashSet;
- import org.apache.mahout.math.list.DoubleArrayList;
...

Pretty much all the problems come from the OpenInt... classes that it
doesn't seem to find. Is there a jar or a pom entry I need to add here ?
Or do I have the wrong version of org.apache.mahout.math, because I can't
find those maps/sets/lists in the math package ?

(I have the same problem on both my windows, centos and mac os)

Kévin Moulart


2014-03-12 17:00 GMT+01:00 Kevin Moulart :

  Never mind, I found where the problem lied, I deleted the full content of

.m2 and retried it as non root user and it worked. Trying in Eclipse now,
with tests I'll let you now if it doesn't work.

Kévin Moulart


2014-03-12 16:45 GMT+01:00 Kevin Moulart :

Hi,



I tried to fix all the problem I had to configure eclipse in order to
compile mahout in it using "maven clean package" as goal.

First I had to make a change in mahout core in the class GroupTree.java,
line 171 :

  stack = new ArrayDeque();





Then I tried compiling with eclipse (I already had the plugin and all
imported and I'm working on the trunk version).

  From eclipse it runs until it tries compiling the examples :

  [INFO] Building jar:

/home/myCompany/Workspace_eclipse/mahout-trunk/examples/
target/mahout-examples-1.0-SNAPSHOT-job.jar
[INFO]


[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools  SUCCESS [
   1.173 s]
[INFO] Apache Mahout . SUCCESS [
   0.307 s]
[INFO] Mahout Math ... SUCCESS [
   8.041 s]
[INFO] Mahout Core ... SUCCESS [
   8.378 s]
[INFO] Mahout Integration  SUCCESS [
   1.030 s]
[INFO] Mahout Examples ... FAILURE [
   5.325 s]
[INFO] Mahout Release Package  SKIPPED
[INFO] Mahout Math/Scala wrappers  SKIPPED
[INFO] Mahout Spark bindings . SKIPPED
[INFO]


[INFO] BUILD FAILURE
[INFO]


[INFO] Total time: 24.630 s
[INFO] Finished at: 2014-03-12T16:38:08+01:00
[INFO] Final Memory: 101M/1430M
[INFO]


[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on
project
mahout-examples: Failed to create assembly: Error creating assembly
archive
job: IOException when zipping com/ibm/icu/ICUConfig.properties:
invalid LOC
header (bad signature) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with
the
-e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/
MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with
the
command
[ERROR]   mvn  -rf :mahout-examples




It does the exact same thing when I try typing mvn clean package in
terminal, but when I try it as root, it works, so it might be an issue
with
the permissions however I fail to see where (I did a chown -R on my
entire
home folder just to be on the safe side and it still fails).

Anyone had the same problem ? Any idea about how to fix it ?

Kévin Moulart















Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Kevin Moulart
How can I generate them to make these errors go away then ? Or don't I have
to ?

Kévin Moulart


2014-03-13 9:17 GMT+01:00 Sebastian Schelter :

> Those are autogenerated.
>
>
> On 03/13/2014 09:05 AM, Kevin Moulart wrote:
>
>> Ok it does compile with maven in eclipse as well, but still, many imports
>> are not recognized in the sources :
>>
>> - import org.apache.mahout.math.function.IntObjectProcedure;
>> - import org.apache.mahout.math.map.OpenIntLongHashMap;
>> - import org.apache.mahout.math.map.OpenIntObjectHashMap;
>> - import org.apache.mahout.math.set.OpenIntHashSet;
>> - import org.apache.mahout.math.list.DoubleArrayList;
>> ...
>>
>> Pretty much all the problems come from the OpenInt... classes that it
>> doesn't seem to find. Is there a jar or a pom entry I need to add here ?
>> Or do I have the wrong version of org.apache.mahout.math, because I can't
>> find those maps/sets/lists in the math package ?
>>
>> (I have the same problem on both my windows, centos and mac os)
>>
>> Kévin Moulart
>>
>>
>> 2014-03-12 17:00 GMT+01:00 Kevin Moulart :
>>
>>  Never mind, I found where the problem lied, I deleted the full content of
>>> .m2 and retried it as non root user and it worked. Trying in Eclipse now,
>>> with tests I'll let you now if it doesn't work.
>>>
>>> Kévin Moulart
>>>
>>>
>>> 2014-03-12 16:45 GMT+01:00 Kevin Moulart :
>>>
>>> Hi,
>>>

 I tried to fix all the problem I had to configure eclipse in order to
 compile mahout in it using "maven clean package" as goal.

 First I had to make a change in mahout core in the class GroupTree.java,
 line 171 :

  stack = new ArrayDeque();
>


 Then I tried compiling with eclipse (I already had the plugin and all
 imported and I'm working on the trunk version).

  From eclipse it runs until it tries compiling the examples :

  [INFO] Building jar:
> /home/myCompany/Workspace_eclipse/mahout-trunk/examples/
> target/mahout-examples-1.0-SNAPSHOT-job.jar
> [INFO]
> 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Mahout Build Tools  SUCCESS [
>   1.173 s]
> [INFO] Apache Mahout . SUCCESS [
>   0.307 s]
> [INFO] Mahout Math ... SUCCESS [
>   8.041 s]
> [INFO] Mahout Core ... SUCCESS [
>   8.378 s]
> [INFO] Mahout Integration  SUCCESS [
>   1.030 s]
> [INFO] Mahout Examples ... FAILURE [
>   5.325 s]
> [INFO] Mahout Release Package  SKIPPED
> [INFO] Mahout Math/Scala wrappers  SKIPPED
> [INFO] Mahout Spark bindings . SKIPPED
> [INFO]
> 
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> 
> [INFO] Total time: 24.630 s
> [INFO] Finished at: 2014-03-12T16:38:08+01:00
> [INFO] Final Memory: 101M/1430M
> [INFO]
> 
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on
> project
> mahout-examples: Failed to create assembly: Error creating assembly
> archive
> job: IOException when zipping com/ibm/icu/ICUConfig.properties:
> invalid LOC
> header (bad signature) -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with
> the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/
> MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with
> the
> command
> [ERROR]   mvn  -rf :mahout-examples
>


 It does the exact same thing when I try typing mvn clean package in
 terminal, but when I try it as root, it works, so it might be an issue
 with
 the permissions however I fail to see where (I did a chown -R on my
 entire
 home folder just to be on the safe side and it still fails).

 Anyone had the same problem ? Any idea about how to fix it ?

 Kévin Moulart


>>>
>>>
>>
>


Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter

Those are autogenerated.

On 03/13/2014 09:05 AM, Kevin Moulart wrote:

Ok it does compile with maven in eclipse as well, but still, many imports
are not recognized in the sources :

- import org.apache.mahout.math.function.IntObjectProcedure;
- import org.apache.mahout.math.map.OpenIntLongHashMap;
- import org.apache.mahout.math.map.OpenIntObjectHashMap;
- import org.apache.mahout.math.set.OpenIntHashSet;
- import org.apache.mahout.math.list.DoubleArrayList;
...

Pretty much all the problems come from the OpenInt... classes that it
doesn't seem to find. Is there a jar or a pom entry I need to add here ?
Or do I have the wrong version of org.apache.mahout.math, because I can't
find those maps/sets/lists in the math package ?

(I have the same problem on both my windows, centos and mac os)

Kévin Moulart


2014-03-12 17:00 GMT+01:00 Kevin Moulart :


Never mind, I found where the problem lied, I deleted the full content of
.m2 and retried it as non root user and it worked. Trying in Eclipse now,
with tests I'll let you now if it doesn't work.

Kévin Moulart


2014-03-12 16:45 GMT+01:00 Kevin Moulart :

Hi,


I tried to fix all the problem I had to configure eclipse in order to
compile mahout in it using "maven clean package" as goal.

First I had to make a change in mahout core in the class GroupTree.java,
line 171 :


stack = new ArrayDeque();



Then I tried compiling with eclipse (I already had the plugin and all
imported and I'm working on the trunk version).

 From eclipse it runs until it tries compiling the examples :


[INFO] Building jar:
/home/myCompany/Workspace_eclipse/mahout-trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools  SUCCESS [
  1.173 s]
[INFO] Apache Mahout . SUCCESS [
  0.307 s]
[INFO] Mahout Math ... SUCCESS [
  8.041 s]
[INFO] Mahout Core ... SUCCESS [
  8.378 s]
[INFO] Mahout Integration  SUCCESS [
  1.030 s]
[INFO] Mahout Examples ... FAILURE [
  5.325 s]
[INFO] Mahout Release Package  SKIPPED
[INFO] Mahout Math/Scala wrappers  SKIPPED
[INFO] Mahout Spark bindings . SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 24.630 s
[INFO] Finished at: 2014-03-12T16:38:08+01:00
[INFO] Final Memory: 101M/1430M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on project
mahout-examples: Failed to create assembly: Error creating assembly archive
job: IOException when zipping com/ibm/icu/ICUConfig.properties: invalid LOC
header (bad signature) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the
-e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn  -rf :mahout-examples



It does the exact same thing when I try typing mvn clean package in
terminal, but when I try it as root, it works, so it might be an issue with
the permissions however I fail to see where (I did a chown -R on my entire
home folder just to be on the safe side and it still fails).

Anyone had the same problem ? Any idea about how to fix it ?

Kévin Moulart










Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Kevin Moulart
Ok it does compile with maven in eclipse as well, but still, many imports
are not recognized in the sources :

- import org.apache.mahout.math.function.IntObjectProcedure;
- import org.apache.mahout.math.map.OpenIntLongHashMap;
- import org.apache.mahout.math.map.OpenIntObjectHashMap;
- import org.apache.mahout.math.set.OpenIntHashSet;
- import org.apache.mahout.math.list.DoubleArrayList;
...

Pretty much all the problems come from the OpenInt... classes that it
doesn't seem to find. Is there a jar or a pom entry I need to add here ?
Or do I have the wrong version of org.apache.mahout.math, because I can't
find those maps/sets/lists in the math package ?

(I have the same problem on both my windows, centos and mac os)

Kévin Moulart


2014-03-12 17:00 GMT+01:00 Kevin Moulart :

> Never mind, I found where the problem lied, I deleted the full content of
> .m2 and retried it as non root user and it worked. Trying in Eclipse now,
> with tests I'll let you now if it doesn't work.
>
> Kévin Moulart
>
>
> 2014-03-12 16:45 GMT+01:00 Kevin Moulart :
>
> Hi,
>>
>> I tried to fix all the problem I had to configure eclipse in order to
>> compile mahout in it using "maven clean package" as goal.
>>
>> First I had to make a change in mahout core in the class GroupTree.java,
>> line 171 :
>>
>>> stack = new ArrayDeque();
>>
>>
>> Then I tried compiling with eclipse (I already had the plugin and all
>> imported and I'm working on the trunk version).
>>
>> From eclipse it runs until it tries compiling the examples :
>>
>>> [INFO] Building jar:
>>> /home/myCompany/Workspace_eclipse/mahout-trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>>> [INFO]
>>> 
>>> [INFO] Reactor Summary:
>>> [INFO]
>>> [INFO] Mahout Build Tools  SUCCESS [
>>>  1.173 s]
>>> [INFO] Apache Mahout . SUCCESS [
>>>  0.307 s]
>>> [INFO] Mahout Math ... SUCCESS [
>>>  8.041 s]
>>> [INFO] Mahout Core ... SUCCESS [
>>>  8.378 s]
>>> [INFO] Mahout Integration  SUCCESS [
>>>  1.030 s]
>>> [INFO] Mahout Examples ... FAILURE [
>>>  5.325 s]
>>> [INFO] Mahout Release Package  SKIPPED
>>> [INFO] Mahout Math/Scala wrappers  SKIPPED
>>> [INFO] Mahout Spark bindings . SKIPPED
>>> [INFO]
>>> 
>>> [INFO] BUILD FAILURE
>>> [INFO]
>>> 
>>> [INFO] Total time: 24.630 s
>>> [INFO] Finished at: 2014-03-12T16:38:08+01:00
>>> [INFO] Final Memory: 101M/1430M
>>> [INFO]
>>> 
>>> [ERROR] Failed to execute goal
>>> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on project
>>> mahout-examples: Failed to create assembly: Error creating assembly archive
>>> job: IOException when zipping com/ibm/icu/ICUConfig.properties: invalid LOC
>>> header (bad signature) -> [Help 1]
>>> [ERROR]
>>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>>> -e switch.
>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>> [ERROR]
>>> [ERROR] For more information about the errors and possible solutions,
>>> please read the following articles:
>>> [ERROR] [Help 1]
>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>> [ERROR]
>>> [ERROR] After correcting the problems, you can resume the build with the
>>> command
>>> [ERROR]   mvn  -rf :mahout-examples
>>
>>
>> It does the exact same thing when I try typing mvn clean package in
>> terminal, but when I try it as root, it works, so it might be an issue with
>> the permissions however I fail to see where (I did a chown -R on my entire
>> home folder just to be on the safe side and it still fails).
>>
>> Anyone had the same problem ? Any idea about how to fix it ?
>>
>> Kévin Moulart
>>
>
>