Re: local file input for seqdirectory
Thanks, Suneel. On Thu, Mar 13, 2014 at 4:17 PM, Suneel Marthi wrote: > The workaround is to add -xm sequential. A MR version of seqdirectory was > introduced in 0.8 and hence the default execution mode is MR if none is > specified. > > > > > > > On Thursday, March 13, 2014 4:12 PM, Steven Cullens > wrote: > > Hi, > > I have a large number of files on the order of kilobytes on my local > machine that I want to convert to a sequence file on HDFS. Whenever, I try > to copy the local files to HDFS, hadoop complains about bad blocks, > presumably because each block is 64mb and there are more files than blocks. > In mahout 0.7, I would tell it that the input files are local, like: > > mahout seqdirectory -i file:// -o > > But I can't use the same command on Mahout 0.9, where it expects the file > system to be HDFS. Is there a workaround to generating the sequence file > using Mahout 0.9? Thanks. > > Steven >
Re: local file input for seqdirectory
The workaround is to add -xm sequential. A MR version of seqdirectory was introduced in 0.8 and hence the default execution mode is MR if none is specified. On Thursday, March 13, 2014 4:12 PM, Steven Cullens wrote: Hi, I have a large number of files on the order of kilobytes on my local machine that I want to convert to a sequence file on HDFS. Whenever, I try to copy the local files to HDFS, hadoop complains about bad blocks, presumably because each block is 64mb and there are more files than blocks. In mahout 0.7, I would tell it that the input files are local, like: mahout seqdirectory -i file:// -o But I can't use the same command on Mahout 0.9, where it expects the file system to be HDFS. Is there a workaround to generating the sequence file using Mahout 0.9? Thanks. Steven
local file input for seqdirectory
Hi, I have a large number of files on the order of kilobytes on my local machine that I want to convert to a sequence file on HDFS. Whenever, I try to copy the local files to HDFS, hadoop complains about bad blocks, presumably because each block is 64mb and there are more files than blocks. In mahout 0.7, I would tell it that the input files are local, like: mahout seqdirectory -i file:// -o But I can't use the same command on Mahout 0.9, where it expects the file system to be HDFS. Is there a workaround to generating the sequence file using Mahout 0.9? Thanks. Steven
Re: Website, urgent help needed
I have created issue https://issues.apache.org/jira/browse/MAHOUT-1461 Will upload shell scripts and suggested replacement text later tonight …. SCott On 3/13/14, 10:43 AM, "Sebastian Schelter" wrote: >Hi Scott, > >Create a jira ticket and attach your scripts and a text version of the >page there. > >Best, >Sebastian > > >On 03/12/2014 03:27 PM, Scott C. Cote wrote: >> I took the tour of the text analysis and pushed through despite the >> problems on the page. Commiters helped me over the hump where others >> might have just gave up (to your point). >> When I did it, I made shell scripts so that my steps would be repeatable >> with an anticipation of updating the page. >> >> Unforunately, I gave up on trying to figure out how to update the page >> (there were links indicating that I could do it), and I didn¹t want to >> appear to be stupid asking how to update the documentation (my bad - not >> anyone else). Now I know that it was not possible unless I was a >>commiter. >> >> Who should I send my scripts to, or how should I proceed with a current >> form of the page? >> >> SCott >> >> On 3/12/14, 5:02 AM, "Sebastian Schelter" wrote: >> >>> Hi Pavan, >>> >>> Awesome that you're willing to help. The documentation are the pages >>> listed under "Clustering" in the navigation bar under mahout.apache.org >>> >>> If you start working on one of the pages listed there (e.g. the k-Means >>> doc), please created jira ticket in our issue tracker with a title >>>along >>> the lines of "Cleaning up the documentation for k-Means on the >>>website". >>> >>> Put a list of errors and corrections into the jira and I (or some other >>> committer) will make sure to fix the website. >>> >>> Thanks, >>> Sebastian >>> >>> >>> On 03/12/2014 08:48 AM, Pavan Kumar N wrote: i ll help with clustering algorithms documentation. do send me old documentation and i will check and remove errors. or better let me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, "Sebastian Schelter" wrote: > Hi, > > As you've probably noticed, I've put in a lot of effort over the last > days > to kickstart cleaning up our website. I've thrown out a lot of stuff > and > have been startled by the amout of outdated and incorrect information > on > our website, as well as links pointing to nowhere. > > I think our lack of documentation makes it superhard to use Mahout >for > new > people. A crucial next step is to clean up the documentation on > classification and clustering. I cannot do this alone, because I >don't > have > the time and I'm not so familiar with the background of the >algorithms. > > I need volunteers to go through all the pages under "Classification" > and > "Clustering" on the website. For the algorithms, the content and > claims of > the articles need to be checked, for the examples we need to make >sure > that > everything still works as described. It would also be great to move > articles from personal blogs to our website. > > Imagine that some developer wants to try out Mahout and takes one >hour > for > that in the evening. She will go to our website, download Mahout, >read > the > description of an algorithm and try to run an example. In the current > state > of the documentation, I'm afraid that most people will walk away > frustrated, because the website does not help them as it should. > > Best, > Sebastian > > PS: I will make my standpoint on whether Mahout should do a 1.0 >release > depend on whether we manage to clean up and maintain our >documentation. > >>> >> >> >
Re: bug report
OK I see that. thanks Regards, Mahmood On Thursday, March 13, 2014 10:25 PM, Andrew Musselman wrote: That's right, thanks On Thu, Mar 13, 2014 at 11:52 AM, Ted Dunning wrote: > You have to be logged in to JIRA to do this. To log in, you may need to > create an account. > > > > On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > > > > > > https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel > > > > "Create Issue" button at the top of the page. > > > > > > On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan > >wrote: > > > > > Hi > > > Where can I submit a mahout bug? I am not familiar with JIRA and I see > > > issues and agile. > > > > > > > > > > > > Regards, > > > Mahmood > > >
Re: Solving "heap size error"
We used the wikipedia splitter as a benchmark for our simulation on hadoop 0.2. I am now trying to run that on the latest hadoop to be up to date and check some differences. For now, I have no other choice. Regards, Mahmood On Thursday, March 13, 2014 10:12 PM, Andrew Musselman wrote: What's your larger goal here; are you putting Hadoop and Mahout through paces as an exercise? If your process is blowing through data quickly up to a certain point there may be something happening with a common value, which is a "data bug". I don't know what this wikipedia splitter class does but if you're interested in isolating the issue you could find out what is happening data-wise and see if there is some very large grouping on a pathologically frequent key for instance. On Thu, Mar 13, 2014 at 11:31 AM, Mahmood Naderan wrote: > I am pretty sure that there is something wrong with hadoop/mahout/java. > With any configuration, it stuck at the chunk #571. Previous chunks are > created rapidly but I see it waits for bout 30 minutes on 571 and that is > the reason for heap error size. > > I will try to submit a bug report. > > > Regards, > Mahmood > > > > On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan > wrote: > > Strange thing is that if I use either -Xmx128m of -Xmx16384m the process > stops at the chunk #571 (571*64=36.5GB). > Still I haven't figured out is this a problem with JVM or Hadoop or Mahout? > > I have tested various parameters on 16GB RAM > > > > mapred.map.child.java.opts > -Xmx2048m > > > > mapred.reduce.child.java.opts > -Xmx4096m > > > > Is there an relation between the parameters and the amount of available > memory? > I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by > default. What is that? > > Regards, > Mahmood > > > > On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan > wrote: > > As I posted earlier, here is the result of a successful test > > 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) > with 4GB of RAM and -Xmx128m tooks 5 minutes to complete. > > I didn't find a larger wikipedia XML file. Need > to test 10GB, 20GB and 30GB files > > > > Regards, > Mahmood > > > > > On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > > Can you please try running this on a smaller file first, per Suneel's > comment a while back: > > "Please first try running this on a smaller dataset like > 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire > english wikipedia." > > > > On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan >wrote: > > > Hi, > > Recently I have faced a heap size error when I run > > > > $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d > > > $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o > > wikipedia/chunks -c 64 > > > > Here is the specs > > 1- XML file size = 44GB > > 2- System memory = 54GB (on virtualbox) > > 3- Heap size = 51GB (-Xmx51000m) > > > > At the time of failure, I see that 571 chunks are created (hadoop dfs > -ls) > > so 36GB of the original file has been processed. Now here are my > questions > > > > 1- Is there any way to > resume the process? As stated before, 571 chunks > > have been created. So by resuming, it can create the rest of the chunks > > (572~). > > > > 2- Is it possible to parallelize the process? Assume, 100GB of heap is > > required to process the XML file and my system cannot > afford that. Then we > > can create 20 threads each requires 5GB of heap. Next by feeding the > first > > 10 threads we can use the available 50GB of heap and after completion, we > > can feed the next set of threads. > > > > > > Regards, > > Mahmood >
Re: bug report
That's right, thanks On Thu, Mar 13, 2014 at 11:52 AM, Ted Dunning wrote: > You have to be logged in to JIRA to do this. To log in, you may need to > create an account. > > > > On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > > > > > > https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel > > > > "Create Issue" button at the top of the page. > > > > > > On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan > >wrote: > > > > > Hi > > > Where can I submit a mahout bug? I am not familiar with JIRA and I see > > > issues and agile. > > > > > > > > > > > > Regards, > > > Mahmood > > >
Re: bug report
You have to be logged in to JIRA to do this. To log in, you may need to create an account. On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > > https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel > > "Create Issue" button at the top of the page. > > > On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan >wrote: > > > Hi > > Where can I submit a mahout bug? I am not familiar with JIRA and I see > > issues and agile. > > > > > > > > Regards, > > Mahmood >
Re: Solving "heap size error"
What's your larger goal here; are you putting Hadoop and Mahout through paces as an exercise? If your process is blowing through data quickly up to a certain point there may be something happening with a common value, which is a "data bug". I don't know what this wikipedia splitter class does but if you're interested in isolating the issue you could find out what is happening data-wise and see if there is some very large grouping on a pathologically frequent key for instance. On Thu, Mar 13, 2014 at 11:31 AM, Mahmood Naderan wrote: > I am pretty sure that there is something wrong with hadoop/mahout/java. > With any configuration, it stuck at the chunk #571. Previous chunks are > created rapidly but I see it waits for bout 30 minutes on 571 and that is > the reason for heap error size. > > I will try to submit a bug report. > > > Regards, > Mahmood > > > > On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan > wrote: > > Strange thing is that if I use either -Xmx128m of -Xmx16384m the process > stops at the chunk #571 (571*64=36.5GB). > Still I haven't figured out is this a problem with JVM or Hadoop or Mahout? > > I have tested various parameters on 16GB RAM > > > > mapred.map.child.java.opts > -Xmx2048m > > > > mapred.reduce.child.java.opts > -Xmx4096m > > > > Is there an relation between the parameters and the amount of available > memory? > I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by > default. What is that? > > Regards, > Mahmood > > > > On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan > wrote: > > As I posted earlier, here is the result of a successful test > > 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) > with 4GB of RAM and -Xmx128m tooks 5 minutes to complete. > > I didn't find a larger wikipedia XML file. Need > to test 10GB, 20GB and 30GB files > > > > Regards, > Mahmood > > > > > On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > > Can you please try running this on a smaller file first, per Suneel's > comment a while back: > > "Please first try running this on a smaller dataset like > 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire > english wikipedia." > > > > On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan >wrote: > > > Hi, > > Recently I have faced a heap size error when I run > > > > $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d > > > $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o > > wikipedia/chunks -c 64 > > > > Here is the specs > > 1- XML file size = 44GB > > 2- System memory = 54GB (on virtualbox) > > 3- Heap size = 51GB (-Xmx51000m) > > > > At the time of failure, I see that 571 chunks are created (hadoop dfs > -ls) > > so 36GB of the original file has been processed. Now here are my > questions > > > > 1- Is there any way to > resume the process? As stated before, 571 chunks > > have been created. So by resuming, it can create the rest of the chunks > > (572~). > > > > 2- Is it possible to parallelize the process? Assume, 100GB of heap is > > required to process the XML file and my system cannot > afford that. Then we > > can create 20 threads each requires 5GB of heap. Next by feeding the > first > > 10 threads we can use the available 50GB of heap and after completion, we > > can feed the next set of threads. > > > > > > Regards, > > Mahmood >
Re: bug report
https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel "Create Issue" button at the top of the page. On Thu, Mar 13, 2014 at 11:29 AM, Mahmood Naderan wrote: > Hi > Where can I submit a mahout bug? I am not familiar with JIRA and I see > issues and agile. > > > > Regards, > Mahmood
Re: Solving "heap size error"
I am pretty sure that there is something wrong with hadoop/mahout/java. With any configuration, it stuck at the chunk #571. Previous chunks are created rapidly but I see it waits for bout 30 minutes on 571 and that is the reason for heap error size. I will try to submit a bug report. Regards, Mahmood On Thursday, March 13, 2014 2:31 PM, Mahmood Naderan wrote: Strange thing is that if I use either -Xmx128m of -Xmx16384m the process stops at the chunk #571 (571*64=36.5GB). Still I haven't figured out is this a problem with JVM or Hadoop or Mahout? I have tested various parameters on 16GB RAM mapred.map.child.java.opts -Xmx2048m mapred.reduce.child.java.opts -Xmx4096m Is there an relation between the parameters and the amount of available memory? I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by default. What is that? Regards, Mahmood On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan wrote: As I posted earlier, here is the result of a successful test 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) with 4GB of RAM and -Xmx128m tooks 5 minutes to complete. I didn't find a larger wikipedia XML file. Need to test 10GB, 20GB and 30GB files Regards, Mahmood On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman wrote: Can you please try running this on a smaller file first, per Suneel's comment a while back: "Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia." On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan wrote: > Hi, > Recently I have faced a heap size error when I run > > $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d > $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o > wikipedia/chunks -c 64 > > Here is the specs > 1- XML file size = 44GB > 2- System memory = 54GB (on virtualbox) > 3- Heap size = 51GB (-Xmx51000m) > > At the time of failure, I see that 571 chunks are created (hadoop dfs -ls) > so 36GB of the original file has been processed. Now here are my questions > > 1- Is there any way to resume the process? As stated before, 571 chunks > have been created. So by resuming, it can create the rest of the chunks > (572~). > > 2- Is it possible to parallelize the process? Assume, 100GB of heap is > required to process the XML file and my system cannot afford that. Then we > can create 20 threads each requires 5GB of heap. Next by feeding the first > 10 threads we can use the available 50GB of heap and after completion, we > can feed the next set of threads. > > > Regards, > Mahmood
bug report
Hi Where can I submit a mahout bug? I am not familiar with JIRA and I see issues and agile. Regards, Mahmood
Re: Website, urgent help needed
Hi Scott, Create a jira ticket and attach your scripts and a text version of the page there. Best, Sebastian On 03/12/2014 03:27 PM, Scott C. Cote wrote: I took the tour of the text analysis and pushed through despite the problems on the page. Commiters helped me over the hump where others might have just gave up (to your point). When I did it, I made shell scripts so that my steps would be repeatable with an anticipation of updating the page. Unforunately, I gave up on trying to figure out how to update the page (there were links indicating that I could do it), and I didn¹t want to appear to be stupid asking how to update the documentation (my bad - not anyone else). Now I know that it was not possible unless I was a commiter. Who should I send my scripts to, or how should I proceed with a current form of the page? SCott On 3/12/14, 5:02 AM, "Sebastian Schelter" wrote: Hi Pavan, Awesome that you're willing to help. The documentation are the pages listed under "Clustering" in the navigation bar under mahout.apache.org If you start working on one of the pages listed there (e.g. the k-Means doc), please created jira ticket in our issue tracker with a title along the lines of "Cleaning up the documentation for k-Means on the website". Put a list of errors and corrections into the jira and I (or some other committer) will make sure to fix the website. Thanks, Sebastian On 03/12/2014 08:48 AM, Pavan Kumar N wrote: i ll help with clustering algorithms documentation. do send me old documentation and i will check and remove errors. or better let me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, "Sebastian Schelter" wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the amout of outdated and incorrect information on our website, as well as links pointing to nowhere. I think our lack of documentation makes it superhard to use Mahout for new people. A crucial next step is to clean up the documentation on classification and clustering. I cannot do this alone, because I don't have the time and I'm not so familiar with the background of the algorithms. I need volunteers to go through all the pages under "Classification" and "Clustering" on the website. For the algorithms, the content and claims of the articles need to be checked, for the examples we need to make sure that everything still works as described. It would also be great to move articles from personal blogs to our website. Imagine that some developer wants to try out Mahout and takes one hour for that in the evening. She will go to our website, download Mahout, read the description of an algorithm and try to run an example. In the current state of the documentation, I'm afraid that most people will walk away frustrated, because the website does not help them as it should. Best, Sebastian PS: I will make my standpoint on whether Mahout should do a 1.0 release depend on whether we manage to clean up and maintain our documentation.
Re: Commons IO version mismatch with CDH 4.6
My Bad!!! I was pointing to wrong jar..sorry for this... On Thu, Mar 13, 2014 at 4:11 PM, Bikash Gupta wrote: > Hi, > > Running Kmeans in cluster of CDH 4.6 I have a new issue with commons-io > compatibility > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1107) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:539) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:738) > at > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:782) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800) > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) > at > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1714) > at > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1728) > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56) > at > org.apache.mahout.clustering.iterator.ClusterIterator.isConverged(ClusterIterator.java:207) > at > org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:188) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:217) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:140) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > Any suggestion? > > -- > Thanks & Regards > Bikash Gupta > -- Thanks & Regards Bikash Kumar Gupta
Re: Solving "heap size error"
Strange thing is that if I use either -Xmx128m of -Xmx16384m the process stops at the chunk #571 (571*64=36.5GB). Still I haven't figured out is this a problem with JVM or Hadoop or Mahout? I have tested various parameters on 16GB RAM mapred.map.child.java.opts -Xmx2048m mapred.reduce.child.java.opts -Xmx4096m Is there an relation between the parameters and the amount of available memory? I also see a HADOOP_HEAPSIZE in hadoop-env.sh which is commented by default. What is that? Regards, Mahmood On Tuesday, March 11, 2014 11:57 PM, Mahmood Naderan wrote: As I posted earlier, here is the result of a successful test 5.4GB XML file (which is larger than enwiki-latest-pages-articles10.xml) with 4GB of RAM and -Xmx128m tooks 5 minutes to complete. I didn't find a larger wikipedia XML file. Need to test 10GB, 20GB and 30GB files Regards, Mahmood On Tuesday, March 11, 2014 11:41 PM, Andrew Musselman wrote: Can you please try running this on a smaller file first, per Suneel's comment a while back: "Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia." On Tue, Mar 11, 2014 at 12:56 PM, Mahmood Naderan wrote: > Hi, > Recently I have faced a heap size error when I run > > $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d > $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o > wikipedia/chunks -c 64 > > Here is the specs > 1- XML file size = 44GB > 2- System memory = 54GB (on virtualbox) > 3- Heap size = 51GB (-Xmx51000m) > > At the time of failure, I see that 571 chunks are created (hadoop dfs -ls) > so 36GB of the original file has been processed. Now here are my questions > > 1- Is there any way to resume the process? As stated before, 571 chunks > have been created. So by resuming, it can create the rest of the chunks > (572~). > > 2- Is it possible to parallelize the process? Assume, 100GB of heap is > required to process the XML file and my system cannot afford that. Then we > can create 20 threads each requires 5GB of heap. Next by feeding the first > 10 threads we can use the available 50GB of heap and after completion, we > can feed the next set of threads. > > > Regards, > Mahmood
Re: verbose output
The hadoop-2.3.0/log is empty when I run mahout command which uses hadoop Regards, Mahmood On Thursday, March 13, 2014 12:53 PM, Sebastian Schelter wrote: To my knowledge, there is no such flag for mahout. You can check hadoop's logs for further information however. On 03/13/2014 10:21 AM, Mahmood Naderan wrote: > Hi, > Is there any verbosity flag for hadoop and mahout commands? I can not find > such thing in the command line. > > > Regards, > Mahmood >
Commons IO version mismatch with CDH 4.6
Hi, Running Kmeans in cluster of CDH 4.6 I have a new issue with commons-io compatibility Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1107) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:539) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:738) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:782) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1728) at org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56) at org.apache.mahout.clustering.iterator.ClusterIterator.isConverged(ClusterIterator.java:207) at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:188) at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:217) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:140) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) Any suggestion? -- Thanks & Regards Bikash Gupta
Fwd: Compiling Mahout with maven in Eclipse
Here are the followings of the conversation between Sebastian and me, since I hit reply instead of reply all at some point. Kévin Moulart -- Forwarded message -- From: Sebastian Schelter Date: 2014-03-13 10:33 GMT+01:00 Subject: Re: Compiling Mahout with maven in Eclipse To: Kevin Moulart I use Intellij IDEA. Its support for maven projects is very nice. You should be able to simply import mahout as maven project there and everything should work fine. --sebastian On 03/13/2014 10:24 AM, Kevin Moulart wrote: > Actually I pretty much don't care which IDE I use, if you could share the > one you use I can try and make something work. At this point I would only > update the documentation on several files, so even something like Sublime > Text would work for me. I just wanted eclipse for the ease of accessing the > javadoc and the links between the files (I can click a class and get > directly to it's declaration...). > > Kévin Moulart > > > 2014-03-13 10:20 GMT+01:00 Sebastian Schelter : > > Hm, I'm not an Eclipse user myself, maybe someone else can help? >> >> >> >> On 03/13/2014 10:15 AM, Kevin Moulart wrote: >> >> [myCompany@node01 mahout-trunk]$ ls -al >>> total 180 >>> drwxrwxr-x 15 myCompany myCompany 4096 Mar 13 10:05 . >>> drwxrwxr-x 6 myCompany myCompany 4096 Mar 7 11:42 .. >>> drwxrwxr-x 2 myCompany myCompany 4096 Mar 7 11:43 bin >>> drwxrwxr-x 5 myCompany myCompany 4096 Mar 13 10:05 buildtools >>> -rw-rw-r-- 1 myCompany myCompany 12979 Mar 7 11:43 CHANGELOG >>> drwxrwxr-x 6 myCompany myCompany 4096 Mar 12 17:00 core >>> drwxrwxr-x 4 myCompany myCompany 4096 Mar 7 11:46 distribution >>> -rw-rw-r-- 1 myCompany myCompany 2320 Mar 7 11:43 doap_Mahout.rdf >>> drwxrwxr-x 6 myCompany myCompany 4096 Mar 12 17:11 examples >>> -rw-rw-r-- 1 myCompany myCompany 213 Mar 7 11:43 .gitignore >>> drwxrwxr-x 7 myCompany myCompany 4096 Mar 12 17:11 integration >>> -rw-rw-r-- 1 myCompany myCompany 39588 Mar 7 11:43 LICENSE.txt >>> drwxrwxr-x 5 myCompany myCompany 4096 Mar 13 10:05 math >>> drwxrwxr-x 5 myCompany myCompany 4096 Mar 12 17:12 math-scala >>> -rw-rw-r-- 1 myCompany myCompany 1888 Mar 7 11:43 NOTICE.txt >>> -rw-rw-r-- 1 myCompany myCompany 42747 Mar 7 11:49 pom.xml >>> -rw-rw-r-- 1 myCompany myCompany 375 Mar 7 11:43 .project >>> -rw-rw-r-- 1 myCompany myCompany 1212 Mar 7 11:42 README.txt >>> drwxrwxr-x 2 myCompany myCompany 4096 Mar 7 11:43 .settings >>> drwxrwxr-x 6 myCompany myCompany 4096 Mar 12 17:12 spark >>> drwxrwxr-x 4 myCompany myCompany 4096 Mar 7 11:42 src >>> drwxrwxr-x 4 myCompany myCompany 4096 Mar 7 11:43 .svn >>> drwxrwxr-x 3 myCompany myCompany 4096 Mar 13 10:05 target >>> >>> Yes I do think so. But I don't understand how mvn package can work >>> without >>> generating those files... Even the JUnit tests passed... ? >>> >>> Kévin Moulart >>> >>> >>> 2014-03-13 10:11 GMT+01:00 Sebastian Schelter : >>> >>> Are executing maven in the topmost directory? >>> On 03/13/2014 10:09 AM, Kevin Moulart wrote: I did, but then it fails because of these missing files : > https://gist.github.com/kmoulart/9524828 > > Kévin Moulart > > > 2014-03-13 9:57 GMT+01:00 Sebastian Schelter : > >Maven should generate the classes automatically. Have you tried > running > > >> mvn -DskipTests clean install >> >> on the commandline? >> >> >> >> >> On 03/13/2014 09:50 AM, Kevin Moulart wrote: >> >>How can I generate them to make these errors go away then ? Or >> don't >> I >> >> have >>> to ? >>> >>> Kévin Moulart >>> >>> >>> 2014-03-13 9:17 GMT+01:00 Sebastian Schelter < >>> ssc.o...@googlemail.com >>> : >>> >>> Those are autogenerated. >>> >>> >>> On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports > are not recognized in the sources : > > - import org.apache.mahout.math.function.IntObjectProcedure; > - import org.apache.mahout.math.map.OpenIntLongHashMap; > - import org.apache.mahout.math.map.OpenIntObjectHashMap; > - import org.apache.mahout.math.set.OpenIntHashSet; > - import org.apache.mahout.math.list.DoubleArrayList; > ... > > Pretty much all the problems come from the OpenInt... classes that > it > doesn't seem to find. Is there a jar or a pom entry I need to add > here ? > Or do I have the wrong version of org.apache.mahout.math, because I > can't > find those maps/sets/lists in the math package ? > > (I have the same problem on both my windows, centos and mac os) > > Kévin Moulart > > > 2014-03-12
Re: verbose output
To my knowledge, there is no such flag for mahout. You can check hadoop's logs for further information however. On 03/13/2014 10:21 AM, Mahmood Naderan wrote: Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line. Regards, Mahmood
verbose output
Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line. Regards, Mahmood
Re: Compiling Mahout with maven in Eclipse
Are executing maven in the topmost directory? On 03/13/2014 10:09 AM, Kevin Moulart wrote: I did, but then it fails because of these missing files : https://gist.github.com/kmoulart/9524828 Kévin Moulart 2014-03-13 9:57 GMT+01:00 Sebastian Schelter : Maven should generate the classes automatically. Have you tried running mvn -DskipTests clean install on the commandline? On 03/13/2014 09:50 AM, Kevin Moulart wrote: How can I generate them to make these errors go away then ? Or don't I have to ? Kévin Moulart 2014-03-13 9:17 GMT+01:00 Sebastian Schelter : Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import org.apache.mahout.math.map.OpenIntLongHashMap; - import org.apache.mahout.math.map.OpenIntObjectHashMap; - import org.apache.mahout.math.set.OpenIntHashSet; - import org.apache.mahout.math.list.DoubleArrayList; ... Pretty much all the problems come from the OpenInt... classes that it doesn't seem to find. Is there a jar or a pom entry I need to add here ? Or do I have the wrong version of org.apache.mahout.math, because I can't find those maps/sets/lists in the math package ? (I have the same problem on both my windows, centos and mac os) Kévin Moulart 2014-03-12 17:00 GMT+01:00 Kevin Moulart : Never mind, I found where the problem lied, I deleted the full content of .m2 and retried it as non root user and it worked. Trying in Eclipse now, with tests I'll let you now if it doesn't work. Kévin Moulart 2014-03-12 16:45 GMT+01:00 Kevin Moulart : Hi, I tried to fix all the problem I had to configure eclipse in order to compile mahout in it using "maven clean package" as goal. First I had to make a change in mahout core in the class GroupTree.java, line 171 : stack = new ArrayDeque(); Then I tried compiling with eclipse (I already had the plugin and all imported and I'm working on the trunk version). From eclipse it runs until it tries compiling the examples : [INFO] Building jar: /home/myCompany/Workspace_eclipse/mahout-trunk/examples/ target/mahout-examples-1.0-SNAPSHOT-job.jar [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [ 1.173 s] [INFO] Apache Mahout . SUCCESS [ 0.307 s] [INFO] Mahout Math ... SUCCESS [ 8.041 s] [INFO] Mahout Core ... SUCCESS [ 8.378 s] [INFO] Mahout Integration SUCCESS [ 1.030 s] [INFO] Mahout Examples ... FAILURE [ 5.325 s] [INFO] Mahout Release Package SKIPPED [INFO] Mahout Math/Scala wrappers SKIPPED [INFO] Mahout Spark bindings . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 24.630 s [INFO] Finished at: 2014-03-12T16:38:08+01:00 [INFO] Final Memory: 101M/1430M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on project mahout-examples: Failed to create assembly: Error creating assembly archive job: IOException when zipping com/ibm/icu/ICUConfig.properties: invalid LOC header (bad signature) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :mahout-examples It does the exact same thing when I try typing mvn clean package in terminal, but when I try it as root, it works, so it might be an issue with the permissions however I fail to see where (I did a chown -R on my entire home folder just to be on the safe side and it still fails). Anyone had the same problem ? Any idea about how to fix it ? Kévin Moulart
Re: Compiling Mahout with maven in Eclipse
I did, but then it fails because of these missing files : https://gist.github.com/kmoulart/9524828 Kévin Moulart 2014-03-13 9:57 GMT+01:00 Sebastian Schelter : > Maven should generate the classes automatically. Have you tried running > > mvn -DskipTests clean install > > on the commandline? > > > > > On 03/13/2014 09:50 AM, Kevin Moulart wrote: > >> How can I generate them to make these errors go away then ? Or don't I >> have >> to ? >> >> Kévin Moulart >> >> >> 2014-03-13 9:17 GMT+01:00 Sebastian Schelter : >> >> Those are autogenerated. >>> >>> >>> On 03/13/2014 09:05 AM, Kevin Moulart wrote: >>> >>> Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import org.apache.mahout.math.map.OpenIntLongHashMap; - import org.apache.mahout.math.map.OpenIntObjectHashMap; - import org.apache.mahout.math.set.OpenIntHashSet; - import org.apache.mahout.math.list.DoubleArrayList; ... Pretty much all the problems come from the OpenInt... classes that it doesn't seem to find. Is there a jar or a pom entry I need to add here ? Or do I have the wrong version of org.apache.mahout.math, because I can't find those maps/sets/lists in the math package ? (I have the same problem on both my windows, centos and mac os) Kévin Moulart 2014-03-12 17:00 GMT+01:00 Kevin Moulart : Never mind, I found where the problem lied, I deleted the full content of > .m2 and retried it as non root user and it worked. Trying in Eclipse > now, > with tests I'll let you now if it doesn't work. > > Kévin Moulart > > > 2014-03-12 16:45 GMT+01:00 Kevin Moulart : > > Hi, > > >> I tried to fix all the problem I had to configure eclipse in order to >> compile mahout in it using "maven clean package" as goal. >> >> First I had to make a change in mahout core in the class >> GroupTree.java, >> line 171 : >> >> stack = new ArrayDeque(); >> >>> >>> >> >> Then I tried compiling with eclipse (I already had the plugin and all >> imported and I'm working on the trunk version). >> >> From eclipse it runs until it tries compiling the examples : >> >> [INFO] Building jar: >> >>> /home/myCompany/Workspace_eclipse/mahout-trunk/examples/ >>> target/mahout-examples-1.0-SNAPSHOT-job.jar >>> [INFO] >>> >>> >>> [INFO] Reactor Summary: >>> [INFO] >>> [INFO] Mahout Build Tools SUCCESS [ >>>1.173 s] >>> [INFO] Apache Mahout . SUCCESS [ >>>0.307 s] >>> [INFO] Mahout Math ... SUCCESS [ >>>8.041 s] >>> [INFO] Mahout Core ... SUCCESS [ >>>8.378 s] >>> [INFO] Mahout Integration SUCCESS [ >>>1.030 s] >>> [INFO] Mahout Examples ... FAILURE [ >>>5.325 s] >>> [INFO] Mahout Release Package SKIPPED >>> [INFO] Mahout Math/Scala wrappers SKIPPED >>> [INFO] Mahout Spark bindings . SKIPPED >>> [INFO] >>> >>> >>> [INFO] BUILD FAILURE >>> [INFO] >>> >>> >>> [INFO] Total time: 24.630 s >>> [INFO] Finished at: 2014-03-12T16:38:08+01:00 >>> [INFO] Final Memory: 101M/1430M >>> [INFO] >>> >>> >>> [ERROR] Failed to execute goal >>> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on >>> project >>> mahout-examples: Failed to create assembly: Error creating assembly >>> archive >>> job: IOException when zipping com/ibm/icu/ICUConfig.properties: >>> invalid LOC >>> header (bad signature) -> [Help 1] >>> [ERROR] >>> [ERROR] To see the full stack trace of the errors, re-run Maven with >>> the >>> -e switch. >>> [ERROR] Re-run Maven using the -X switch to enable full debug >>> logging. >>> [ERROR] >>> [ERROR] For more information about the errors and possible solutions, >>> please read the following articles: >>> [ERROR] [Help 1] >>> http://cwiki.apache.org/confluence/display/MAVEN/ >>> MojoExecutionException >>> [ERROR] >>> [ERROR] After correcting the problems, you can resume the build with >>> the >>> command >>> [ERROR] mvn -rf :mahout-examples >>> >>> >> >>
Re: Compiling Mahout with maven in Eclipse
Maven should generate the classes automatically. Have you tried running mvn -DskipTests clean install on the commandline? On 03/13/2014 09:50 AM, Kevin Moulart wrote: How can I generate them to make these errors go away then ? Or don't I have to ? Kévin Moulart 2014-03-13 9:17 GMT+01:00 Sebastian Schelter : Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import org.apache.mahout.math.map.OpenIntLongHashMap; - import org.apache.mahout.math.map.OpenIntObjectHashMap; - import org.apache.mahout.math.set.OpenIntHashSet; - import org.apache.mahout.math.list.DoubleArrayList; ... Pretty much all the problems come from the OpenInt... classes that it doesn't seem to find. Is there a jar or a pom entry I need to add here ? Or do I have the wrong version of org.apache.mahout.math, because I can't find those maps/sets/lists in the math package ? (I have the same problem on both my windows, centos and mac os) Kévin Moulart 2014-03-12 17:00 GMT+01:00 Kevin Moulart : Never mind, I found where the problem lied, I deleted the full content of .m2 and retried it as non root user and it worked. Trying in Eclipse now, with tests I'll let you now if it doesn't work. Kévin Moulart 2014-03-12 16:45 GMT+01:00 Kevin Moulart : Hi, I tried to fix all the problem I had to configure eclipse in order to compile mahout in it using "maven clean package" as goal. First I had to make a change in mahout core in the class GroupTree.java, line 171 : stack = new ArrayDeque(); Then I tried compiling with eclipse (I already had the plugin and all imported and I'm working on the trunk version). From eclipse it runs until it tries compiling the examples : [INFO] Building jar: /home/myCompany/Workspace_eclipse/mahout-trunk/examples/ target/mahout-examples-1.0-SNAPSHOT-job.jar [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [ 1.173 s] [INFO] Apache Mahout . SUCCESS [ 0.307 s] [INFO] Mahout Math ... SUCCESS [ 8.041 s] [INFO] Mahout Core ... SUCCESS [ 8.378 s] [INFO] Mahout Integration SUCCESS [ 1.030 s] [INFO] Mahout Examples ... FAILURE [ 5.325 s] [INFO] Mahout Release Package SKIPPED [INFO] Mahout Math/Scala wrappers SKIPPED [INFO] Mahout Spark bindings . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 24.630 s [INFO] Finished at: 2014-03-12T16:38:08+01:00 [INFO] Final Memory: 101M/1430M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on project mahout-examples: Failed to create assembly: Error creating assembly archive job: IOException when zipping com/ibm/icu/ICUConfig.properties: invalid LOC header (bad signature) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :mahout-examples It does the exact same thing when I try typing mvn clean package in terminal, but when I try it as root, it works, so it might be an issue with the permissions however I fail to see where (I did a chown -R on my entire home folder just to be on the safe side and it still fails). Anyone had the same problem ? Any idea about how to fix it ? Kévin Moulart
Re: Compiling Mahout with maven in Eclipse
How can I generate them to make these errors go away then ? Or don't I have to ? Kévin Moulart 2014-03-13 9:17 GMT+01:00 Sebastian Schelter : > Those are autogenerated. > > > On 03/13/2014 09:05 AM, Kevin Moulart wrote: > >> Ok it does compile with maven in eclipse as well, but still, many imports >> are not recognized in the sources : >> >> - import org.apache.mahout.math.function.IntObjectProcedure; >> - import org.apache.mahout.math.map.OpenIntLongHashMap; >> - import org.apache.mahout.math.map.OpenIntObjectHashMap; >> - import org.apache.mahout.math.set.OpenIntHashSet; >> - import org.apache.mahout.math.list.DoubleArrayList; >> ... >> >> Pretty much all the problems come from the OpenInt... classes that it >> doesn't seem to find. Is there a jar or a pom entry I need to add here ? >> Or do I have the wrong version of org.apache.mahout.math, because I can't >> find those maps/sets/lists in the math package ? >> >> (I have the same problem on both my windows, centos and mac os) >> >> Kévin Moulart >> >> >> 2014-03-12 17:00 GMT+01:00 Kevin Moulart : >> >> Never mind, I found where the problem lied, I deleted the full content of >>> .m2 and retried it as non root user and it worked. Trying in Eclipse now, >>> with tests I'll let you now if it doesn't work. >>> >>> Kévin Moulart >>> >>> >>> 2014-03-12 16:45 GMT+01:00 Kevin Moulart : >>> >>> Hi, >>> I tried to fix all the problem I had to configure eclipse in order to compile mahout in it using "maven clean package" as goal. First I had to make a change in mahout core in the class GroupTree.java, line 171 : stack = new ArrayDeque(); > Then I tried compiling with eclipse (I already had the plugin and all imported and I'm working on the trunk version). From eclipse it runs until it tries compiling the examples : [INFO] Building jar: > /home/myCompany/Workspace_eclipse/mahout-trunk/examples/ > target/mahout-examples-1.0-SNAPSHOT-job.jar > [INFO] > > > [INFO] Reactor Summary: > [INFO] > [INFO] Mahout Build Tools SUCCESS [ > 1.173 s] > [INFO] Apache Mahout . SUCCESS [ > 0.307 s] > [INFO] Mahout Math ... SUCCESS [ > 8.041 s] > [INFO] Mahout Core ... SUCCESS [ > 8.378 s] > [INFO] Mahout Integration SUCCESS [ > 1.030 s] > [INFO] Mahout Examples ... FAILURE [ > 5.325 s] > [INFO] Mahout Release Package SKIPPED > [INFO] Mahout Math/Scala wrappers SKIPPED > [INFO] Mahout Spark bindings . SKIPPED > [INFO] > > > [INFO] BUILD FAILURE > [INFO] > > > [INFO] Total time: 24.630 s > [INFO] Finished at: 2014-03-12T16:38:08+01:00 > [INFO] Final Memory: 101M/1430M > [INFO] > > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on > project > mahout-examples: Failed to create assembly: Error creating assembly > archive > job: IOException when zipping com/ibm/icu/ICUConfig.properties: > invalid LOC > header (bad signature) -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with > the > -e switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, > please read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/ > MojoExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with > the > command > [ERROR] mvn -rf :mahout-examples > It does the exact same thing when I try typing mvn clean package in terminal, but when I try it as root, it works, so it might be an issue with the permissions however I fail to see where (I did a chown -R on my entire home folder just to be on the safe side and it still fails). Anyone had the same problem ? Any idea about how to fix it ? Kévin Moulart >>> >>> >> >
Re: Compiling Mahout with maven in Eclipse
Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import org.apache.mahout.math.map.OpenIntLongHashMap; - import org.apache.mahout.math.map.OpenIntObjectHashMap; - import org.apache.mahout.math.set.OpenIntHashSet; - import org.apache.mahout.math.list.DoubleArrayList; ... Pretty much all the problems come from the OpenInt... classes that it doesn't seem to find. Is there a jar or a pom entry I need to add here ? Or do I have the wrong version of org.apache.mahout.math, because I can't find those maps/sets/lists in the math package ? (I have the same problem on both my windows, centos and mac os) Kévin Moulart 2014-03-12 17:00 GMT+01:00 Kevin Moulart : Never mind, I found where the problem lied, I deleted the full content of .m2 and retried it as non root user and it worked. Trying in Eclipse now, with tests I'll let you now if it doesn't work. Kévin Moulart 2014-03-12 16:45 GMT+01:00 Kevin Moulart : Hi, I tried to fix all the problem I had to configure eclipse in order to compile mahout in it using "maven clean package" as goal. First I had to make a change in mahout core in the class GroupTree.java, line 171 : stack = new ArrayDeque(); Then I tried compiling with eclipse (I already had the plugin and all imported and I'm working on the trunk version). From eclipse it runs until it tries compiling the examples : [INFO] Building jar: /home/myCompany/Workspace_eclipse/mahout-trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [ 1.173 s] [INFO] Apache Mahout . SUCCESS [ 0.307 s] [INFO] Mahout Math ... SUCCESS [ 8.041 s] [INFO] Mahout Core ... SUCCESS [ 8.378 s] [INFO] Mahout Integration SUCCESS [ 1.030 s] [INFO] Mahout Examples ... FAILURE [ 5.325 s] [INFO] Mahout Release Package SKIPPED [INFO] Mahout Math/Scala wrappers SKIPPED [INFO] Mahout Spark bindings . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 24.630 s [INFO] Finished at: 2014-03-12T16:38:08+01:00 [INFO] Final Memory: 101M/1430M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on project mahout-examples: Failed to create assembly: Error creating assembly archive job: IOException when zipping com/ibm/icu/ICUConfig.properties: invalid LOC header (bad signature) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :mahout-examples It does the exact same thing when I try typing mvn clean package in terminal, but when I try it as root, it works, so it might be an issue with the permissions however I fail to see where (I did a chown -R on my entire home folder just to be on the safe side and it still fails). Anyone had the same problem ? Any idea about how to fix it ? Kévin Moulart
Re: Compiling Mahout with maven in Eclipse
Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import org.apache.mahout.math.map.OpenIntLongHashMap; - import org.apache.mahout.math.map.OpenIntObjectHashMap; - import org.apache.mahout.math.set.OpenIntHashSet; - import org.apache.mahout.math.list.DoubleArrayList; ... Pretty much all the problems come from the OpenInt... classes that it doesn't seem to find. Is there a jar or a pom entry I need to add here ? Or do I have the wrong version of org.apache.mahout.math, because I can't find those maps/sets/lists in the math package ? (I have the same problem on both my windows, centos and mac os) Kévin Moulart 2014-03-12 17:00 GMT+01:00 Kevin Moulart : > Never mind, I found where the problem lied, I deleted the full content of > .m2 and retried it as non root user and it worked. Trying in Eclipse now, > with tests I'll let you now if it doesn't work. > > Kévin Moulart > > > 2014-03-12 16:45 GMT+01:00 Kevin Moulart : > > Hi, >> >> I tried to fix all the problem I had to configure eclipse in order to >> compile mahout in it using "maven clean package" as goal. >> >> First I had to make a change in mahout core in the class GroupTree.java, >> line 171 : >> >>> stack = new ArrayDeque(); >> >> >> Then I tried compiling with eclipse (I already had the plugin and all >> imported and I'm working on the trunk version). >> >> From eclipse it runs until it tries compiling the examples : >> >>> [INFO] Building jar: >>> /home/myCompany/Workspace_eclipse/mahout-trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar >>> [INFO] >>> >>> [INFO] Reactor Summary: >>> [INFO] >>> [INFO] Mahout Build Tools SUCCESS [ >>> 1.173 s] >>> [INFO] Apache Mahout . SUCCESS [ >>> 0.307 s] >>> [INFO] Mahout Math ... SUCCESS [ >>> 8.041 s] >>> [INFO] Mahout Core ... SUCCESS [ >>> 8.378 s] >>> [INFO] Mahout Integration SUCCESS [ >>> 1.030 s] >>> [INFO] Mahout Examples ... FAILURE [ >>> 5.325 s] >>> [INFO] Mahout Release Package SKIPPED >>> [INFO] Mahout Math/Scala wrappers SKIPPED >>> [INFO] Mahout Spark bindings . SKIPPED >>> [INFO] >>> >>> [INFO] BUILD FAILURE >>> [INFO] >>> >>> [INFO] Total time: 24.630 s >>> [INFO] Finished at: 2014-03-12T16:38:08+01:00 >>> [INFO] Final Memory: 101M/1430M >>> [INFO] >>> >>> [ERROR] Failed to execute goal >>> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (job) on project >>> mahout-examples: Failed to create assembly: Error creating assembly archive >>> job: IOException when zipping com/ibm/icu/ICUConfig.properties: invalid LOC >>> header (bad signature) -> [Help 1] >>> [ERROR] >>> [ERROR] To see the full stack trace of the errors, re-run Maven with the >>> -e switch. >>> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >>> [ERROR] >>> [ERROR] For more information about the errors and possible solutions, >>> please read the following articles: >>> [ERROR] [Help 1] >>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException >>> [ERROR] >>> [ERROR] After correcting the problems, you can resume the build with the >>> command >>> [ERROR] mvn -rf :mahout-examples >> >> >> It does the exact same thing when I try typing mvn clean package in >> terminal, but when I try it as root, it works, so it might be an issue with >> the permissions however I fail to see where (I did a chown -R on my entire >> home folder just to be on the safe side and it still fails). >> >> Anyone had the same problem ? Any idea about how to fix it ? >> >> Kévin Moulart >> > >