[jira] [Resolved] (MAHOUT-1407) MatrixVectorView allows out of bounds index

2014-01-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1407.
---

Resolution: Duplicate

> MatrixVectorView allows out of bounds index
> ---
>
> Key: MAHOUT-1407
> URL: https://issues.apache.org/jira/browse/MAHOUT-1407
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.9
>Reporter: Ted Dunning
>Priority: Minor
> Fix For: 0.9
>
>
> The MatrixVectorView has a > where it should have a >= in a test for index 
> out of bounds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1404) MatrixVectorView allows out of bounds index

2014-01-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1404.
---

   Resolution: Duplicate
Fix Version/s: 0.9

> MatrixVectorView allows out of bounds index
> ---
>
> Key: MAHOUT-1404
> URL: https://issues.apache.org/jira/browse/MAHOUT-1404
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Minor
> Fix For: 0.9
>
>
> The MatrixVectorView has a > where it should have a >= in a test for index 
> out of bounds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1405) MatrixVectorView allows out of bounds index

2014-01-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1405.
---

   Resolution: Duplicate
Fix Version/s: 0.9

> MatrixVectorView allows out of bounds index
> ---
>
> Key: MAHOUT-1405
> URL: https://issues.apache.org/jira/browse/MAHOUT-1405
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.9
>Reporter: Ted Dunning
>Priority: Minor
> Fix For: 0.9
>
>
> The MatrixVectorView has a > where it should have a >= in a test for index 
> out of bounds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1403) MatrixVectorView allows out of bounds index

2014-01-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1403.
---

   Resolution: Duplicate
Fix Version/s: 0.9

> MatrixVectorView allows out of bounds index
> ---
>
> Key: MAHOUT-1403
> URL: https://issues.apache.org/jira/browse/MAHOUT-1403
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Minor
> Fix For: 0.9
>
>
> The MatrixVectorView has a > where it should have a >= in a test for index 
> out of bounds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1406) MatrixVectorView allows out of bounds index

2014-01-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1406.
---

Resolution: Duplicate

> MatrixVectorView allows out of bounds index
> ---
>
> Key: MAHOUT-1406
> URL: https://issues.apache.org/jira/browse/MAHOUT-1406
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.9
>Reporter: Ted Dunning
>Priority: Minor
> Fix For: 0.9
>
>
> The MatrixVectorView has a > where it should have a >= in a test for index 
> out of bounds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-23 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880684#comment-13880684
 ] 

Ted Dunning commented on MAHOUT-1409:
-

JIRA was sick last night when I first tried to post this issue.  I didn't see 
that all of those other issues had actually been created.

All are dupes.

> MatrixVectorView has index check error
> --
>
> Key: MAHOUT-1409
> URL: https://issues.apache.org/jira/browse/MAHOUT-1409
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Ted Dunning
>Assignee: Ted Dunning
> Attachments: MAHOUT-1409.patch
>
>
> There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-23 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880651#comment-13880651
 ] 

Suneel Marthi commented on MAHOUT-1409:
---

Ted, there r 5 other JIRAs for this one issue - M-1403 - M-1407.  Was that in 
error?

> MatrixVectorView has index check error
> --
>
> Key: MAHOUT-1409
> URL: https://issues.apache.org/jira/browse/MAHOUT-1409
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Ted Dunning
>Assignee: Ted Dunning
> Attachments: MAHOUT-1409.patch
>
>
> There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-23 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated MAHOUT-1409:


 Assignee: Ted Dunning
Affects Version/s: 0.8
   Status: Patch Available  (was: Open)

> MatrixVectorView has index check error
> --
>
> Key: MAHOUT-1409
> URL: https://issues.apache.org/jira/browse/MAHOUT-1409
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Ted Dunning
>Assignee: Ted Dunning
> Attachments: MAHOUT-1409.patch
>
>
> There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-23 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated MAHOUT-1409:


Attachment: MAHOUT-1409.patch

Here is a patch that adds a test and the fix.

> MatrixVectorView has index check error
> --
>
> Key: MAHOUT-1409
> URL: https://issues.apache.org/jira/browse/MAHOUT-1409
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Ted Dunning
> Attachments: MAHOUT-1409.patch
>
>
> There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-23 Thread Ted Dunning (JIRA)
Ted Dunning created MAHOUT-1409:
---

 Summary: MatrixVectorView has index check error
 Key: MAHOUT-1409
 URL: https://issues.apache.org/jira/browse/MAHOUT-1409
 Project: Mahout
  Issue Type: Bug
Reporter: Ted Dunning


There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1326) Fix broken links to quickstart tutorials

2014-01-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1326.
---

Resolution: Fixed

Marking this as Resolved, most of the broken links reported here have been 
fixed by Sotiris recent work on the wiki migration to CMS.

> Fix broken links to quickstart tutorials
> 
>
> Key: MAHOUT-1326
> URL: https://issues.apache.org/jira/browse/MAHOUT-1326
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Ravi Mummulla
> Fix For: 0.9
>
>
> All links are broken in https://cwiki.apache.org/MAHOUT/quickstart.html.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1408) Distributed cache file matching bug while running SSVD in broadcast mode

2014-01-23 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880200#comment-13880200
 ] 

Dmitriy Lyubimov commented on MAHOUT-1408:
--

I take it you are trying to use SSVD solver in some sort of embedded mode, not 
a pure Mahout CLI? 
Still though, i am not sure why you want wrestle control over map reduce from 
SSVD solver in individual MR steps? Additional jars will not get there (nor 
they are needed by SSVD jobs). Mahout architecture, in general,  and this 
pipeline in particular, does not assume you get to manipulate individual job 
settings. This pipeline's step legitimately expects to find the files in the 
cache that SSVD pipeline has put into it. 

I would like to place a burden on you to explain why you think SSVD pipeline 
should expect someone messing up its MR settings.

Assuming however your reasons are valid, this (BtJob mr) would not be the only 
MR case where cache is used in the SSVD pipeline and this patch will not be 
sufficient to do this throughout. 


> Distributed cache file matching bug while running SSVD in broadcast mode
> 
>
> Key: MAHOUT-1408
> URL: https://issues.apache.org/jira/browse/MAHOUT-1408
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.8
>Reporter: Angad Singh
>Assignee: Dmitriy Lyubimov
>Priority: Minor
> Attachments: BtJob.java.patch
>
>
> The error is:
> java.lang.IllegalArgumentException: Unexpected file name, unable to deduce 
> partition 
> #:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
>   at java.util.Arrays.mergeSort(Arrays.java:1270)
>   at java.util.Arrays.mergeSort(Arrays.java:1281)
>   at java.util.Arrays.mergeSort(Arrays.java:1281)
>   at java.util.Arrays.sort(Arrays.java:1210)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.(SequenceFileDirValueIterator.java:94)
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
> The bug is @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
>  near line 220.
> and  @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
>  near line 144.
> SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache 
> will have a particular pattern whereas we have jar files in our distributed 
> cache which causes the above exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (MAHOUT-1408) Distributed cache file matching bug while running SSVD in broadcast mode

2014-01-23 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov reassigned MAHOUT-1408:


Assignee: Dmitriy Lyubimov

> Distributed cache file matching bug while running SSVD in broadcast mode
> 
>
> Key: MAHOUT-1408
> URL: https://issues.apache.org/jira/browse/MAHOUT-1408
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.8
>Reporter: Angad Singh
>Assignee: Dmitriy Lyubimov
>Priority: Minor
> Attachments: BtJob.java.patch
>
>
> The error is:
> java.lang.IllegalArgumentException: Unexpected file name, unable to deduce 
> partition 
> #:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
>   at java.util.Arrays.mergeSort(Arrays.java:1270)
>   at java.util.Arrays.mergeSort(Arrays.java:1281)
>   at java.util.Arrays.mergeSort(Arrays.java:1281)
>   at java.util.Arrays.sort(Arrays.java:1210)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.(SequenceFileDirValueIterator.java:94)
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
> The bug is @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
>  near line 220.
> and  @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
>  near line 144.
> SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache 
> will have a particular pattern whereas we have jar files in our distributed 
> cache which causes the above exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Berlin Buzzwords 2014: CfP is open

2014-01-23 Thread Isabel Drost-Fromm
I'm super happy to announce that the call for submissions for Berlin
Buzzwords 2013 is open. For those who don't know the conference - in
my "absolutely objective opinion" the event is the most exciting
conference on storing, processing and searching large amounts of
digital data for engineers.

The 5th edition of Berlin Buzzwords will take place on May 25-28,
2014 at Kulturbrauerei Berlin.

Berlin Buzzwords is looking for speakers who submit talks on the
following topics:

* Information Retrieval / Search i.e. Lucene, Solr, katta, ElasticSearch or
comparable solutions

* NoSQL and SQL i.e. CouchDB, MongoDB, Jackrabbit, Hbase and others

* Large Data Processing i.e. Hadoop itself, MapReduce, Cascading, Pig,
Spark and friends

Closely related topics not explicity listed above are welcome as well.

The Call for Submissions will be open until February 9! Be part of
Berlin Buzzwords and submit your session idea. Please register here:
.

Looking forward to lots of interesting proposals - and looking forward to
meeting all of you in Berlin later this year (did I mention that Berlin
rocks in summer?)


Isabel

PS: As always, any help with spreading the word is highly welcome.

PS2: One final hint - even though speakers of course get a complimentary
conference pass make sure to still check out our ticket page in
particular if you'd like to bring your children to the conference - we
do provide child day care on a donation basis but need your registration
for capacity planning: http://berlinbuzzwords.de/tickets



RE: MAHOUT 0.9 Release - New URL

2014-01-23 Thread Andrew Palumbo
a),b),c),d) all passed on CentOS for me

> Date: Thu, 23 Jan 2014 13:43:06 +0200
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: ssvinarc...@hortonworks.com
> To: dev@mahout.apache.org
> 
> I did a), b), c), d) and all steps pass.
> +1
> 
> 
> On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll wrote:
> 
> > +1 from me.
> >
> > On Jan 22, 2014, at 5:55 PM, Suneel Marthi 
> > wrote:
> >
> > > Fixed the issues that were reported this week and restored FP mining
> > into the codebase.
> > >
> > > Here's the URL for the final release in staging:-
> > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > >
> > > The artifacts have been signed with the following key:
> > > https://people.apache.org/keys/committer/smarthi.asc
> > >
> > >
> > > a) Verify that u can unpack the release (tar or zip)
> > > b) Verify u r able to compile the distro
> > > c)  Run through the unit tests: mvn clean test
> > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> > through all the different options in each script.
> > >
> > > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> > finalized.
> >
> > 
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
> >
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.
  

[jira] [Updated] (MAHOUT-1408) Distributed cache file matching bug while running SSVD in broadcast mode

2014-01-23 Thread Angad Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angad Singh updated MAHOUT-1408:


Attachment: BtJob.java.patch

> Distributed cache file matching bug while running SSVD in broadcast mode
> 
>
> Key: MAHOUT-1408
> URL: https://issues.apache.org/jira/browse/MAHOUT-1408
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.8
>Reporter: Angad Singh
>Priority: Minor
> Attachments: BtJob.java.patch
>
>
> The error is:
> java.lang.IllegalArgumentException: Unexpected file name, unable to deduce 
> partition 
> #:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
>   at java.util.Arrays.mergeSort(Arrays.java:1270)
>   at java.util.Arrays.mergeSort(Arrays.java:1281)
>   at java.util.Arrays.mergeSort(Arrays.java:1281)
>   at java.util.Arrays.sort(Arrays.java:1210)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.(SequenceFileDirValueIterator.java:94)
>   at 
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
> The bug is @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
>  near line 220.
> and  @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
>  near line 144.
> SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache 
> will have a particular pattern whereas we have jar files in our distributed 
> cache which causes the above exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1408) Distributed cache file matching bug while running SSVD in broadcast mode

2014-01-23 Thread Angad Singh (JIRA)
Angad Singh created MAHOUT-1408:
---

 Summary: Distributed cache file matching bug while running SSVD in 
broadcast mode
 Key: MAHOUT-1408
 URL: https://issues.apache.org/jira/browse/MAHOUT-1408
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.8
Reporter: Angad Singh
Priority: Minor


The error is:
java.lang.IllegalArgumentException: Unexpected file name, unable to deduce 
partition 
#:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
at 
org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
at 
org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
at java.util.Arrays.mergeSort(Arrays.java:1270)
at java.util.Arrays.mergeSort(Arrays.java:1281)
at java.util.Arrays.mergeSort(Arrays.java:1281)
at java.util.Arrays.sort(Arrays.java:1210)
at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.(SequenceFileDirValueIterator.java:94)
at 
org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)


The bug is @ 
https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
 near line 220.

and  @ 
https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
 near line 144.

SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache 
will have a particular pattern whereas we have jar files in our distributed 
cache which causes the above exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-23 Thread Sergey Svinarchuk
I did a), b), c), d) and all steps pass.
+1


On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll wrote:

> +1 from me.
>
> On Jan 22, 2014, at 5:55 PM, Suneel Marthi 
> wrote:
>
> > Fixed the issues that were reported this week and restored FP mining
> into the codebase.
> >
> > Here's the URL for the final release in staging:-
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >
> > The artifacts have been signed with the following key:
> > https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c)  Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> finalized.
>
> 
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: MAHOUT 0.9 Release - New URL

2014-01-23 Thread Grant Ingersoll
+1 from me.

On Jan 22, 2014, at 5:55 PM, Suneel Marthi  wrote:

> Fixed the issues that were reported this week and restored FP mining into the 
> codebase.
> 
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> 
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
> 
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run 
> through all the different options in each script.
> 
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be 
> finalized.


Grant Ingersoll | @gsingers
http://www.lucidworks.com