[ 
https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734981#comment-14734981
 ] 

Kai Hui edited comment on MAHOUT-1408 at 9/8/15 3:22 PM:
---------------------------------------------------------

I am also experiencing the same problem at this moment.. What I did is simiply 
call the toolrunner.run(SSVDCli, args) within my class.. The reason to do that 
is due to I cant install the mahout on the server due to the edition of the 
maven and I have no sudo ... 

And I received exactly the same error, which I guess is due to there exist tmp 
files in distributed cash for follow-up pipeline in ssvd solver... However, 
there exist other files, like jars in the distributed cash, and all these other 
files cant pass thru the pattern checking, i.e., not following the specific 
patter requried in SSVDHelper$1.compare(SSVDHelper.java:152). And the whole job 
failed... I would suppose a file name filter before the pattern matcher would 
do the job...

For the future reference, I wrote down my solution as follows and it worked for 
me: add the jar files to the class path inside your java code with 
addArchiveToClassPath instead of using -libjars in the command line of hadoop 
jar ...:

FileSystem fs = FileSystem.get(conf);
String libdir = "the hdfs uri for your jar, like /user/<username>/libjars"
FileStatus[] status_list = fs.listStatus(new Path(libdir));
if (status_list != null) {
            for (FileStatus status : status_list) {
                String fname = status.getPath().getName();
                if (fname.endsWith("jar")) {
                    Path path2add = new Path(libdir + "/" + fname);
                    DistributedCache.addArchiveToClassPath(path2add, conf, fs);
                }
            }
        }



was (Author: lhyan792):
I am also experiencing the same problem at this moment.. What I did is simiply 
call the toolrunner.run(SSVDCli, args) within my class.. The reason to do that 
is due to I cant install the mahout on the server due to the edition of the 
maven and I have no sudo ... 

And I received exactly the same error, which I guess is due to there exist tmp 
files in distributed cash for follow-up pipeline in ssvd solver... However, 
there exist other files, like jars in the distributed cash, and all these other 
files cant pass thru the pattern checking, i.e., not following the specific 
patter of the file name {filename}-{p}-{number} requried in 
SSVDHelper$1.compare(SSVDHelper.java:152). And the whole job failed... I would 
suppose a file name filter before the pattern matcher would do the job...

For the future reference, I wrote down my solution as follows and it worked for 
me: add the jar files to the class path inside your java code with 
addArchiveToClassPath instead of using -libjars in the command line of hadoop 
jar ...:

FileSystem fs = FileSystem.get(conf);
String libdir = "the hdfs uri for your jar, like /user/<username>/libjars"
FileStatus[] status_list = fs.listStatus(new Path(libdir));
if (status_list != null) {
            for (FileStatus status : status_list) {
                String fname = status.getPath().getName();
                if (fname.endsWith("jar")) {
                    Path path2add = new Path(libdir + "/" + fname);
                    DistributedCache.addArchiveToClassPath(path2add, conf, fs);
                }
            }
        }


> Distributed cache file matching bug while running SSVD in broadcast mode
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1408
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1408
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.8
>            Reporter: Angad Singh
>            Assignee: Dmitriy Lyubimov
>            Priority: Minor
>             Fix For: 0.10.0
>
>         Attachments: BtJob.java.patch
>
>
> The error is:
> java.lang.IllegalArgumentException: Unexpected file name, unable to deduce 
> partition 
> #:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
>       at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
>       at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
>       at java.util.Arrays.mergeSort(Arrays.java:1270)
>       at java.util.Arrays.mergeSort(Arrays.java:1281)
>       at java.util.Arrays.mergeSort(Arrays.java:1281)
>       at java.util.Arrays.sort(Arrays.java:1210)
>       at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
>       at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:94)
>       at 
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>       at org.apache.hadoop.mapred.Child.main(Child.java:260)
> The bug is @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
>  near line 220.
> and  @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
>  near line 144.
> SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache 
> will have a particular pattern whereas we have jar files in our distributed 
> cache which causes the above exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to