[ https://issues.apache.org/jira/browse/MAHOUT-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677185#comment-13677185 ]
Grant Ingersoll commented on MAHOUT-992: ---------------------------------------- [~ssc] or [~robin.a...@gmail.com] I see this in several places: {code} Path[] files = DistributedCache.getLocalCacheFiles(conf); if (files == null) { throw new IOException("Cannot read Frequency list from Distributed Cache"); } if (files.length != 1) { throw new IOException("Cannot read Frequency list from Distributed Cache (" + files.length + ')'); } FileSystem fs = FileSystem.getLocal(conf); Path fListLocalPath = fs.makeQualified(files[0]); // Fallback if we are running locally. if (!fs.exists(fListLocalPath)) { URI[] filesURIs = DistributedCache.getCacheFiles(conf); if (filesURIs == null) { throw new IOException("Cannot read Frequency list from Distributed Cache"); } if (filesURIs.length != 1) { throw new IOException("Cannot read Frequency list from Distributed Cache (" + files.length + ')'); } fListLocalPath = new Path(filesURIs[0].getPath()); } {code} I don't really follow the "Fallback if running locally" comment. The first part of the code is looking in the local file system. Doesn't (or shouldn't?) Hadoop handle this seamlessly? > Audit DistributedCache use to support EMR > ----------------------------------------- > > Key: MAHOUT-992 > URL: https://issues.apache.org/jira/browse/MAHOUT-992 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.6 > Reporter: tom pierce > Assignee: Grant Ingersoll > Priority: Minor > Labels: newbie > Fix For: 0.8 > > > Apparently some of our DistributedCache use is not EMR-safe. It would be > great if someone could audit our uses of DC, and fix up this problem where it > exists. > For an example of problematic usage (and the fix), see MAHOUT-980. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira