[
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy reopened PIG-3815:
-------------------------------------
Actually I see some issue with this patch. Reopening jira.
1) Changing os.close() to IOUtils.closeQuietly(os); is not good. You can
close the input quietly, but not output especially HDFS outputstream. HDFS can
create empty files without data which can be accessed through NN fine if
os.close() failed. We have been bitten by this a lot of time. In internal
projects, we delete the file and retry if os.close() failed. So please let the
pig script fail if os.close() failed rather than causing unexpected behavior.
2) addFileToClassPath is already doing file.toUri().getPath(). I don't see
where the hadoop bug is coming from.
http://svn.apache.org/viewvc/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/filecache/DistributedCache.java?revision=1206848&view=markup
{code}
public static void addFileToClassPath
(Path file, Configuration conf, FileSystem fs)
throws IOException {
String filepath = file.toUri().getPath();
String classpath = conf.get("mapred.job.classpath.files");
conf.set("mapred.job.classpath.files", classpath == null
? filepath
: classpath + System.getProperty("path.separator") + filepath);
URI uri = fs.makeQualified(file).toUri();
addCacheFile(uri, conf);
}
{code}
> Hadoop bug causes to pig to fail silently with jar cache
> --------------------------------------------------------
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.13.0
> Reporter: Aniket Mokashi
> Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on
> distributed cache configuration. This uses : to separate list of files to be
> put of classpath via distributed cache. If fs.default.name has port number in
> it, it causes the tokenization logic to fail in hadoop for retrieving list of
> cache filenames in backend.
--
This message was sent by Atlassian JIRA
(v6.2#6252)