[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-19 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940867#comment-13940867
 ] 

Aniket Mokashi commented on PIG-3815:
-

Thanks [~cheolsoo], I committed it to trunk.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-19 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940600#comment-13940600
 ] 

Cheolsoo Park commented on PIG-3815:


Ha, that looks a lot better to me. +1.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940196#comment-13940196
 ] 

Aniket Mokashi commented on PIG-3815:
-

I just realized that there is a better way to refactor this code. Can someone 
review the patch attached?

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939825#comment-13939825
 ] 

Aniket Mokashi commented on PIG-3815:
-

I have committed PIG-3815-2.patch to trunk! Thanks everyone for your comments.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939815#comment-13939815
 ] 

Rohini Palaniswamy commented on PIG-3815:
-

[~julienledem],
It is being qualified only to be used in addCacheFile() which sets the 
mapred.cache.files which is required. conf.set("mapred.job.classpath.files") 
uses just the file path after removing scheme and port.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939749#comment-13939749
 ] 

Julien Le Dem commented on PIG-3815:


[~rohini] in the code you quoted, don't you think it is putting the port back 
in the following line?
{noformat}
URI uri = fs.makeQualified(file).toUri();
{noformat}

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939750#comment-13939750
 ] 

Rohini Palaniswamy commented on PIG-3815:
-

Yes. It has been fixed 3 years ago. I am not sure what version of hadoop you 
are using and hitting this issue. But since we still support 0.20 as well there 
is no harm in doing .toUri().getPath() in pig as well. 

+1. Since the issue is not with hadoop 1.0, please update your comment when 
checking in this patch from  "// PIG-3815 In hadoop 1.0, addFileToClassPath 
uses : as separator" to say hadoop 0.20. 

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939728#comment-13939728
 ] 

Aniket Mokashi commented on PIG-3815:
-

Thanks for your comments, [~rohini]. I was not aware of limitations on the HDFS 
streams, I have attached a patch (PIG-3815-2.patch) to fix those problems.

Hadoop Jira: https://issues.apache.org/jira/browse/MAPREDUCE-2361. Looks like 
this was fixed here - 
http://svn.apache.org/viewvc?view=revision&revision=1077790. 



> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939618#comment-13939618
 ] 

Aniket Mokashi commented on PIG-3815:
-

Thanks for the review, [~julienledem] and [~cheolsoo]. I have attached revised 
patch and committed it to trunk!

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939606#comment-13939606
 ] 

Julien Le Dem commented on PIG-3815:


same comment as 1. from Cheolsoo
otherwise, this looks good to me.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939354#comment-13939354
 ] 

Cheolsoo Park commented on PIG-3815:


# Can you delete this? It's unused.
{code}
+import org.codehaus.plexus.util.IOUtil;
{code}
# Do you mind fixing JobControlCompiler.java#L1700 too? Looks like we can use 
IOUtils.closeQuietly() here too.
{code}
OutputStream os = fs.create(dst);
try {
IOUtils.copyBytes(url.openStream(), os, 4096, true);
} finally {
// IOUtils can not close both the input and the output properly in 
a finally
// as we can get an exception in between opening the stream and 
calling the method
os.close();
}
{code}

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)