[ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246439#comment-14246439
 ] 

Dong Chen commented on HIVE-860:
--------------------------------

It is a little strange for these failed cases. They work fine in local env. 

After analyzing the test results, most of them failed at 
{{LocalDistributedCacheManager.setup()}} and deeply at {{FSDownload.unpack()}} 
or {{FSDownload.changePermissions()}}. Then I uploaded a temp patch with 
logging the properties {{mapreduce.job.cache.archives}} in conf for debugging. 
The value is the URIs of the cache jars and they look correct in test logs. 
However, the test result complained incorrect jar URI or missing class file in 
unpacked jar.

So I think this patch might be ok. It worked in local test.
The jar files might be messed up between test cases in Jenkins CI. Since I am 
not sure how to check the env in Jenkins, this is a guess...

Any thought?

> Persistent distributed cache
> ----------------------------
>
>                 Key: HIVE-860
>                 URL: https://issues.apache.org/jira/browse/HIVE-860
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.12.0
>            Reporter: Zheng Shao
>            Assignee: Dong Chen
>             Fix For: 0.15.0
>
>         Attachments: HIVE-860-debug.4.patch, HIVE-860.1.patch, 
> HIVE-860.2.patch, HIVE-860.2.patch, HIVE-860.3.patch, HIVE-860.4.patch, 
> HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.4.patch, 
> HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
> HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
> HIVE-860.patch, HIVE-860.patch, HIVE-860.patch
>
>
> DistributedCache is shared across multiple jobs, if the hdfs file name is the 
> same.
> We need to make sure Hive put the same file into the same location every time 
> and do not overwrite if the file content is the same.
> We can achieve 2 different results:
> A1. Files added with the same name, timestamp, and md5 in the same session 
> will have a single copy in distributed cache.
> A2. Filed added with the same name, timestamp, and md5 will have a single 
> copy in distributed cache.
> A2 has a bigger benefit in sharing but may raise a question on when Hive 
> should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to