[ 
https://issues.apache.org/jira/browse/DRILL-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037480#comment-15037480
 ] 

Parth Chandra commented on DRILL-4154:
--------------------------------------

[~rkins] After many hours of trying to reproduce this, the only way I am able 
to get the metadata cache file to look like in 'broken-cache.txt' is if the 
metadata cache file gets created without the migration tool having been run on 
the parquet files. The data files you attached do not have the appropriate 
version number and in that case the parquet code prevents us from reading the 
stats for binary columns. 
There is an issue with the migration tool in that, at least on a local file 
system, the timestamp of the directory does not get updated after the parquet 
files are updated. This should be fixed. (Note I have yet to try this on a dfs).

For the second issue, it is likely that when you copied the cache file, the 
directory timestamp was also updated. I have seen sometimes, that in such a 
case the timestamp of the directory may be a few microseconds newer than the 
timestamp of the copied cache file. In this case we think the cache file is 
stale and recreate it. This behaviour is safe. Also this situation is unlikely 
to occur as copying metadata cache files is not likely to happen.

> Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some 
> scenarios
> -------------------------------------------------------------------------------------
>
>                 Key: DRILL-4154
>                 URL: https://issues.apache.org/jira/browse/DRILL-4154
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: broken-cache.txt, fewtypes_varcharpartition.tar.tgz, 
> old-cache.txt
>
>
> git.commit.id.abbrev=46c47a2
> I copied the data along with the cache file onto maprfs. Now I ran the 
> upgrade tool (https://github.com/parthchandra/drill-upgrade). Now I ran the 
> metadata_caching suite from the functional tests (concurrency 10) without the 
> datagen phase. I see 3 test failures and when I looked at the cache file it 
> seems to be containing wrong information for the varchar column. 
> Sample from the cache :
> {code}
>       {
>         "name" : [ "varchar_col" ]
>       }, {
>         "name" : [ "float_col" ],
>         "mxValue" : 68797.22,
>         "nulls" : 0
>       }
> {code}
> Now I followed the same steps and instead of running the suites I executed 
> the "REFRESH TABLE METADATA" command or any query on that folder,  the cache 
> file seems to be created properly
> I attached the data and cache files required. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to