[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

ASF GitHub Bot (JIRA) Fri, 16 Jun 2017 15:55:59 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052473#comment-16052473
 ]


ASF GitHub Bot commented on DRILL-3867:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/824#discussion_r122515967
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -680,7 +731,7 @@ private boolean tableModified(List<String> directories, 
Path metaFilePath,
       }
     
       public static abstract class ParquetFileMetadata {
    -    @JsonIgnore public abstract String getPath();
    +    @JsonIgnore public abstract ParquetPath getParquetPath();
    --- End diff --
    
    The structure of metadata cache file isn't changed and deserializing works 
properly for new relative paths and for old absolute ones (`new Path(parent, 
child)` in `deserialize()` method). 
    
    In the new approach after deserializing list of paths are checked and 
updated from relative paths to absolute ones.
    Leaving relative paths in metadata may cause to repeated converting of the 
paths and checking in a lot of places the kind of path.
    If old meta cache file is deserialized with absolute paths, nothing is made 
with them and an old mechanism works.


> Store relative paths in metadata file
> -------------------------------------
>
>                 Key: DRILL-3867
>                 URL: https://issues.apache.org/jira/browse/DRILL-3867
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>             Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +-------+-------------------------------------------------------------------------------------+
> |  ok   |                                       summary                       
>                 |
> +-------+-------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +-------+-------------------------------------------------------------------------------------+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

Reply via email to