[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053334#comment-16053334
 ] 

ASF GitHub Bot commented on DRILL-3867:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/824#discussion_r122602455
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -179,10 +182,18 @@ private Metadata(FileSystem fs, ParquetFormatConfig 
formatConfig) {
     
         for (final FileStatus file : fs.listStatus(p, new DrillPathFilter())) {
           if (file.isDirectory()) {
    +        String subdirectoryName = file.getPath().getName();
             ParquetTableMetadata_v3 subTableMetadata = 
(createMetaFilesRecursively(file.getPath().toString())).getLeft();
    -        metaDataList.addAll(subTableMetadata.files);
    -        directoryList.addAll(subTableMetadata.directories);
    -        directoryList.add(file.getPath().toString());
    +        for (ParquetFileMetadata_v3 pfm_v3 : subTableMetadata.files) {
    +          // Construction of the relative file path by adding subdirectory 
name and inner relative file path
    +          String relativePath = Joiner.on("/").join(subdirectoryName, 
pfm_v3.getPath());
    --- End diff --
    
    `Path.mergePaths()`?
    
    We really don't want to work with paths as strings: such code is hard to 
test and maintain.
    
    If we need new Path operations (such as merging relative paths), I suggest 
we create a `PathUtils` class to hold the operations. Then, create unit tests 
to check all the various conditions: empty head, empty tail, neither empty, etc.
    
    Also, in general, we would work with path names as `Path` objects: the job 
of the `Path` class is do properly implement file path operations, just as the 
job of the older `File` and newer `Path` classes in Java is to handle OS paths.


> Store relative paths in metadata file
> -------------------------------------
>
>                 Key: DRILL-3867
>                 URL: https://issues.apache.org/jira/browse/DRILL-3867
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>             Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +-------+-------------------------------------------------------------------------------------+
> |  ok   |                                       summary                       
>                 |
> +-------+-------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +-------+-------------------------------------------------------------------------------------+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to