[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

ASF GitHub Bot (JIRA) Wed, 21 Jun 2017 12:37:22 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058074#comment-16058074
 ]


ASF GitHub Bot commented on DRILL-3867:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/824#discussion_r123329475
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 
getParquetTableMetadata(List<FileStatus> fileSta
       /**
        * Get a list of file metadata for a list of parquet files
        *
    -   * @param fileStatuses
    -   * @return
    +   * @param parquetTableMetadata_v3 can store column schema info from all 
the files and row groups
    +   * @param fileStatuses list of the parquet files statuses
    +   * @param absolutePathInMetadata true if result metadata files should 
contain absolute paths, false for relative paths.
    +   *                               Relative paths in the metadata are only 
necessary while creating meta cache files.
    +   * @return list of the parquet file metadata (parquet metadata for every 
file)
        * @throws IOException
        */
    -  private List<ParquetFileMetadata_v3> getParquetFileMetadata_v3(
    -      ParquetTableMetadata_v3 parquetTableMetadata_v3, List<FileStatus> 
fileStatuses) throws IOException {
    +  private List<ParquetFileMetadata_v3> 
getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3,
    +      List<FileStatus> fileStatuses, boolean absolutePathInMetadata) 
throws IOException {
    --- End diff --
    
    Using of boolean flag is deleted.
    
    For now we create and gather metadata only with absolute paths. But before 
writing based on the old metadata the new metadata with relative paths is 
created.
    
    Agree. It makes sense to check every path while converting it. Done.


> Store relative paths in metadata file
> -------------------------------------
>
>                 Key: DRILL-3867
>                 URL: https://issues.apache.org/jira/browse/DRILL-3867
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>             Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +-------+-------------------------------------------------------------------------------------+
> |  ok   |                                       summary                       
>                 |
> +-------+-------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +-------+-------------------------------------------------------------------------------------+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

Reply via email to