[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file

paul-rogers Sun, 18 Jun 2017 15:55:11 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/824#discussion_r122602595
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 
getParquetTableMetadata(List<FileStatus> fileSta
       /**
        * Get a list of file metadata for a list of parquet files
        *
    -   * @param fileStatuses
    -   * @return
    +   * @param parquetTableMetadata_v3 can store column schema info from all 
the files and row groups
    +   * @param fileStatuses list of the parquet files statuses
    +   * @param absolutePathInMetadata true if result metadata files should 
contain absolute paths, false for relative paths.
    +   *                               Relative paths in the metadata are only 
necessary while creating meta cache files.
    +   * @return list of the parquet file metadata (parquet metadata for every 
file)
        * @throws IOException
        */
    -  private List<ParquetFileMetadata_v3> getParquetFileMetadata_v3(
    -      ParquetTableMetadata_v3 parquetTableMetadata_v3, List<FileStatus> 
fileStatuses) throws IOException {
    +  private List<ParquetFileMetadata_v3> 
getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3,
    +      List<FileStatus> fileStatuses, boolean absolutePathInMetadata) 
throws IOException {
    --- End diff --
    
    Is this really needed? Or, is it an attempt to answer my earlier concern 
about compatibility?
    
    Only newer Drill instances will create metadata. If we want relative paths, 
then we should always use relative paths. No need to pass along a flag.
    
    On the other hand, if we are saying that the root call is absolute (as seen 
in the code earlier), but subdirectories are relative, then doesn't the 
presence of even one absolute directory name make the whole feature invalid?
    
    Perhaps some more background explanation in the PR comments (or even a 
design spec) might shed some light on what we are trying to accomplish here. 
Very hard to simply reverse engineer a design from code changes...
    
    Also, below, we have a method to convert relative paths to absolute in 
bulk. Should we do the same here? Always gather data in absolute form, then 
convert it to relative just before serializing?
    
    I wasn't sure why we are converting paths from relative to absolute. If we 
are doing that because we use absolute paths internally, then it is OK to 
gather absolute paths here. Convert the to relative just before writing if that 
is easier.
    
    Here, I'm referring to the note about the "proposed alternative solution".



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file

Reply via email to