[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053336#comment-16053336 ]
ASF GitHub Bot commented on DRILL-3867: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r122602595 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 getParquetTableMetadata(List<FileStatus> fileSta /** * Get a list of file metadata for a list of parquet files * - * @param fileStatuses - * @return + * @param parquetTableMetadata_v3 can store column schema info from all the files and row groups + * @param fileStatuses list of the parquet files statuses + * @param absolutePathInMetadata true if result metadata files should contain absolute paths, false for relative paths. + * Relative paths in the metadata are only necessary while creating meta cache files. + * @return list of the parquet file metadata (parquet metadata for every file) * @throws IOException */ - private List<ParquetFileMetadata_v3> getParquetFileMetadata_v3( - ParquetTableMetadata_v3 parquetTableMetadata_v3, List<FileStatus> fileStatuses) throws IOException { + private List<ParquetFileMetadata_v3> getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3, + List<FileStatus> fileStatuses, boolean absolutePathInMetadata) throws IOException { --- End diff -- Is this really needed? Or, is it an attempt to answer my earlier concern about compatibility? Only newer Drill instances will create metadata. If we want relative paths, then we should always use relative paths. No need to pass along a flag. On the other hand, if we are saying that the root call is absolute (as seen in the code earlier), but subdirectories are relative, then doesn't the presence of even one absolute directory name make the whole feature invalid? Perhaps some more background explanation in the PR comments (or even a design spec) might shed some light on what we are trying to accomplish here. Very hard to simply reverse engineer a design from code changes... Also, below, we have a method to convert relative paths to absolute in bulk. Should we do the same here? Always gather data in absolute form, then convert it to relative just before serializing? I wasn't sure why we are converting paths from relative to absolute. If we are doing that because we use absolute paths internally, then it is OK to gather absolute paths here. Convert the to relative just before writing if that is easier. Here, I'm referring to the note about the "proposed alternative solution". > Store relative paths in metadata file > ------------------------------------- > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.2.0 > Reporter: Rahul Challapalli > Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +-------+-------------------------------------------------------------------------------------+ > | ok | summary > | > +-------+-------------------------------------------------------------------------------------+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +-------+-------------------------------------------------------------------------------------+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)