[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052475#comment-16052475 ]
ASF GitHub Bot commented on DRILL-3867: --------------------------------------- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r122511871 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p } /** + * Serializer for ParquetPath. Writes the path relative to the root path + */ + private static class ParquetPathSerializer extends StdSerializer<ParquetPath> { + private final String rootPath; + + ParquetPathSerializer(String rootPath) { + super(ParquetPath.class); + this.rootPath = rootPath; + } + + @Override + public void serialize(ParquetPath parquetPath, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException, JsonGenerationException { + Preconditions.checkState(parquetPath.getFullPath().startsWith(rootPath), String.format("Path %s is not a subpath of %s", parquetPath.getFullPath(), rootPath)); + String relativePath = parquetPath.getFullPath().replaceFirst(rootPath, ""); --- End diff -- Hadoop Path doesn't provide similar way. But it is possible to use relativize() method from `Uri`. Anyway in the new approach in the `Metadata.createMetaFilesRecursively()` I've implemented recursive collecting of inner subdirectories's names to construct relative path for every file and directory. > Store relative paths in metadata file > ------------------------------------- > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.2.0 > Reporter: Rahul Challapalli > Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +-------+-------------------------------------------------------------------------------------+ > | ok | summary > | > +-------+-------------------------------------------------------------------------------------+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +-------+-------------------------------------------------------------------------------------+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)