[
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086673#comment-16086673
]
ASF GitHub Bot commented on DRILL-5660:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/877#discussion_r127364840
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java
---
@@ -1851,9 +1860,81 @@ private static String relativize(String baseDir,
String childPath) {
.relativize(fullPathWithoutSchemeAndAuthority.toUri()));
if (relativeFilePath.isAbsolute()) {
throw new IllegalStateException(String.format("Path %s is not a
subpath of %s.",
- basePathWithoutSchemeAndAuthority.toUri().toString(),
fullPathWithoutSchemeAndAuthority.toUri().toString()));
+ basePathWithoutSchemeAndAuthority.toUri().getPath(),
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
+ }
+ return relativeFilePath.toUri().getPath();
+ }
+ }
+
+ /**
+ * Used to identify metadata version by the deserialization
"metadata_version" first property
+ * from the metadata cache file
+ */
+ public static class MetadataVersion {
+ @JsonProperty("metadata_version")
+ public String textVersion;
+
+ /**
+ * Supported metadata versions.
+ * Note: keep them synchronized with {@link ParquetTableMetadataBase}
versions
+ */
+ enum Versions {
+ v1(Constants.V1),
+ v2(Constants.V2),
+ v3(Constants.V3),
+ v3_1(Constants.V3_1);
+
+ private final String version;
+
+ Versions(String version) {
+ this.version = version;
+ }
+
+ public String getVersion() {
+ return version;
+ }
+
+ public static Versions fromString(String version) {
+ for (Versions v : Versions.values()) {
+ if (v.version.equalsIgnoreCase(version)) {
+ return v;
+ }
+ }
+ return null;
+ }
+
+ public static class Constants {
+ public static final String V1 = "v1";
+ public static final String V2 = "v2";
+ public static final String V3 = "v3";
+ public static final String V3_1 = "v3_1";
+ }
+ }
+
+ /**
--- End diff --
One very handy thing to do for each version constant is to list what
changed, possibly including the JIRA number for more information:
```
/**
* Version 3.1: Changes the xyz property.
* File names stored as relative paths.
* See DRILL-1234.
*/
```
> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> ----------------------------------------------------------------------------
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Vitalii Diravka
> Fix For: 1.11.0
>
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store
> relative paths. All Drill servers after that PR create files with relative
> paths. But, the version number of the file is unchanged, so that older
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and
> right. Drill will resolve the paths, but does so relative to the user's HDFS
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata
> file version number so that older Drillbits can’t read the file. This ticket
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a
> user upgrades Drill, they won't use an old Drillbit. But, things are not that
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in
> which metadata files have been created by a post-DRILL-3867 build. (This has
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll
> back to Drill 1.10. Doing so will cause queries to fail due to
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted"
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that
> is not the issue here.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)