[
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103595#comment-16103595
]
ASF GitHub Bot commented on DRILL-5660:
---------------------------------------
Github user vdiravka commented on a diff in the pull request:
https://github.com/apache/drill/pull/877#discussion_r129834750
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/MetadataVersion.java
---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ComparisonChain;
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+class MetadataVersion implements Comparable<MetadataVersion> {
+ /**
+ * String version starts from 'v' letter<p>
+ * First group is major metadata version (any number of digits, except a
single zero digit)<p>
+ * Next character is optional '.' (if minor version is specified)<p>
+ * Next group is optional, minor metadata version (any number of digits,
except a single zero digit)<p>
+ * Examples of correct metadata versions: v1, v10, v4.13
+ */
+ private static final String FORMAT = "v((?!0)\\d+)\\.?((?!0)\\d+)?";
+ private static final Pattern PATTERN = Pattern.compile(FORMAT);
+
+ private final int major;
+ private final int minor;
+
+ public MetadataVersion(int major, int minor) {
+ this.major = major;
+ this.minor = minor;
+ }
+
+ public MetadataVersion(String metadataVersion) {
+ Matcher matcher = PATTERN.matcher(metadataVersion);
+ if (!matcher.matches()) {
+ DrillRuntimeException.format("Could not parse metadata version '%s'
using format '%s'", metadataVersion, FORMAT);
+ }
+ this.major = Integer.parseInt(matcher.group(1));
+ this.minor = matcher.group(2) != null ?
Integer.parseInt(matcher.group(2)) : 0;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) return true;
+ if (!(o instanceof MetadataVersion)) {
+ return false;
+ }
+ MetadataVersion that = (MetadataVersion) o;
+ return this.major == that.major
+ && this.minor == that.minor;
+ }
+
+ @Override
+ public int hashCode() {
+ int result = major;
+ result = 31 * result + minor;
+ return result;
+ }
+
+ @Override
+ public String toString() {
+ return minor == 0 ? String.format("v%s1", major) :
String.format("v%s1.%s2", major, minor);
+ }
+
+ @Override
+ public int compareTo(MetadataVersion o) {
+ Preconditions.checkNotNull(o);
+ return ComparisonChain.start()
+ .compare(this.major, o.major)
+ .compare(this.minor, o.minor)
+ .result();
+ }
+
+/**
+ * Supported metadata versions.
+ * <p>
+ * Note: keep them synchronized with {@link
Metadata.ParquetTableMetadataBase} versions
+ */
+ public static class Constants {
+ /**
+ * Version 1: Introduces parquet file metadata caching.<br>
+ * See DRILL-2743
+ */
+ public static final String V1 = "v1";
--- End diff --
Makes sense. Also started using an `ImmutableList` here.
> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> ----------------------------------------------------------------------------
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Vitalii Diravka
> Labels: doc-impacting
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store
> relative paths. All Drill servers after that PR create files with relative
> paths. But, the version number of the file is unchanged, so that older
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and
> right. Drill will resolve the paths, but does so relative to the user's HDFS
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata
> file version number so that older Drillbits can’t read the file. This ticket
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a
> user upgrades Drill, they won't use an old Drillbit. But, things are not that
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in
> which metadata files have been created by a post-DRILL-3867 build. (This has
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll
> back to Drill 1.10. Doing so will cause queries to fail due to
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted"
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that
> is not the issue here.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)