[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095004#comment-16095004
 ] 

ASF GitHub Bot commented on DRILL-5660:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/877#discussion_r128502337
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -1851,9 +1860,81 @@ private static String relativize(String baseDir, 
String childPath) {
               .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
           if (relativeFilePath.isAbsolute()) {
             throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
    -            basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
    +            basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
    +      }
    +      return relativeFilePath.toUri().getPath();
    +    }
    +  }
    +
    +  /**
    +   * Used to identify metadata version by the deserialization 
"metadata_version" first property
    +   * from the metadata cache file
    +   */
    +  public static class MetadataVersion {
    +    @JsonProperty("metadata_version")
    +    public String textVersion;
    +
    +    /**
    +     * Supported metadata versions.
    +     * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
    +     */
    +    enum Versions {
    +      v1(Constants.V1),
    +      v2(Constants.V2),
    +      v3(Constants.V3),
    +      v3_1(Constants.V3_1);
    +
    +      private final String version;
    +
    +      Versions(String version) {
    +        this.version = version;
    +      }
    +
    +      public String getVersion() {
    +        return version;
    +      }
    +
    +      public static Versions fromString(String version) {
    +        for (Versions v : Versions.values()) {
    +          if (v.version.equalsIgnoreCase(version)) {
    +            return v;
    +          }
    +        }
    +        return null;
    +      }
    +
    +      public static class Constants {
    +        public static final String V1 = "v1";
    +        public static final String V2 = "v2";
    +        public static final String V3 = "v3";
    +        public static final String V3_1 = "v3_1";
    +      }
    +    }
    +
    +    /**
    +     * @param fs current file system
    +     * @param path of metadata cache file
    +     * @return true if metadata version is supported, false otherwise
    +     * @throws IOException if parquet metadata can't be deserialized from 
the json file
    +     */
    +    public static boolean isVersionSupported(FileSystem fs, Path path) 
throws IOException {
    --- End diff --
    
    That new deserialization persistence class for reading `metadata` version  
is removed. 
    For now we try to deserialize the `metadata` file and in case of getting 
any inheritor of `JsonProcessingException` ( for example `JsonMappingException` 
or `JsonParseException`) the `metadata` will be null and will be ignored (with 
appropriate logging). To avoid of reading such corrupted or unsupported file 
again that status is stored in `metadata context`. 


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5660
>                 URL: https://issues.apache.org/jira/browse/DRILL-5660
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to