[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095896#comment-16095896
 ] 

ASF GitHub Bot commented on DRILL-5660:
---------------------------------------

Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/877#discussion_r128698529
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -1851,9 +1922,73 @@ private static String relativize(String baseDir, 
String childPath) {
               .relativize(fullPathWithoutSchemeAndAuthority.toUri()));
           if (relativeFilePath.isAbsolute()) {
             throw new IllegalStateException(String.format("Path %s is not a 
subpath of %s.",
    -            basePathWithoutSchemeAndAuthority.toUri().toString(), 
fullPathWithoutSchemeAndAuthority.toUri().toString()));
    +            basePathWithoutSchemeAndAuthority.toUri().getPath(), 
fullPathWithoutSchemeAndAuthority.toUri().getPath()));
    +      }
    +      return relativeFilePath.toUri().getPath();
    +    }
    +  }
    +
    +  /**
    +   * Supported metadata versions.
    +   * <p>
    +   * Note: keep them synchronized with {@link ParquetTableMetadataBase} 
versions
    +   */
    +  public static class MetadataVersion {
    +
    +    /**
    +     * Version 1: Introduces parquet file metadata caching.<br>
    +     * See DRILL-2743
    +     */
    +    public static final String V1 = "v1";
    +    /**
    +     * Version 2: Metadata cache file size is reduced.<br>
    +     * See DRILL-4053
    +     */
    +    public static final String V2 = "v2";
    +    /**
    +     * Version 3: Difference between v3 and v2 : min/max, type_length, 
precision, scale, repetitionLevel, definitionLevel.<br>
    +     * Filter pushdown for Parquet is implemented. <br>
    +     * See DRILL-1950
    +     */
    +    public static final String V3 = "v3";
    +    /**
    +     * Version 3.1: Absolute paths of files and directories are replaced 
with relative ones.<br>
    +     * See DRILL-3867
    +     */
    +    public static final String V3_1 = "v3.1";
    +
    +
    +    /**
    +     * All historical versions of the Drill metadata cache files
    +     */
    +    public static final List<String> SUPPORTED_VERSIONS = 
Lists.newArrayList(V1, V2, V3, V3_1);
    +
    +    /**
    +     * @param metadataVersion parquet metadata version
    +     * @return true if metadata version is supported, false otherwise
    +     */
    +    public static boolean isVersionSupported(String metadataVersion) {
    +      return SUPPORTED_VERSIONS.contains(metadataVersion);
    +    }
    +
    +    /**
    +     * Helper compare method similar to {@link 
java.util.Comparator#compare}
    +     *
    +     * @param metadataVersion1 the first metadata version to be compared
    +     * @param metadataVersion2 the second metadata version to be compared
    +     * @return a negative integer, zero, or a positive integer as the
    +     *         first argument is less than, equal to, or greater than the
    +     *         second.
    +     */
    +    public static int compare(String metadataVersion1, String 
metadataVersion2) {
    +      if (isVersionSupported(metadataVersion1) && 
isVersionSupported(metadataVersion2)) {
    +        return 
Integer.compare(SUPPORTED_VERSIONS.indexOf(metadataVersion1), 
SUPPORTED_VERSIONS.indexOf(metadataVersion2));
    +      } else {
    +        // this is never reached
    +        throw UserException.validationError()
    --- End diff --
    
    Replace please with `DrillRuntimeException`, it's not user's fault if we 
try to compare with unsupported version.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5660
>                 URL: https://issues.apache.org/jira/browse/DRILL-5660
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Vitalii Diravka
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to