[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

ASF GitHub Bot (JIRA) Wed, 10 Jan 2018 06:06:30 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320290#comment-16320290
 ]


ASF GitHub Bot commented on DRILL-4185:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160685002
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
 ---
    @@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, 
FileStatus dir) throws IOExcep
         }
     
         boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
    -      Path p = new Path(dir.getPath(), 
ParquetFileWriter.PARQUET_METADATA_FILE);
           try {
    -        if (fs.exists(p)) {
    -          return true;
    -        } else {
    -
    -          if (metaDataFileExists(fs, dir)) {
    -            return true;
    -          }
    -          List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
    -          return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
    -        }
    +        // There should be at least one file, which is readable by Drill
    +        List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
    +        return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
    --- End diff --
    
    I did it on purpose. With the old logic of isDirReadable() method an empty 
directory, which contains parquet metadata files, will be processes with 
ParquetGroupScan as a Parquet Table. It leads to obtaining an exception:
    
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L878
    
    To process such table with SchemalessScan, isReadable method should return 
false for that case. In other words it shouldn't check availability of metadata 
cache files, but only really readable files by Drill.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-4185
>                 URL: https://issues.apache.org/jira/browse/DRILL-4185
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.4.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

Reply via email to