[ https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454257#comment-16454257 ]
ASF GitHub Bot commented on DRILL-6331: --------------------------------------- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r184401600 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } - try { - if (fs != null) { + + for (DrillFileSystem fs : fileSystems) { + try { fs.close(); - fs = null; - } - } catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) - .addContext("Failed to close the Drill file system for " + getName()) - .build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { - Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); - fs = new DrillFileSystem(conf, getStats()); + DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- When `AbstractParquetScanBatchCreator.getBatch` method is called, it receives one operator context which is used to allow to create only one file system. It also receives `AbstractParquetRowGroupScan` which contains several row groups. Row groups may belong to different files. For Drill parquet files, we create only one fs and use it for to create readers for each row group. That's why it was fine when operator context allowed to create only one fs. But we needed to adjust it for Hive files. For Hive we need to create fs for each file (since config to each file system is different and created using projection pusher), that's why I had to change operator context to allow more then one file system. I have also introduced `AbstractDrillFileSystemManager` which controls number of file systems created. `ParquetDrillFileSystemManager` creates only one (as was done before). `HiveDrillNativeParquetDrillFileSystemManager` creates fs for each file, so when two row groups belong to the same file, they will get the same fs. But I agree that for tracking fs (i.e. store.parquet.reader.pagereader.async is set to false) this will create mess in calculations. So I suggest the following fix, for Hive we'll always create non tracking fs, for Drill depending on store.parquet.reader.pagereader.async option. Also I'll add checks in operator context to disallow to create more then one tracking fs and to create tracking fs at all when non-tracking is / are already created. > Parquet filter pushdown does not support the native hive reader > --------------------------------------------------------------- > > Key: DRILL-6331 > URL: https://issues.apache.org/jira/browse/DRILL-6331 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive > Affects Versions: 1.13.0 > Reporter: Arina Ielchiieva > Assignee: Arina Ielchiieva > Priority: Major > Fix For: 1.14.0 > > > Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the > core difference between them was > that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader > instead of HiveReader. > This allowed to read Hive parquet files using Drill native parquet reader but > did not expose Hive data to Drill optimizations. > For example, filter push down, limit push down, count to direct scan > optimizations. > Hive code had to be refactored to use the same interfaces as > ParquestGroupScan in order to be exposed to such optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)