Github user arina-ielchiieva commented on a diff in the pull request:
https://github.com/apache/drill/pull/1214#discussion_r184401600
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java
---
@@ -158,25 +159,26 @@ public void close() {
} catch (RuntimeException e) {
ex = ex == null ? e : ex;
}
- try {
- if (fs != null) {
+
+ for (DrillFileSystem fs : fileSystems) {
+ try {
fs.close();
- fs = null;
- }
- } catch (IOException e) {
+ } catch (IOException e) {
throw UserException.resourceError(e)
- .addContext("Failed to close the Drill file system for " +
getName())
- .build(logger);
+ .addContext("Failed to close the Drill file system for " +
getName())
+ .build(logger);
+ }
}
+
if (ex != null) {
throw ex;
}
}
@Override
public DrillFileSystem newFileSystem(Configuration conf) throws
IOException {
- Preconditions.checkState(fs == null, "Tried to create a second
FileSystem. Can only be called once per OperatorContext");
- fs = new DrillFileSystem(conf, getStats());
+ DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --
When `AbstractParquetScanBatchCreator.getBatch` method is called, it
receives one operator context which is used to allow to create only one file
system. It also receives `AbstractParquetRowGroupScan` which contains several
row groups. Row groups may belong to different files. For Drill parquet files,
we create only one fs and use it for to create readers for each row group.
That's why it was fine when operator context allowed to create only one fs. But
we needed to adjust it for Hive files. For Hive we need to create fs for each
file (since config to each file system is different and created using
projection pusher), that's why I had to change operator context to allow more
then one file system. I have also introduced `AbstractDrillFileSystemManager`
which controls number of file systems created. `ParquetDrillFileSystemManager`
creates only one (as was done before).
`HiveDrillNativeParquetDrillFileSystemManager` creates fs for each file, so
when two row groups belong to the same
file, they will get the same fs.
But I agree that for tracking fs (i.e.
store.parquet.reader.pagereader.async is set to false) this will create mess in
calculations. So I suggest the following fix, for Hive we'll always create non
tracking fs, for Drill depending on store.parquet.reader.pagereader.async
option. Also I'll add checks in operator context to disallow to create more
then one tracking fs and to create tracking fs at all when non-tracking is /
are already created.
---