Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/424#discussion_r55780247
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
@@ -547,29 +559,72 @@ public long getRowCount() {
}
- // Create and return a new file selection based on reading the metadata
cache file.
- // This function also initializes a few of ParquetGroupScan's fields as
appropriate.
+ /**
+ * Create and return a new file selection based on reading the metadata
cache file.
+ *
+ * This function also initializes a few of ParquetGroupScan's fields as
appropriate.
+ *
+ * @param selection initial file selection
+ * @param metaFilePath metadata cache file path
+ * @return file selection read from cache
+ *
+ * @throws IOException
+ * @throws UserException when the updated selection is empty, this
happens if the user selects an empty folder.
+ */
private FileSelection
- initFromMetadataCache(DrillFileSystem fs, FileSelection selection)
throws IOException {
-FileStatus metaRootDir = selection.getFirstPath(fs);
-Path metaFilePath = new Path(metaRootDir.getPath(),
Metadata.METADATA_FILENAME);
+ initFromMetadataCache(FileSelection selection, Path metaFilePath) throws
IOException {
+// get the metadata for the root directory by reading the metadata file
+// parquetTableMetadata contains the metadata for all files in the
selection root folder, but we need to make sure
+// we only select the files that are part of selection (by setting
fileSet appropriately)
// get (and set internal field) the metadata for the directory by
reading the metadata file
this.parquetTableMetadata = Metadata.readBlockMeta(fs,
metaFilePath.toString());
List fileNames = Lists.newArrayList();
-for (Metadata.ParquetFileMetadata file :
parquetTableMetadata.getFiles()) {
- fileNames.add(file.getPath());
+List fileStatuses = selection.getStatuses(fs);
+
+final Path first = fileStatuses.get(0).getPath();
+if (fileStatuses.size() == 1 &&
selection.getSelectionRoot().equals(first.toString())) {
+ // we are selecting all files from selection root. Expand the file
list from the cache
+ for (Metadata.ParquetFileMetadata file :
parquetTableMetadata.getFiles()) {
+fileNames.add(file.getPath());
+ }
+ // we don't need to populate fileSet as all files are selected
+} else {
+ // we need to expand the files from fileStatuses
+ for (FileStatus status : fileStatuses) {
+if (status.isDirectory()) {
+ //TODO read the metadata cache files in parallel
--- End diff --
Could you file a JIRA for this TODO ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---