[GitHub] drill pull request: Drill 4484: NPE when querying empty directory

2016-03-19 Thread adeneche
Github user adeneche closed the pull request at:

https://github.com/apache/drill/pull/424


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Drill 4484: NPE when querying empty directory

2016-03-10 Thread amansinha100
Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/424#issuecomment-195140460
  
LGTM.  +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Drill 4484: NPE when querying empty directory

2016-03-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/424#discussion_r55780247
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -547,29 +559,72 @@ public long getRowCount() {
   }
 
 
-  // Create and return a new file selection based on reading the metadata 
cache file.
-  // This function also initializes a few of ParquetGroupScan's fields as 
appropriate.
+  /**
+   * Create and return a new file selection based on reading the metadata 
cache file.
+   *
+   * This function also initializes a few of ParquetGroupScan's fields as 
appropriate.
+   *
+   * @param selection initial file selection
+   * @param metaFilePath metadata cache file path
+   * @return file selection read from cache
+   *
+   * @throws IOException
+   * @throws UserException when the updated selection is empty, this 
happens if the user selects an empty folder.
+   */
   private FileSelection
-  initFromMetadataCache(DrillFileSystem fs, FileSelection selection) 
throws IOException {
-FileStatus metaRootDir = selection.getFirstPath(fs);
-Path metaFilePath = new Path(metaRootDir.getPath(), 
Metadata.METADATA_FILENAME);
+  initFromMetadataCache(FileSelection selection, Path metaFilePath) throws 
IOException {
+// get the metadata for the root directory by reading the metadata file
+// parquetTableMetadata contains the metadata for all files in the 
selection root folder, but we need to make sure
+// we only select the files that are part of selection (by setting 
fileSet appropriately)
 
 // get (and set internal field) the metadata for the directory by 
reading the metadata file
 this.parquetTableMetadata = Metadata.readBlockMeta(fs, 
metaFilePath.toString());
 List fileNames = Lists.newArrayList();
-for (Metadata.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
-  fileNames.add(file.getPath());
+List fileStatuses = selection.getStatuses(fs);
+
+final Path first = fileStatuses.get(0).getPath();
+if (fileStatuses.size() == 1 && 
selection.getSelectionRoot().equals(first.toString())) {
+  // we are selecting all files from selection root. Expand the file 
list from the cache
+  for (Metadata.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
+fileNames.add(file.getPath());
+  }
+  // we don't need to populate fileSet as all files are selected
+} else {
+  // we need to expand the files from fileStatuses
+  for (FileStatus status : fileStatuses) {
+if (status.isDirectory()) {
+  //TODO read the metadata cache files in parallel
--- End diff --

Could you file a JIRA for this TODO ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Drill 4484: NPE when querying empty directory

2016-03-10 Thread adeneche
GitHub user adeneche opened a pull request:

https://github.com/apache/drill/pull/424

Drill 4484: NPE when querying  empty directory

this PR also includes the fix for DRILL-4376. I will rebase once that fix 
has been merged into master

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adeneche/incubator-drill DRILL-4484

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #424


commit b8fa596d421eca31ff28a40e681c544c10521078
Author: adeneche 
Date:   2016-03-09T12:44:02Z

DRILL-4376: Wrong results when doing a count(*) on part of directories with 
metadata cache

commit e8c5dcb64926d0931c48cb0eba3f17dc2b597822
Author: adeneche 
Date:   2016-03-10T09:40:06Z

DRILL-4484: NPE when querying  empty directory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---