[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062183#comment-16062183
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/824


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059566#comment-16059566
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123512327
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java 
---
@@ -593,4 +610,49 @@ private void convert(List batches) 
throws SchemaChangeException
   }
 }
   }
+
+  private static String replaceWorkingPathInString(String orig) {
+return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), 
Matcher.quoteReplacement(TestTools.getWorkingPath()));
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir) 
throws IOException {
+copyDirectoryIntoTempSpace(resourcesDir, null);
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir, 
String destinationSubDir) throws IOException {
+Path destination = destinationSubDir != null ? new 
Path(getDfsTestTmpSchemaLocation(), destinationSubDir)
+: new Path(getDfsTestTmpSchemaLocation());
+fs.copyFromLocalFile(
+new Path(replaceWorkingPathInString(resourcesDir)),
+destination);
+  }
+
+  /**
+   * Metadata cache files include full paths to the files that have been 
scanned.
--- End diff --

I wanted to do this, but found that will be incorrect, since that files 
have `metadata version` `v1` and `v2`. These versions metadata cache files can 
contain only absolute paths.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059567#comment-16059567
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123512448
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java 
---
@@ -593,4 +610,49 @@ private void convert(List batches) 
throws SchemaChangeException
   }
 }
   }
+
+  private static String replaceWorkingPathInString(String orig) {
+return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), 
Matcher.quoteReplacement(TestTools.getWorkingPath()));
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir) 
throws IOException {
+copyDirectoryIntoTempSpace(resourcesDir, null);
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir, 
String destinationSubDir) throws IOException {
+Path destination = destinationSubDir != null ? new 
Path(getDfsTestTmpSchemaLocation(), destinationSubDir)
+: new Path(getDfsTestTmpSchemaLocation());
+fs.copyFromLocalFile(
+new Path(replaceWorkingPathInString(resourcesDir)),
+destination);
+  }
+
+  /**
+   * Metadata cache files include full paths to the files that have been 
scanned.
+   *
+   * There is no way to generate a metadata cache file with absolute paths 
that
+   * will be guaranteed to be available on an arbitrary test machine.
+   *
--- End diff --

`` was added. Thanks


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058538#comment-16058538
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123397635
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
--- End diff --

Thanks for the explanation.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058540#comment-16058540
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123398393
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java 
---
@@ -593,4 +610,49 @@ private void convert(List batches) 
throws SchemaChangeException
   }
 }
   }
+
+  private static String replaceWorkingPathInString(String orig) {
+return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), 
Matcher.quoteReplacement(TestTools.getWorkingPath()));
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir) 
throws IOException {
+copyDirectoryIntoTempSpace(resourcesDir, null);
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir, 
String destinationSubDir) throws IOException {
+Path destination = destinationSubDir != null ? new 
Path(getDfsTestTmpSchemaLocation(), destinationSubDir)
+: new Path(getDfsTestTmpSchemaLocation());
+fs.copyFromLocalFile(
+new Path(replaceWorkingPathInString(resourcesDir)),
+destination);
+  }
+
+  /**
+   * Metadata cache files include full paths to the files that have been 
scanned.
+   *
+   * There is no way to generate a metadata cache file with absolute paths 
that
+   * will be guaranteed to be available on an arbitrary test machine.
+   *
--- End diff --

Very small suggestion: Javadoc is HTML-formatted. Insert a  between 
paragraphs.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058539#comment-16058539
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123398337
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java 
---
@@ -593,4 +610,49 @@ private void convert(List batches) 
throws SchemaChangeException
   }
 }
   }
+
+  private static String replaceWorkingPathInString(String orig) {
+return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), 
Matcher.quoteReplacement(TestTools.getWorkingPath()));
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir) 
throws IOException {
+copyDirectoryIntoTempSpace(resourcesDir, null);
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir, 
String destinationSubDir) throws IOException {
+Path destination = destinationSubDir != null ? new 
Path(getDfsTestTmpSchemaLocation(), destinationSubDir)
+: new Path(getDfsTestTmpSchemaLocation());
+fs.copyFromLocalFile(
+new Path(replaceWorkingPathInString(resourcesDir)),
+destination);
+  }
+
+  /**
+   * Metadata cache files include full paths to the files that have been 
scanned.
--- End diff --

For older files with the marker, should we just replace the marker to be 
relative and take advantage of this improvement? Can that be done without 
having to edit the old files?


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058072#comment-16058072
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123340606
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1413,6 +1452,31 @@ public ColumnTypeMetadata_v3 
getColumnTypeInfo(String[] name) {
   return directories;
 }
 
+/** If directories list and file metadata list contain relative paths, 
update it to absolute ones
+ * @param baseDir base parent directory
+ */
+@JsonIgnore public void updateRelativePaths(Path baseDir) {
--- End diff --

I combined the general code for this two methods and created a separate 
helper methods.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058075#comment-16058075
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123339909
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
+ * @param baseDir base parent directory
+ */
+@JsonIgnore public void updateRelativePaths(Path baseDir) {
+  if (!directories.isEmpty()) {
+// It is enough to check the first path to decide if updating 
needed
+if (!new Path(directories.get(0)).isAbsolute()) {
--- End diff --

It is possible to replace String with Path for directories paths due to 
implementing custom `JsonSerializer` and `JsonDeserializer`. But 
then it will be necessary to convert every `Path` from lists back into 
`String`, because a String paths are used in a lot of places: `FileSelection`, 
`Metadata`, `ParquetGroupScan`, `ReadEntryWithPath`, `FileWork`, 
`FormatSelection`, `FormatPlugin`,  `PartitionLocation`and so on.

I am totally agree with replacing `String` with `Path` requirement. But it 
should be done not only for parquet and in context of separate jira. I am going 
to create it. 


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058071#comment-16058071
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123331277
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
--- End diff --

Yes, we do, internally we use absolute paths (for the FileSelection, 
FileStatus, ReadEntryWithPath). 

By the way it is possible to convert paths to absolute ones just before 
retrieving, but converting immediately after deserializing has advantages: 
avoiding of the keeping the metadata with appropriate `baseDir`, avoiding of 
the over number of checking the type of the path and avoiding an extra 
converting paths (when the data is retrieved several times from one metadata 
object).


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058074#comment-16058074
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123329475
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 
getParquetTableMetadata(List fileSta
   /**
* Get a list of file metadata for a list of parquet files
*
-   * @param fileStatuses
-   * @return
+   * @param parquetTableMetadata_v3 can store column schema info from all 
the files and row groups
+   * @param fileStatuses list of the parquet files statuses
+   * @param absolutePathInMetadata true if result metadata files should 
contain absolute paths, false for relative paths.
+   *   Relative paths in the metadata are only 
necessary while creating meta cache files.
+   * @return list of the parquet file metadata (parquet metadata for every 
file)
* @throws IOException
*/
-  private List getParquetFileMetadata_v3(
-  ParquetTableMetadata_v3 parquetTableMetadata_v3, List 
fileStatuses) throws IOException {
+  private List 
getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3,
+  List fileStatuses, boolean absolutePathInMetadata) 
throws IOException {
--- End diff --

Using of boolean flag is deleted.

For now we create and gather metadata only with absolute paths. But before 
writing based on the old metadata the new metadata with relative paths is 
created.

Agree. It makes sense to check every path while converting it. Done.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058076#comment-16058076
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123342097
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -398,6 +398,23 @@ public void testDrill4877() throws Exception {
 
   }
 
+  @Test // DRILL-3867
+  public void testMoveCache() throws Exception {
+String tableName = "nation_move";
+String newTableName = "nation_moved";
+test("use dfs_test.tmp");
+test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test(String.format("refresh table metadata %s", tableName));
+checkForMetadataFile(tableName);
+File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName);
+File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName);
+FileUtils.moveDirectory(srcFile, dstFile);
+Assert.assertFalse("Cache file was not moved successfully", 
srcFile.exists());
+int rowCount = testSql(String.format("select * from %s", 
newTableName));
+Assert.assertEquals(50, rowCount);
+  }
+
--- End diff --

There is no requirement for them to use absolute paths. After this fix they 
can be upgraded to use relative paths. (I'm going to open separate jira for 
it). Therefore a new test case for metadata cache files with absolute paths was 
added.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058073#comment-16058073
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123316880
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -179,10 +182,18 @@ private Metadata(FileSystem fs, ParquetFormatConfig 
formatConfig) {
 
 for (final FileStatus file : fs.listStatus(p, new DrillPathFilter())) {
   if (file.isDirectory()) {
+String subdirectoryName = file.getPath().getName();
 ParquetTableMetadata_v3 subTableMetadata = 
(createMetaFilesRecursively(file.getPath().toString())).getLeft();
-metaDataList.addAll(subTableMetadata.files);
-directoryList.addAll(subTableMetadata.directories);
-directoryList.add(file.getPath().toString());
+for (ParquetFileMetadata_v3 pfm_v3 : subTableMetadata.files) {
+  // Construction of the relative file path by adding subdirectory 
name and inner relative file path
+  String relativePath = Joiner.on("/").join(subdirectoryName, 
pfm_v3.getPath());
--- End diff --

Regarding the `paths` I answered in the general comment. 

I refused from merging path names recursively. Instead of that I've 
implemented a new `MetadataPathUtils.createMetadataWithRelativePaths()` method 
that converts absolute paths to relative ones and creates a new metadata for 
the cache files. 



> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053336#comment-16053336
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122602595
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 
getParquetTableMetadata(List fileSta
   /**
* Get a list of file metadata for a list of parquet files
*
-   * @param fileStatuses
-   * @return
+   * @param parquetTableMetadata_v3 can store column schema info from all 
the files and row groups
+   * @param fileStatuses list of the parquet files statuses
+   * @param absolutePathInMetadata true if result metadata files should 
contain absolute paths, false for relative paths.
+   *   Relative paths in the metadata are only 
necessary while creating meta cache files.
+   * @return list of the parquet file metadata (parquet metadata for every 
file)
* @throws IOException
*/
-  private List getParquetFileMetadata_v3(
-  ParquetTableMetadata_v3 parquetTableMetadata_v3, List 
fileStatuses) throws IOException {
+  private List 
getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3,
+  List fileStatuses, boolean absolutePathInMetadata) 
throws IOException {
--- End diff --

Is this really needed? Or, is it an attempt to answer my earlier concern 
about compatibility?

Only newer Drill instances will create metadata. If we want relative paths, 
then we should always use relative paths. No need to pass along a flag.

On the other hand, if we are saying that the root call is absolute (as seen 
in the code earlier), but subdirectories are relative, then doesn't the 
presence of even one absolute directory name make the whole feature invalid?

Perhaps some more background explanation in the PR comments (or even a 
design spec) might shed some light on what we are trying to accomplish here. 
Very hard to simply reverse engineer a design from code changes...

Also, below, we have a method to convert relative paths to absolute in 
bulk. Should we do the same here? Always gather data in absolute form, then 
convert it to relative just before serializing?

I wasn't sure why we are converting paths from relative to absolute. If we 
are doing that because we use absolute paths internally, then it is OK to 
gather absolute paths here. Convert the to relative just before writing if that 
is easier.

Here, I'm referring to the note about the "proposed alternative solution".


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053337#comment-16053337
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122602623
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
+ * @param baseDir base parent directory
+ */
+@JsonIgnore public void updateRelativePaths(Path baseDir) {
+  if (!directories.isEmpty()) {
+// It is enough to check the first path to decide if updating 
needed
+if (!new Path(directories.get(0)).isAbsolute()) {
--- End diff --

This is getting a bit silly, converting to/from String and Path. The 
`directories` list should contain Path elements. In general, we should never 
work with Path as strings except if we need to serialize as in Jackson.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053338#comment-16053338
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122602771
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1413,6 +1452,31 @@ public ColumnTypeMetadata_v3 
getColumnTypeInfo(String[] name) {
   return directories;
 }
 
+/** If directories list and file metadata list contain relative paths, 
update it to absolute ones
+ * @param baseDir base parent directory
+ */
+@JsonIgnore public void updateRelativePaths(Path baseDir) {
--- End diff --

How is this method different from the previous one? Looks like they are 
doing a bit of the same thing...


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053334#comment-16053334
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122602455
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -179,10 +182,18 @@ private Metadata(FileSystem fs, ParquetFormatConfig 
formatConfig) {
 
 for (final FileStatus file : fs.listStatus(p, new DrillPathFilter())) {
   if (file.isDirectory()) {
+String subdirectoryName = file.getPath().getName();
 ParquetTableMetadata_v3 subTableMetadata = 
(createMetaFilesRecursively(file.getPath().toString())).getLeft();
-metaDataList.addAll(subTableMetadata.files);
-directoryList.addAll(subTableMetadata.directories);
-directoryList.add(file.getPath().toString());
+for (ParquetFileMetadata_v3 pfm_v3 : subTableMetadata.files) {
+  // Construction of the relative file path by adding subdirectory 
name and inner relative file path
+  String relativePath = Joiner.on("/").join(subdirectoryName, 
pfm_v3.getPath());
--- End diff --

`Path.mergePaths()`?

We really don't want to work with paths as strings: such code is hard to 
test and maintain.

If we need new Path operations (such as merging relative paths), I suggest 
we create a `PathUtils` class to hold the operations. Then, create unit tests 
to check all the various conditions: empty head, empty tail, neither empty, etc.

Also, in general, we would work with path names as `Path` objects: the job 
of the `Path` class is do properly implement file path operations, just as the 
job of the older `File` and newer `Path` classes in Java is to handle OS paths.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053339#comment-16053339
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122602816
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -398,6 +398,23 @@ public void testDrill4877() throws Exception {
 
   }
 
+  @Test // DRILL-3867
+  public void testMoveCache() throws Exception {
+String tableName = "nation_move";
+String newTableName = "nation_moved";
+test("use dfs_test.tmp");
+test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test(String.format("refresh table metadata %s", tableName));
+checkForMetadataFile(tableName);
+File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName);
+File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName);
+FileUtils.moveDirectory(srcFile, dstFile);
+Assert.assertFalse("Cache file was not moved successfully", 
srcFile.exists());
+int rowCount = testSql(String.format("select * from %s", 
newTableName));
+Assert.assertEquals(50, rowCount);
+  }
+
--- End diff --

Will they stay absolute? Or, are they rebuilt from time to time?


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053335#comment-16053335
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122602635
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
--- End diff --

Explanation of why we do this? Do we store data in relative form, but 
convert it to absolute form internally?


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052473#comment-16052473
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122515967
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -680,7 +731,7 @@ private boolean tableModified(List directories, 
Path metaFilePath,
   }
 
   public static abstract class ParquetFileMetadata {
-@JsonIgnore public abstract String getPath();
+@JsonIgnore public abstract ParquetPath getParquetPath();
--- End diff --

The structure of metadata cache file isn't changed and deserializing works 
properly for new relative paths and for old absolute ones (`new Path(parent, 
child)` in `deserialize()` method). 

In the new approach after deserializing list of paths are checked and 
updated from relative paths to absolute ones.
Leaving relative paths in metadata may cause to repeated converting of the 
paths and checking in a lot of places the kind of path.
If old meta cache file is deserialized with absolute paths, nothing is made 
with them and an old mechanism works.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052476#comment-16052476
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122529973
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -398,6 +398,23 @@ public void testDrill4877() throws Exception {
 
   }
 
+  @Test // DRILL-3867
+  public void testMoveCache() throws Exception {
+String tableName = "nation_move";
+String newTableName = "nation_moved";
+test("use dfs_test.tmp");
+test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test(String.format("refresh table metadata %s", tableName));
+checkForMetadataFile(tableName);
+File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName);
+File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName);
+FileUtils.moveDirectory(srcFile, dstFile);
+Assert.assertFalse("Cache file was not moved successfully", 
srcFile.exists());
+int rowCount = testSql(String.format("select * from %s", 
newTableName));
+Assert.assertEquals(50, rowCount);
+  }
+
--- End diff --

There are the metadata files with full absolute paths in other test cases 
(for example in TestCorruptParquetDateCorrection`).


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052475#comment-16052475
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122511871
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs 
parquetTableMetadataDirs, Path p
   }
 
   /**
+   * Serializer for ParquetPath. Writes the path relative to the root path
+   */
+  private static class ParquetPathSerializer extends 
StdSerializer {
+private final String rootPath;
+
+ParquetPathSerializer(String rootPath) {
+  super(ParquetPath.class);
+  this.rootPath = rootPath;
+}
+
+@Override
+public void serialize(ParquetPath parquetPath, JsonGenerator 
jsonGenerator, SerializerProvider serializerProvider) throws IOException, 
JsonGenerationException {
+  
Preconditions.checkState(parquetPath.getFullPath().startsWith(rootPath), 
String.format("Path %s is not a subpath of %s", parquetPath.getFullPath(), 
rootPath));
+  String relativePath = 
parquetPath.getFullPath().replaceFirst(rootPath, "");
--- End diff --

Hadoop Path doesn't provide similar way. But it is possible to use 
relativize() method from `Uri`.
Anyway in the new approach in the `Metadata.createMetaFilesRecursively()` 
I've implemented recursive collecting of inner subdirectories's names to 
construct relative path for every file and directory.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052474#comment-16052474
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122516074
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1029,14 +1099,14 @@ public ParquetTableMetadata_v2(String drillVersion) 
{
 }
 
 public ParquetTableMetadata_v2(ParquetTableMetadataBase parquetTable,
-List files, List directories, 
String drillVersion) {
+List files, List directories, 
String drillVersion) {
--- End diff --

New approach is implemented.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052477#comment-16052477
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r122509527
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs 
parquetTableMetadataDirs, Path p
   }
 
   /**
+   * Serializer for ParquetPath. Writes the path relative to the root path
+   */
--- End diff --

The new approach with storing relative paths in metadata is implemented. 


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-16 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052464#comment-16052464
 ] 

Vitalii Diravka commented on DRILL-3867:


The fix is storing relative paths in the process of creating metadata cache 
files - Metadata.createMetaFilesRecursively() and
converting relative paths in metadata files after deserializing. It makes the 
parquet table with existing metadata cache files accessible after their moving.

> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999144#comment-15999144
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r115104954
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs 
parquetTableMetadataDirs, Path p
   }
 
   /**
+   * Serializer for ParquetPath. Writes the path relative to the root path
+   */
--- End diff --

Why compute the relative during serialization? A more common approach is to 
store the relative paths in our internal data structures and serialize that 
relative value.

Doing things this way is awkward: we need to define a serializer and 
deserializer unnecessarily.

Also, see comment later on backward compatibility.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999143#comment-15999143
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r115105784
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -398,6 +398,23 @@ public void testDrill4877() throws Exception {
 
   }
 
+  @Test // DRILL-3867
+  public void testMoveCache() throws Exception {
+String tableName = "nation_move";
+String newTableName = "nation_moved";
+test("use dfs_test.tmp");
+test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test(String.format("refresh table metadata %s", tableName));
+checkForMetadataFile(tableName);
+File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName);
+File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName);
+FileUtils.moveDirectory(srcFile, dstFile);
+Assert.assertFalse("Cache file was not moved successfully", 
srcFile.exists());
+int rowCount = testSql(String.format("select * from %s", 
newTableName));
+Assert.assertEquals(50, rowCount);
+  }
+
--- End diff --

The tests here don't verify opening a version 1.10 metadata file with this 
1.11 change. What happens? Can we read old files?


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999146#comment-15999146
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r115105413
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs 
parquetTableMetadataDirs, Path p
   }
 
   /**
+   * Serializer for ParquetPath. Writes the path relative to the root path
+   */
+  private static class ParquetPathSerializer extends 
StdSerializer {
+private final String rootPath;
+
+ParquetPathSerializer(String rootPath) {
+  super(ParquetPath.class);
+  this.rootPath = rootPath;
+}
+
+@Override
+public void serialize(ParquetPath parquetPath, JsonGenerator 
jsonGenerator, SerializerProvider serializerProvider) throws IOException, 
JsonGenerationException {
+  
Preconditions.checkState(parquetPath.getFullPath().startsWith(rootPath), 
String.format("Path %s is not a subpath of %s", parquetPath.getFullPath(), 
rootPath));
+  String relativePath = 
parquetPath.getFullPath().replaceFirst(rootPath, "");
--- End diff --

Java defines a Path abstraction that will compute relative paths.

An [old fashioned 
way](http://stackoverflow.com/questions/204784/how-to-construct-a-relative-path-in-java-from-two-absolute-paths-or-urls):
```
String relative = new File(base).toURI().relativize(new 
File(path).toURI()).getPath();
```

From the same post, the Java 7 way:
```
Path pathAbsolute = Paths.get("/var/data/stuff/xyz.dat");
Path pathBase = Paths.get("/var/data");
Path pathRelative = pathBase.relativize(pathAbsolute);
```

Hadoop's path may have something similar.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999147#comment-15999147
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r115105622
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -680,7 +731,7 @@ private boolean tableModified(List directories, 
Path metaFilePath,
   }
 
   public static abstract class ParquetFileMetadata {
-@JsonIgnore public abstract String getPath();
+@JsonIgnore public abstract ParquetPath getParquetPath();
--- End diff --

Doing this changes the on-disk format for the metadata file, doesn't it? If 
we do that, we need to introduce a new file version. Since metadata is 
expensive, we'd have to be able to read the existing file format. I don't see 
code for any of that.

To address the issue, we can instead leave the field as a string. Treat the 
string as either relative or absolute. This should be easy to detect: 
"/this/is/absolute", "but/this/is/relative".

Then, create a method to set the paths. When setting paths, convert them to 
relative. When retrieving them, give a base directory. Convert relative (new) 
paths to absolute, leave (old) absolute paths unchanged. This is exactly how 
browsers handle URLs, OS's handle paths and so on. Classic approach.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995033#comment-15995033
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/824

DRILL-3867: Store relative paths in metadata file

For easier review this PR consists of two commits, which can be squashed 
into one.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-3867

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #824


commit c2341e21e00ec595cab66d078c37ffcaf8e2481e
Author: Steven Phillips 
Date:   2015-10-02T03:36:15Z

DRILL-3867: Store relative paths in metadata file

commit 6ff28a8c94e7f1a9a79d75132695f8335f315cdd
Author: Vitalii Diravka 
Date:   2017-05-03T15:52:08Z

DRILL-3867: Store relative paths in metadata file
- resolving conflicts;
- refactoring according to the new changes in the Metadata and other 
classes,
  that were made after this original fix.




> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)