[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

2020-02-22 Thread GitBox
vinothchandar commented on a change in pull request #1348: HUDI-597 Enable 
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r382966594
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
 ##
 @@ -84,7 +85,7 @@ class IncrementalRelation(val sqlContext: SQLContext,
 
   val filters = {
 if 
(optParams.contains(DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY)) {
-  val filterStr = 
optParams.get(DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY).getOrElse("")
+  val filterStr = 
optParams.getOrElse(DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY, "")
 
 Review comment:
   can we move the `""` default to DataSourceOptions, to keep it consistent 
with how the other options are defined


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

2020-02-22 Thread GitBox
vinothchandar commented on a change in pull request #1348: HUDI-597 Enable 
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r382966967
 
 

 ##
 File path: hudi-spark/src/test/scala/TestDataSource.scala
 ##
 @@ -135,6 +136,14 @@ class TestDataSource extends AssertionsForJUnit {
 countsPerCommit = 
hoodieIncViewDF2.groupBy("_hoodie_commit_time").count().collect();
 assertEquals(1, countsPerCommit.length)
 assertEquals(commitInstantTime2, countsPerCommit(0).get(0))
+
+// pull the latest commit within certain partitions
+val hoodieIncViewDF3 = spark.read.format("org.apache.hudi")
+  .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
+  .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, 
commitInstantTime1)
+  .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "/2016/*/*/*")
 
 Review comment:
   is the leading `/` necessary?  could we make it (if not already) such that 
the matchong works with or without it.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

2020-02-22 Thread GitBox
vinothchandar commented on a change in pull request #1348: HUDI-597 Enable 
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r38294
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
 ##
 @@ -100,17 +101,22 @@ class IncrementalRelation(val sqlContext: SQLContext,
 .get, classOf[HoodieCommitMetadata])
   fileIdToFullPath ++= metadata.getFileIdAndFullPaths(basePath).toMap
 }
+val pathGlobPattern = 
optParams.getOrElse(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "")
+val filteredFullPath = if(!pathGlobPattern.equals("")) {
 
 Review comment:
   here we will compare with the default variable constant. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

2020-02-22 Thread GitBox
vinothchandar commented on a change in pull request #1348: HUDI-597 Enable 
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r382966852
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
 ##
 @@ -100,17 +101,22 @@ class IncrementalRelation(val sqlContext: SQLContext,
 .get, classOf[HoodieCommitMetadata])
   fileIdToFullPath ++= metadata.getFileIdAndFullPaths(basePath).toMap
 }
+val pathGlobPattern = 
optParams.getOrElse(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "")
+val filteredFullPath = if(!pathGlobPattern.equals("")) {
+  val globMatcher = new GlobPattern("*" + pathGlobPattern)
 
 Review comment:
   should we leave the `*` to the user? i.e let the user pass in `*` if needed? 
or is that needed for the matching...
   
   I am not familiar with this class per se.. 
   
   Also http://hadoop.apache.org/docs/r2.8.0/api/allclasses-noframe.html does 
not seem to have `GlobPattern` is this class still around.. Was a bit confused 
by that.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

2020-02-22 Thread GitBox
vinothchandar commented on a change in pull request #1348: HUDI-597 Enable 
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r382966705
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
 ##
 @@ -100,17 +101,22 @@ class IncrementalRelation(val sqlContext: SQLContext,
 .get, classOf[HoodieCommitMetadata])
   fileIdToFullPath ++= metadata.getFileIdAndFullPaths(basePath).toMap
 }
+val pathGlobPattern = 
optParams.getOrElse(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "")
+val filteredFullPath = if(!pathGlobPattern.equals("")) {
+  val globMatcher = new GlobPattern("*" + pathGlobPattern)
+  fileIdToFullPath.filter(p => globMatcher.matches(p._2))
+} else fileIdToFullPath
 
 Review comment:
   please enclose within braces for readability. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services