[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex

2020-01-06 Thread GitBox
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] 
Remove PrunedInMemoryFileIndex and merge its functionality into 
InMemoryFileIndex
URL: https://github.com/apache/spark/pull/26850#discussion_r363620212
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ##
 @@ -68,8 +70,11 @@ class InMemoryFileIndex(
   refresh0()
 
   override def partitionSpec(): PartitionSpec = {
+if (userSpecifiedPartitionSpec.isDefined) {
+  cachedPartitionSpec = userSpecifiedPartitionSpec.get
+}
 if (cachedPartitionSpec == null) {
-  cachedPartitionSpec = inferPartitioning()
+cachedPartitionSpec = inferPartitioning()
 
 Review comment:
   why not put the `if (userSpecifiedPartitionSpec.isDefined)` inside this if?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex

2020-01-05 Thread GitBox
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] 
Remove PrunedInMemoryFileIndex and merge its functionality into 
InMemoryFileIndex
URL: https://github.com/apache/spark/pull/26850#discussion_r363178201
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ##
 @@ -68,6 +70,9 @@ class InMemoryFileIndex(
   refresh0()
 
   override def partitionSpec(): PartitionSpec = {
+if (userSpecifiedPartitionSpec.isDefined) {
+  return userSpecifiedPartitionSpec.get
 
 Review comment:
   how about `cachedPartitionSpec = userSpecifiedPartitionSpec.get`? then the 
code flow is more consistent and we will log the partition spec as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex

2020-01-05 Thread GitBox
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] 
Remove PrunedInMemoryFileIndex and merge its functionality into 
InMemoryFileIndex
URL: https://github.com/apache/spark/pull/26850#discussion_r363163140
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ##
 @@ -67,7 +69,18 @@ class InMemoryFileIndex(
 
   refresh0()
 
+//  override def metadataOpsTimeNs: Option[Long] = {
 
 Review comment:
   can we remove it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex

2020-01-01 Thread GitBox
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] 
Remove PrunedInMemoryFileIndex and merge its functionality into 
InMemoryFileIndex
URL: https://github.com/apache/spark/pull/26850#discussion_r362383631
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala
 ##
 @@ -81,8 +81,9 @@ class CatalogFileIndex(
   }
   val partitionSpec = PartitionSpec(partitionSchema, partitions)
   val timeNs = System.nanoTime() - startTime
-  new PrunedInMemoryFileIndex(
-sparkSession, new Path(baseLocation.get), fileStatusCache, 
partitionSpec, Option(timeNs))
+  new InMemoryFileIndex(sparkSession, 
partitionSpec.partitions.map(_.path), Map.empty,
 
 Review comment:
   nit: can we use named parameter?
   ```
   new InMemoryFileIndex(
 sparkSession,
 rootPathsSpecified = partitionSpec.partitions.map(_.path),
 ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex

2020-01-01 Thread GitBox
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] 
Remove PrunedInMemoryFileIndex and merge its functionality into 
InMemoryFileIndex
URL: https://github.com/apache/spark/pull/26850#discussion_r362383376
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ##
 @@ -50,7 +50,9 @@ class InMemoryFileIndex(
 rootPathsSpecified: Seq[Path],
 parameters: Map[String, String],
 userSpecifiedSchema: Option[StructType],
-fileStatusCache: FileStatusCache = NoopCache)
+fileStatusCache: FileStatusCache = NoopCache,
+userSpecifiedPartitionSpec: Option[PartitionSpec] = None,
+_metadataOpsTimeNs: Option[Long] = None)
 
 Review comment:
   We can put `metadataOpsTimeNs` directly here. `super.metadataOpsTimeNs` just 
returns none.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org