[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex URL: https://github.com/apache/spark/pull/26850#discussion_r363620212 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala ## @@ -68,8 +70,11 @@ class InMemoryFileIndex( refresh0() override def partitionSpec(): PartitionSpec = { +if (userSpecifiedPartitionSpec.isDefined) { + cachedPartitionSpec = userSpecifiedPartitionSpec.get +} if (cachedPartitionSpec == null) { - cachedPartitionSpec = inferPartitioning() +cachedPartitionSpec = inferPartitioning() Review comment: why not put the `if (userSpecifiedPartitionSpec.isDefined)` inside this if? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex URL: https://github.com/apache/spark/pull/26850#discussion_r363178201 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala ## @@ -68,6 +70,9 @@ class InMemoryFileIndex( refresh0() override def partitionSpec(): PartitionSpec = { +if (userSpecifiedPartitionSpec.isDefined) { + return userSpecifiedPartitionSpec.get Review comment: how about `cachedPartitionSpec = userSpecifiedPartitionSpec.get`? then the code flow is more consistent and we will log the partition spec as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex URL: https://github.com/apache/spark/pull/26850#discussion_r363163140 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala ## @@ -67,7 +69,18 @@ class InMemoryFileIndex( refresh0() +// override def metadataOpsTimeNs: Option[Long] = { Review comment: can we remove it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex URL: https://github.com/apache/spark/pull/26850#discussion_r362383631 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala ## @@ -81,8 +81,9 @@ class CatalogFileIndex( } val partitionSpec = PartitionSpec(partitionSchema, partitions) val timeNs = System.nanoTime() - startTime - new PrunedInMemoryFileIndex( -sparkSession, new Path(baseLocation.get), fileStatusCache, partitionSpec, Option(timeNs)) + new InMemoryFileIndex(sparkSession, partitionSpec.partitions.map(_.path), Map.empty, Review comment: nit: can we use named parameter? ``` new InMemoryFileIndex( sparkSession, rootPathsSpecified = partitionSpec.partitions.map(_.path), ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
cloud-fan commented on a change in pull request #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex URL: https://github.com/apache/spark/pull/26850#discussion_r362383376 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala ## @@ -50,7 +50,9 @@ class InMemoryFileIndex( rootPathsSpecified: Seq[Path], parameters: Map[String, String], userSpecifiedSchema: Option[StructType], -fileStatusCache: FileStatusCache = NoopCache) +fileStatusCache: FileStatusCache = NoopCache, +userSpecifiedPartitionSpec: Option[PartitionSpec] = None, +_metadataOpsTimeNs: Option[Long] = None) Review comment: We can put `metadataOpsTimeNs` directly here. `super.metadataOpsTimeNs` just returns none. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org