[GitHub] [spark] AmplabJenkins removed a comment on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
AmplabJenkins removed a comment on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR URL: https://github.com/apache/spark/pull/27570#issuecomment-594176426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23996/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize
SparkQA commented on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize URL: https://github.com/apache/spark/pull/27752#issuecomment-594176778 **[Test build #119236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119236/testReport)** for PR 27752 at commit [`654d9d3`](https://github.com/apache/spark/commit/654d9d33bbffdaee7d818f938dd6a1f271208c0d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-594176386 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-594176398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23995/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-594176386 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
AmplabJenkins commented on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR URL: https://github.com/apache/spark/pull/27570#issuecomment-594176426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23996/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594176363 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594176377 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23993/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
AmplabJenkins commented on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper URL: https://github.com/apache/spark/pull/27593#issuecomment-594176297 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
AmplabJenkins removed a comment on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper URL: https://github.com/apache/spark/pull/27593#issuecomment-594176309 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23994/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-594176398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23995/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594176363 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
AmplabJenkins removed a comment on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper URL: https://github.com/apache/spark/pull/27593#issuecomment-594176297 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables
AmplabJenkins removed a comment on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables URL: https://github.com/apache/spark/pull/27776#issuecomment-594170070 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594176377 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23993/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
AmplabJenkins commented on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR URL: https://github.com/apache/spark/pull/27570#issuecomment-594176417 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
AmplabJenkins commented on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper URL: https://github.com/apache/spark/pull/27593#issuecomment-594176309 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23994/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
SparkQA commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-594175789 **[Test build #119254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119254/testReport)** for PR 27571 at commit [`4c5b2a5`](https://github.com/apache/spark/commit/4c5b2a59574f59927b962f9657f82837f88db74b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
SparkQA commented on issue #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR URL: https://github.com/apache/spark/pull/27570#issuecomment-594175786 **[Test build #119255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119255/testReport)** for PR 27570 at commit [`2156bed`](https://github.com/apache/spark/commit/2156bed223ec28279fbaa18e2bc0f8c47ade7d0d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
SparkQA commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594175748 **[Test build #119253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119253/testReport)** for PR 27728 at commit [`77ea177`](https://github.com/apache/spark/commit/77ea177985516e05bf89e3c05a9c87050583). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
zero323 commented on issue #27593: [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper URL: https://github.com/apache/spark/pull/27593#issuecomment-594175502 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dongjoon-hyun commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594175144 @dbtsai . You can address the above comments in the spin-off PR. - https://github.com/apache/spark/pull/27778 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
zero323 commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-594175279 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dongjoon-hyun commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594175454 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dongjoon-hyun edited a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594175144 @dbtsai . You can address the above two comments in the spin-off PR. - https://github.com/apache/spark/pull/27778 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dongjoon-hyun commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387299506 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -437,61 +437,74 @@ object DataSourceStrategy { } } + /** + * Find the column name of an expression that can be pushed down. + */ + private[sql] def pushDownColName(e: Expression): Option[String] = { +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper +def helper(e: Expression): Option[Seq[String]] = e match { + case a: Attribute => Some(Seq(a.name)) + case s: GetStructField => helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) + case _ => None +} +helper(e).map(_.quoted) + } + private def translateLeafNodeFilter(predicate: Expression): Option[Filter] = predicate match { -case expressions.EqualTo(a: Attribute, Literal(v, t)) => - Some(sources.EqualTo(a.name, convertToScala(v, t))) -case expressions.EqualTo(Literal(v, t), a: Attribute) => - Some(sources.EqualTo(a.name, convertToScala(v, t))) - -case expressions.EqualNullSafe(a: Attribute, Literal(v, t)) => - Some(sources.EqualNullSafe(a.name, convertToScala(v, t))) -case expressions.EqualNullSafe(Literal(v, t), a: Attribute) => - Some(sources.EqualNullSafe(a.name, convertToScala(v, t))) - -case expressions.GreaterThan(a: Attribute, Literal(v, t)) => - Some(sources.GreaterThan(a.name, convertToScala(v, t))) -case expressions.GreaterThan(Literal(v, t), a: Attribute) => - Some(sources.LessThan(a.name, convertToScala(v, t))) - -case expressions.LessThan(a: Attribute, Literal(v, t)) => - Some(sources.LessThan(a.name, convertToScala(v, t))) -case expressions.LessThan(Literal(v, t), a: Attribute) => - Some(sources.GreaterThan(a.name, convertToScala(v, t))) - -case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, t)) => - Some(sources.GreaterThanOrEqual(a.name, convertToScala(v, t))) -case expressions.GreaterThanOrEqual(Literal(v, t), a: Attribute) => - Some(sources.LessThanOrEqual(a.name, convertToScala(v, t))) - -case expressions.LessThanOrEqual(a: Attribute, Literal(v, t)) => - Some(sources.LessThanOrEqual(a.name, convertToScala(v, t))) -case expressions.LessThanOrEqual(Literal(v, t), a: Attribute) => - Some(sources.GreaterThanOrEqual(a.name, convertToScala(v, t))) - -case expressions.InSet(a: Attribute, set) => - val toScala = CatalystTypeConverters.createToScalaConverter(a.dataType) - Some(sources.In(a.name, set.toArray.map(toScala))) +case expressions.EqualTo(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.EqualTo(_, convertToScala(v, t))) +case expressions.EqualTo(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.EqualTo(_, convertToScala(v, t))) + +case expressions.EqualNullSafe(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.EqualNullSafe(_, convertToScala(v, t))) +case expressions.EqualNullSafe(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.EqualNullSafe(_, convertToScala(v, t))) + +case expressions.GreaterThan(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.GreaterThan(_, convertToScala(v, t))) +case expressions.GreaterThan(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.LessThan(_, convertToScala(v, t))) + +case expressions.LessThan(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.LessThan(_, convertToScala(v, t))) +case expressions.LessThan(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.GreaterThan(_, convertToScala(v, t))) + +case expressions.GreaterThanOrEqual(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.GreaterThanOrEqual(_, convertToScala(v, t))) +case expressions.GreaterThanOrEqual(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.LessThanOrEqual(_, convertToScala(v, t))) + +case expressions.LessThanOrEqual(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.LessThanOrEqual(_, convertToScala(v, t))) +case expressions.LessThanOrEqual(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.GreaterThanOrEqual(_, convertToScala(v, t))) + +case expressions.InSet(e: Expression, set) => + val toScala = CatalystTypeConverters.createToScalaConverter(e.dataType) + pushDownColName(e).map(sources.In(_, set.toArray.map(toScala))) // Because we only convert In to InSet in Optimizer when there are more than certain // items. So it is possible we still get an In expression here that needs to be pushed // down. -case
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dongjoon-hyun commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387298806 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -437,61 +437,74 @@ object DataSourceStrategy { } } + /** + * Find the column name of an expression that can be pushed down. + */ + private[sql] def pushDownColName(e: Expression): Option[String] = { +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper +def helper(e: Expression): Option[Seq[String]] = e match { + case a: Attribute => Some(Seq(a.name)) + case s: GetStructField => helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) + case _ => None +} +helper(e).map(_.quoted) + } + private def translateLeafNodeFilter(predicate: Expression): Option[Filter] = predicate match { -case expressions.EqualTo(a: Attribute, Literal(v, t)) => - Some(sources.EqualTo(a.name, convertToScala(v, t))) -case expressions.EqualTo(Literal(v, t), a: Attribute) => - Some(sources.EqualTo(a.name, convertToScala(v, t))) - -case expressions.EqualNullSafe(a: Attribute, Literal(v, t)) => - Some(sources.EqualNullSafe(a.name, convertToScala(v, t))) -case expressions.EqualNullSafe(Literal(v, t), a: Attribute) => - Some(sources.EqualNullSafe(a.name, convertToScala(v, t))) - -case expressions.GreaterThan(a: Attribute, Literal(v, t)) => - Some(sources.GreaterThan(a.name, convertToScala(v, t))) -case expressions.GreaterThan(Literal(v, t), a: Attribute) => - Some(sources.LessThan(a.name, convertToScala(v, t))) - -case expressions.LessThan(a: Attribute, Literal(v, t)) => - Some(sources.LessThan(a.name, convertToScala(v, t))) -case expressions.LessThan(Literal(v, t), a: Attribute) => - Some(sources.GreaterThan(a.name, convertToScala(v, t))) - -case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, t)) => - Some(sources.GreaterThanOrEqual(a.name, convertToScala(v, t))) -case expressions.GreaterThanOrEqual(Literal(v, t), a: Attribute) => - Some(sources.LessThanOrEqual(a.name, convertToScala(v, t))) - -case expressions.LessThanOrEqual(a: Attribute, Literal(v, t)) => - Some(sources.LessThanOrEqual(a.name, convertToScala(v, t))) -case expressions.LessThanOrEqual(Literal(v, t), a: Attribute) => - Some(sources.GreaterThanOrEqual(a.name, convertToScala(v, t))) - -case expressions.InSet(a: Attribute, set) => - val toScala = CatalystTypeConverters.createToScalaConverter(a.dataType) - Some(sources.In(a.name, set.toArray.map(toScala))) +case expressions.EqualTo(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.EqualTo(_, convertToScala(v, t))) +case expressions.EqualTo(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.EqualTo(_, convertToScala(v, t))) + +case expressions.EqualNullSafe(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.EqualNullSafe(_, convertToScala(v, t))) +case expressions.EqualNullSafe(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.EqualNullSafe(_, convertToScala(v, t))) + +case expressions.GreaterThan(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.GreaterThan(_, convertToScala(v, t))) +case expressions.GreaterThan(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.LessThan(_, convertToScala(v, t))) + +case expressions.LessThan(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.LessThan(_, convertToScala(v, t))) +case expressions.LessThan(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.GreaterThan(_, convertToScala(v, t))) + +case expressions.GreaterThanOrEqual(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.GreaterThanOrEqual(_, convertToScala(v, t))) +case expressions.GreaterThanOrEqual(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.LessThanOrEqual(_, convertToScala(v, t))) + +case expressions.LessThanOrEqual(e: Expression, Literal(v, t)) => + pushDownColName(e).map(sources.LessThanOrEqual(_, convertToScala(v, t))) +case expressions.LessThanOrEqual(Literal(v, t), e: Expression) => + pushDownColName(e).map(sources.GreaterThanOrEqual(_, convertToScala(v, t))) + +case expressions.InSet(e: Expression, set) => + val toScala = CatalystTypeConverters.createToScalaConverter(e.dataType) + pushDownColName(e).map(sources.In(_, set.toArray.map(toScala))) Review comment: If you don't mind, can we rewrite this like the following to prevent potential minor regression? The above new code execute `CatalystTypeConverters.createToScalaConverter` for all expressions while
[GitHub] [spark] AmplabJenkins commented on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables
AmplabJenkins commented on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables URL: https://github.com/apache/spark/pull/27776#issuecomment-594170082 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119249/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables
AmplabJenkins commented on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables URL: https://github.com/apache/spark/pull/27776#issuecomment-594170070 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
dongjoon-hyun commented on a change in pull request #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#discussion_r387294541 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -437,61 +437,72 @@ object DataSourceStrategy { } } + /** + * Find the column name of an expression that can be pushed down. + */ + private[sql] def pushDownColName(e: Expression): Option[String] = { +def helper(e: Expression): Option[Seq[String]] = e match { + case a: Attribute => Some(Seq(a.name)) + case _ => None +} +helper(e).flatMap(_.headOption) Review comment: Although I know the background, shall we write like the following simpler way in this PR? ```scala def helper(e: Expression): Option[String] = e match { case a: Attribute => Some(a.name) case _ => None } helper(e) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables
SparkQA removed a comment on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables URL: https://github.com/apache/spark/pull/27776#issuecomment-594116491 **[Test build #119249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119249/testReport)** for PR 27776 at commit [`e588663`](https://github.com/apache/spark/commit/e5886637eed1166f5b9abbbe669709573ce289ce). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables
SparkQA commented on issue #27776: [SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables URL: https://github.com/apache/spark/pull/27776#issuecomment-594169684 **[Test build #119249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119249/testReport)** for PR 27776 at commit [`e588663`](https://github.com/apache/spark/commit/e5886637eed1166f5b9abbbe669709573ce289ce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on issue #27773: [SPARK-29154][CORE] Update Spark scheduler for stage level scheduling
tgravescs commented on issue #27773: [SPARK-29154][CORE] Update Spark scheduler for stage level scheduling URL: https://github.com/apache/spark/pull/27773#issuecomment-594169369 @mridulm @squito if either of you have time to review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv`
AmplabJenkins removed a comment on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv` URL: https://github.com/apache/spark/pull/27771#issuecomment-594164395 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119239/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594164391 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119248/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594164378 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
SparkQA removed a comment on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594112894 **[Test build #119248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119248/testReport)** for PR 27728 at commit [`77ea177`](https://github.com/apache/spark/commit/77ea177985516e05bf89e3c05a9c87050583). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv`
AmplabJenkins removed a comment on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv` URL: https://github.com/apache/spark/pull/27771#issuecomment-594164383 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv`
AmplabJenkins commented on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv` URL: https://github.com/apache/spark/pull/27771#issuecomment-594164383 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594164391 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119248/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv`
AmplabJenkins commented on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv` URL: https://github.com/apache/spark/pull/27771#issuecomment-594164395 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119239/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
AmplabJenkins commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594164378 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv`
SparkQA removed a comment on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv` URL: https://github.com/apache/spark/pull/27771#issuecomment-594045806 **[Test build #119239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119239/testReport)** for PR 27771 at commit [`a72568e`](https://github.com/apache/spark/commit/a72568ef7560e0996a012235873cc5bc395ec364). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
SparkQA commented on issue #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#issuecomment-594164190 **[Test build #119248 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119248/testReport)** for PR 27728 at commit [`77ea177`](https://github.com/apache/spark/commit/77ea177985516e05bf89e3c05a9c87050583). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv`
SparkQA commented on issue #27771: [SPARK-31020][SQL] Support foldable schemas by `from_csv` URL: https://github.com/apache/spark/pull/27771#issuecomment-594164032 **[Test build #119239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119239/testReport)** for PR 27771 at commit [`a72568e`](https://github.com/apache/spark/commit/a72568ef7560e0996a012235873cc5bc395ec364). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594162321 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594162329 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23992/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594162329 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23992/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594162321 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
SparkQA commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594161759 **[Test build #119252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119252/testReport)** for PR 27778 at commit [`ea2d1f6`](https://github.com/apache/spark/commit/ea2d1f6bbe6e57424097cc3b5c80fe0a6e90afe2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
dongjoon-hyun commented on a change in pull request #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#discussion_r387278397 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala ## @@ -22,68 +22,82 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.sources import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} class DataSourceStrategySuite extends PlanTest with SharedSparkSession { + val attrInts = Seq( +'cint.int, + ).zip(Seq( +"cint", + )) - test("translate simple expression") { -val attrInt = 'cint.int -val attrStr = 'cstr.string + val attrStrs = Seq( +'cstr.int, + ).zip(Seq( +"cstr", + )) + + test("translate simple expression") { attrInts.zip(attrStrs) Review comment: ~Indentation?~ Never mind. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594156029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119251/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594156019 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
SparkQA removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594154950 **[Test build #119251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119251/testReport)** for PR 27778 at commit [`56c56a0`](https://github.com/apache/spark/commit/56c56a0e36fd8e610d9cb525e2cd9f8f08ba99ca). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
SparkQA commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594156000 **[Test build #119251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119251/testReport)** for PR 27778 at commit [`56c56a0`](https://github.com/apache/spark/commit/56c56a0e36fd8e610d9cb525e2cd9f8f08ba99ca). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594156029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119251/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594156019 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
dongjoon-hyun commented on a change in pull request #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#discussion_r387278397 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala ## @@ -22,68 +22,82 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.sources import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} class DataSourceStrategySuite extends PlanTest with SharedSparkSession { + val attrInts = Seq( +'cint.int, + ).zip(Seq( +"cint", + )) - test("translate simple expression") { -val attrInt = 'cint.int -val attrStr = 'cstr.string + val attrStrs = Seq( +'cstr.int, + ).zip(Seq( +"cstr", + )) + + test("translate simple expression") { attrInts.zip(attrStrs) Review comment: Indentation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594151900 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
AmplabJenkins removed a comment on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594151910 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23991/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
SparkQA commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594154950 **[Test build #119251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119251/testReport)** for PR 27778 at commit [`56c56a0`](https://github.com/apache/spark/commit/56c56a0e36fd8e610d9cb525e2cd9f8f08ba99ca). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable
dbtsai commented on issue #27778: [SPARK-31027] [SQL] Refactor DataSourceStrategy to be more extendable URL: https://github.com/apache/spark/pull/27778#issuecomment-594153195 cc @dongjoon-hyun @gengliangwang @cloud-fan @rdblue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27769: [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators
dongjoon-hyun commented on issue #27769: [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators URL: https://github.com/apache/spark/pull/27769#issuecomment-594152999 Thank you, @maropu and @cloud-fan . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor `DataSourceStrategy.scala` to minimize the changes to support nested predicate pushdown
AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor `DataSourceStrategy.scala` to minimize the changes to support nested predicate pushdown URL: https://github.com/apache/spark/pull/27778#issuecomment-594151900 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor `DataSourceStrategy.scala` to minimize the changes to support nested predicate pushdown
AmplabJenkins commented on issue #27778: [SPARK-31027] [SQL] Refactor `DataSourceStrategy.scala` to minimize the changes to support nested predicate pushdown URL: https://github.com/apache/spark/pull/27778#issuecomment-594151910 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23991/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai opened a new pull request #27778: [SPARK-31027] [SQL] Refactor `DataSourceStrategy.scala` to minimize the changes to support nested predicate pushdown
dbtsai opened a new pull request #27778: [SPARK-31027] [SQL] Refactor `DataSourceStrategy.scala` to minimize the changes to support nested predicate pushdown URL: https://github.com/apache/spark/pull/27778 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce any user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #27749: [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions
dongjoon-hyun closed pull request #27749: [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions URL: https://github.com/apache/spark/pull/27749 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference from an example
MaxGekk commented on a change in pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference from an example URL: https://github.com/apache/spark/pull/22666#discussion_r387269929 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala ## @@ -19,14 +19,39 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.util.ArrayBasedMapData -import org.apache.spark.sql.types.{MapType, StringType, StructType} +import org.apache.spark.sql.types.{DataType, MapType, StringType, StructType} +import org.apache.spark.unsafe.types.UTF8String object ExprUtils { - def evalSchemaExpr(exp: Expression): StructType = exp match { -case Literal(s, StringType) => StructType.fromDDL(s.toString) + def evalSchemaExpr(exp: Expression): StructType = { +// Use `DataType.fromDDL` since the type string can be struct<...>. +val dataType = exp match { + case Literal(s, StringType) => +DataType.fromDDL(s.toString) + case e @ SchemaOfCsv(_: Literal, _) => +val ddlSchema = e.eval(EmptyRow).asInstanceOf[UTF8String] +DataType.fromDDL(ddlSchema.toString) + case e => throw new AnalysisException( +"Schema should be specified in DDL format as a string literal or output of " + + s"the schema_of_csv function instead of ${e.sql}") +} + +if (!dataType.isInstanceOf[StructType]) { + throw new AnalysisException( +s"Schema should be struct type but got ${dataType.sql}.") +} +dataType.asInstanceOf[StructType] + } + + def evalTypeExpr(exp: Expression): DataType = exp match { +case Literal(s, StringType) => DataType.fromDDL(s.toString) Review comment: For example, a column with CSV string may be a result of string functions. So, you could just invoke the functions with an particular inputs. Currently, we force people to materialize an example and copy-past it to `schema_of_csv()`. That could cause maintainability issues, so, users should keep in sync the example in `schema_of_csv()` with the code which forms CSV column. I prepared the PR https://github.com/apache/spark/pull/2 to avoid the restriction which is not necessary from my point of view. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
viirya commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387264452 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ## @@ -49,15 +49,34 @@ class ParquetFilters( pushDownInFilterThreshold: Int, caseSensitive: Boolean) { // A map which contains parquet field name and data type, if predicate push down applies. - private val nameToParquetField : Map[String, ParquetField] = { -// Here we don't flatten the fields in the nested schema but just look up through -// root fields. Currently, accessing to nested fields does not push down filters -// and it does not support to create filters for them. -val primitiveFields = - schema.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map { f => - f.getName -> ParquetField(f.getName, -ParquetSchemaType(f.getOriginalType, - f.getPrimitiveTypeName, f.getTypeLength, f.getDecimalMetadata)) + // The keys are the column names. For nested column, `dot` will be used as a separator. + // For column name that contains `dot`, backquote will be used. + // See `org.apache.spark.sql.connector.catalog.quote` for implementation details. + private val nameToParquetField : Map[String, ParquetPrimitiveField] = { +// Recursively traverse the parquet schema to get primitive fields that can be pushed-down. +// `parentFieldNames` is used to keep track of the current nested level when traversing. +def getPrimitiveFields( +fields: Seq[Type], +parentFieldNames: Array[String] = Array.empty): Seq[ParquetPrimitiveField] = { + fields.flatMap { +case p: PrimitiveType => + Some(ParquetPrimitiveField(fieldNames = parentFieldNames :+ p.getName, +fieldType = ParquetSchemaType(p.getOriginalType, + p.getPrimitiveTypeName, p.getTypeLength, p.getDecimalMetadata))) +// Note that when g is a `Struct`, `g.getOriginalType` is `null`. +// When g is a `Map`, `g.getOriginalType` is `MAP`. +// When g is a `List`, `g.getOriginalType` is `LIST`. +case g: GroupType if g.getOriginalType == null => + getPrimitiveFields(g.getFields.asScala, parentFieldNames :+ g.getName) +// Parquet only supports push-down for primitive types; as a result, Map and List types +// are removed. +case _ => None + } +} + +val primitiveFields = getPrimitiveFields(schema.getFields.asScala).map { field => + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper Review comment: nit: move `import` outside `map? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
viirya commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387261748 ## File path: sql/core/src/main/java/org/apache/parquet/filter2/predicate/NestedFilterApi.java ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.predicate; + +import org.apache.parquet.hadoop.metadata.ColumnPath; Review comment: nit: import order? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
viirya commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387266696 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ## @@ -75,13 +94,13 @@ class ParquetFilters( } /** - * Holds a single field information stored in the underlying parquet file. + * Holds a single primitive field information stored in the underlying parquet file. * - * @param fieldName field name in parquet file + * @param fieldNames field names in parquet file Review comment: I think It still indicates a single field name? Though it could contain multiple identifiers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-594145847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119234/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-594145833 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-594145847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119234/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-594145833 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-593991498 **[Test build #119234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119234/testReport)** for PR 26918 at commit [`6476b62`](https://github.com/apache/spark/commit/6476b62667b2a38cabf44c2fad447c4bab9005d5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-594144781 **[Test build #119234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119234/testReport)** for PR 26918 at commit [`6476b62`](https://github.com/apache/spark/commit/6476b62667b2a38cabf44c2fad447c4bab9005d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27777: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv`
AmplabJenkins removed a comment on issue #2: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv` URL: https://github.com/apache/spark/pull/2#issuecomment-594138770 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27777: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv`
SparkQA commented on issue #2: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv` URL: https://github.com/apache/spark/pull/2#issuecomment-594141546 **[Test build #119250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119250/testReport)** for PR 2 at commit [`d4da235`](https://github.com/apache/spark/commit/d4da2352f90e683c02e12b4f9e161284f0146734). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27777: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv`
AmplabJenkins removed a comment on issue #2: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv` URL: https://github.com/apache/spark/pull/2#issuecomment-594138782 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27749: [SPARK-30997][SQL] An analysis failure in generators with aggregate functions
dongjoon-hyun commented on a change in pull request #27749: [SPARK-30997][SQL] An analysis failure in generators with aggregate functions URL: https://github.com/apache/spark/pull/27749#discussion_r387260348 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ## @@ -433,6 +433,13 @@ class AnalysisErrorSuite extends AnalysisTest { :: Nil ) + errorTest( +"generator nested in expressions for aggregates", +testRelation.select(Explode(CreateArray(min($"a") :: max($"a") :: Nil)) + 1), +"Generators are not supported when it's nested in expressions, but got: " + + "(explode(array(min(a), max(a))) + 1)" :: Nil + ) + Review comment: Thanks for adding. It looks fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27777: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv`
AmplabJenkins commented on issue #2: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv` URL: https://github.com/apache/spark/pull/2#issuecomment-594138770 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27777: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv`
AmplabJenkins commented on issue #2: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv` URL: https://github.com/apache/spark/pull/2#issuecomment-594138782 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels
AmplabJenkins commented on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels URL: https://github.com/apache/spark/pull/27679#issuecomment-594138522 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels
AmplabJenkins removed a comment on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels URL: https://github.com/apache/spark/pull/27679#issuecomment-594138522 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels
AmplabJenkins commented on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels URL: https://github.com/apache/spark/pull/27679#issuecomment-594138532 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119247/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels
AmplabJenkins removed a comment on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels URL: https://github.com/apache/spark/pull/27679#issuecomment-594138532 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119247/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request #27777: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv`
MaxGekk opened a new pull request #2: [SPARK-31025][SQL] Support foldable CSV strings by `schema_of_csv` URL: https://github.com/apache/spark/pull/2 ### What changes were proposed in this pull request? In the PR, I propose to change checking of the input parameter in the `SchemaOfCsv` expression, and allow foldable `child` expression. ### Why are the changes needed? To improve user experience with Spark SQL. Currently, only string literals are acceptable as CSV examples by `schema_of_csv`: ```sql spark-sql> select schema_of_csv(concat_ws(',', 0.1, 1)); Error in query: cannot resolve 'schema_of_csv(concat_ws(',', CAST(0.1BD AS STRING), CAST(1 AS STRING)))' due to data type mismatch: The input csv should be a string literal and not null; however, got concat_ws(',', CAST(0.1BD AS STRING), CAST(1 AS STRING)).; line 1 pos 7; 'Project [unresolvedalias(schema_of_csv(concat_ws(,, cast(0.1 as string), cast(1 as string))), None)] +- OneRowRelation ``` ### Does this PR introduce any user-facing change? Yes, after change the `schema_of_csv` accept foldable expressions, for example: ```sql ``` ### How was this patch tested? - By existing test suites `CsvFunctionsSuite` and `CsvExpressionsSuite`. - Added new test to `CsvFunctionsSuite` to check foldable input. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels
SparkQA removed a comment on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels URL: https://github.com/apache/spark/pull/27679#issuecomment-594105637 **[Test build #119247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119247/testReport)** for PR 27679 at commit [`4584465`](https://github.com/apache/spark/commit/4584465a7681b5199f9cc31c755e7e96ee36bb1d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels
SparkQA commented on issue #27679: [SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels URL: https://github.com/apache/spark/pull/27679#issuecomment-594138027 **[Test build #119247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119247/testReport)** for PR 27679 at commit [`4584465`](https://github.com/apache/spark/commit/4584465a7681b5199f9cc31c755e7e96ee36bb1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
viirya commented on a change in pull request #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#discussion_r387252151 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2435,17 +2450,6 @@ object SQLConf { .booleanConf .createWithDefault(false) - val LEGACY_ALLOW_DUPLICATED_MAP_KEY = Review comment: We need to update sql-migration-guide doc too. We already documented `spark.sql.legacy.allowDuplicatedMapKeys` there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
viirya commented on a change in pull request #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#discussion_r387252151 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2435,17 +2450,6 @@ object SQLConf { .booleanConf .createWithDefault(false) - val LEGACY_ALLOW_DUPLICATED_MAP_KEY = Review comment: We need to update sql-migration-guide doc too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on a change in pull request #27728: [SPARK-17636][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dbtsai commented on a change in pull request #27728: [SPARK-17636][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387248516 ## File path: sql/core/v2.3/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala ## @@ -64,9 +64,11 @@ private[sql] object OrcFilters extends OrcFiltersBase { * Create ORC filter as a SearchArgument instance. */ def createFilter(schema: StructType, filters: Seq[Filter]): Option[SearchArgument] = { -val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap +val dataTypeMap = schema.map(f => quoteAttributeNameIfNeeded(f.name) -> f.dataType).toMap // Combines all convertible filters using `And` to produce a single conjunction -val conjunctionOptional = buildTree(convertibleFilters(schema, dataTypeMap, filters)) +// TODO: ORC doesn't support predicate pushdown for nested field yet, so they are removed. Review comment: Done. thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on a change in pull request #27728: [SPARK-17636][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dbtsai commented on a change in pull request #27728: [SPARK-17636][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387245545 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala ## @@ -63,13 +63,38 @@ private[sql] trait ParquetTest extends FileBasedDataSourceTest { (f: String => Unit): Unit = withDataSourceFile(data)(f) /** - * Writes `data` to a Parquet file and reads it back as a [[DataFrame]], + * Writes `data` objects to a Parquet file and reads it back as a [[DataFrame]], * which is then passed to `f`. The Parquet file will be deleted after `f` returns. */ - protected def withParquetDataFrame[T <: Product: ClassTag: TypeTag] + protected def withParquetDFfromObjs[T <: Product: ClassTag: TypeTag] (data: Seq[T], testVectorized: Boolean = true) (f: DataFrame => Unit): Unit = withDataSourceDataFrame(data, testVectorized)(f) + /** + * Writes `df` dataframe to a Parquet file and reads it back as a [[DataFrame]], + * which is then passed to `f`. The Parquet file will be deleted after `f` returns. + */ + protected def withParquetDFfromDF[T <: Product: ClassTag: TypeTag] + (df: DataFrame, testVectorized: Boolean = true) + (f: DataFrame => Unit): Unit = { +withTempPath { file => + df.write.format(dataSourceName).save(file.getCanonicalPath) + readFile(file.getCanonicalPath, testVectorized)(f) +} + } + + /** + * Writes `df` to a Parquet file and reads it back as a [[DataFrame]], + * which is then passed to `f`. The Parquet file will be deleted after `f` returns. + */ + protected def toParquetDataFrame(df: DataFrame, testVectorized: Boolean = true) Review comment: They are used in couple places. I can submit another PR for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
AmplabJenkins removed a comment on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#issuecomment-594127264 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119232/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
AmplabJenkins commented on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#issuecomment-594127264 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119232/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
AmplabJenkins removed a comment on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#issuecomment-594127250 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
AmplabJenkins commented on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#issuecomment-594127250 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys
SparkQA removed a comment on issue #27772: [SPARK-31019][SQL] make it clear that people can deduplicate map keys URL: https://github.com/apache/spark/pull/27772#issuecomment-593967450 **[Test build #119232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119232/testReport)** for PR 27772 at commit [`80c7450`](https://github.com/apache/spark/commit/80c74509ab1a86cc001887060b34fd3c29ec5a81). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on a change in pull request #27728: [SPARK-17636][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet
dbtsai commented on a change in pull request #27728: [SPARK-17636][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet URL: https://github.com/apache/spark/pull/27728#discussion_r387244858 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala ## @@ -63,13 +63,38 @@ private[sql] trait ParquetTest extends FileBasedDataSourceTest { (f: String => Unit): Unit = withDataSourceFile(data)(f) /** - * Writes `data` to a Parquet file and reads it back as a [[DataFrame]], + * Writes `data` objects to a Parquet file and reads it back as a [[DataFrame]], * which is then passed to `f`. The Parquet file will be deleted after `f` returns. */ - protected def withParquetDataFrame[T <: Product: ClassTag: TypeTag] + protected def withParquetDFfromObjs[T <: Product: ClassTag: TypeTag] Review comment: Sounds good idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org