[GitHub] [spark] HyukjinKwon commented on a change in pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
HyukjinKwon commented on a change in pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#discussion_r365109561 ## File path: python/pyspark/sql/pandas/group_ops.py ## @@ -28,6 +28,7 @@ class PandasGroupedOpsMixin(object): can use this class. """ +@since(2.3) Review comment: I piggyback this change while I'm here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
HyukjinKwon commented on a change in pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#discussion_r365109478 ## File path: python/pyspark/sql/tests/test_pandas_cogrouped_map.py ## @@ -180,59 +176,26 @@ def left_assign_key(key, l, _): assert_frame_equal(expected, result, check_column_type=_check_column_type) def test_wrong_return_type(self): -with QuietTest(self.sc): -with self.assertRaisesRegexp( -NotImplementedError, -'Invalid returnType.*cogrouped map Pandas UDF.*MapType'): -pandas_udf( -lambda l, r: l, -'id long, v map', -PandasUDFType.COGROUPED_MAP) - -def test_wrong_args(self): # Test that we get a sensible exception invalid values passed to apply left = self.data1 right = self.data2 with QuietTest(self.sc): -# Function rather than a udf Review comment: `groupby.cogroup.applyInPandas` now always sets the right eval type internally. So we don't have to worry about mis-typed UDF. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365109226 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala ## @@ -170,4 +170,156 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { test("SPARK-11412 read and merge orc schemas in parallel") { testMergeSchemasInParallel(OrcFileOperator.readOrcSchemasInParallel) } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq(true, false).foreach { convertMetastore => + withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") { +withTempDir { dir => + try { +sql("USE default") +sql( + """ +|CREATE EXTERNAL TABLE hive_orc( Review comment: I'm a little confused here. @kevinyu98 . Do you want to get a table created by Hive here? Usually, we use the table name, `hive_orc`, for that table. Please see https://github.com/apache/spark/pull/27130/files#diff-a8c26a35def87a13e6b59db19d9fb8a1R68 . And, you still using `hiveClient.runSqlHive` at line 192. I'm wondering what is the test target in this PR~. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of t
beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-572917240 > Ah, also, can you put a simple explain example (about how to convert a plan with distinct aggregates) in the PR description? better to put how-to-fix in this pr there. OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r365107970 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { }.asInstanceOf[NamedExpression] } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) +} else if (distinctAggGroups.size == 1) { + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +// Why do we need to construct the phantom id ? +// First, In order to reduce costs, it is better to handle the filter clause locally. +// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate expression +// If(id > 1) 'a else null first, and use the result as output. +// Second, If more than one DISTINCT aggregate expression uses the same column, +// We need to construct the phantom attributes so as the output not lost. +// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) will output +// attribute 'phantom1-a and attribute 'phantom2-a instead of two 'a. +// Note: We just need to illusion the expression with filter clause. +// The illusionary mechanism may result in multiple distinct aggregations uses +// different column, so we still need to call `rewrite`. +val phantomId = i + 1 +val unfoldableChildren = af.children.filter(!_.foldable) +val exprAttrs = unfoldableChildren.map { e => + (e, AttributeReference(s"phantom$phantomId-${e.sql}", e.dataType, nullable = true)()) Review comment: OK. Could I use `_gen_phantom_${exprId.id}` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-572917051 cc @rxin, @zero323, @cloud-fan, @mengxr, @viirya, @ueshin, @BryanCutler, @icexelloss, @dongjoon-hyun FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572916501 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572916510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116450/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#issuecomment-572916406 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572916501 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
HyukjinKwon opened a new pull request #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165 ### What changes were proposed in this pull request? This PR proposes to redesign pandas UDFs as described in [the proposal](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing). Note that, this PR address one of the future improvements described [here](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit#heading=h.h3ncjpk6ujqu), "A couple of less-intuitive pandas UDF types" (by @zero323) together. In short, - New way with type hints as an alternative and experimental way. ```python @pandas_udf(schema='...') def func(c1: Series, c2: Series) -> DataFrame: pass ``` - Remove three types below from UDF, and make them as separate standalone APIs. So, `pandas_udf` is now consistent with regular `udf`s and other expressions. `df.mapInPandas(udf)` -> `df.mapInPandas(func, schema)` `df.groupby.apply(udf)` -> `df.groupby.applyInPandas(func, schema)` `df.groupby.cogroup.apply(udf)` -> `df.groupby.cogroup.applyInPandas(func, schema)` - No deprecation for the existing ways for now. ```python @pandas_udf(schema='...', functionType=PandasUDFType.SCALAR) def func(c1, c2): pass ``` If users are happy with this, I plan to deprecate the existing way and declare using type hints is not experimental anymore. One design goal in this PR was that, avoid touching the internal (since we didn't deprecate the old ways for now), but supports type hints with a minimised changes only at the interface. - Once we deprecate or remove the old ways, I think it requires another refactoring for the internal in the future. At the very least, we should rename internal pandas evaluation types. - If users find this experimental type hints isn't quite helpful, we should simply revert the changes at the interface level. ### Why are the changes needed? In order to address old design issues. Please see [the proposal](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing). ### Does this PR introduce any user-facing change? For behaviour changes, No. It adds new ways to use pandas UDFs by using type hints. See below. **SCALAR**: ```python @pandas_udf(schema='...') def func(c1: Series, c2: DataFrame) -> Series: pass # DataFrame represents a struct column ``` **SCALAR_ITER**: ```python @pandas_udf(schema='...') def func(iter: Iterator[Tuple[Series, DataFrame, ...]]) -> Iterator[Series]: pass # Same as SCALAR but wrapped by Iterator ``` **GROUPED_AGG**: ```python @pandas_udf(schema='...') def func(c1: Series, c2: DataFrame) -> int: pass # DataFrame represents a struct column ``` **GROUPED_MAP**: This was added in Spark 2.3 as of SPARK-20396. As described above, it keeps the existing behaviour. Instead, we have a new alias `groupby.applyInPandas` for `groupby.apply`. See the example below: ```python def func(pdf): return pdf df.groupby("...").applyInPandas(func, schema=df.schema) ``` **MAP_ITER**: This was added in Spark 3.0 as of SPARK-28198; and this PR replaces the usages. See the example below: ```python def func(iter): for df in iter: yield df df.mapInPandas(func, df.schema) ``` **COGROUPED_MAP** This was added in Spark 3.0 as of SPARK-27463; and this PR replaces the usages. See the example below: ```python def asof_join(left, right): return pd.merge_asof(left, right, on="...", by="...") df1.groupby("...").cogroup(df2.groupby("...")).applyInPandas(asof_join, schema="...") ``` ### How was this patch tested? Unittests added and tested against Python 2.7, 3.6 and 3.7. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
AmplabJenkins removed a comment on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#issuecomment-572916416 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21251/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572916510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116450/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table
viirya commented on a change in pull request #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table URL: https://github.com/apache/spark/pull/26956#discussion_r365108284 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ## @@ -1981,6 +1982,60 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-30312: truncate table - keep acl/permission") { +import testImplicits._ +val ignorePermissionAcl = Seq(true, false) + +ignorePermissionAcl.foreach { ignore => + withSQLConf( +"fs.file.impl" -> classOf[FakeLocalFsFileSystem].getName, +"fs.file.impl.disable.cache" -> "true", +SQLConf.TRUNCATE_TABLE_IGNORE_PERMISSION_ACL.key -> ignore.toString) { +withTable("tab1") { + sql("CREATE TABLE tab1 (col INT) USING parquet") + sql("INSERT INTO tab1 SELECT 1") + checkAnswer(spark.table("tab1"), Row(1)) + + val tablePath = new Path(spark.sessionState.catalog +.getTableMetadata(TableIdentifier("tab1")).storage.locationUri.get) + + val hadoopConf = spark.sessionState.newHadoopConf() + val fs = tablePath.getFileSystem(hadoopConf) + val fileStatus = fs.getFileStatus(tablePath); + + fs.setPermission(tablePath, new FsPermission("777")) + assert(fileStatus.getPermission().toString() == "rwxrwxrwx") + + // Set ACL to table path. + val customAcl = new java.util.ArrayList[AclEntry]() + customAcl.add(new AclEntry.Builder() +.setType(AclEntryType.USER) +.setScope(AclEntryScope.ACCESS) +.setPermission(FsAction.READ).build()) + fs.setAcl(tablePath, customAcl) + assert(fs.getAclStatus(tablePath).getEntries().get(0) == customAcl.get(0)) + + sql("TRUNCATE TABLE tab1") + assert(spark.table("tab1").collect().isEmpty) + + val fileStatus2 = fs.getFileStatus(tablePath) + if (ignore) { +assert(fileStatus2.getPermission().toString() == "rwxr-xr-x") Review comment: Good point! Let me update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#issuecomment-572916416 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21251/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
AmplabJenkins commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#issuecomment-572916406 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572878167 **[Test build #116450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116450/testReport)** for PR 27159 at commit [`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
HeartSaVioR commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572916229 3 tests ran so far, 1 failed from other flaky test, 2 passed. Looks like working so far. Closing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
HeartSaVioR closed pull request #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
SparkQA commented on issue #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#issuecomment-572915998 **[Test build #116464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116464/testReport)** for PR 27030 at commit [`3cb79fe`](https://github.com/apache/spark/commit/3cb79fe63a14ecba62bfc9b449ecf9b4a28b6e10). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r365107970 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { }.asInstanceOf[NamedExpression] } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) +} else if (distinctAggGroups.size == 1) { + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +// Why do we need to construct the phantom id ? +// First, In order to reduce costs, it is better to handle the filter clause locally. +// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate expression +// If(id > 1) 'a else null first, and use the result as output. +// Second, If more than one DISTINCT aggregate expression uses the same column, +// We need to construct the phantom attributes so as the output not lost. +// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) will output +// attribute 'phantom1-a and attribute 'phantom2-a instead of two 'a. +// Note: We just need to illusion the expression with filter clause. +// The illusionary mechanism may result in multiple distinct aggregations uses +// different column, so we still need to call `rewrite`. +val phantomId = i + 1 +val unfoldableChildren = af.children.filter(!_.foldable) +val exprAttrs = unfoldableChildren.map { e => + (e, AttributeReference(s"phantom$phantomId-${e.sql}", e.dataType, nullable = true)()) Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572915778 **[Test build #116450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116450/testReport)** for PR 27159 at commit [`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365107735 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala ## @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq("true", "false").foreach { parquetConversion => + withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> parquetConversion) { +withTempPath { path => + withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") { +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")). + toDF("c1", "c2", "c3").repartition(1) +val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")). + toDF("c1", "c2", "c3").repartition(1) +val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")). + toDF("c1", "c2", "c3").repartition(1) +someDF1.write.parquet(s"${path.getCanonicalPath}/l1/") +someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/") +someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/") + +val topDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin +sql(topDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl1"), Nil) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() + }.getMessage +assert(msg.contains("Not a file:")) +} + +val l1DirStatement = + s""" + |CREATE EXTERNAL TABLE tbl2( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin +sql(l1DirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl2"), +(1 to 2).map(i => Row(i, i, s"parq$i"))) +} else { + val msg = intercept[IOException] { +sql("SELECT * FROM tbl2").show() + }.getMessage + assert(msg.contains("Not a file:")) +} + +val l2DirStatement = + s""" + |CREATE EXTERNAL TABLE tbl3( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}/l1/l2/"}'""".stripMargin +sql(l2DirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl3"), +(3 to 4).map(i => Row(i, i, s"parq$i"))) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl3").show() Review comment: `sql("SELECT * FROM tbl3").show()`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365107848 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala ## @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq("true", "false").foreach { parquetConversion => + withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> parquetConversion) { +withTempPath { path => + withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") { +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")). + toDF("c1", "c2", "c3").repartition(1) +val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")). + toDF("c1", "c2", "c3").repartition(1) +val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")). + toDF("c1", "c2", "c3").repartition(1) +someDF1.write.parquet(s"${path.getCanonicalPath}/l1/") +someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/") +someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/") + +val topDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin +sql(topDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl1"), Nil) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() + }.getMessage +assert(msg.contains("Not a file:")) +} + +val l1DirStatement = + s""" + |CREATE EXTERNAL TABLE tbl2( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin +sql(l1DirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl2"), +(1 to 2).map(i => Row(i, i, s"parq$i"))) +} else { + val msg = intercept[IOException] { +sql("SELECT * FROM tbl2").show() + }.getMessage + assert(msg.contains("Not a file:")) +} + +val l2DirStatement = + s""" + |CREATE EXTERNAL TABLE tbl3( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}/l1/l2/"}'""".stripMargin +sql(l2DirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl3"), +(3 to 4).map(i => Row(i, i, s"parq$i"))) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl3").show() + }.getMessage + assert(msg.contains("Not a file:")) +} + +val wildcardTopDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl4( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${new File(s"${path}/*").toURI}'""".stripMargin +sql(wildcardTopDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl4"), +(1 to 2).map(i => Row(i, i, s"parq$i"))) +} else { + val msg = intercept[IOException] { +sql("SELECT * FROM tbl4").show() + }.getMessage + assert(msg.contains("Not a file:")) +} + +val wildcardL1DirStatement = + s""" + |CREATE EXTERNAL TABLE tbl5( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${new File(s"${path}/l1/*").toURI}'""".stripMargin +sql(wildcardL1DirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl5"), +(1 to 4).map(i => Row(i, i, s"parq$i"))) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl5").show() Review comment: ditto. This is an automated message from the Apache Git Service. To respond to the message, please
[GitHub] [spark] beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of t
beliefer commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-572915374 > btw, we need a different approach for supporting multiple distinct groups (SPARK-30396)? Why did you split the distinct support into two? This PR will support `select a, sum(distinct b) filter (where ...) from t group by a;` We only have one DISTINCT aggregate expr, so the columns where the aggregate function acting on is same. `select a, sum(distinct b) filter (where ...), count(distinct b) filter (where ...) from t group by a;` We have two DISTINCT aggregate exprs, but the columns where each aggregate function acting on is same. SPARK-30396 will support `select a, sum(distinct b) filter (where ...), count(distinct c) filter (where ...) from t group by a;` We have two DISTINCT aggregate exprs, the columns where each aggregate function acting on is different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project
viirya commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project URL: https://github.com/apache/spark/pull/26978#issuecomment-572915404 @dongjoon-hyun Yes, I do think so too. Let's see if we can have more details from @cloud-fan. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365107663 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala ## @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq("true", "false").foreach { parquetConversion => + withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> parquetConversion) { +withTempPath { path => + withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") { +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")). + toDF("c1", "c2", "c3").repartition(1) +val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")). + toDF("c1", "c2", "c3").repartition(1) +val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")). + toDF("c1", "c2", "c3").repartition(1) +someDF1.write.parquet(s"${path.getCanonicalPath}/l1/") +someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/") +someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/") + +val topDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin +sql(topDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl1"), Nil) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() + }.getMessage +assert(msg.contains("Not a file:")) +} + +val l1DirStatement = + s""" + |CREATE EXTERNAL TABLE tbl2( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin +sql(l1DirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl2"), +(1 to 2).map(i => Row(i, i, s"parq$i"))) Review comment: Can we merge 269 and 270 into one line here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365107292 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala ## @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq("true", "false").foreach { parquetConversion => + withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> parquetConversion) { +withTempPath { path => + withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") { +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")). + toDF("c1", "c2", "c3").repartition(1) +val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")). + toDF("c1", "c2", "c3").repartition(1) +val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")). + toDF("c1", "c2", "c3").repartition(1) +someDF1.write.parquet(s"${path.getCanonicalPath}/l1/") +someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/") +someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/") + +val topDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin +sql(topDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl1"), Nil) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() Review comment: `sql("SELECT * FROM tbl1").show()` seems to need to be in the next line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365107292 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala ## @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq("true", "false").foreach { parquetConversion => + withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> parquetConversion) { +withTempPath { path => + withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") { +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")). + toDF("c1", "c2", "c3").repartition(1) +val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")). + toDF("c1", "c2", "c3").repartition(1) +val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")). + toDF("c1", "c2", "c3").repartition(1) +someDF1.write.parquet(s"${path.getCanonicalPath}/l1/") +someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/") +someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/") + +val topDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin +sql(topDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl1"), Nil) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() Review comment: `sql("SELECT * FROM tbl1").show()` seems to be in the next line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories
dongjoon-hyun commented on a change in pull request #27130: [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/27130#discussion_r365107448 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala ## @@ -222,4 +223,127 @@ class HiveParquetSourceSuite extends ParquetPartitioningTest { assert(df4.columns === Array("str", "max_int")) } } + + test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") { +Seq("true", "false").foreach { parquetConversion => + withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> parquetConversion) { +withTempPath { path => + withTable("tbl1", "tbl2", "tbl3", "tbl4", "tbl5", "tbl6") { +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")). + toDF("c1", "c2", "c3").repartition(1) +val someDF2 = Seq((3, 3, "parq3"), (4, 4, "parq4")). + toDF("c1", "c2", "c3").repartition(1) +val someDF3 = Seq((5, 5, "parq5"), (6, 6, "parq6")). + toDF("c1", "c2", "c3").repartition(1) +someDF1.write.parquet(s"${path.getCanonicalPath}/l1/") +someDF2.write.parquet(s"${path.getCanonicalPath}/l1/l2/") +someDF3.write.parquet(s"${path.getCanonicalPath}/l1/l2/l3/") + +val topDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin +sql(topDirStatement) +if (parquetConversion == "true") { + checkAnswer(sql("SELECT * FROM tbl1"), Nil) +} else { + val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() + }.getMessage +assert(msg.contains("Not a file:")) Review comment: indentation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode
cloud-fan commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode URL: https://github.com/apache/spark/pull/26881#issuecomment-572914440 also cc @viirya @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #27119: [SPARK-30447][SQL] Constant propagation nullability issue
peter-toth commented on a change in pull request #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#discussion_r365106443 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -132,11 +142,14 @@ object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { (newSelf, Seq.empty) case n: Not => // Ignore the EqualityPredicates from children since they are only propagated through And. -val (newChild, _) = traverse(n.child, replaceChildren = true) +val (newChild, _) = traverse(n.child, replaceChildren = true, nullIsFalse = false) (newChild.map(Not), Seq.empty) case _ => (None, Seq.empty) } + private def safeToReplace(ar: AttributeReference, nullIsFalse: Boolean) = Review comment: Ok, added. Let me know if it needs rewording. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-572913804 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue
AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#issuecomment-572913885 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572913895 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue
AmplabJenkins commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#issuecomment-572913893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21249/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-572913804 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572913902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21250/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins removed a comment on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-572913812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21248/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-572913812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21248/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue
AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#issuecomment-572913893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21249/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue
AmplabJenkins removed a comment on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#issuecomment-572913885 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project
dongjoon-hyun commented on issue #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project URL: https://github.com/apache/spark/pull/26978#issuecomment-572913802 Shall we hold on this PR a little bit until the bug of https://github.com/apache/spark/pull/24637 is identified and resolved? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572913895 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572913902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21250/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
SparkQA commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-572913400 **[Test build #116461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116461/testReport)** for PR 27164 at commit [`c72fd1f`](https://github.com/apache/spark/commit/c72fd1facd509a49401f1dc492227a5c2fac41ae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572913427 **[Test build #116463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116463/testReport)** for PR 26918 at commit [`10e6bcd`](https://github.com/apache/spark/commit/10e6bcd2a685d86c3f3a1316f43ae08de934f54c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue
SparkQA commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#issuecomment-572913429 **[Test build #116462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116462/testReport)** for PR 27119 at commit [`c8a21cc`](https://github.com/apache/spark/commit/c8a21cc29682800e79213ef371f51871e9c44551). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue
maropu commented on issue #27119: [SPARK-30447][SQL] Constant propagation nullability issue URL: https://github.com/apache/spark/pull/27119#issuecomment-572912992 LGTM. Pending, Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ayudovin commented on a change in pull request #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
ayudovin commented on a change in pull request #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#discussion_r365105197 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/events.scala ## @@ -204,61 +204,64 @@ case class RenameFunctionEvent( newName: String) extends FunctionEvent -trait PartitionEvent extends DatabaseEvent { - /** - * Name of the table that was touched. - */ - val name: String -} - +/** + * Event fired when a partition is created, dropped, altered or renamed. + */ +trait PartitionEvent extends TableEvent /** * Event fired before a partition is created. */ -case class CreatePartitionPreEvent(database: String, name: String, - parts: Seq[CatalogTablePartition]) extends PartitionEvent +case class CreatePartitionPreEvent( +database: String, +name: String) extends PartitionEvent Review comment: > shall we put `extends PartitionEvent` into a new line yes, I'll put it into a new line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn edited a comment on issue #27163: Revert "[SPARK-29393][SQL] Add `make_interval` function"
yaooqinn edited a comment on issue #27163: Revert "[SPARK-29393][SQL] Add `make_interval` function" URL: https://github.com/apache/spark/pull/27163#issuecomment-572912259 Sorry for my rudeness and thoughtlessness. I didn't mean to hurt anybody. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression al
maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r365104343 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { }.asInstanceOf[NamedExpression] } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) +} else if (distinctAggGroups.size == 1) { + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +// Why do we need to construct the phantom id ? +// First, In order to reduce costs, it is better to handle the filter clause locally. +// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate expression +// If(id > 1) 'a else null first, and use the result as output. +// Second, If more than one DISTINCT aggregate expression uses the same column, +// We need to construct the phantom attributes so as the output not lost. +// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) will output +// attribute 'phantom1-a and attribute 'phantom2-a instead of two 'a. +// Note: We just need to illusion the expression with filter clause. +// The illusionary mechanism may result in multiple distinct aggregations uses +// different column, so we still need to call `rewrite`. +val phantomId = i + 1 +val unfoldableChildren = af.children.filter(!_.foldable) +val exprAttrs = unfoldableChildren.map { e => + (e, AttributeReference(s"phantom$phantomId-${e.sql}", e.dataType, nullable = true)()) Review comment: Can we use expr Ids (e.g., `_gen_distinct_group_${exprId.id}`) instead? Just like; https://github.com/apache/spark/blob/afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #27163: Revert "[SPARK-29393][SQL] Add `make_interval` function"
yaooqinn commented on issue #27163: Revert "[SPARK-29393][SQL] Add `make_interval` function" URL: https://github.com/apache/spark/pull/27163#issuecomment-572912259 Sorry for my rudeness and thoughtlessness. I didn't mean to heart somebody. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911727 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116460/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911706 **[Test build #116460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116460/testReport)** for PR 26918 at commit [`7a33558`](https://github.com/apache/spark/commit/7a335588ef2d5a526f21635e13fb59fd5ffd6824). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911720 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911095 **[Test build #116460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116460/testReport)** for PR 26918 at commit [`7a33558`](https://github.com/apache/spark/commit/7a335588ef2d5a526f21635e13fb59fd5ffd6824). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911720 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911727 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116460/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #27086: [WIP][SPARK-29779][SQL] Compact old event log files and cleanup - part 2
HeartSaVioR closed pull request #27086: [WIP][SPARK-29779][SQL] Compact old event log files and cleanup - part 2 URL: https://github.com/apache/spark/pull/27086 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27086: [WIP][SPARK-29779][SQL] Compact old event log files and cleanup - part 2
HeartSaVioR commented on issue #27086: [WIP][SPARK-29779][SQL] Compact old event log files and cleanup - part 2 URL: https://github.com/apache/spark/pull/27086#issuecomment-572911816 replaced by #27164 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911502 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression al
maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r365104343 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { }.asInstanceOf[NamedExpression] } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) +} else if (distinctAggGroups.size == 1) { + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +// Why do we need to construct the phantom id ? +// First, In order to reduce costs, it is better to handle the filter clause locally. +// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate expression +// If(id > 1) 'a else null first, and use the result as output. +// Second, If more than one DISTINCT aggregate expression uses the same column, +// We need to construct the phantom attributes so as the output not lost. +// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) will output +// attribute 'phantom1-a and attribute 'phantom2-a instead of two 'a. +// Note: We just need to illusion the expression with filter clause. +// The illusionary mechanism may result in multiple distinct aggregations uses +// different column, so we still need to call `rewrite`. +val phantomId = i + 1 +val unfoldableChildren = af.children.filter(!_.foldable) +val exprAttrs = unfoldableChildren.map { e => + (e, AttributeReference(s"phantom$phantomId-${e.sql}", e.dataType, nullable = true)()) Review comment: Can we use expr Ids (e.g., `_distinct_group_${exprId.id}`) instead? Just like; https://github.com/apache/spark/blob/afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#issuecomment-572911518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21246/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572911413 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911508 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21247/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#issuecomment-572911518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21246/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572911425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21245/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
AmplabJenkins removed a comment on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#issuecomment-572911505 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911502 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
HeartSaVioR commented on issue #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-572911604 Will rebase once SPARK-29779 is merged. For now, the only effective commit is the last, c72fd1f. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572911413 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572911425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21245/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
AmplabJenkins commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#issuecomment-572911505 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911508 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21247/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR opened a new pull request #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events
HeartSaVioR opened a new pull request #27164: [WIP][SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164 ### What changes were proposed in this pull request? This patch addresses adding event filter to handle SQL related events. This patch is next task of SPARK-29779 (#27085), please refer the description of PR #27085 to see overall rationalization of this patch. Below functionalities will be addressed in later parts: integrate compaction into FsHistoryProvider documentation about new configuration ### Why are the changes needed? One of major goal of SPARK-28594 is to prevent the event logs to become too huge, and SPARK-29779 achieves the goal. We've got another approach in prior, but the old approach required models in both KVStore and live entities to guarantee compatibility, while they're not designed to do so. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UTs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
SparkQA commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#issuecomment-572911070 **[Test build #116458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116458/testReport)** for PR 27019 at commit [`e449d3c`](https://github.com/apache/spark/commit/e449d3c8ca24d4ec1da2080ebb37ca2307e2ca76). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-572911095 **[Test build #116460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116460/testReport)** for PR 26918 at commit [`7a33558`](https://github.com/apache/spark/commit/7a335588ef2d5a526f21635e13fb59fd5ffd6824). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572911071 **[Test build #116459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116459/testReport)** for PR 27129 at commit [`2ff9960`](https://github.com/apache/spark/commit/2ff9960aeda3784c951b0d100d09d7081f012645). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
maropu commented on issue #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#issuecomment-572910240 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572909274 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116451/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572909353 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21243/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572909353 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21243/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572909328 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572909328 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572909346 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
AmplabJenkins removed a comment on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572909333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21244/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
AmplabJenkins commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572909333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21244/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572909264 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
AmplabJenkins commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572909346 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572909264 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
AmplabJenkins commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572909274 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116451/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
SparkQA commented on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572908996 **[Test build #116451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116451/testReport)** for PR 27159 at commit [`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit
SparkQA removed a comment on issue #27159: [DO-NOT-MERGE] Testing fix for pyspark test test_memory_limit URL: https://github.com/apache/spark/pull/27159#issuecomment-572879942 **[Test build #116451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116451/testReport)** for PR 27159 at commit [`b77f9d8`](https://github.com/apache/spark/commit/b77f9d8759eddfc5c56c8de14037cdcf5dc0c989). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
SparkQA commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572908922 **[Test build #116457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116457/testReport)** for PR 26997 at commit [`f4eccd0`](https://github.com/apache/spark/commit/f4eccd04d9833448c71f6820c3852ce0c635e3b2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
SparkQA commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572908937 **[Test build #116456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116456/testReport)** for PR 27155 at commit [`3e78632`](https://github.com/apache/spark/commit/3e78632789afa643bc92ad425ac31de9d5afb29e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
dongjoon-hyun commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572907049 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields
AmplabJenkins removed a comment on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-572713017 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572907233 cc: @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
maropu commented on issue #26997: [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/26997#issuecomment-572907195 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org