[jira] [Assigned] (SPARK-35244) invoke should throw the original exception
[ https://issues.apache.org/jira/browse/SPARK-35244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-35244: --- Assignee: Wenchen Fan (was: Apache Spark) > invoke should throw the original exception > -- > > Key: SPARK-35244 > URL: https://issues.apache.org/jira/browse/SPARK-35244 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.3, 3.1.2, 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35378) Eagerly execute non-root Command
[ https://issues.apache.org/jira/browse/SPARK-35378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-35378: --- Summary: Eagerly execute non-root Command (was: Eagerly execute non-root Command so that query command with CTE) > Eagerly execute non-root Command > > > Key: SPARK-35378 > URL: https://issues.apache.org/jira/browse/SPARK-35378 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark doesn't support LeafRunnableCommand as sub query. > Because the LeafRunnableCommand always output GenericInternalRow and some > node(e.g. SortExec, AdaptiveExecutionExec, WholeCodegenExec) will convert > GenericInternalRow to UnsafeRow. So will causes error as follows: > {code:java} > java.lang.ClassCastException > org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast > to org.apache.spark.sql.catalyst.expressions.UnsafeRow > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35427) Check the EXCEPTION rebase mode for Avro/Parquet
[ https://issues.apache.org/jira/browse/SPARK-35427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35427. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32574 [https://github.com/apache/spark/pull/32574] > Check the EXCEPTION rebase mode for Avro/Parquet > > > Key: SPARK-35427 > URL: https://issues.apache.org/jira/browse/SPARK-35427 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Add tests to check the SparkUpgradeException exception in the EXCEPTION > rebase node for Avro and Parquet datasource. Currently, the mode is checked > implicitly, and not for all data types columns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35063) Group exception messages in sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-35063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35063: --- Assignee: jiaan.geng > Group exception messages in sql/catalyst > > > Key: SPARK-35063 > URL: https://issues.apache.org/jira/browse/SPARK-35063 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst > module. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35063) Group exception messages in sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-35063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35063. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32478 [https://github.com/apache/spark/pull/32478] > Group exception messages in sql/catalyst > > > Key: SPARK-35063 > URL: https://issues.apache.org/jira/browse/SPARK-35063 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > Fix For: 3.2.0 > > > Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst > module. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35479) Format PartitionFilters IN strings in scan nodes
[ https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35479. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32615 [https://github.com/apache/spark/pull/32615] > Format PartitionFilters IN strings in scan nodes > > > Key: SPARK-35479 > URL: https://issues.apache.org/jira/browse/SPARK-35479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.2.0 > > > This ticket proposes to format strings correctly for `PushedFilters`. For > example, `explain()` for a query below prints `v in (array('a'))` as > `PushedFilters: [In(v, [WrappedArray(a)])]`; > {code} > scala> sql("create table t (v array) using parquet") > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], > ReadSchema: struct> > {code} > This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; > {code} > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: > struct> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35479) Format PartitionFilters IN strings in scan nodes
[ https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35479: --- Assignee: Takeshi Yamamuro > Format PartitionFilters IN strings in scan nodes > > > Key: SPARK-35479 > URL: https://issues.apache.org/jira/browse/SPARK-35479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > > This ticket proposes to format strings correctly for `PushedFilters`. For > example, `explain()` for a query below prints `v in (array('a'))` as > `PushedFilters: [In(v, [WrappedArray(a)])]`; > {code} > scala> sql("create table t (v array) using parquet") > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], > ReadSchema: struct> > {code} > This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; > {code} > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: > struct> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35445) Reduce the execution time of DeduplicateRelations
[ https://issues.apache.org/jira/browse/SPARK-35445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35445: -- Assignee: wuyi > Reduce the execution time of DeduplicateRelations > - > > Key: SPARK-35445 > URL: https://issues.apache.org/jira/browse/SPARK-35445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35445) Reduce the execution time of DeduplicateRelations
[ https://issues.apache.org/jira/browse/SPARK-35445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35445. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32590 [https://github.com/apache/spark/pull/32590] > Reduce the execution time of DeduplicateRelations > - > > Key: SPARK-35445 > URL: https://issues.apache.org/jira/browse/SPARK-35445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35456) Show invalid value in config entry check error message
[ https://issues.apache.org/jira/browse/SPARK-35456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35456. -- Fix Version/s: 3.2.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/32600 > Show invalid value in config entry check error message > -- > > Key: SPARK-35456 > URL: https://issues.apache.org/jira/browse/SPARK-35456 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35456) Show invalid value in config entry check error message
[ https://issues.apache.org/jira/browse/SPARK-35456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35456: Assignee: Kent Yao > Show invalid value in config entry check error message > -- > > Key: SPARK-35456 > URL: https://issues.apache.org/jira/browse/SPARK-35456 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35481) Create more robust link for Data Source Options
[ https://issues.apache.org/jira/browse/SPARK-35481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35481: Description: Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 For example, [Data Source Option for Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#data-source-option] It should point 3.2 document only in branch-3.2, so it's better to use a relative link instead of /latest/. was: Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 It should point 3.2 document only in branch-3.2, so it's better to use a relative link instead of /latest/. > Create more robust link for Data Source Options > --- > > Key: SPARK-35481 > URL: https://issues.apache.org/jira/browse/SPARK-35481 > Project: Spark > Issue Type: Documentation > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Now the link for the Data Source Options using /latest/, but it possibly be > broken when we cut branch-3.2 > For example, [Data Source Option for > Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#data-source-option] > It should point 3.2 document only in branch-3.2, so it's better to use a > relative link instead of /latest/. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35481) Create more robust link for Data Source Options
[ https://issues.apache.org/jira/browse/SPARK-35481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35481: - Description: Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 It should point 3.2 document only in branch-3.2, so it's better to use a relative link instead of /latest/. was: Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 on July 1st. It should point 3.2 document only in branch-3.2, so it's better to use a relative link instead of /latest/. > Create more robust link for Data Source Options > --- > > Key: SPARK-35481 > URL: https://issues.apache.org/jira/browse/SPARK-35481 > Project: Spark > Issue Type: Documentation > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Now the link for the Data Source Options using /latest/, but it possibly be > broken when we cut branch-3.2 > It should point 3.2 document only in branch-3.2, so it's better to use a > relative link instead of /latest/. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35481) Create more robust link for Data Source Options
[ https://issues.apache.org/jira/browse/SPARK-35481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35481: Description: Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 on July 1st. It should point 3.2 document only in branch-3.2, so it's better to use a relative link instead of /latest/. was: Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 on July 1st. It should point 3.2 document only in branch-3.2. So, we should use a relative link instead of /latest/. > Create more robust link for Data Source Options > --- > > Key: SPARK-35481 > URL: https://issues.apache.org/jira/browse/SPARK-35481 > Project: Spark > Issue Type: Documentation > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Now the link for the Data Source Options using /latest/, but it possibly be > broken when we cut branch-3.2 on July 1st. > It should point 3.2 document only in branch-3.2, so it's better to use a > relative link instead of /latest/. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35481) Create more robust link for Data Source Options
Haejoon Lee created SPARK-35481: --- Summary: Create more robust link for Data Source Options Key: SPARK-35481 URL: https://issues.apache.org/jira/browse/SPARK-35481 Project: Spark Issue Type: Documentation Components: docs Affects Versions: 3.2.0 Reporter: Haejoon Lee Now the link for the Data Source Options using /latest/, but it possibly be broken when we cut branch-3.2 on July 1st. It should point 3.2 document only in branch-3.2. So, we should use a relative link instead of /latest/. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35480) percentile_approx function doesn't work with pivot
[ https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Bryant updated SPARK-35480: --- Description: The percentile_approx PySpark function does not appear to treat the "accuracy" parameter correctly when pivoting on a column, causing the query below to fail (this also fails if the accuracy parameter is left unspecified): {{import pyspark.sql.functions as F}} {{df = sc.parallelize([}} {{ ["a", -1.0],}} {{ ["a", 5.5],}} {{ ["a", 2.5],}} {{ ["b", 3.0],}} {{ ["b", 5.2]}} {{]).toDF(["type", "value"])}} {{ .groupBy()}} {{ .pivot("type", ["a", "b"])}} {{ .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}} Error message: {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage provided must be a constant literal; 'Aggregate [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243|#242, value#243], false}} was: The percentile_approx PySpark function does not appear to treat the "accuracy" parameter correctly when pivoting on a column, causing the query below to fail (this also fails if the accuracy parameter is left unspecified): {{import pyspark.sql.functions as F}} {{df = sc.parallelize([}} {{ ["a", -1.0],}} {{ ["a", 5.5],}} {{ ["a", 2.5],}} {{ ["b", 3.0],}} {{ ["b", 5]}} {{]).toDF(["type", "value"])}} {{ .groupBy()}} {{ .pivot("type", ["a", "b"])}} {{ .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}} Error message: {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage provided must be a constant literal; 'Aggregate [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243|#242, value#243], false}} > percentile_approx function doesn't work with pivot > -- > > Key: SPARK-35480 > URL: https://issues.apache.org/jira/browse/SPARK-35480 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.1.1 >Reporter: Christopher Bryant >Priority: Major > > The percentile_approx PySpark function does not appear to treat the > "accuracy" parameter correctly when pivoting on a column, causing the query > below to fail (this also fails if the accuracy parameter is left unspecified): > > {{import pyspark.sql.functions as F}} > {{df = sc.parallelize([}} > {{ ["a", -1.0],}} > {{ ["a",
[jira] [Updated] (SPARK-35480) percentile_approx function doesn't work with pivot
[ https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Bryant updated SPARK-35480: --- Description: The percentile_approx PySpark function does not appear to treat the "accuracy" parameter correctly when pivoting on a column, causing the query below to fail (this also fails if the accuracy parameter is left unspecified): {{import pyspark.sql.functions as F}} {{df = sc.parallelize([}} {{ ["a", -1.0],}} {{ ["a", 5.5],}} {{ ["a", 2.5],}} {{ ["b", 3.0],}} {{ ["b", 5]}} {{]).toDF(["type", "value"])}} {{ .groupBy()}} {{ .pivot("type", ["a", "b"])}} {{ .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}} Error message: {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage provided must be a constant literal; 'Aggregate [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243|#242, value#243], false}} was: The percentile_approx PySpark function does not appear to treat the "accuracy" parameter correctly when pivoting on a column, causing the query below to fail (this also fails if the accuracy parameter is left unspecified): import pyspark.sql.functions as F {{df = sc.parallelize([}} {{ ["a", -1.0],}} {{ ["a", 5.5],}} {{ ["a", 2.5],}} {{ ["b", 3.0],}} {{ ["b", 5]}} {{]).toDF(["type", "value"]) \}} {{ .groupBy() \}} {{ .pivot("type", ["a", "b"]) \}} {{ .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}} Error message: {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage provided must be a constant literal; 'Aggregate [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243], false}} > percentile_approx function doesn't work with pivot > -- > > Key: SPARK-35480 > URL: https://issues.apache.org/jira/browse/SPARK-35480 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.1.1 >Reporter: Christopher Bryant >Priority: Major > > The percentile_approx PySpark function does not appear to treat the > "accuracy" parameter correctly when pivoting on a column, causing the query > below to fail (this also fails if the accuracy parameter is left unspecified): > > {{import pyspark.sql.functions as F}} > {{df = sc.parallelize([}} > {{ ["a", -1.0],}} > {{ ["a", 5.5],}} > {{ ["a", 2.5],}} > {{ ["b", 3.0],}} > {{ ["b", 5]}} > {{]).toDF(["type", "value"])}} > {{ .groupBy()}} > {{ .pivot("type", ["a", "b"])}} > {{ .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}} > > Error message: > {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> > CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> > CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS > STRI
[jira] [Created] (SPARK-35480) percentile_approx function doesn't work with pivot
Christopher Bryant created SPARK-35480: -- Summary: percentile_approx function doesn't work with pivot Key: SPARK-35480 URL: https://issues.apache.org/jira/browse/SPARK-35480 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 3.1.1 Reporter: Christopher Bryant The percentile_approx PySpark function does not appear to treat the "accuracy" parameter correctly when pivoting on a column, causing the query below to fail (this also fails if the accuracy parameter is left unspecified): import pyspark.sql.functions as F {{df = sc.parallelize([}} {{ ["a", -1.0],}} {{ ["a", 5.5],}} {{ ["a", 2.5],}} {{ ["b", 3.0],}} {{ ["b", 5]}} {{]).toDF(["type", "value"]) \}} {{ .groupBy() \}} {{ .pivot("type", ["a", "b"]) \}} {{ .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}} Error message: {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage provided must be a constant literal; 'Aggregate [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243], false}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35454) Ambiguous self-join doesn't fail after transfroming the dataset to dataframe
[ https://issues.apache.org/jira/browse/SPARK-35454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35454: Assignee: Apache Spark > Ambiguous self-join doesn't fail after transfroming the dataset to dataframe > > > Key: SPARK-35454 > URL: https://issues.apache.org/jira/browse/SPARK-35454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > {code:java} > test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column > ref") { > val df1 = spark.range(3) > val df2 = df1.filter($"id" > 0) > withSQLConf( > SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true", > SQLConf.CROSS_JOINS_ENABLED.key -> "true") { > assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > > df2.colRegex("id"))) > } > } > {code} > For this unit test, if we append `.toDF()` to both df1 and df2, the query > won't fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35454) Ambiguous self-join doesn't fail after transfroming the dataset to dataframe
[ https://issues.apache.org/jira/browse/SPARK-35454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35454: Assignee: (was: Apache Spark) > Ambiguous self-join doesn't fail after transfroming the dataset to dataframe > > > Key: SPARK-35454 > URL: https://issues.apache.org/jira/browse/SPARK-35454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1 >Reporter: wuyi >Priority: Major > > {code:java} > test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column > ref") { > val df1 = spark.range(3) > val df2 = df1.filter($"id" > 0) > withSQLConf( > SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true", > SQLConf.CROSS_JOINS_ENABLED.key -> "true") { > assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > > df2.colRegex("id"))) > } > } > {code} > For this unit test, if we append `.toDF()` to both df1 and df2, the query > won't fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35454) Ambiguous self-join doesn't fail after transfroming the dataset to dataframe
[ https://issues.apache.org/jira/browse/SPARK-35454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348935#comment-17348935 ] Apache Spark commented on SPARK-35454: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/32616 > Ambiguous self-join doesn't fail after transfroming the dataset to dataframe > > > Key: SPARK-35454 > URL: https://issues.apache.org/jira/browse/SPARK-35454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1 >Reporter: wuyi >Priority: Major > > {code:java} > test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column > ref") { > val df1 = spark.range(3) > val df2 = df1.filter($"id" > 0) > withSQLConf( > SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true", > SQLConf.CROSS_JOINS_ENABLED.key -> "true") { > assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > > df2.colRegex("id"))) > } > } > {code} > For this unit test, if we append `.toDF()` to both df1 and df2, the query > won't fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35479) Format PartitionFilters IN strings in scan nodes
[ https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348923#comment-17348923 ] Apache Spark commented on SPARK-35479: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/32615 > Format PartitionFilters IN strings in scan nodes > > > Key: SPARK-35479 > URL: https://issues.apache.org/jira/browse/SPARK-35479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket proposes to format strings correctly for `PushedFilters`. For > example, `explain()` for a query below prints `v in (array('a'))` as > `PushedFilters: [In(v, [WrappedArray(a)])]`; > {code} > scala> sql("create table t (v array) using parquet") > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], > ReadSchema: struct> > {code} > This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; > {code} > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: > struct> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35479) Format PartitionFilters IN strings in scan nodes
[ https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35479: Assignee: Apache Spark > Format PartitionFilters IN strings in scan nodes > > > Key: SPARK-35479 > URL: https://issues.apache.org/jira/browse/SPARK-35479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Apache Spark >Priority: Major > > This ticket proposes to format strings correctly for `PushedFilters`. For > example, `explain()` for a query below prints `v in (array('a'))` as > `PushedFilters: [In(v, [WrappedArray(a)])]`; > {code} > scala> sql("create table t (v array) using parquet") > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], > ReadSchema: struct> > {code} > This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; > {code} > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: > struct> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35479) Format PartitionFilters IN strings in scan nodes
[ https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35479: Assignee: (was: Apache Spark) > Format PartitionFilters IN strings in scan nodes > > > Key: SPARK-35479 > URL: https://issues.apache.org/jira/browse/SPARK-35479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket proposes to format strings correctly for `PushedFilters`. For > example, `explain()` for a query below prints `v in (array('a'))` as > `PushedFilters: [In(v, [WrappedArray(a)])]`; > {code} > scala> sql("create table t (v array) using parquet") > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], > ReadSchema: struct> > {code} > This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; > {code} > scala> sql("select * from t where v in (array('a'), null)").explain() > == Physical Plan == > *(1) Filter v#4 IN ([a],null) > +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN > ([a],null)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], > PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: > struct> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35479) Format PartitionFilters IN strings in scan nodes
Takeshi Yamamuro created SPARK-35479: Summary: Format PartitionFilters IN strings in scan nodes Key: SPARK-35479 URL: https://issues.apache.org/jira/browse/SPARK-35479 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Takeshi Yamamuro This ticket proposes to format strings correctly for `PushedFilters`. For example, `explain()` for a query below prints `v in (array('a'))` as `PushedFilters: [In(v, [WrappedArray(a)])]`; {code} scala> sql("create table t (v array) using parquet") scala> sql("select * from t where v in (array('a'), null)").explain() == Physical Plan == *(1) Filter v#4 IN ([a],null) +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN ([a],null)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], ReadSchema: struct> {code} This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; {code} scala> sql("select * from t where v in (array('a'), null)").explain() == Physical Plan == *(1) Filter v#4 IN ([a],null) +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN ([a],null)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: struct> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.
[ https://issues.apache.org/jira/browse/SPARK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35465: Assignee: Apache Spark > Enable disallow_untyped_defs mypy check except for major modules. > - > > Key: SPARK-35465 > URL: https://issues.apache.org/jira/browse/SPARK-35465 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Set up the mypy configuration and add type annotations except for major > modules. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.
[ https://issues.apache.org/jira/browse/SPARK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348832#comment-17348832 ] Apache Spark commented on SPARK-35465: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/32614 > Enable disallow_untyped_defs mypy check except for major modules. > - > > Key: SPARK-35465 > URL: https://issues.apache.org/jira/browse/SPARK-35465 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Set up the mypy configuration and add type annotations except for major > modules. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.
[ https://issues.apache.org/jira/browse/SPARK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35465: Assignee: (was: Apache Spark) > Enable disallow_untyped_defs mypy check except for major modules. > - > > Key: SPARK-35465 > URL: https://issues.apache.org/jira/browse/SPARK-35465 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Set up the mypy configuration and add type annotations except for major > modules. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35477) Enable disallow_untyped_defs mypy check for pyspark.pandas.utils.
Takuya Ueshin created SPARK-35477: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.utils. Key: SPARK-35477 URL: https://issues.apache.org/jira/browse/SPARK-35477 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35478) Enable disallow_untyped_defs mypy check for pyspark.pandas.window.
Takuya Ueshin created SPARK-35478: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.window. Key: SPARK-35478 URL: https://issues.apache.org/jira/browse/SPARK-35478 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35476) Enable disallow_untyped_defs mypy check for pyspark.pandas.series.
Takuya Ueshin created SPARK-35476: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.series. Key: SPARK-35476 URL: https://issues.apache.org/jira/browse/SPARK-35476 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35475) Enable disallow_untyped_defs mypy check for pyspark.pandas.namespace.
Takuya Ueshin created SPARK-35475: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.namespace. Key: SPARK-35475 URL: https://issues.apache.org/jira/browse/SPARK-35475 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35474) Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.
Takuya Ueshin created SPARK-35474: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing. Key: SPARK-35474 URL: https://issues.apache.org/jira/browse/SPARK-35474 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
Takuya Ueshin created SPARK-35472: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.generic. Key: SPARK-35472 URL: https://issues.apache.org/jira/browse/SPARK-35472 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35473) Enable disallow_untyped_defs mypy check for pyspark.pandas.groupby.
Takuya Ueshin created SPARK-35473: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.groupby. Key: SPARK-35473 URL: https://issues.apache.org/jira/browse/SPARK-35473 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35470) Enable disallow_untyped_defs mypy check for pyspark.pandas.base.
Takuya Ueshin created SPARK-35470: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.base. Key: SPARK-35470 URL: https://issues.apache.org/jira/browse/SPARK-35470 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35471) Enable disallow_untyped_defs mypy check for pyspark.pandas.frame.
Takuya Ueshin created SPARK-35471: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.frame. Key: SPARK-35471 URL: https://issues.apache.org/jira/browse/SPARK-35471 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35468) Enable disallow_untyped_defs mypy check for pyspark.pandas.typedef.typehints.
Takuya Ueshin created SPARK-35468: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.typedef.typehints. Key: SPARK-35468 URL: https://issues.apache.org/jira/browse/SPARK-35468 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
Takuya Ueshin created SPARK-35469: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors. Key: SPARK-35469 URL: https://issues.apache.org/jira/browse/SPARK-35469 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
[ https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-35466: -- Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops. (was: Enable disallow_untyped_defs for pyspark.pandas.data_type_ops.) > Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops. > - > > Key: SPARK-35466 > URL: https://issues.apache.org/jira/browse/SPARK-35466 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35467) Enable disallow_untyped_defs mypy check for pyspark.pandas.spark.accessors.
Takuya Ueshin created SPARK-35467: - Summary: Enable disallow_untyped_defs mypy check for pyspark.pandas.spark.accessors. Key: SPARK-35467 URL: https://issues.apache.org/jira/browse/SPARK-35467 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35466) Enable disallow_untyped_defs for pyspark.pandas.data_type_ops.
Takuya Ueshin created SPARK-35466: - Summary: Enable disallow_untyped_defs for pyspark.pandas.data_type_ops. Key: SPARK-35466 URL: https://issues.apache.org/jira/browse/SPARK-35466 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.
Takuya Ueshin created SPARK-35465: - Summary: Enable disallow_untyped_defs mypy check except for major modules. Key: SPARK-35465 URL: https://issues.apache.org/jira/browse/SPARK-35465 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin Set up the mypy configuration and add type annotations except for major modules. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35464) pandas APIs on Spark: Enable mypy check "disallow_untyped_defs" for main codes.
Takuya Ueshin created SPARK-35464: - Summary: pandas APIs on Spark: Enable mypy check "disallow_untyped_defs" for main codes. Key: SPARK-35464 URL: https://issues.apache.org/jira/browse/SPARK-35464 Project: Spark Issue Type: Umbrella Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin Currently many functions in the main codes are still missing type annotations and disabled {{mypy}} check "disallow_untyped_defs". We should add more type annotations and enable the {{mypy}} check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35364) Renaming the existing Koalas related codes.
[ https://issues.apache.org/jira/browse/SPARK-35364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-35364. --- Fix Version/s: 3.2.0 Assignee: Haejoon Lee Resolution: Fixed Issue resolved by pull request 32516 https://github.com/apache/spark/pull/32516 > Renaming the existing Koalas related codes. > --- > > Key: SPARK-35364 > URL: https://issues.apache.org/jira/browse/SPARK-35364 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > We should renaming the several Koalas-related codes in the pandas APIs on > Spark. > * kdf -> psdf > * kser -> psser > * kidx -> psidx > * kmidx -> psmidx > * sdf.to_koalas() -> sdf.to_pandas_on_spark() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35463. --- Fix Version/s: 3.0.3 3.1.2 3.2.0 Resolution: Fixed Issue resolved by pull request 32613 [https://github.com/apache/spark/pull/32613] > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.2.0, 3.1.2, 3.0.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35463: - Assignee: Dongjoon Hyun > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
[ https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35462. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32612 [https://github.com/apache/spark/pull/32612] > Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models > - > > Key: SPARK-35462 > URL: https://issues.apache.org/jira/browse/SPARK-35462 > Project: Spark > Issue Type: Improvement > Components: Build, Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
[ https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35462: - Assignee: Dongjoon Hyun > Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models > - > > Key: SPARK-35462 > URL: https://issues.apache.org/jira/browse/SPARK-35462 > Project: Spark > Issue Type: Improvement > Components: Build, Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348799#comment-17348799 ] Apache Spark commented on SPARK-35463: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/32613 > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35463: Assignee: (was: Apache Spark) > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35463: Assignee: Apache Spark > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348800#comment-17348800 ] Apache Spark commented on SPARK-35463: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/32613 > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35458) ARM CI failed: failed to validate maven sha512
[ https://issues.apache.org/jira/browse/SPARK-35458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-35458: - Priority: Minor (was: Major) > ARM CI failed: failed to validate maven sha512 > -- > > Key: SPARK-35458 > URL: https://issues.apache.org/jira/browse/SPARK-35458 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Minor > Fix For: 3.0.3, 3.1.2, 3.2.0 > > > Log: > > Veryfing checksum from > /home/jenkins/workspace/spark-master-test-maven-arm/build/apache-maven-3.6.3-bin.tar.gz.sha512 > *Unknown option: q* > *Type shasum -h for help* > Bad checksum from > [https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz.sha512] > > Looks like shasum validation had some wrong change in: > [https://github.com/apache/spark/commit/6c5fcac6b787d01ebf3d9f53410db2c894ab9abd#diff-590845f9441f6be1f05f517fd1caf31d64d0b5126ea9a2a13d79c74f761417ce] > > [1] [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/] > [2] > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35463: -- Target Version/s: 3.0.3, 3.1.2, 3.2.0 > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35458) ARM CI failed: failed to validate maven sha512
[ https://issues.apache.org/jira/browse/SPARK-35458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-35458. -- Fix Version/s: 3.0.3 3.1.2 3.2.0 Resolution: Fixed Issue resolved by pull request 32604 [https://github.com/apache/spark/pull/32604] > ARM CI failed: failed to validate maven sha512 > -- > > Key: SPARK-35458 > URL: https://issues.apache.org/jira/browse/SPARK-35458 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.2.0, 3.1.2, 3.0.3 > > > Log: > > Veryfing checksum from > /home/jenkins/workspace/spark-master-test-maven-arm/build/apache-maven-3.6.3-bin.tar.gz.sha512 > *Unknown option: q* > *Type shasum -h for help* > Bad checksum from > [https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz.sha512] > > Looks like shasum validation had some wrong change in: > [https://github.com/apache/spark/commit/6c5fcac6b787d01ebf3d9f53410db2c894ab9abd#diff-590845f9441f6be1f05f517fd1caf31d64d0b5126ea9a2a13d79c74f761417ce] > > [1] [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/] > [2] > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
[ https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35463: -- Priority: Blocker (was: Major) > Skip checking checksum on a system doesn't have `shasum` > > > Key: SPARK-35463 > URL: https://issues.apache.org/jira/browse/SPARK-35463 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35458) ARM CI failed: failed to validate maven sha512
[ https://issues.apache.org/jira/browse/SPARK-35458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-35458: Assignee: Yikun Jiang > ARM CI failed: failed to validate maven sha512 > -- > > Key: SPARK-35458 > URL: https://issues.apache.org/jira/browse/SPARK-35458 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > > Log: > > Veryfing checksum from > /home/jenkins/workspace/spark-master-test-maven-arm/build/apache-maven-3.6.3-bin.tar.gz.sha512 > *Unknown option: q* > *Type shasum -h for help* > Bad checksum from > [https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz.sha512] > > Looks like shasum validation had some wrong change in: > [https://github.com/apache/spark/commit/6c5fcac6b787d01ebf3d9f53410db2c894ab9abd#diff-590845f9441f6be1f05f517fd1caf31d64d0b5126ea9a2a13d79c74f761417ce] > > [1] [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/] > [2] > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`
Dongjoon Hyun created SPARK-35463: - Summary: Skip checking checksum on a system doesn't have `shasum` Key: SPARK-35463 URL: https://issues.apache.org/jira/browse/SPARK-35463 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.0.3, 3.1.2, 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18683) REST APIs for standalone Master、Workers and Applications
[ https://issues.apache.org/jira/browse/SPARK-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348708#comment-17348708 ] Mayank Asthana commented on SPARK-18683: Looking through the code I found that there is a `/json` endpoint to the master ui which returns the json representation of everything on that page. However, I don't think this is documented anywhere. > REST APIs for standalone Master、Workers and Applications > > > Key: SPARK-18683 > URL: https://issues.apache.org/jira/browse/SPARK-18683 > Project: Spark > Issue Type: Improvement >Reporter: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > It would be great that we have some REST APIs to access Master、Workers and > Applications information. Right now the only way to get them is using the Web > UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
[ https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35462: Assignee: (was: Apache Spark) > Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models > - > > Key: SPARK-35462 > URL: https://issues.apache.org/jira/browse/SPARK-35462 > Project: Spark > Issue Type: Improvement > Components: Build, Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
[ https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348681#comment-17348681 ] Apache Spark commented on SPARK-35462: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/32612 > Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models > - > > Key: SPARK-35462 > URL: https://issues.apache.org/jira/browse/SPARK-35462 > Project: Spark > Issue Type: Improvement > Components: Build, Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
[ https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35462: Assignee: Apache Spark > Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models > - > > Key: SPARK-35462 > URL: https://issues.apache.org/jira/browse/SPARK-35462 > Project: Spark > Issue Type: Improvement > Components: Build, Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
Dongjoon Hyun created SPARK-35462: - Summary: Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models Key: SPARK-35462 URL: https://issues.apache.org/jira/browse/SPARK-35462 Project: Spark Issue Type: Improvement Components: Build, Kubernetes Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint
[ https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35461: -- Issue Type: Improvement (was: Bug) > Error when reading dictionary-encoded Parquet int column when read schema is > bigint > --- > > Key: SPARK-35461 > URL: https://issues.apache.org/jira/browse/SPARK-35461 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Chao Sun >Priority: Major > > When reading a dictionary-encoded integer column from a Parquet file, and > users specify read schema to be bigint, Spark currently will fail with the > following exception: > {code} > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) > {code} > To reproduce: > {code} > val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, > i.toString)) > withParquetFile(data) { path => > val readSchema = StructType(Seq(StructField("_1", LongType))) > spark.read.schema(readSchema).parquet(path).first() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint
[ https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348678#comment-17348678 ] Dongjoon Hyun commented on SPARK-35461: --- For a record, Apache Spark file-based data sources have different capabilities like we don't expect much capability at TEXT data sources. Parquet data source has been having this limitation for a long time. > Error when reading dictionary-encoded Parquet int column when read schema is > bigint > --- > > Key: SPARK-35461 > URL: https://issues.apache.org/jira/browse/SPARK-35461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Chao Sun >Priority: Major > > When reading a dictionary-encoded integer column from a Parquet file, and > users specify read schema to be bigint, Spark currently will fail with the > following exception: > {code} > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) > {code} > To reproduce: > {code} > val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, > i.toString)) > withParquetFile(data) { path => > val readSchema = StructType(Seq(StructField("_1", LongType))) > spark.read.schema(readSchema).parquet(path).first() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint
[ https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35461: -- Affects Version/s: (was: 3.1.1) (was: 3.0.2) 3.2.0 > Error when reading dictionary-encoded Parquet int column when read schema is > bigint > --- > > Key: SPARK-35461 > URL: https://issues.apache.org/jira/browse/SPARK-35461 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Priority: Major > > When reading a dictionary-encoded integer column from a Parquet file, and > users specify read schema to be bigint, Spark currently will fail with the > following exception: > {code} > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) > {code} > To reproduce: > {code} > val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, > i.toString)) > withParquetFile(data) { path => > val readSchema = StructType(Seq(StructField("_1", LongType))) > spark.read.schema(readSchema).parquet(path).first() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint
[ https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348677#comment-17348677 ] Dongjoon Hyun commented on SPARK-35461: --- This is a well-known limitation after we build a test coverage via SPARK-23007 at Apache Spark 2.4.0. > Error when reading dictionary-encoded Parquet int column when read schema is > bigint > --- > > Key: SPARK-35461 > URL: https://issues.apache.org/jira/browse/SPARK-35461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Chao Sun >Priority: Major > > When reading a dictionary-encoded integer column from a Parquet file, and > users specify read schema to be bigint, Spark currently will fail with the > following exception: > {code} > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) > {code} > To reproduce: > {code} > val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, > i.toString)) > withParquetFile(data) { path => > val readSchema = StructType(Seq(StructField("_1", LongType))) > spark.read.schema(readSchema).parquet(path).first() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348675#comment-17348675 ] Apache Spark commented on SPARK-35314: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/32611 > Support arithmetic operations against bool IndexOpsMixin > > > Key: SPARK-35314 > URL: https://issues.apache.org/jira/browse/SPARK-35314 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Existing binary operations of bool Series in Koalas do not match pandas’ > behaviors. > pandas take True as 1, False as 0 when dealing with numeric values, numeric > collections, and numeric Series; whereas Koalas raises an AnalysisException > no matter what the binary operation is. > We aim to match pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35314: Assignee: Apache Spark > Support arithmetic operations against bool IndexOpsMixin > > > Key: SPARK-35314 > URL: https://issues.apache.org/jira/browse/SPARK-35314 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Existing binary operations of bool Series in Koalas do not match pandas’ > behaviors. > pandas take True as 1, False as 0 when dealing with numeric values, numeric > collections, and numeric Series; whereas Koalas raises an AnalysisException > no matter what the binary operation is. > We aim to match pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35314: Assignee: (was: Apache Spark) > Support arithmetic operations against bool IndexOpsMixin > > > Key: SPARK-35314 > URL: https://issues.apache.org/jira/browse/SPARK-35314 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Existing binary operations of bool Series in Koalas do not match pandas’ > behaviors. > pandas take True as 1, False as 0 when dealing with numeric values, numeric > collections, and numeric Series; whereas Koalas raises an AnalysisException > no matter what the binary operation is. > We aim to match pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348672#comment-17348672 ] Apache Spark commented on SPARK-35314: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/32611 > Support arithmetic operations against bool IndexOpsMixin > > > Key: SPARK-35314 > URL: https://issues.apache.org/jira/browse/SPARK-35314 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Existing binary operations of bool Series in Koalas do not match pandas’ > behaviors. > pandas take True as 1, False as 0 when dealing with numeric values, numeric > collections, and numeric Series; whereas Koalas raises an AnalysisException > no matter what the binary operation is. > We aim to match pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35314: - Summary: Support arithmetic operations against bool IndexOpsMixin (was: Support arithmetic operations against bool Series) > Support arithmetic operations against bool IndexOpsMixin > > > Key: SPARK-35314 > URL: https://issues.apache.org/jira/browse/SPARK-35314 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Existing binary operations of bool Series in Koalas do not match pandas’ > behaviors. > pandas take True as 1, False as 0 when dealing with numeric values, numeric > collections, and numeric Series; whereas Koalas raises an AnalysisException > no matter what the binary operation is. > We aim to match pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint
[ https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348667#comment-17348667 ] Chao Sun commented on SPARK-35461: -- Actually this also fails when turning off the vectorized reader: {code} Caused by: java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to class org.apache.spark.sql.catalyst.expressions.MutableInt (org.apache.spark.sql.catalyst.expressions.MutableLong and org.apache.spark.sql.catalyst.expressions.MutableInt are in unnamed module of loader 'app') at org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.setInt(SpecificInternalRow.scala:253) at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RowUpdater.setInt(ParquetRowConverter.scala:178) at org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter.addInt(ParquetRowConverter.scala:88) at org.apache.parquet.column.impl.ColumnReaderBase$2$3.writeValue(ColumnReaderBase.java:297) at org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:440) at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30) at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:229) {code} In this case parquet-mr is able to return the value but Spark won't be able to handle it. > Error when reading dictionary-encoded Parquet int column when read schema is > bigint > --- > > Key: SPARK-35461 > URL: https://issues.apache.org/jira/browse/SPARK-35461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Chao Sun >Priority: Major > > When reading a dictionary-encoded integer column from a Parquet file, and > users specify read schema to be bigint, Spark currently will fail with the > following exception: > {code} > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) > {code} > To reproduce: > {code} > val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, > i.toString)) > withParquetFile(data) { path => > val readSchema = StructType(Seq(StructField("_1", LongType))) > spark.read.schema(readSchema).parquet(path).first() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint
Chao Sun created SPARK-35461: Summary: Error when reading dictionary-encoded Parquet int column when read schema is bigint Key: SPARK-35461 URL: https://issues.apache.org/jira/browse/SPARK-35461 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1, 3.0.2 Reporter: Chao Sun When reading a dictionary-encoded integer column from a Parquet file, and users specify read schema to be bigint, Spark currently will fail with the following exception: {code} java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) {code} To reproduce: {code} val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, i.toString)) withParquetFile(data) { path => val readSchema = StructType(Seq(StructField("_1", LongType))) spark.read.schema(readSchema).parquet(path).first() } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
[ https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xu updated SPARK-33743: -- Description: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. } {code} *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP)).. } {code} was: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. } {code} *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. } {code} > Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2 > --- > > Key: SPARK-33743 > URL: https://issues.apache.org/jira/browse/SPARK-33743 > Project: Spark > Issue Type: Request > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: Lu Xu >Priority: Major > > *datetime v/s datetime2* > Spark datetime type is > [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. > This supports a microsecond resolution. > > Sql supports 2 date time types > o *datetime* can support only milli seconds resolu
[jira] [Updated] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
[ https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35460: -- Affects Version/s: (was: 3.1.1) 3.2.0 > invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang > - > > Key: SPARK-35460 > URL: https://issues.apache.org/jira/browse/SPARK-35460 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: > https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. > Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=metadata.name, > message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must > consist of lower case alphanumeric characters, '-' or '.', and must start and > end with an alphanumeric character (e.g. 'example.com', regex used for > validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > reason=FieldValueInvalid, additionalProperties={}), > StatusCause(field=spec.hostname, message=Invalid value: > "spark_exec-exec-688": a DNS-1123 label must consist of lower case > alphanumeric characters or '-', and must start and end with an alphanumeric > character (e.g. 'my-name', or '123-abc', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) > {code} > When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, > the driver will continuously fail to request executors from k8s master, which > causes the app to hang with the above message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
[ https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xu updated SPARK-33743: -- Description: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. } {code} *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. } {code} was: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* {code:java} {code} *override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }* *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. } {code} > Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2 > --- > > Key: SPARK-33743 > URL: https://issues.apache.org/jira/browse/SPARK-33743 > Project: Spark > Issue Type: Request > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: Lu Xu >Priority: Major > > *datetime v/s datetime2* > Spark datetime type is > [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. > This supports a microsecond resolution. > > Sql supports 2 date time types > o *datetime* can support only milli secon
[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
[ https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xu updated SPARK-33743: -- Description: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* {code:java} {code} *override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }* *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. } {code} was: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* |override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ *case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| | *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. } {code} > Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2 > --- > > Key: SPARK-33743 > URL: https://issues.apache.org/jira/browse/SPARK-33743 > Project: Spark > Issue Type: Request > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: Lu Xu >Priority: Major > > *datetime v/s datetime2* > Spark datetime type is > [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. > This supports a microsecond resolution. > > Sql supports 2 date time types > o *datetime* can support only milli seconds resolutio
[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
[ https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xu updated SPARK-33743: -- Description: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* |override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ *case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| | *+Proposal+* {code:java} override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. } {code} was: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* |override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| | *+Proposal+* override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *_case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))}_* .. }| | > Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2 > --- > > Key: SPARK-33743 > URL: https://issues.apache.org/jira/browse/SPARK-33743 > Project: Spark > Issue Type: Request > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: Lu Xu >Priority: Major > > *datetime v/s datetime2* > Spark datetime type is > [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. > This supports a microsecond resolution. > > Sql supports 2 date time types > o *datetime* can support only milli seconds resolution (0 to 999). > o *datetime2* is
[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
[ https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xu updated SPARK-33743: -- Description: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* |override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| | *+Proposal+* override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *_case TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))}_* .. }| | was: *datetime v/s datetime2* Spark datetime type is [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. This supports a microsecond resolution. Sql supports 2 date time types o *datetime* can support only milli seconds resolution (0 to 999). o *datetime2* is extension of datetime , is compatible with datetime and supports 0 to 999 sub second resolution. Currently [MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0] maps timestamptype to datetime. This implies results in errors when writing *+Current+* |override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ *case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| | *+Proposal+* override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *_case TimestampType => if(oldDateTime) {Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))} else \{Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))}_* .. }| | > Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2 > --- > > Key: SPARK-33743 > URL: https://issues.apache.org/jira/browse/SPARK-33743 > Project: Spark > Issue Type: Request > Components: Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: Lu Xu >Priority: Major > > *datetime v/s datetime2* > Spark datetime type is > [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0]. > This supports a microsecond resolution. > > Sql supports 2 date time types > o *datetime* can supp
[jira] [Commented] (SPARK-35256) Subexpression elimination leading to a performance regression
[ https://issues.apache.org/jira/browse/SPARK-35256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348576#comment-17348576 ] Ondrej Kokes commented on SPARK-35256: -- [~Kimahriman] I think you're right - I've just built the PR linked in 35410 and it brought the runtime to less than half of what it was under 3.1.1 (15.45s vs 35s) and it's also faster than 2.4.x, which is nice. So if merged, I'll close this as a dupe - but for now I'll subscribe to that issue and PR and wait for its resolution. Strangely enough, my original pipeline (which was simplified into the repro linked in this issue) is only 10% faster than under 3.1.1 (so way way slower than 2.4.x), so there are more things at play. I'll investigate more once this is merged. Thanks for the links! > Subexpression elimination leading to a performance regression > - > > Key: SPARK-35256 > URL: https://issues.apache.org/jira/browse/SPARK-35256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ondrej Kokes >Priority: Minor > Attachments: bisect_log.txt, bisect_timing.csv > > > I'm seeing almost double the runtime between 3.0.1 and 3.1.1 in my pipeline > that does mostly str_to_map, split and a few other operations - all > projections, no joins or aggregations (it's here only to trigger the > pipeline). I cut it down to the simplest reproducible example I could - > anything I remove from this changes the runtime difference quite > dramatically. (even moving those two expressions from f.when to standalone > columns makes the difference disappear) > {code:java} > import time > import os > import pyspark > from pyspark.sql import SparkSession > import pyspark.sql.functions as f > if __name__ == '__main__': > print(pyspark.__version__) > spark = SparkSession.builder.getOrCreate() > filename = 'regression.csv' > if not os.path.isfile(filename): > with open(filename, 'wt') as fw: > fw.write('foo\n') > for _ in range(10_000_000): > fw.write('foo=bar&baz=bak&bar=f,o,1:2:3\n') > df = spark.read.option('header', True).csv(filename) > t = time.time() > dd = (df > .withColumn('my_map', f.expr('str_to_map(foo, "&", "=")')) > .withColumn('extracted', > # without this top level split it is only 50% > slower, with it > # the runtime almost doubles > f.split(f.split(f.col("my_map")["bar"], ",")[2], > ":")[0] >) > .select( > f.when( > f.col("extracted").startswith("foo"), f.col("extracted") > ).otherwise( > f.concat(f.lit("foo"), f.col("extracted")) > ).alias("foo") > ) > ) > # dd.explain(True) > _ = dd.groupby("foo").count().count() > print("elapsed", time.time() - t) > {code} > Running this in 3.0.1 and 3.1.1 respectively (both installed from PyPI, on my > local macOS) > {code:java} > 3.0.1 > elapsed 21.262351036071777 > 3.1.1 > elapsed 40.26582884788513 > {code} > (Meaning the transformation took 21 seconds in 3.0.1 and 40 seconds in 3.1.1) > Feel free to make the CSV smaller to get a quicker feedback loop - it scales > linearly (I developed this with 2M rows). > It might be related to my previous issue - SPARK-32989 - there are similar > operations, nesting etc. (splitting on the original column, not on a map, > makes the difference disappear) > I tried dissecting the queries in SparkUI and via explain, but both 3.0.1 and > 3.1.1 produced identical plans. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
[ https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348544#comment-17348544 ] Apache Spark commented on SPARK-35460: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/32610 > invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang > - > > Key: SPARK-35460 > URL: https://issues.apache.org/jira/browse/SPARK-35460 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.1 >Reporter: Kent Yao >Priority: Major > > {code:java} > 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: > https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. > Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=metadata.name, > message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must > consist of lower case alphanumeric characters, '-' or '.', and must start and > end with an alphanumeric character (e.g. 'example.com', regex used for > validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > reason=FieldValueInvalid, additionalProperties={}), > StatusCause(field=spec.hostname, message=Invalid value: > "spark_exec-exec-688": a DNS-1123 label must consist of lower case > alphanumeric characters or '-', and must start and end with an alphanumeric > character (e.g. 'my-name', or '123-abc', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) > {code} > When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, > the driver will continuously fail to request executors from k8s master, which > causes the app to hang with the above message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
[ https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35460: Assignee: (was: Apache Spark) > invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang > - > > Key: SPARK-35460 > URL: https://issues.apache.org/jira/browse/SPARK-35460 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.1 >Reporter: Kent Yao >Priority: Major > > {code:java} > 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: > https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. > Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=metadata.name, > message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must > consist of lower case alphanumeric characters, '-' or '.', and must start and > end with an alphanumeric character (e.g. 'example.com', regex used for > validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > reason=FieldValueInvalid, additionalProperties={}), > StatusCause(field=spec.hostname, message=Invalid value: > "spark_exec-exec-688": a DNS-1123 label must consist of lower case > alphanumeric characters or '-', and must start and end with an alphanumeric > character (e.g. 'my-name', or '123-abc', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) > {code} > When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, > the driver will continuously fail to request executors from k8s master, which > causes the app to hang with the above message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
[ https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348543#comment-17348543 ] Apache Spark commented on SPARK-35460: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/32610 > invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang > - > > Key: SPARK-35460 > URL: https://issues.apache.org/jira/browse/SPARK-35460 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.1 >Reporter: Kent Yao >Priority: Major > > {code:java} > 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: > https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. > Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=metadata.name, > message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must > consist of lower case alphanumeric characters, '-' or '.', and must start and > end with an alphanumeric character (e.g. 'example.com', regex used for > validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > reason=FieldValueInvalid, additionalProperties={}), > StatusCause(field=spec.hostname, message=Invalid value: > "spark_exec-exec-688": a DNS-1123 label must consist of lower case > alphanumeric characters or '-', and must start and end with an alphanumeric > character (e.g. 'my-name', or '123-abc', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) > {code} > When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, > the driver will continuously fail to request executors from k8s master, which > causes the app to hang with the above message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
[ https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35460: Assignee: Apache Spark > invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang > - > > Key: SPARK-35460 > URL: https://issues.apache.org/jira/browse/SPARK-35460 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.1 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > {code:java} > 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: > https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. > Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=metadata.name, > message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must > consist of lower case alphanumeric characters, '-' or '.', and must start and > end with an alphanumeric character (e.g. 'example.com', regex used for > validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > reason=FieldValueInvalid, additionalProperties={}), > StatusCause(field=spec.hostname, message=Invalid value: > "spark_exec-exec-688": a DNS-1123 label must consist of lower case > alphanumeric characters or '-', and must start and end with an alphanumeric > character (e.g. 'my-name', or '123-abc', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: > "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case > alphanumeric characters, '-' or '.', and must start and end with an > alphanumeric character (e.g. 'example.com', regex used for validation is > '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), > spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must > consist of lower case alphanumeric characters or '-', and must start and end > with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for > validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) > {code} > When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, > the driver will continuously fail to request executors from k8s master, which > causes the app to hang with the above message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
Kent Yao created SPARK-35460: Summary: invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang Key: SPARK-35460 URL: https://issues.apache.org/jira/browse/SPARK-35460 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.1.1 Reporter: Kent Yao {code:java} 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying snapshot subscriber. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=metadata.name, message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), reason=FieldValueInvalid, additionalProperties={}), StatusCause(field=spec.hostname, message=Invalid value: "spark_exec-exec-688": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) {code} When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, the driver will continuously fail to request executors from k8s master, which causes the app to hang with the above message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29223) Kafka source: offset by timestamp - allow specifying timestamp for "all partitions"
[ https://issues.apache.org/jira/browse/SPARK-29223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348508#comment-17348508 ] Apache Spark commented on SPARK-29223: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/32609 > Kafka source: offset by timestamp - allow specifying timestamp for "all > partitions" > --- > > Key: SPARK-29223 > URL: https://issues.apache.org/jira/browse/SPARK-29223 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Priority: Minor > > This issue is a follow-up of SPARK-26848. > In SPARK-26848, we decided to open possibility to let end users set > individual timestamp per partition. But in many cases, specifying timestamp > represents the intention that we would want to go back to specific timestamp > and reprocess records, which should be applied to all topics and partitions. > According to the format of > `startingOffsetsByTimestamp`/`endingOffsetsByTimestamp`, while it's not > intuitive to provide an option to set a global timestamp across topic, it's > still intuitive to provide an option to set a global timestamp across > partitions in a topic. > This issue tracks the efforts to deal with this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33867) java.time.Instant and java.time.LocalDate not handled in org.apache.spark.sql.jdbc.JdbcDialect#compileValue
[ https://issues.apache.org/jira/browse/SPARK-33867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348323#comment-17348323 ] Takeshi Yamamuro commented on SPARK-33867: -- Please see the "Fix Version/s" in this jira and that includes 3.1.x, too. > java.time.Instant and java.time.LocalDate not handled in > org.apache.spark.sql.jdbc.JdbcDialect#compileValue > --- > > Key: SPARK-33867 > URL: https://issues.apache.org/jira/browse/SPARK-33867 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Cristi >Assignee: Cristi >Priority: Major > Fix For: 3.0.2, 3.1.1, 3.2.0 > > > When using the new java time API (spark.sql.datetime.java8API.enabled=true) > LocalDate and Instant aren't handled in > org.apache.spark.sql.jdbc.JdbcDialect#compileValue so exceptions are thrown > when they are used in filters since a filter condition would be translated to > something like this: "valid_from" > 2020-12-21T11:40:24.413681Z. > To reproduce you can write a simple filter like where dataset is backed by a > DB table (in my case PostgreSQL): > dataset.filter(current_timestamp().gt(col(VALID_FROM))) > The error and stacktrace: > Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near > "T11"Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or > near "T11" Position: 285 at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at > org.apache.spark.scheduler.Task.run(Task.scala:127) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35373) Verify checksums of downloaded artifacts in build/mvn
[ https://issues.apache.org/jira/browse/SPARK-35373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348318#comment-17348318 ] Apache Spark commented on SPARK-35373: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/32608 > Verify checksums of downloaded artifacts in build/mvn > - > > Key: SPARK-35373 > URL: https://issues.apache.org/jira/browse/SPARK-35373 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.7, 3.0.2, 3.1.1 >Reporter: Sean R. Owen >Assignee: Apache Spark >Priority: Minor > Fix For: 3.0.3, 3.1.2, 3.2.0 > > > build/mvn is a convenience script that will automatically download Maven (and > Scala) if not already present. While it downloads from official ASF mirrors, > it does not check the checksum of the artifact, which is available as a > .sha512 file from ASF servers. > The risk of a supply chain attack is a bit less theoretical here than usual, > because artifacts are downloaded from any of several mirrors worldwide, and > injecting a malicious copy of Maven in any one of them might be simpler and > less noticeable than injecting it into ASF servers. > (Note, Scala's download site does not seem to provide a checksum. They do all > come from Lightbend, at least, not N mirrors. Not much we can do there.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33867) java.time.Instant and java.time.LocalDate not handled in org.apache.spark.sql.jdbc.JdbcDialect#compileValue
[ https://issues.apache.org/jira/browse/SPARK-33867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348298#comment-17348298 ] Cristi commented on SPARK-33867: looks like it: https://github.com/apache/spark/blob/branch-3.1/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala > java.time.Instant and java.time.LocalDate not handled in > org.apache.spark.sql.jdbc.JdbcDialect#compileValue > --- > > Key: SPARK-33867 > URL: https://issues.apache.org/jira/browse/SPARK-33867 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Cristi >Assignee: Cristi >Priority: Major > Fix For: 3.0.2, 3.1.1, 3.2.0 > > > When using the new java time API (spark.sql.datetime.java8API.enabled=true) > LocalDate and Instant aren't handled in > org.apache.spark.sql.jdbc.JdbcDialect#compileValue so exceptions are thrown > when they are used in filters since a filter condition would be translated to > something like this: "valid_from" > 2020-12-21T11:40:24.413681Z. > To reproduce you can write a simple filter like where dataset is backed by a > DB table (in my case PostgreSQL): > dataset.filter(current_timestamp().gt(col(VALID_FROM))) > The error and stacktrace: > Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near > "T11"Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or > near "T11" Position: 285 at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at > org.apache.spark.scheduler.Task.run(Task.scala:127) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35459) Move AvroRowReaderSuite to a separate file
[ https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35459. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32607 [https://github.com/apache/spark/pull/32607] > Move AvroRowReaderSuite to a separate file > -- > > Key: SPARK-35459 > URL: https://issues.apache.org/jira/browse/SPARK-35459 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Move AvroRowReaderSuite from AvroSuite.scala and place it to > AvroRowReaderSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33867) java.time.Instant and java.time.LocalDate not handled in org.apache.spark.sql.jdbc.JdbcDialect#compileValue
[ https://issues.apache.org/jira/browse/SPARK-33867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348294#comment-17348294 ] LiaoHanwen commented on SPARK-33867: Is this fixed on branch-3.1? > java.time.Instant and java.time.LocalDate not handled in > org.apache.spark.sql.jdbc.JdbcDialect#compileValue > --- > > Key: SPARK-33867 > URL: https://issues.apache.org/jira/browse/SPARK-33867 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Cristi >Assignee: Cristi >Priority: Major > Fix For: 3.0.2, 3.1.1, 3.2.0 > > > When using the new java time API (spark.sql.datetime.java8API.enabled=true) > LocalDate and Instant aren't handled in > org.apache.spark.sql.jdbc.JdbcDialect#compileValue so exceptions are thrown > when they are used in filters since a filter condition would be translated to > something like this: "valid_from" > 2020-12-21T11:40:24.413681Z. > To reproduce you can write a simple filter like where dataset is backed by a > DB table (in my case PostgreSQL): > dataset.filter(current_timestamp().gt(col(VALID_FROM))) > The error and stacktrace: > Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near > "T11"Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or > near "T11" Position: 285 at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at > org.apache.spark.scheduler.Task.run(Task.scala:127) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35424) Remove some useless code in ExternalBlockHandler
[ https://issues.apache.org/jira/browse/SPARK-35424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35424. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32571 [https://github.com/apache/spark/pull/32571] > Remove some useless code in ExternalBlockHandler > > > Key: SPARK-35424 > URL: https://issues.apache.org/jira/browse/SPARK-35424 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: weixiuli >Assignee: weixiuli >Priority: Major > Fix For: 3.2.0 > > > There is some useless code in the ExternalBlockHandler, so we may remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35424) Remove some useless code in ExternalBlockHandler
[ https://issues.apache.org/jira/browse/SPARK-35424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35424: Assignee: weixiuli > Remove some useless code in ExternalBlockHandler > > > Key: SPARK-35424 > URL: https://issues.apache.org/jira/browse/SPARK-35424 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: weixiuli >Assignee: weixiuli >Priority: Major > > There is some useless code in the ExternalBlockHandler, so we may remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35459) Move AvroRowReaderSuite to a separate file
[ https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35459: Assignee: Max Gekk (was: Apache Spark) > Move AvroRowReaderSuite to a separate file > -- > > Key: SPARK-35459 > URL: https://issues.apache.org/jira/browse/SPARK-35459 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Move AvroRowReaderSuite from AvroSuite.scala and place it to > AvroRowReaderSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35459) Move AvroRowReaderSuite to a separate file
[ https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348195#comment-17348195 ] Apache Spark commented on SPARK-35459: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/32607 > Move AvroRowReaderSuite to a separate file > -- > > Key: SPARK-35459 > URL: https://issues.apache.org/jira/browse/SPARK-35459 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Move AvroRowReaderSuite from AvroSuite.scala and place it to > AvroRowReaderSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35459) Move AvroRowReaderSuite to a separate file
[ https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348194#comment-17348194 ] Apache Spark commented on SPARK-35459: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/32607 > Move AvroRowReaderSuite to a separate file > -- > > Key: SPARK-35459 > URL: https://issues.apache.org/jira/browse/SPARK-35459 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Move AvroRowReaderSuite from AvroSuite.scala and place it to > AvroRowReaderSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35459) Move AvroRowReaderSuite to a separate file
[ https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35459: Assignee: Apache Spark (was: Max Gekk) > Move AvroRowReaderSuite to a separate file > -- > > Key: SPARK-35459 > URL: https://issues.apache.org/jira/browse/SPARK-35459 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Move AvroRowReaderSuite from AvroSuite.scala and place it to > AvroRowReaderSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35378) Eagerly execute non-root Command so that query command with CTE
[ https://issues.apache.org/jira/browse/SPARK-35378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-35378: --- Summary: Eagerly execute non-root Command so that query command with CTE (was: Eagerly execute Command so that query command with CTE) > Eagerly execute non-root Command so that query command with CTE > --- > > Key: SPARK-35378 > URL: https://issues.apache.org/jira/browse/SPARK-35378 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark doesn't support LeafRunnableCommand as sub query. > Because the LeafRunnableCommand always output GenericInternalRow and some > node(e.g. SortExec, AdaptiveExecutionExec, WholeCodegenExec) will convert > GenericInternalRow to UnsafeRow. So will causes error as follows: > {code:java} > java.lang.ClassCastException > org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast > to org.apache.spark.sql.catalyst.expressions.UnsafeRow > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org