[jira] [Assigned] (SPARK-35244) invoke should throw the original exception

2021-05-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-35244:
---

Assignee: Wenchen Fan  (was: Apache Spark)

> invoke should throw the original exception
> --
>
> Key: SPARK-35244
> URL: https://issues.apache.org/jira/browse/SPARK-35244
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.3, 3.1.2, 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35378) Eagerly execute non-root Command

2021-05-20 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-35378:
---
Summary: Eagerly execute non-root Command  (was: Eagerly execute non-root 
Command so that query command with CTE)

> Eagerly execute non-root Command
> 
>
> Key: SPARK-35378
> URL: https://issues.apache.org/jira/browse/SPARK-35378
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark doesn't support LeafRunnableCommand as sub query.
> Because the LeafRunnableCommand always output GenericInternalRow and some 
> node(e.g. SortExec, AdaptiveExecutionExec, WholeCodegenExec) will convert 
> GenericInternalRow to UnsafeRow. So will causes error as follows:
> {code:java}
> java.lang.ClassCastException
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast 
> to org.apache.spark.sql.catalyst.expressions.UnsafeRow
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35427) Check the EXCEPTION rebase mode for Avro/Parquet

2021-05-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35427.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32574
[https://github.com/apache/spark/pull/32574]

> Check the EXCEPTION rebase mode for Avro/Parquet
> 
>
> Key: SPARK-35427
> URL: https://issues.apache.org/jira/browse/SPARK-35427
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add tests to check the SparkUpgradeException exception in the EXCEPTION 
> rebase node for Avro and Parquet datasource. Currently, the mode is checked 
> implicitly, and not for all data types columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35063) Group exception messages in sql/catalyst

2021-05-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35063:
---

Assignee: jiaan.geng

> Group exception messages in sql/catalyst
> 
>
> Key: SPARK-35063
> URL: https://issues.apache.org/jira/browse/SPARK-35063
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst 
> module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35063) Group exception messages in sql/catalyst

2021-05-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35063.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32478
[https://github.com/apache/spark/pull/32478]

> Group exception messages in sql/catalyst
> 
>
> Key: SPARK-35063
> URL: https://issues.apache.org/jira/browse/SPARK-35063
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Group all errors in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst 
> module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35479) Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35479.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32615
[https://github.com/apache/spark/pull/32615]

> Format PartitionFilters IN strings in scan nodes
> 
>
> Key: SPARK-35479
> URL: https://issues.apache.org/jira/browse/SPARK-35479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.2.0
>
>
> This ticket proposes to format strings correctly for `PushedFilters`. For 
> example, `explain()` for a query below prints `v in (array('a'))` as 
> `PushedFilters: [In(v, [WrappedArray(a)])]`;
> {code}
> scala> sql("create table t (v array) using parquet")
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], 
> ReadSchema: struct>
> {code}
> This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; 
> {code}
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: 
> struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35479) Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35479:
---

Assignee: Takeshi Yamamuro

> Format PartitionFilters IN strings in scan nodes
> 
>
> Key: SPARK-35479
> URL: https://issues.apache.org/jira/browse/SPARK-35479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
>
> This ticket proposes to format strings correctly for `PushedFilters`. For 
> example, `explain()` for a query below prints `v in (array('a'))` as 
> `PushedFilters: [In(v, [WrappedArray(a)])]`;
> {code}
> scala> sql("create table t (v array) using parquet")
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], 
> ReadSchema: struct>
> {code}
> This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; 
> {code}
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: 
> struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35445) Reduce the execution time of DeduplicateRelations

2021-05-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35445:
--

Assignee: wuyi

> Reduce the execution time of DeduplicateRelations
> -
>
> Key: SPARK-35445
> URL: https://issues.apache.org/jira/browse/SPARK-35445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35445) Reduce the execution time of DeduplicateRelations

2021-05-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35445.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32590
[https://github.com/apache/spark/pull/32590]

> Reduce the execution time of DeduplicateRelations
> -
>
> Key: SPARK-35445
> URL: https://issues.apache.org/jira/browse/SPARK-35445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35456) Show invalid value in config entry check error message

2021-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35456.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/32600

> Show invalid value in config entry check error message
> --
>
> Key: SPARK-35456
> URL: https://issues.apache.org/jira/browse/SPARK-35456
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Minor
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35456) Show invalid value in config entry check error message

2021-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35456:


Assignee: Kent Yao

> Show invalid value in config entry check error message
> --
>
> Key: SPARK-35456
> URL: https://issues.apache.org/jira/browse/SPARK-35456
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35481) Create more robust link for Data Source Options

2021-05-20 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-35481:

Description: 
Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2

For example, [Data Source Option for 
Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#data-source-option]

It should point 3.2 document only in branch-3.2, so it's better to use a 
relative link instead of /latest/.

  was:
Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2

It should point 3.2 document only in branch-3.2, so it's better to use a 
relative link instead of /latest/.


> Create more robust link for Data Source Options
> ---
>
> Key: SPARK-35481
> URL: https://issues.apache.org/jira/browse/SPARK-35481
> Project: Spark
>  Issue Type: Documentation
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Now the link for the Data Source Options using /latest/, but it possibly be 
> broken when we cut branch-3.2
> For example, [Data Source Option for 
> Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#data-source-option]
> It should point 3.2 document only in branch-3.2, so it's better to use a 
> relative link instead of /latest/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35481) Create more robust link for Data Source Options

2021-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35481:
-
Description: 
Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2

It should point 3.2 document only in branch-3.2, so it's better to use a 
relative link instead of /latest/.

  was:
Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2 on July 1st.

It should point 3.2 document only in branch-3.2, so it's better to use a 
relative link instead of /latest/.


> Create more robust link for Data Source Options
> ---
>
> Key: SPARK-35481
> URL: https://issues.apache.org/jira/browse/SPARK-35481
> Project: Spark
>  Issue Type: Documentation
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Now the link for the Data Source Options using /latest/, but it possibly be 
> broken when we cut branch-3.2
> It should point 3.2 document only in branch-3.2, so it's better to use a 
> relative link instead of /latest/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35481) Create more robust link for Data Source Options

2021-05-20 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-35481:

Description: 
Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2 on July 1st.

It should point 3.2 document only in branch-3.2, so it's better to use a 
relative link instead of /latest/.

  was:
Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2 on July 1st. It should point 3.2 document only in 
branch-3.2.

So, we should use a relative link instead of /latest/.


> Create more robust link for Data Source Options
> ---
>
> Key: SPARK-35481
> URL: https://issues.apache.org/jira/browse/SPARK-35481
> Project: Spark
>  Issue Type: Documentation
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Now the link for the Data Source Options using /latest/, but it possibly be 
> broken when we cut branch-3.2 on July 1st.
> It should point 3.2 document only in branch-3.2, so it's better to use a 
> relative link instead of /latest/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35481) Create more robust link for Data Source Options

2021-05-20 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-35481:
---

 Summary: Create more robust link for Data Source Options
 Key: SPARK-35481
 URL: https://issues.apache.org/jira/browse/SPARK-35481
 Project: Spark
  Issue Type: Documentation
  Components: docs
Affects Versions: 3.2.0
Reporter: Haejoon Lee


Now the link for the Data Source Options using /latest/, but it possibly be 
broken when we cut branch-3.2 on July 1st. It should point 3.2 document only in 
branch-3.2.

So, we should use a relative link instead of /latest/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35480) percentile_approx function doesn't work with pivot

2021-05-20 Thread Christopher Bryant (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bryant updated SPARK-35480:
---
Description: 
The percentile_approx PySpark function does not appear to treat the "accuracy" 
parameter correctly when pivoting on a column, causing the query below to fail 
(this also fails if the accuracy parameter is left unspecified):

{{import pyspark.sql.functions as F}}

{{df = sc.parallelize([}}
 {{    ["a", -1.0],}}
 {{    ["a", 5.5],}}
 {{    ["a", 2.5],}}
 {{    ["b", 3.0],}}
 {{    ["b", 5.2]}}
 {{]).toDF(["type", "value"])}}
 {{    .groupBy()}}
 {{    .pivot("type", ["a", "b"])}}
 {{    .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}}

Error message: 

{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' 
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS 
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, 
CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage 
provided must be a constant literal; 'Aggregate [percentile_approx(if 
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if 
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), 
if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS 
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(b as string))) 1 else 
cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(a as string))) 1 else 
cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b 
as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as 
string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b 
as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD 
[type#242, value#243|#242, value#243], false}}

 

  was:
The percentile_approx PySpark function does not appear to treat the "accuracy" 
parameter correctly when pivoting on a column, causing the query below to fail 
(this also fails if the accuracy parameter is left unspecified):

{{import pyspark.sql.functions as F}}

{{df = sc.parallelize([}}
 {{    ["a", -1.0],}}
 {{    ["a", 5.5],}}
 {{    ["a", 2.5],}}
 {{    ["b", 3.0],}}
 {{    ["b", 5]}}
{{]).toDF(["type", "value"])}}
{{    .groupBy()}}
{{    .pivot("type", ["a", "b"])}}
 {{    .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}}

Error message: 

{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' 
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS 
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, 
CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage 
provided must be a constant literal; 'Aggregate [percentile_approx(if 
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if 
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), 
if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS 
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(b as string))) 1 else 
cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(a as string))) 1 else 
cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b 
as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as 
string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b 
as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD 
[type#242, value#243|#242, value#243], false}}

 


> percentile_approx function doesn't work with pivot
> --
>
> Key: SPARK-35480
> URL: https://issues.apache.org/jira/browse/SPARK-35480
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.1
>Reporter: Christopher Bryant
>Priority: Major
>
> The percentile_approx PySpark function does not appear to treat the 
> "accuracy" parameter correctly when pivoting on a column, causing the query 
> below to fail (this also fails if the accuracy parameter is left unspecified):
> 
> {{import pyspark.sql.functions as F}}
> {{df = sc.parallelize([}}
>  {{    ["a", -1.0],}}
>  {{    ["a",

[jira] [Updated] (SPARK-35480) percentile_approx function doesn't work with pivot

2021-05-20 Thread Christopher Bryant (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bryant updated SPARK-35480:
---
Description: 
The percentile_approx PySpark function does not appear to treat the "accuracy" 
parameter correctly when pivoting on a column, causing the query below to fail 
(this also fails if the accuracy parameter is left unspecified):

{{import pyspark.sql.functions as F}}

{{df = sc.parallelize([}}
 {{    ["a", -1.0],}}
 {{    ["a", 5.5],}}
 {{    ["a", 2.5],}}
 {{    ["b", 3.0],}}
 {{    ["b", 5]}}
{{]).toDF(["type", "value"])}}
{{    .groupBy()}}
{{    .pivot("type", ["a", "b"])}}
 {{    .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}}

Error message: 

{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' 
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS 
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, 
CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage 
provided must be a constant literal; 'Aggregate [percentile_approx(if 
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if 
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), 
if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS 
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(b as string))) 1 else 
cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(a as string))) 1 else 
cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b 
as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as 
string))) array(0.5) else cast(null as array), if ((type#242 <=> cast(b 
as string))) 1 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD 
[type#242, value#243|#242, value#243], false}}

 

  was:
The percentile_approx PySpark function does not appear to treat the "accuracy" 
parameter correctly when pivoting on a column, causing the query below to fail 
(this also fails if the accuracy parameter is left unspecified):

import pyspark.sql.functions as F

{{df = sc.parallelize([}}
{{    ["a", -1.0],}}
{{    ["a", 5.5],}}
{{    ["a", 2.5],}}
{{    ["b", 3.0],}}
{{    ["b", 5]}}
{{]).toDF(["type", "value"]) \}}
{{    .groupBy() \}}
{{    .pivot("type", ["a", "b"]) \}}
{{    .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}}

Error message: 

{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' 
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS 
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, 
CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage 
provided must be a constant literal; 'Aggregate [percentile_approx(if 
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if 
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), 
if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS 
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(b as string))) 1 else 
cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243], false}}

 


> percentile_approx function doesn't work with pivot
> --
>
> Key: SPARK-35480
> URL: https://issues.apache.org/jira/browse/SPARK-35480
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.1
>Reporter: Christopher Bryant
>Priority: Major
>
> The percentile_approx PySpark function does not appear to treat the 
> "accuracy" parameter correctly when pivoting on a column, causing the query 
> below to fail (this also fails if the accuracy parameter is left unspecified):
> 
> {{import pyspark.sql.functions as F}}
> {{df = sc.parallelize([}}
>  {{    ["a", -1.0],}}
>  {{    ["a", 5.5],}}
>  {{    ["a", 2.5],}}
>  {{    ["b", 3.0],}}
>  {{    ["b", 5]}}
> {{]).toDF(["type", "value"])}}
> {{    .groupBy()}}
> {{    .pivot("type", ["a", "b"])}}
>  {{    .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}}
> 
> Error message: 
> {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> 
> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> 
> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS 
> STRI

[jira] [Created] (SPARK-35480) percentile_approx function doesn't work with pivot

2021-05-20 Thread Christopher Bryant (Jira)
Christopher Bryant created SPARK-35480:
--

 Summary: percentile_approx function doesn't work with pivot
 Key: SPARK-35480
 URL: https://issues.apache.org/jira/browse/SPARK-35480
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 3.1.1
Reporter: Christopher Bryant


The percentile_approx PySpark function does not appear to treat the "accuracy" 
parameter correctly when pivoting on a column, causing the query below to fail 
(this also fails if the accuracy parameter is left unspecified):

import pyspark.sql.functions as F

{{df = sc.parallelize([}}
{{    ["a", -1.0],}}
{{    ["a", 5.5],}}
{{    ["a", 2.5],}}
{{    ["b", 3.0],}}
{{    ["b", 5]}}
{{]).toDF(["type", "value"]) \}}
{{    .groupBy() \}}
{{    .pivot("type", ["a", "b"]) \}}
{{    .agg(F.percentile_approx("value", [0.5], 1).alias("percentiles"))}}

Error message: 

{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' 
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS 
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 1, 
CAST(NULL AS INT' due to data type mismatch: The accuracy or percentage 
provided must be a constant literal; 'Aggregate [percentile_approx(if 
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if 
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array), 
if ((type#242 <=> cast(a as string))) 1 else cast(null as int), 0, 0) AS 
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
cast(null as array), if ((type#242 <=> cast(b as string))) 1 else 
cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243], false}}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35454) Ambiguous self-join doesn't fail after transfroming the dataset to dataframe

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35454:


Assignee: Apache Spark

> Ambiguous self-join doesn't fail after transfroming the dataset to dataframe
> 
>
> Key: SPARK-35454
> URL: https://issues.apache.org/jira/browse/SPARK-35454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column 
> ref") {
>   val df1 = spark.range(3)
>   val df2 = df1.filter($"id" > 0)
>   withSQLConf(
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true",
> SQLConf.CROSS_JOINS_ENABLED.key -> "true") {
> assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > 
> df2.colRegex("id")))
>   }
> }
> {code}
> For this unit test, if we append `.toDF()` to both df1 and df2, the query 
> won't fail. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35454) Ambiguous self-join doesn't fail after transfroming the dataset to dataframe

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35454:


Assignee: (was: Apache Spark)

> Ambiguous self-join doesn't fail after transfroming the dataset to dataframe
> 
>
> Key: SPARK-35454
> URL: https://issues.apache.org/jira/browse/SPARK-35454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column 
> ref") {
>   val df1 = spark.range(3)
>   val df2 = df1.filter($"id" > 0)
>   withSQLConf(
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true",
> SQLConf.CROSS_JOINS_ENABLED.key -> "true") {
> assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > 
> df2.colRegex("id")))
>   }
> }
> {code}
> For this unit test, if we append `.toDF()` to both df1 and df2, the query 
> won't fail. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35454) Ambiguous self-join doesn't fail after transfroming the dataset to dataframe

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348935#comment-17348935
 ] 

Apache Spark commented on SPARK-35454:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/32616

> Ambiguous self-join doesn't fail after transfroming the dataset to dataframe
> 
>
> Key: SPARK-35454
> URL: https://issues.apache.org/jira/browse/SPARK-35454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column 
> ref") {
>   val df1 = spark.range(3)
>   val df2 = df1.filter($"id" > 0)
>   withSQLConf(
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true",
> SQLConf.CROSS_JOINS_ENABLED.key -> "true") {
> assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > 
> df2.colRegex("id")))
>   }
> }
> {code}
> For this unit test, if we append `.toDF()` to both df1 and df2, the query 
> won't fail. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35479) Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348923#comment-17348923
 ] 

Apache Spark commented on SPARK-35479:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/32615

> Format PartitionFilters IN strings in scan nodes
> 
>
> Key: SPARK-35479
> URL: https://issues.apache.org/jira/browse/SPARK-35479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket proposes to format strings correctly for `PushedFilters`. For 
> example, `explain()` for a query below prints `v in (array('a'))` as 
> `PushedFilters: [In(v, [WrappedArray(a)])]`;
> {code}
> scala> sql("create table t (v array) using parquet")
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], 
> ReadSchema: struct>
> {code}
> This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; 
> {code}
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: 
> struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35479) Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35479:


Assignee: Apache Spark

> Format PartitionFilters IN strings in scan nodes
> 
>
> Key: SPARK-35479
> URL: https://issues.apache.org/jira/browse/SPARK-35479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> This ticket proposes to format strings correctly for `PushedFilters`. For 
> example, `explain()` for a query below prints `v in (array('a'))` as 
> `PushedFilters: [In(v, [WrappedArray(a)])]`;
> {code}
> scala> sql("create table t (v array) using parquet")
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], 
> ReadSchema: struct>
> {code}
> This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; 
> {code}
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: 
> struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35479) Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35479:


Assignee: (was: Apache Spark)

> Format PartitionFilters IN strings in scan nodes
> 
>
> Key: SPARK-35479
> URL: https://issues.apache.org/jira/browse/SPARK-35479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket proposes to format strings correctly for `PushedFilters`. For 
> example, `explain()` for a query below prints `v in (array('a'))` as 
> `PushedFilters: [In(v, [WrappedArray(a)])]`;
> {code}
> scala> sql("create table t (v array) using parquet")
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], 
> ReadSchema: struct>
> {code}
> This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; 
> {code}
> scala> sql("select * from t where v in (array('a'), null)").explain()
> == Physical Plan ==
> *(1) Filter v#4 IN ([a],null)
> +- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
> ([a],null)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
>  PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: 
> struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35479) Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-35479:


 Summary: Format PartitionFilters IN strings in scan nodes
 Key: SPARK-35479
 URL: https://issues.apache.org/jira/browse/SPARK-35479
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Takeshi Yamamuro


This ticket proposes to format strings correctly for `PushedFilters`. For 
example, `explain()` for a query below prints `v in (array('a'))` as 
`PushedFilters: [In(v, [WrappedArray(a)])]`;
{code}
scala> sql("create table t (v array) using parquet")
scala> sql("select * from t where v in (array('a'), null)").explain()
== Physical Plan ==
*(1) Filter v#4 IN ([a],null)
+- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
([a],null)], Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
 PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], 
ReadSchema: struct>
{code}
This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`; 
{code}
scala> sql("select * from t where v in (array('a'), null)").explain()
== Physical Plan ==
*(1) Filter v#4 IN ([a],null)
+- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN 
([a],null)], Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t],
 PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: 
struct>
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35465:


Assignee: Apache Spark

> Enable disallow_untyped_defs mypy check except for major modules.
> -
>
> Key: SPARK-35465
> URL: https://issues.apache.org/jira/browse/SPARK-35465
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Set up the mypy configuration and add type annotations except for major 
> modules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348832#comment-17348832
 ] 

Apache Spark commented on SPARK-35465:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/32614

> Enable disallow_untyped_defs mypy check except for major modules.
> -
>
> Key: SPARK-35465
> URL: https://issues.apache.org/jira/browse/SPARK-35465
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Set up the mypy configuration and add type annotations except for major 
> modules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35465:


Assignee: (was: Apache Spark)

> Enable disallow_untyped_defs mypy check except for major modules.
> -
>
> Key: SPARK-35465
> URL: https://issues.apache.org/jira/browse/SPARK-35465
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Set up the mypy configuration and add type annotations except for major 
> modules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35477) Enable disallow_untyped_defs mypy check for pyspark.pandas.utils.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35477:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.utils.
 Key: SPARK-35477
 URL: https://issues.apache.org/jira/browse/SPARK-35477
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35478) Enable disallow_untyped_defs mypy check for pyspark.pandas.window.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35478:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.window.
 Key: SPARK-35478
 URL: https://issues.apache.org/jira/browse/SPARK-35478
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35476) Enable disallow_untyped_defs mypy check for pyspark.pandas.series.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35476:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.series.
 Key: SPARK-35476
 URL: https://issues.apache.org/jira/browse/SPARK-35476
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35475) Enable disallow_untyped_defs mypy check for pyspark.pandas.namespace.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35475:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.namespace.
 Key: SPARK-35475
 URL: https://issues.apache.org/jira/browse/SPARK-35475
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35474) Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35474:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.indexing.
 Key: SPARK-35474
 URL: https://issues.apache.org/jira/browse/SPARK-35474
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35472:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.generic.
 Key: SPARK-35472
 URL: https://issues.apache.org/jira/browse/SPARK-35472
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35473) Enable disallow_untyped_defs mypy check for pyspark.pandas.groupby.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35473:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.groupby.
 Key: SPARK-35473
 URL: https://issues.apache.org/jira/browse/SPARK-35473
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35470) Enable disallow_untyped_defs mypy check for pyspark.pandas.base.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35470:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.base.
 Key: SPARK-35470
 URL: https://issues.apache.org/jira/browse/SPARK-35470
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35471) Enable disallow_untyped_defs mypy check for pyspark.pandas.frame.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35471:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.frame.
 Key: SPARK-35471
 URL: https://issues.apache.org/jira/browse/SPARK-35471
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35468) Enable disallow_untyped_defs mypy check for pyspark.pandas.typedef.typehints.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35468:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.typedef.typehints.
 Key: SPARK-35468
 URL: https://issues.apache.org/jira/browse/SPARK-35468
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35469:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.accessors.
 Key: SPARK-35469
 URL: https://issues.apache.org/jira/browse/SPARK-35469
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.

2021-05-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-35466:
--
Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.data_type_ops.  (was: Enable disallow_untyped_defs for 
pyspark.pandas.data_type_ops.)

> Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
> -
>
> Key: SPARK-35466
> URL: https://issues.apache.org/jira/browse/SPARK-35466
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35467) Enable disallow_untyped_defs mypy check for pyspark.pandas.spark.accessors.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35467:
-

 Summary: Enable disallow_untyped_defs mypy check for 
pyspark.pandas.spark.accessors.
 Key: SPARK-35467
 URL: https://issues.apache.org/jira/browse/SPARK-35467
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35466) Enable disallow_untyped_defs for pyspark.pandas.data_type_ops.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35466:
-

 Summary: Enable disallow_untyped_defs for 
pyspark.pandas.data_type_ops.
 Key: SPARK-35466
 URL: https://issues.apache.org/jira/browse/SPARK-35466
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35465) Enable disallow_untyped_defs mypy check except for major modules.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35465:
-

 Summary: Enable disallow_untyped_defs mypy check except for major 
modules.
 Key: SPARK-35465
 URL: https://issues.apache.org/jira/browse/SPARK-35465
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin


Set up the mypy configuration and add type annotations except for major modules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35464) pandas APIs on Spark: Enable mypy check "disallow_untyped_defs" for main codes.

2021-05-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35464:
-

 Summary: pandas APIs on Spark: Enable mypy check 
"disallow_untyped_defs" for main codes.
 Key: SPARK-35464
 URL: https://issues.apache.org/jira/browse/SPARK-35464
 Project: Spark
  Issue Type: Umbrella
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin


Currently many functions in the main codes are still missing type annotations 
and disabled {{mypy}} check "disallow_untyped_defs".

We should add more type annotations and enable the {{mypy}} check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35364) Renaming the existing Koalas related codes.

2021-05-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-35364.
---
Fix Version/s: 3.2.0
 Assignee: Haejoon Lee
   Resolution: Fixed

Issue resolved by pull request 32516
https://github.com/apache/spark/pull/32516

> Renaming the existing Koalas related codes.
> ---
>
> Key: SPARK-35364
> URL: https://issues.apache.org/jira/browse/SPARK-35364
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.2.0
>
>
> We should renaming the several Koalas-related codes in the pandas APIs on 
> Spark.
>  * kdf -> psdf
>  * kser -> psser
>  * kidx -> psidx
>  * kmidx -> psmidx
>  * sdf.to_koalas() -> sdf.to_pandas_on_spark()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35463.
---
Fix Version/s: 3.0.3
   3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 32613
[https://github.com/apache/spark/pull/32613]

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.2.0, 3.1.2, 3.0.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35463:
-

Assignee: Dongjoon Hyun

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35462.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32612
[https://github.com/apache/spark/pull/32612]

> Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
> -
>
> Key: SPARK-35462
> URL: https://issues.apache.org/jira/browse/SPARK-35462
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35462:
-

Assignee: Dongjoon Hyun

> Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
> -
>
> Key: SPARK-35462
> URL: https://issues.apache.org/jira/browse/SPARK-35462
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348799#comment-17348799
 ] 

Apache Spark commented on SPARK-35463:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32613

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35463:


Assignee: (was: Apache Spark)

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35463:


Assignee: Apache Spark

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348800#comment-17348800
 ] 

Apache Spark commented on SPARK-35463:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32613

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35458) ARM CI failed: failed to validate maven sha512

2021-05-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-35458:
-
Priority: Minor  (was: Major)

> ARM CI failed: failed to validate maven sha512
> --
>
> Key: SPARK-35458
> URL: https://issues.apache.org/jira/browse/SPARK-35458
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Minor
> Fix For: 3.0.3, 3.1.2, 3.2.0
>
>
> Log:
>  
> Veryfing checksum from 
> /home/jenkins/workspace/spark-master-test-maven-arm/build/apache-maven-3.6.3-bin.tar.gz.sha512
>  *Unknown option: q*
>  *Type shasum -h for help*
>  Bad checksum from 
> [https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz.sha512]
>  
> Looks like shasum validation had some wrong change in:
> [https://github.com/apache/spark/commit/6c5fcac6b787d01ebf3d9f53410db2c894ab9abd#diff-590845f9441f6be1f05f517fd1caf31d64d0b5126ea9a2a13d79c74f761417ce]
>  
> [1] [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/]
> [2] 
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35463:
--
Target Version/s: 3.0.3, 3.1.2, 3.2.0

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35458) ARM CI failed: failed to validate maven sha512

2021-05-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-35458.
--
Fix Version/s: 3.0.3
   3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 32604
[https://github.com/apache/spark/pull/32604]

> ARM CI failed: failed to validate maven sha512
> --
>
> Key: SPARK-35458
> URL: https://issues.apache.org/jira/browse/SPARK-35458
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.2.0, 3.1.2, 3.0.3
>
>
> Log:
>  
> Veryfing checksum from 
> /home/jenkins/workspace/spark-master-test-maven-arm/build/apache-maven-3.6.3-bin.tar.gz.sha512
>  *Unknown option: q*
>  *Type shasum -h for help*
>  Bad checksum from 
> [https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz.sha512]
>  
> Looks like shasum validation had some wrong change in:
> [https://github.com/apache/spark/commit/6c5fcac6b787d01ebf3d9f53410db2c894ab9abd#diff-590845f9441f6be1f05f517fd1caf31d64d0b5126ea9a2a13d79c74f761417ce]
>  
> [1] [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/]
> [2] 
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35463:
--
Priority: Blocker  (was: Major)

> Skip checking checksum on a system doesn't have `shasum`
> 
>
> Key: SPARK-35463
> URL: https://issues.apache.org/jira/browse/SPARK-35463
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35458) ARM CI failed: failed to validate maven sha512

2021-05-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-35458:


Assignee: Yikun Jiang

> ARM CI failed: failed to validate maven sha512
> --
>
> Key: SPARK-35458
> URL: https://issues.apache.org/jira/browse/SPARK-35458
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> Log:
>  
> Veryfing checksum from 
> /home/jenkins/workspace/spark-master-test-maven-arm/build/apache-maven-3.6.3-bin.tar.gz.sha512
>  *Unknown option: q*
>  *Type shasum -h for help*
>  Bad checksum from 
> [https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz.sha512]
>  
> Looks like shasum validation had some wrong change in:
> [https://github.com/apache/spark/commit/6c5fcac6b787d01ebf3d9f53410db2c894ab9abd#diff-590845f9441f6be1f05f517fd1caf31d64d0b5126ea9a2a13d79c74f761417ce]
>  
> [1] [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/]
> [2] 
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35463) Skip checking checksum on a system doesn't have `shasum`

2021-05-20 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-35463:
-

 Summary: Skip checking checksum on a system doesn't have `shasum`
 Key: SPARK-35463
 URL: https://issues.apache.org/jira/browse/SPARK-35463
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.3, 3.1.2, 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18683) REST APIs for standalone Master、Workers and Applications

2021-05-20 Thread Mayank Asthana (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348708#comment-17348708
 ] 

Mayank Asthana commented on SPARK-18683:


Looking through the code I found that there is a  `/json` endpoint to the 
master ui which returns the json representation of everything on that page. 
However, I don't think this is documented anywhere.

> REST APIs for standalone Master、Workers and Applications
> 
>
> Key: SPARK-18683
> URL: https://issues.apache.org/jira/browse/SPARK-18683
> Project: Spark
>  Issue Type: Improvement
>Reporter: Shixiong Zhu
>Priority: Major
>  Labels: bulk-closed
>
> It would be great that we have some REST APIs to access Master、Workers and 
> Applications information. Right now the only way to get them is using the Web 
> UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35462:


Assignee: (was: Apache Spark)

> Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
> -
>
> Key: SPARK-35462
> URL: https://issues.apache.org/jira/browse/SPARK-35462
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348681#comment-17348681
 ] 

Apache Spark commented on SPARK-35462:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32612

> Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
> -
>
> Key: SPARK-35462
> URL: https://issues.apache.org/jira/browse/SPARK-35462
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35462:


Assignee: Apache Spark

> Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models
> -
>
> Key: SPARK-35462
> URL: https://issues.apache.org/jira/browse/SPARK-35462
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35462) Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 models

2021-05-20 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-35462:
-

 Summary: Upgrade Kubernetes-client to 5.4.0 to support K8s 1.21 
models
 Key: SPARK-35462
 URL: https://issues.apache.org/jira/browse/SPARK-35462
 Project: Spark
  Issue Type: Improvement
  Components: Build, Kubernetes
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35461:
--
Issue Type: Improvement  (was: Bug)

> Error when reading dictionary-encoded Parquet int column when read schema is 
> bigint
> ---
>
> Key: SPARK-35461
> URL: https://issues.apache.org/jira/browse/SPARK-35461
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Chao Sun
>Priority: Major
>
> When reading a dictionary-encoded integer column from a Parquet file, and 
> users specify read schema to be bigint, Spark currently will fail with the 
> following exception:
> {code}
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
> {code}
> To reproduce:
> {code}
> val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, 
> i.toString))
> withParquetFile(data) { path =>
>   val readSchema = StructType(Seq(StructField("_1", LongType)))
>   spark.read.schema(readSchema).parquet(path).first()
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

2021-05-20 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348678#comment-17348678
 ] 

Dongjoon Hyun commented on SPARK-35461:
---

For a record, Apache Spark file-based data sources have different capabilities 
like we don't expect much capability at TEXT data sources. Parquet data source 
has been having this limitation for a long time.

> Error when reading dictionary-encoded Parquet int column when read schema is 
> bigint
> ---
>
> Key: SPARK-35461
> URL: https://issues.apache.org/jira/browse/SPARK-35461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Chao Sun
>Priority: Major
>
> When reading a dictionary-encoded integer column from a Parquet file, and 
> users specify read schema to be bigint, Spark currently will fail with the 
> following exception:
> {code}
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
> {code}
> To reproduce:
> {code}
> val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, 
> i.toString))
> withParquetFile(data) { path =>
>   val readSchema = StructType(Seq(StructField("_1", LongType)))
>   spark.read.schema(readSchema).parquet(path).first()
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35461:
--
Affects Version/s: (was: 3.1.1)
   (was: 3.0.2)
   3.2.0

> Error when reading dictionary-encoded Parquet int column when read schema is 
> bigint
> ---
>
> Key: SPARK-35461
> URL: https://issues.apache.org/jira/browse/SPARK-35461
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Major
>
> When reading a dictionary-encoded integer column from a Parquet file, and 
> users specify read schema to be bigint, Spark currently will fail with the 
> following exception:
> {code}
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
> {code}
> To reproduce:
> {code}
> val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, 
> i.toString))
> withParquetFile(data) { path =>
>   val readSchema = StructType(Seq(StructField("_1", LongType)))
>   spark.read.schema(readSchema).parquet(path).first()
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

2021-05-20 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348677#comment-17348677
 ] 

Dongjoon Hyun commented on SPARK-35461:
---

This is a well-known limitation after we build a test coverage via SPARK-23007 
at Apache Spark 2.4.0.

> Error when reading dictionary-encoded Parquet int column when read schema is 
> bigint
> ---
>
> Key: SPARK-35461
> URL: https://issues.apache.org/jira/browse/SPARK-35461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Chao Sun
>Priority: Major
>
> When reading a dictionary-encoded integer column from a Parquet file, and 
> users specify read schema to be bigint, Spark currently will fail with the 
> following exception:
> {code}
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
> {code}
> To reproduce:
> {code}
> val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, 
> i.toString))
> withParquetFile(data) { path =>
>   val readSchema = StructType(Seq(StructField("_1", LongType)))
>   spark.read.schema(readSchema).parquet(path).first()
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348675#comment-17348675
 ] 

Apache Spark commented on SPARK-35314:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/32611

> Support arithmetic operations against bool IndexOpsMixin
> 
>
> Key: SPARK-35314
> URL: https://issues.apache.org/jira/browse/SPARK-35314
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Existing binary operations of bool Series in Koalas do not match pandas’ 
> behaviors.
> pandas take True as 1, False as 0 when dealing with numeric values, numeric 
> collections, and numeric Series; whereas Koalas raises an AnalysisException 
> no matter what the binary operation is.
> We aim to match pandas' behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35314:


Assignee: Apache Spark

> Support arithmetic operations against bool IndexOpsMixin
> 
>
> Key: SPARK-35314
> URL: https://issues.apache.org/jira/browse/SPARK-35314
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Existing binary operations of bool Series in Koalas do not match pandas’ 
> behaviors.
> pandas take True as 1, False as 0 when dealing with numeric values, numeric 
> collections, and numeric Series; whereas Koalas raises an AnalysisException 
> no matter what the binary operation is.
> We aim to match pandas' behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35314:


Assignee: (was: Apache Spark)

> Support arithmetic operations against bool IndexOpsMixin
> 
>
> Key: SPARK-35314
> URL: https://issues.apache.org/jira/browse/SPARK-35314
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Existing binary operations of bool Series in Koalas do not match pandas’ 
> behaviors.
> pandas take True as 1, False as 0 when dealing with numeric values, numeric 
> collections, and numeric Series; whereas Koalas raises an AnalysisException 
> no matter what the binary operation is.
> We aim to match pandas' behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348672#comment-17348672
 ] 

Apache Spark commented on SPARK-35314:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/32611

> Support arithmetic operations against bool IndexOpsMixin
> 
>
> Key: SPARK-35314
> URL: https://issues.apache.org/jira/browse/SPARK-35314
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Existing binary operations of bool Series in Koalas do not match pandas’ 
> behaviors.
> pandas take True as 1, False as 0 when dealing with numeric values, numeric 
> collections, and numeric Series; whereas Koalas raises an AnalysisException 
> no matter what the binary operation is.
> We aim to match pandas' behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35314) Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35314:
-
Summary: Support arithmetic operations against bool IndexOpsMixin  (was: 
Support arithmetic operations against bool Series)

> Support arithmetic operations against bool IndexOpsMixin
> 
>
> Key: SPARK-35314
> URL: https://issues.apache.org/jira/browse/SPARK-35314
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Existing binary operations of bool Series in Koalas do not match pandas’ 
> behaviors.
> pandas take True as 1, False as 0 when dealing with numeric values, numeric 
> collections, and numeric Series; whereas Koalas raises an AnalysisException 
> no matter what the binary operation is.
> We aim to match pandas' behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

2021-05-20 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348667#comment-17348667
 ] 

Chao Sun commented on SPARK-35461:
--

Actually this also fails when turning off the vectorized reader:
{code}
Caused by: java.lang.ClassCastException: class 
org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to class 
org.apache.spark.sql.catalyst.expressions.MutableInt 
(org.apache.spark.sql.catalyst.expressions.MutableLong and 
org.apache.spark.sql.catalyst.expressions.MutableInt are in unnamed module of 
loader 'app')
at 
org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.setInt(SpecificInternalRow.scala:253)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RowUpdater.setInt(ParquetRowConverter.scala:178)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter.addInt(ParquetRowConverter.scala:88)
at 
org.apache.parquet.column.impl.ColumnReaderBase$2$3.writeValue(ColumnReaderBase.java:297)
at 
org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:440)
at 
org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30)
at 
org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:229)
{code}
In this case parquet-mr is able to return the value but Spark won't be able to 
handle it.

> Error when reading dictionary-encoded Parquet int column when read schema is 
> bigint
> ---
>
> Key: SPARK-35461
> URL: https://issues.apache.org/jira/browse/SPARK-35461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Chao Sun
>Priority: Major
>
> When reading a dictionary-encoded integer column from a Parquet file, and 
> users specify read schema to be bigint, Spark currently will fail with the 
> following exception:
> {code}
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
> {code}
> To reproduce:
> {code}
> val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, 
> i.toString))
> withParquetFile(data) { path =>
>   val readSchema = StructType(Seq(StructField("_1", LongType)))
>   spark.read.schema(readSchema).parquet(path).first()
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

2021-05-20 Thread Chao Sun (Jira)
Chao Sun created SPARK-35461:


 Summary: Error when reading dictionary-encoded Parquet int column 
when read schema is bigint
 Key: SPARK-35461
 URL: https://issues.apache.org/jira/browse/SPARK-35461
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.0.2
Reporter: Chao Sun


When reading a dictionary-encoded integer column from a Parquet file, and users 
specify read schema to be bigint, Spark currently will fail with the following 
exception:
{code}
java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
{code}

To reproduce:
{code}
val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, i.toString))
withParquetFile(data) { path =>
  val readSchema = StructType(Seq(StructField("_1", LongType)))
  spark.read.schema(readSchema).parquet(path).first()
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2

2021-05-20 Thread Lu Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xu updated SPARK-33743:
--
Description: 
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }

{code}
 

*+Proposal+*  
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP)).. }
{code}
 

  was:
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }

{code}
 

*+Proposal+*  
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. }
{code}
 


> Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
> ---
>
> Key: SPARK-33743
> URL: https://issues.apache.org/jira/browse/SPARK-33743
> Project: Spark
>  Issue Type: Request
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Lu Xu
>Priority: Major
>
> *datetime v/s datetime2*
> Spark datetime type is 
> [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
>  This supports a microsecond resolution.
>  
> Sql supports 2 date time types
> o *datetime* can support only milli seconds resolu

[jira] [Updated] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35460:
--
Affects Version/s: (was: 3.1.1)
   3.2.0

>  invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
> -
>
> Key: SPARK-35460
> URL: https://issues.apache.org/jira/browse/SPARK-35460
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: 
> https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. 
> Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must 
> consist of lower case alphanumeric characters, '-' or '.', and must start and 
> end with an alphanumeric character (e.g. 'example.com', regex used for 
> validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> reason=FieldValueInvalid, additionalProperties={}), 
> StatusCause(field=spec.hostname, message=Invalid value: 
> "spark_exec-exec-688": a DNS-1123 label must consist of lower case 
> alphanumeric characters or '-', and must start and end with an alphanumeric 
> character (e.g. 'my-name',  or '123-abc', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
> {code}
> When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, 
> the driver will continuously fail to request executors from k8s master, which 
> causes the app to hang with the above message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2

2021-05-20 Thread Lu Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xu updated SPARK-33743:
--
Description: 
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }

{code}
 

*+Proposal+*  
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. }
{code}
 

  was:
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*

 
{code:java}

{code}
 *override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }* 

*+Proposal+*  
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. }
{code}
 


> Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
> ---
>
> Key: SPARK-33743
> URL: https://issues.apache.org/jira/browse/SPARK-33743
> Project: Spark
>  Issue Type: Request
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Lu Xu
>Priority: Major
>
> *datetime v/s datetime2*
> Spark datetime type is 
> [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
>  This supports a microsecond resolution.
>  
> Sql supports 2 date time types
> o *datetime* can support only milli secon

[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2

2021-05-20 Thread Lu Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xu updated SPARK-33743:
--
Description: 
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*

 
{code:java}

{code}
 *override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) .. }* 

*+Proposal+*  
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. }
{code}
 

  was:
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
|override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ *case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| |

*+Proposal+*  

 
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. }
{code}
 


> Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
> ---
>
> Key: SPARK-33743
> URL: https://issues.apache.org/jira/browse/SPARK-33743
> Project: Spark
>  Issue Type: Request
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Lu Xu
>Priority: Major
>
> *datetime v/s datetime2*
> Spark datetime type is 
> [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
>  This supports a microsecond resolution.
>  
> Sql supports 2 date time types
> o *datetime* can support only milli seconds resolutio

[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2

2021-05-20 Thread Lu Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xu updated SPARK-33743:
--
Description: 
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
|override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ *case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| |

*+Proposal+*  

 
{code:java}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))* .. }
{code}
 

  was:
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
|override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| |

*+Proposal+*  
 override def getJDBCType(dt: DataType): Option[JdbcType] = dt match

{ *_case TimestampType => Some(JdbcType("DATETIME2", 
java.sql.Types.TIMESTAMP))}_*
 ..
 }| |


> Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
> ---
>
> Key: SPARK-33743
> URL: https://issues.apache.org/jira/browse/SPARK-33743
> Project: Spark
>  Issue Type: Request
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Lu Xu
>Priority: Major
>
> *datetime v/s datetime2*
> Spark datetime type is 
> [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
>  This supports a microsecond resolution.
>  
> Sql supports 2 date time types
> o *datetime* can support only milli seconds resolution (0 to 999).
> o *datetime2* is 

[jira] [Updated] (SPARK-33743) Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2

2021-05-20 Thread Lu Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xu updated SPARK-33743:
--
Description: 
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
|override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { *case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| |

*+Proposal+*  
 override def getJDBCType(dt: DataType): Option[JdbcType] = dt match

{ *_case TimestampType => Some(JdbcType("DATETIME2", 
java.sql.Types.TIMESTAMP))}_*
 ..
 }| |

  was:
*datetime v/s datetime2*

Spark datetime type is 
[timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
 This supports a microsecond resolution.

 

Sql supports 2 date time types

o *datetime* can support only milli seconds resolution (0 to 999).

o *datetime2* is extension of datetime , is compatible with datetime and 
supports 0 to 999 sub second resolution.

Currently 
[MsSQLServerDialect|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fbfb257f078854ad587a9e2bfe548cdb7bf8786d4%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fjdbc%2FMsSqlServerDialect.scala&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986197428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PMT9rA08NJRN0kwHy2ERaloOaDRB6ZsBBd70MZXl%2Bv4%3D&reserved=0]
 maps timestamptype to datetime. This implies results in errors when writing

*+Current+*
|override def getJDBCType(dt: DataType): Option[JdbcType] = dt match \{ *case 
TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))* .. }| |

*+Proposal+*  
 override def getJDBCType(dt: DataType): Option[JdbcType] = dt match

{ *_case TimestampType => if(oldDateTime)

{Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))}

else \{Some(JdbcType("DATETIME2", java.sql.Types.TIMESTAMP))}_*
 ..
 }| |


> Change datatype mapping in JDBC mssqldialect: DATETIME to DATETIME2
> ---
>
> Key: SPARK-33743
> URL: https://issues.apache.org/jira/browse/SPARK-33743
> Project: Spark
>  Issue Type: Request
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Lu Xu
>Priority: Major
>
> *datetime v/s datetime2*
> Spark datetime type is 
> [timestamptype|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fjava%2Forg%2Fapache%2Fspark%2Fsql%2Ftypes%2FTimestampType.html&data=04%7C01%7Cluxu1%40microsoft.com%7C39803a0f635646dadd6b08d89010896a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637417747986187437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qPPJve%2FGAPeIp%2BI2hjB%2BqoGGN%2FcJQe6CIDjlEdUyASo%3D&reserved=0].
>  This supports a microsecond resolution.
>  
> Sql supports 2 date time types
> o *datetime* can supp

[jira] [Commented] (SPARK-35256) Subexpression elimination leading to a performance regression

2021-05-20 Thread Ondrej Kokes (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348576#comment-17348576
 ] 

Ondrej Kokes commented on SPARK-35256:
--

[~Kimahriman] I think you're right - I've just built the PR linked in 35410 and 
it brought the runtime to less than half of what it was under 3.1.1 (15.45s vs 
35s) and it's also faster than 2.4.x, which is nice. So if merged, I'll close 
this as a dupe - but for now I'll subscribe to that issue and PR and wait for 
its resolution.

Strangely enough, my original pipeline (which was simplified into the repro 
linked in this issue) is only 10% faster than under 3.1.1 (so way way slower 
than 2.4.x), so there are more things at play. I'll investigate more once this 
is merged.

Thanks for the links!

> Subexpression elimination leading to a performance regression
> -
>
> Key: SPARK-35256
> URL: https://issues.apache.org/jira/browse/SPARK-35256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Ondrej Kokes
>Priority: Minor
> Attachments: bisect_log.txt, bisect_timing.csv
>
>
> I'm seeing almost double the runtime between 3.0.1 and 3.1.1 in my pipeline 
> that does mostly str_to_map, split and a few other operations - all 
> projections, no joins or aggregations (it's here only to trigger the 
> pipeline). I cut it down to the simplest reproducible example I could - 
> anything I remove from this changes the runtime difference quite 
> dramatically. (even moving those two expressions from f.when to standalone 
> columns makes the difference disappear)
> {code:java}
> import time
> import os
> import pyspark  
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as f
> if __name__ == '__main__':
> print(pyspark.__version__)
> spark = SparkSession.builder.getOrCreate()
> filename = 'regression.csv'
> if not os.path.isfile(filename):
> with open(filename, 'wt') as fw:
> fw.write('foo\n')
> for _ in range(10_000_000):
> fw.write('foo=bar&baz=bak&bar=f,o,1:2:3\n')
> df = spark.read.option('header', True).csv(filename)
> t = time.time()
> dd = (df
> .withColumn('my_map', f.expr('str_to_map(foo, "&", "=")'))
> .withColumn('extracted',
> # without this top level split it is only 50% 
> slower, with it
> # the runtime almost doubles
> f.split(f.split(f.col("my_map")["bar"], ",")[2], 
> ":")[0]
>)
> .select(
> f.when(
> f.col("extracted").startswith("foo"), f.col("extracted")
> ).otherwise(
> f.concat(f.lit("foo"), f.col("extracted"))
> ).alias("foo")
> )
> )
> # dd.explain(True)
> _ = dd.groupby("foo").count().count()
> print("elapsed", time.time() - t)
> {code}
> Running this in 3.0.1 and 3.1.1 respectively (both installed from PyPI, on my 
> local macOS)
> {code:java}
> 3.0.1
> elapsed 21.262351036071777
> 3.1.1
> elapsed 40.26582884788513
> {code}
> (Meaning the transformation took 21 seconds in 3.0.1 and 40 seconds in 3.1.1)
> Feel free to make the CSV smaller to get a quicker feedback loop - it scales 
> linearly (I developed this with 2M rows).
> It might be related to my previous issue - SPARK-32989 - there are similar 
> operations, nesting etc. (splitting on the original column, not on a map, 
> makes the difference disappear)
> I tried dissecting the queries in SparkUI and via explain, but both 3.0.1 and 
> 3.1.1 produced identical plans.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348544#comment-17348544
 ] 

Apache Spark commented on SPARK-35460:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/32610

>  invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
> -
>
> Key: SPARK-35460
> URL: https://issues.apache.org/jira/browse/SPARK-35460
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: 
> https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. 
> Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must 
> consist of lower case alphanumeric characters, '-' or '.', and must start and 
> end with an alphanumeric character (e.g. 'example.com', regex used for 
> validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> reason=FieldValueInvalid, additionalProperties={}), 
> StatusCause(field=spec.hostname, message=Invalid value: 
> "spark_exec-exec-688": a DNS-1123 label must consist of lower case 
> alphanumeric characters or '-', and must start and end with an alphanumeric 
> character (e.g. 'my-name',  or '123-abc', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
> {code}
> When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, 
> the driver will continuously fail to request executors from k8s master, which 
> causes the app to hang with the above message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35460:


Assignee: (was: Apache Spark)

>  invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
> -
>
> Key: SPARK-35460
> URL: https://issues.apache.org/jira/browse/SPARK-35460
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: 
> https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. 
> Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must 
> consist of lower case alphanumeric characters, '-' or '.', and must start and 
> end with an alphanumeric character (e.g. 'example.com', regex used for 
> validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> reason=FieldValueInvalid, additionalProperties={}), 
> StatusCause(field=spec.hostname, message=Invalid value: 
> "spark_exec-exec-688": a DNS-1123 label must consist of lower case 
> alphanumeric characters or '-', and must start and end with an alphanumeric 
> character (e.g. 'my-name',  or '123-abc', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
> {code}
> When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, 
> the driver will continuously fail to request executors from k8s master, which 
> causes the app to hang with the above message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348543#comment-17348543
 ] 

Apache Spark commented on SPARK-35460:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/32610

>  invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
> -
>
> Key: SPARK-35460
> URL: https://issues.apache.org/jira/browse/SPARK-35460
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: 
> https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. 
> Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must 
> consist of lower case alphanumeric characters, '-' or '.', and must start and 
> end with an alphanumeric character (e.g. 'example.com', regex used for 
> validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> reason=FieldValueInvalid, additionalProperties={}), 
> StatusCause(field=spec.hostname, message=Invalid value: 
> "spark_exec-exec-688": a DNS-1123 label must consist of lower case 
> alphanumeric characters or '-', and must start and end with an alphanumeric 
> character (e.g. 'my-name',  or '123-abc', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
> {code}
> When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, 
> the driver will continuously fail to request executors from k8s master, which 
> causes the app to hang with the above message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35460:


Assignee: Apache Spark

>  invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang
> -
>
> Key: SPARK-35460
> URL: https://issues.apache.org/jira/browse/SPARK-35460
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> 21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: 
> https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. 
> Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "spark_exec-exec-688": a DNS-1123 subdomain must 
> consist of lower case alphanumeric characters, '-' or '.', and must start and 
> end with an alphanumeric character (e.g. 'example.com', regex used for 
> validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> reason=FieldValueInvalid, additionalProperties={}), 
> StatusCause(field=spec.hostname, message=Invalid value: 
> "spark_exec-exec-688": a DNS-1123 label must consist of lower case 
> alphanumeric characters or '-', and must start and end with an alphanumeric 
> character (e.g. 'my-name',  or '123-abc', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, name=spark_exec-exec-688, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
> "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
> alphanumeric characters, '-' or '.', and must start and end with an 
> alphanumeric character (e.g. 'example.com', regex used for validation is 
> '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
> spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
> consist of lower case alphanumeric characters or '-', and must start and end 
> with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
> validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
> {code}
> When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, 
> the driver will continuously fail to request executors from k8s master, which 
> causes the app to hang with the above message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35460) invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread Kent Yao (Jira)
Kent Yao created SPARK-35460:


 Summary:  invalid `spark.kubernetes.executor.podNamePrefix` causes 
app to hang
 Key: SPARK-35460
 URL: https://issues.apache.org/jira/browse/SPARK-35460
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.1
Reporter: Kent Yao



{code:java}
21/05/20 21:41:21 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying 
snapshot subscriber.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://kubernetes.docker.internal:6443/api/v1/namespaces/default/pods. 
Message: Pod "spark_exec-exec-688" is invalid: [metadata.name: Invalid value: 
"spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
alphanumeric characters, '-' or '.', and must start and end with an 
alphanumeric character (e.g. 'example.com', regex used for validation is 
'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
consist of lower case alphanumeric characters or '-', and must start and end 
with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]. Received status: 
Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=metadata.name, message=Invalid 
value: "spark_exec-exec-688": a DNS-1123 subdomain must consist of lower case 
alphanumeric characters, '-' or '.', and must start and end with an 
alphanumeric character (e.g. 'example.com', regex used for validation is 
'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
reason=FieldValueInvalid, additionalProperties={}), 
StatusCause(field=spec.hostname, message=Invalid value: "spark_exec-exec-688": 
a DNS-1123 label must consist of lower case alphanumeric characters or '-', and 
must start and end with an alphanumeric character (e.g. 'my-name',  or 
'123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), 
reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, 
name=spark_exec-exec-688, retryAfterSeconds=null, uid=null, 
additionalProperties={}), kind=Status, message=Pod "spark_exec-exec-688" is 
invalid: [metadata.name: Invalid value: "spark_exec-exec-688": a DNS-1123 
subdomain must consist of lower case alphanumeric characters, '-' or '.', and 
must start and end with an alphanumeric character (e.g. 'example.com', regex 
used for validation is 
'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), 
spec.hostname: Invalid value: "spark_exec-exec-688": a DNS-1123 label must 
consist of lower case alphanumeric characters or '-', and must start and end 
with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for 
validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')], 
metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, 
status=Failure, additionalProperties={}).
at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583)
at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:522)
at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487)
at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448)
at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
{code}

When `spark.kubernetes.executor.podNamePrefix` contains invalid characters, the 
driver will continuously fail to request executors from k8s master, which 
causes the app to hang with the above message.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29223) Kafka source: offset by timestamp - allow specifying timestamp for "all partitions"

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348508#comment-17348508
 ] 

Apache Spark commented on SPARK-29223:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/32609

> Kafka source: offset by timestamp - allow specifying timestamp for "all 
> partitions"
> ---
>
> Key: SPARK-29223
> URL: https://issues.apache.org/jira/browse/SPARK-29223
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Priority: Minor
>
> This issue is a follow-up of SPARK-26848.
> In SPARK-26848, we decided to open possibility to let end users set 
> individual timestamp per partition. But in many cases, specifying timestamp 
> represents the intention that we would want to go back to specific timestamp 
> and reprocess records, which should be applied to all topics and partitions.
> According to the format of 
> `startingOffsetsByTimestamp`/`endingOffsetsByTimestamp`, while it's not 
> intuitive to provide an option to set a global timestamp across topic, it's 
> still intuitive to provide an option to set a global timestamp across 
> partitions in a topic.
> This issue tracks the efforts to deal with this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33867) java.time.Instant and java.time.LocalDate not handled in org.apache.spark.sql.jdbc.JdbcDialect#compileValue

2021-05-20 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348323#comment-17348323
 ] 

Takeshi Yamamuro commented on SPARK-33867:
--

Please see the "Fix Version/s" in this jira and that includes 3.1.x, too.

> java.time.Instant and java.time.LocalDate not handled in 
> org.apache.spark.sql.jdbc.JdbcDialect#compileValue
> ---
>
> Key: SPARK-33867
> URL: https://issues.apache.org/jira/browse/SPARK-33867
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Cristi
>Assignee: Cristi
>Priority: Major
> Fix For: 3.0.2, 3.1.1, 3.2.0
>
>
> When using the new java time API (spark.sql.datetime.java8API.enabled=true) 
> LocalDate and Instant aren't handled in 
> org.apache.spark.sql.jdbc.JdbcDialect#compileValue so exceptions are thrown 
> when they are used in filters since a filter condition would be translated to 
> something like this: "valid_from" > 2020-12-21T11:40:24.413681Z.
> To reproduce you can write a simple filter like where dataset is backed by a 
> DB table (in my case PostgreSQL): 
> dataset.filter(current_timestamp().gt(col(VALID_FROM)))
> The error and stacktrace:
> Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near 
> "T11"Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or 
> near "T11"  Position: 285 at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) 
> at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
>  at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>  at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at 
> org.apache.spark.scheduler.Task.run(Task.scala:127) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35373) Verify checksums of downloaded artifacts in build/mvn

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348318#comment-17348318
 ] 

Apache Spark commented on SPARK-35373:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32608

> Verify checksums of downloaded artifacts in build/mvn
> -
>
> Key: SPARK-35373
> URL: https://issues.apache.org/jira/browse/SPARK-35373
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.7, 3.0.2, 3.1.1
>Reporter: Sean R. Owen
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.0.3, 3.1.2, 3.2.0
>
>
> build/mvn is a convenience script that will automatically download Maven (and 
> Scala) if not already present. While it downloads from official ASF mirrors, 
> it does not check the checksum of the artifact, which is available as a 
> .sha512 file from ASF servers.
> The risk of a supply chain attack is a bit less theoretical here than usual, 
> because artifacts are downloaded from any of several mirrors worldwide, and 
> injecting a malicious copy of Maven in any one of them might be simpler and 
> less noticeable than injecting it into ASF servers.
> (Note, Scala's download site does not seem to provide a checksum. They do all 
> come from Lightbend, at least, not N mirrors. Not much we can do there.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33867) java.time.Instant and java.time.LocalDate not handled in org.apache.spark.sql.jdbc.JdbcDialect#compileValue

2021-05-20 Thread Cristi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348298#comment-17348298
 ] 

Cristi commented on SPARK-33867:


looks like it: 
https://github.com/apache/spark/blob/branch-3.1/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

> java.time.Instant and java.time.LocalDate not handled in 
> org.apache.spark.sql.jdbc.JdbcDialect#compileValue
> ---
>
> Key: SPARK-33867
> URL: https://issues.apache.org/jira/browse/SPARK-33867
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Cristi
>Assignee: Cristi
>Priority: Major
> Fix For: 3.0.2, 3.1.1, 3.2.0
>
>
> When using the new java time API (spark.sql.datetime.java8API.enabled=true) 
> LocalDate and Instant aren't handled in 
> org.apache.spark.sql.jdbc.JdbcDialect#compileValue so exceptions are thrown 
> when they are used in filters since a filter condition would be translated to 
> something like this: "valid_from" > 2020-12-21T11:40:24.413681Z.
> To reproduce you can write a simple filter like where dataset is backed by a 
> DB table (in my case PostgreSQL): 
> dataset.filter(current_timestamp().gt(col(VALID_FROM)))
> The error and stacktrace:
> Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near 
> "T11"Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or 
> near "T11"  Position: 285 at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) 
> at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
>  at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>  at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at 
> org.apache.spark.scheduler.Task.run(Task.scala:127) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35459) Move AvroRowReaderSuite to a separate file

2021-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35459.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32607
[https://github.com/apache/spark/pull/32607]

> Move AvroRowReaderSuite to a separate file
> --
>
> Key: SPARK-35459
> URL: https://issues.apache.org/jira/browse/SPARK-35459
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Move AvroRowReaderSuite from AvroSuite.scala and place it to 
> AvroRowReaderSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33867) java.time.Instant and java.time.LocalDate not handled in org.apache.spark.sql.jdbc.JdbcDialect#compileValue

2021-05-20 Thread LiaoHanwen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348294#comment-17348294
 ] 

LiaoHanwen commented on SPARK-33867:


Is this fixed on branch-3.1?

> java.time.Instant and java.time.LocalDate not handled in 
> org.apache.spark.sql.jdbc.JdbcDialect#compileValue
> ---
>
> Key: SPARK-33867
> URL: https://issues.apache.org/jira/browse/SPARK-33867
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Cristi
>Assignee: Cristi
>Priority: Major
> Fix For: 3.0.2, 3.1.1, 3.2.0
>
>
> When using the new java time API (spark.sql.datetime.java8API.enabled=true) 
> LocalDate and Instant aren't handled in 
> org.apache.spark.sql.jdbc.JdbcDialect#compileValue so exceptions are thrown 
> when they are used in filters since a filter condition would be translated to 
> something like this: "valid_from" > 2020-12-21T11:40:24.413681Z.
> To reproduce you can write a simple filter like where dataset is backed by a 
> DB table (in my case PostgreSQL): 
> dataset.filter(current_timestamp().gt(col(VALID_FROM)))
> The error and stacktrace:
> Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near 
> "T11"Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or 
> near "T11"  Position: 285 at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) 
> at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
>  at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>  at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at 
> org.apache.spark.scheduler.Task.run(Task.scala:127) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35424) Remove some useless code in ExternalBlockHandler

2021-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35424.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32571
[https://github.com/apache/spark/pull/32571]

> Remove some useless code in ExternalBlockHandler
> 
>
> Key: SPARK-35424
> URL: https://issues.apache.org/jira/browse/SPARK-35424
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: weixiuli
>Assignee: weixiuli
>Priority: Major
> Fix For: 3.2.0
>
>
> There is some useless code in the ExternalBlockHandler, so we may remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35424) Remove some useless code in ExternalBlockHandler

2021-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35424:


Assignee: weixiuli

> Remove some useless code in ExternalBlockHandler
> 
>
> Key: SPARK-35424
> URL: https://issues.apache.org/jira/browse/SPARK-35424
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: weixiuli
>Assignee: weixiuli
>Priority: Major
>
> There is some useless code in the ExternalBlockHandler, so we may remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35459) Move AvroRowReaderSuite to a separate file

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35459:


Assignee: Max Gekk  (was: Apache Spark)

> Move AvroRowReaderSuite to a separate file
> --
>
> Key: SPARK-35459
> URL: https://issues.apache.org/jira/browse/SPARK-35459
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Move AvroRowReaderSuite from AvroSuite.scala and place it to 
> AvroRowReaderSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35459) Move AvroRowReaderSuite to a separate file

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348195#comment-17348195
 ] 

Apache Spark commented on SPARK-35459:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/32607

> Move AvroRowReaderSuite to a separate file
> --
>
> Key: SPARK-35459
> URL: https://issues.apache.org/jira/browse/SPARK-35459
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Move AvroRowReaderSuite from AvroSuite.scala and place it to 
> AvroRowReaderSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35459) Move AvroRowReaderSuite to a separate file

2021-05-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348194#comment-17348194
 ] 

Apache Spark commented on SPARK-35459:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/32607

> Move AvroRowReaderSuite to a separate file
> --
>
> Key: SPARK-35459
> URL: https://issues.apache.org/jira/browse/SPARK-35459
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Move AvroRowReaderSuite from AvroSuite.scala and place it to 
> AvroRowReaderSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35459) Move AvroRowReaderSuite to a separate file

2021-05-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35459:


Assignee: Apache Spark  (was: Max Gekk)

> Move AvroRowReaderSuite to a separate file
> --
>
> Key: SPARK-35459
> URL: https://issues.apache.org/jira/browse/SPARK-35459
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Move AvroRowReaderSuite from AvroSuite.scala and place it to 
> AvroRowReaderSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35378) Eagerly execute non-root Command so that query command with CTE

2021-05-20 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-35378:
---
Summary: Eagerly execute non-root Command so that query command with CTE  
(was: Eagerly execute Command so that query command with CTE)

> Eagerly execute non-root Command so that query command with CTE
> ---
>
> Key: SPARK-35378
> URL: https://issues.apache.org/jira/browse/SPARK-35378
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark doesn't support LeafRunnableCommand as sub query.
> Because the LeafRunnableCommand always output GenericInternalRow and some 
> node(e.g. SortExec, AdaptiveExecutionExec, WholeCodegenExec) will convert 
> GenericInternalRow to UnsafeRow. So will causes error as follows:
> {code:java}
> java.lang.ClassCastException
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast 
> to org.apache.spark.sql.catalyst.expressions.UnsafeRow
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >