date:20220113

[jira] [Assigned] (SPARK-37904) Improve RebalancePartitions in rules of Optimizer

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37904:


Assignee: Apache Spark

> Improve RebalancePartitions in rules of Optimizer
> -
>
> Key: SPARK-37904
> URL: https://issues.apache.org/jira/browse/SPARK-37904
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> After SPARK-37267, we support do optimize rebalance partitions in everywhere 
> of plan rather than limit to the root node. So It should make sense to also 
> let `RebalancePartitions` work in all rules of Optimizer like `Repartition` 
> and `RepartitionByExpression` did.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37904) Improve RebalancePartitions in rules of Optimizer

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476006#comment-17476006
 ] 

Apache Spark commented on SPARK-37904:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/35208

> Improve RebalancePartitions in rules of Optimizer
> -
>
> Key: SPARK-37904
> URL: https://issues.apache.org/jira/browse/SPARK-37904
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> After SPARK-37267, we support do optimize rebalance partitions in everywhere 
> of plan rather than limit to the root node. So It should make sense to also 
> let `RebalancePartitions` work in all rules of Optimizer like `Repartition` 
> and `RepartitionByExpression` did.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37904) Improve RebalancePartitions in rules of Optimizer

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37904:


Assignee: (was: Apache Spark)

> Improve RebalancePartitions in rules of Optimizer
> -
>
> Key: SPARK-37904
> URL: https://issues.apache.org/jira/browse/SPARK-37904
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> After SPARK-37267, we support do optimize rebalance partitions in everywhere 
> of plan rather than limit to the root node. So It should make sense to also 
> let `RebalancePartitions` work in all rules of Optimizer like `Repartition` 
> and `RepartitionByExpression` did.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36967) Report accurate shuffle block size if its skewed

2022-01-13 Thread Attila Zsolt Piros (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros reassigned SPARK-36967:
--

Assignee: Wan Kun  (was: Apache Spark)

> Report accurate shuffle block size if its skewed
> 
>
> Key: SPARK-36967
> URL: https://issues.apache.org/jira/browse/SPARK-36967
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: map_status.png, map_status2.png
>
>
> Now map task will report accurate shuffle block size if the block size is 
> greater than "spark.shuffle.accurateBlockThreshold"( 100M by default ). But 
> if there are a large number of map tasks and the shuffle block sizes of these 
> tasks are smaller than "spark.shuffle.accurateBlockThreshold", there may be 
> unrecognized data skew.
> For example, there are 1 map task and 1 reduce task, and each map 
> task create 50M shuffle blocks for reduce 0, and 10K shuffle blocks for the 
> left reduce tasks, reduce 0 is data skew, but the stat of this plan do not 
> have this information.
>     !map_status2.png!
> I think we need to judge if a shuffle block is huge and need to be accurate 
> reported while running.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36967) Report accurate shuffle block size if its skewed

2022-01-13 Thread Attila Zsolt Piros (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros resolved SPARK-36967.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34234
[https://github.com/apache/spark/pull/34234]

> Report accurate shuffle block size if its skewed
> 
>
> Key: SPARK-36967
> URL: https://issues.apache.org/jira/browse/SPARK-36967
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: map_status.png, map_status2.png
>
>
> Now map task will report accurate shuffle block size if the block size is 
> greater than "spark.shuffle.accurateBlockThreshold"( 100M by default ). But 
> if there are a large number of map tasks and the shuffle block sizes of these 
> tasks are smaller than "spark.shuffle.accurateBlockThreshold", there may be 
> unrecognized data skew.
> For example, there are 1 map task and 1 reduce task, and each map 
> task create 50M shuffle blocks for reduce 0, and 10K shuffle blocks for the 
> left reduce tasks, reduce 0 is data skew, but the stat of this plan do not 
> have this information.
>     !map_status2.png!
> I think we need to judge if a shuffle block is huge and need to be accurate 
> reported while running.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37907:


Assignee: (was: Apache Spark)

> StaticInvoke should support ConstantFolding
> ---
>
> Key: SPARK-37907
> URL: https://issues.apache.org/jira/browse/SPARK-37907
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37907:


Assignee: Apache Spark

> StaticInvoke should support ConstantFolding
> ---
>
> Key: SPARK-37907
> URL: https://issues.apache.org/jira/browse/SPARK-37907
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475994#comment-17475994
 ] 

Apache Spark commented on SPARK-37907:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/35207

> StaticInvoke should support ConstantFolding
> ---
>
> Key: SPARK-37907
> URL: https://issues.apache.org/jira/browse/SPARK-37907
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37873) SQL Syntax links are broken

2022-01-13 Thread Alex Ott (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Ott updated SPARK-37873:
-
Attachment: Screenshot 2022-01-14 at 08.07.24.png

> SQL Syntax links are broken
> ---
>
> Key: SPARK-37873
> URL: https://issues.apache.org/jira/browse/SPARK-37873
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Alex Ott
>Priority: Major
> Attachments: Screenshot 2022-01-14 at 08.07.24.png
>
>
> SQL Syntax links at [https://spark.apache.org/docs/latest/sql-ref.html] are 
> broken



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37873) SQL Syntax links are broken

2022-01-13 Thread Alex Ott (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475993#comment-17475993
 ] 

Alex Ott commented on SPARK-37873:
--

If you click on any:
 * [DDL Statements|https://spark.apache.org/docs/latest/sql-ref-syntax-ddl.html]
 * [DML Statements|https://spark.apache.org/docs/latest/sql-ref-syntax-dml.html]
 * [Data Retrieval 
Statements|https://spark.apache.org/docs/latest/sql-ref-syntax-qry.html]
 * [Auxiliary 
Statements|https://spark.apache.org/docs/latest/sql-ref-syntax-aux.html]

it will show file not found (see image)

!Screenshot 2022-01-14 at 08.07.24.png!

> SQL Syntax links are broken
> ---
>
> Key: SPARK-37873
> URL: https://issues.apache.org/jira/browse/SPARK-37873
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Alex Ott
>Priority: Major
> Attachments: Screenshot 2022-01-14 at 08.07.24.png
>
>
> SQL Syntax links at [https://spark.apache.org/docs/latest/sql-ref.html] are 
> broken



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-13 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475980#comment-17475980
 ] 

angerszhu commented on SPARK-37907:
---

Raise a pr soon

> StaticInvoke should support ConstantFolding
> ---
>
> Key: SPARK-37907
> URL: https://issues.apache.org/jira/browse/SPARK-37907
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-13 Thread angerszhu (Jira)

angerszhu created SPARK-37907:
-

 Summary: StaticInvoke should support ConstantFolding
 Key: SPARK-37907
 URL: https://issues.apache.org/jira/browse/SPARK-37907
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37906) spark-sql should not pass last simple comment to backend

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475974#comment-17475974
 ] 

Apache Spark commented on SPARK-37906:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/35206

> spark-sql should not pass last simple comment to backend 
> -
>
> Key: SPARK-37906
> URL: https://issues.apache.org/jira/browse/SPARK-37906
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> spark-sql should not pass last simple comment to backend 
> ```
> SELECT 1; -- comment
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37906) spark-sql should not pass last simple comment to backend

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37906:


Assignee: Apache Spark

> spark-sql should not pass last simple comment to backend 
> -
>
> Key: SPARK-37906
> URL: https://issues.apache.org/jira/browse/SPARK-37906
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> spark-sql should not pass last simple comment to backend 
> ```
> SELECT 1; -- comment
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37906) spark-sql should not pass last simple comment to backend

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37906:


Assignee: (was: Apache Spark)

> spark-sql should not pass last simple comment to backend 
> -
>
> Key: SPARK-37906
> URL: https://issues.apache.org/jira/browse/SPARK-37906
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> spark-sql should not pass last simple comment to backend 
> ```
> SELECT 1; -- comment
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37873) SQL Syntax links are broken

2022-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475969#comment-17475969
 ] 

Hyukjin Kwon commented on SPARK-37873:
--

[~alexott] which syntax is broken?

> SQL Syntax links are broken
> ---
>
> Key: SPARK-37873
> URL: https://issues.apache.org/jira/browse/SPARK-37873
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Alex Ott
>Priority: Major
>
> SQL Syntax links at [https://spark.apache.org/docs/latest/sql-ref.html] are 
> broken



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37872) [SQL] Some classes are move from org.codehaus.janino:janino to org.codehaus.janino:common-compiler after version 3.1.x

2022-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475971#comment-17475971
 ] 

Hyukjin Kwon commented on SPARK-37872:
--

Spark 2.4 is EOL. Is it still valid for Spark 3+?

> [SQL] Some classes are move from org.codehaus.janino:janino to 
> org.codehaus.janino:common-compiler after version 3.1.x 
> ---
>
> Key: SPARK-37872
> URL: https://issues.apache.org/jira/browse/SPARK-37872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Jin Shen
>Priority: Major
>
> Here is the code:
>  
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L32]
>  
> ByteArrayClassLoader and InternalCompilerException are moved to 
> org.codehaus.janino:common-compiler
>  
> [https://github.com/janino-compiler/janino/blob/3.1.6/commons-compiler/src/main/java/org/codehaus/commons/compiler/util/reflect/ByteArrayClassLoader.java]
>  
> [https://github.com/janino-compiler/janino/blob/3.1.6/commons-compiler/src/main/java/org/codehaus/commons/compiler/InternalCompilerException.java]
>  
> The last working version of janino is 3.0.16 but it is out of date.
> Can we make change to this and upgrade to new version of janino and 
> common-compiler?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37874) Link to Pandas UDF documentation is broken

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37874.
--
Resolution: Fixed

> Link to Pandas UDF documentation is broken
> --
>
> Key: SPARK-37874
> URL: https://issues.apache.org/jira/browse/SPARK-37874
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Alex Ott
>Priority: Major
>
> Link at 
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
>  is broken



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37874) Link to Pandas UDF documentation is broken

2022-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475968#comment-17475968
 ] 

Hyukjin Kwon commented on SPARK-37874:
--

Fixed in https://github.com/apache/spark/pull/34475

> Link to Pandas UDF documentation is broken
> --
>
> Key: SPARK-37874
> URL: https://issues.apache.org/jira/browse/SPARK-37874
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Alex Ott
>Priority: Major
>
> Link at 
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
>  is broken



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37874) Link to Pandas UDF documentation is broken

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37874:
-
Fix Version/s: 3.2.1
   3.3.0

> Link to Pandas UDF documentation is broken
> --
>
> Key: SPARK-37874
> URL: https://issues.apache.org/jira/browse/SPARK-37874
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Alex Ott
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> Link at 
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
>  is broken



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37882) pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values

2022-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475967#comment-17475967
 ] 

Hyukjin Kwon commented on SPARK-37882:
--

[~mattvan83] mind providing self-contained reproducer?

> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values
> -
>
> Key: SPARK-37882
> URL: https://issues.apache.org/jira/browse/SPARK-37882
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: Ubuntu 18.04
>Reporter: Matthieu Vanhoutte
>Priority: Major
>
> Hello,
> When trying to convert a pandas dataframe 
> {code:java}
> ss_corpus_dataframe{code}
>  (containing one column with two-dimensional numpy array) into a 
> pandas-on-spark dataframe with the following code:
> {code:java}
> df = ps.from_pandas(ss_corpus_dataframe){code}
> I got the following error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py",
>  line 375, in run_asgi
>     result = await app(self.scope, self.receive, self.send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py",
>  line 75, in __call__
>     return await self.app(scope, receive, send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py",
>  line 82, in __call__
>     raise exc from None
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py",
>  line 78, in __call__
>     await self.app(scope, inner_receive, inner_send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/applications.py",
>  line 208, in __call__
>     await super().__call__(scope, receive, send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/applications.py",
>  line 112, in __call__
>     await self.middleware_stack(scope, receive, send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py",
>  line 181, in __call__
>     raise exc
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py",
>  line 159, in __call__
>     await self.app(scope, receive, _send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py",
>  line 82, in __call__
>     raise exc
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py",
>  line 71, in __call__
>     await self.app(scope, receive, sender)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
>  line 656, in __call__
>     await route.handle(scope, receive, send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
>  line 259, in handle
>     await self.app(scope, receive, send)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
>  line 61, in app
>     response = await func(request)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py",
>  line 226, in app
>     raw_response = await run_endpoint_function(
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py",
>  line 159, in run_endpoint_function
>     return await dependant.call(**values)
>   File "./app/routers/semantic_searches.py", line 60, in 
> create_semantic_search
>     date_time_sem_search, clean_query, output_dict, error_code = await 
> apply_semantic_search_async(query=query, 
> api_sent_embed_url=settings.api_sent_embed_address, 
> ss_corpus_dataframe=ss_corpus_dataframe.dataframe, id_matrices=id_matrices, 
> top_k=75, similarity_score_thresh=0.5)
>   File "./app/backend/semantic_search/sts_tf_semantic_search.py", line 134, 
> in apply_semantic_search_async
>     df = ps.from_pandas(ss_corpus_dataframe)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/namespace.py",
>  line 143, in from_pandas
>     return DataFrame(pobj)
>   File 
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/frame.py",
>  line 520, in __init__
>     internal = InternalFrame.from_pandas(pdf)
>   File 
>

[jira] [Resolved] (SPARK-37883) log4j update to 2.17.1 in spark-core 3.2

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37883.
--
Resolution: Won't Fix

> log4j update to 2.17.1 in spark-core 3.2
> 
>
> Key: SPARK-37883
> URL: https://issues.apache.org/jira/browse/SPARK-37883
> Project: Spark
>  Issue Type: Bug
>  Components: Security, Spark Core
>Affects Versions: 3.2.0
>Reporter: Setu Agrawal
>Priority: Major
>  Labels: 
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0
>
> We are using spark-core jar file, as below
> libraryDependencies += "org.apache.spark" %% "spark-core" % "3.2.0"
> as per maven repository it usgaes log4j older version, which need to be 
> updated latest(2.17.1) to fix security vulenrability, please help us, how we 
> can get update version spark-core, which usages latest updated log4j.
> Thanks
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37883) log4j update to 2.17.1 in spark-core 3.2

2022-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475966#comment-17475966
 ] 

Hyukjin Kwon commented on SPARK-37883:
--

you should either upgrade to the latest Spark 3.3 when it's released.

> log4j update to 2.17.1 in spark-core 3.2
> 
>
> Key: SPARK-37883
> URL: https://issues.apache.org/jira/browse/SPARK-37883
> Project: Spark
>  Issue Type: Bug
>  Components: Security, Spark Core
>Affects Versions: 3.2.0
>Reporter: Setu Agrawal
>Priority: Major
>  Labels: 
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0
>
> We are using spark-core jar file, as below
> libraryDependencies += "org.apache.spark" %% "spark-core" % "3.2.0"
> as per maven repository it usgaes log4j older version, which need to be 
> updated latest(2.17.1) to fix security vulenrability, please help us, how we 
> can get update version spark-core, which usages latest updated log4j.
> Thanks
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37883) log4j update to 2.17.1 in spark-core 3.2

2022-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475965#comment-17475965
 ] 

Hyukjin Kwon commented on SPARK-37883:
--

We upgraded the log4J in the latest master. Old Spark 3.2 uses log4j 1 that 
virtually doesn't have the security issue by default.

> log4j update to 2.17.1 in spark-core 3.2
> 
>
> Key: SPARK-37883
> URL: https://issues.apache.org/jira/browse/SPARK-37883
> Project: Spark
>  Issue Type: Bug
>  Components: Security, Spark Core
>Affects Versions: 3.2.0
>Reporter: Setu Agrawal
>Priority: Major
>  Labels: 
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0
>
> We are using spark-core jar file, as below
> libraryDependencies += "org.apache.spark" %% "spark-core" % "3.2.0"
> as per maven repository it usgaes log4j older version, which need to be 
> updated latest(2.17.1) to fix security vulenrability, please help us, how we 
> can get update version spark-core, which usages latest updated log4j.
> Thanks
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37883) log4j update to 2.17.1 in spark-core 3.2

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37883:
-
Priority: Major  (was: Blocker)

> log4j update to 2.17.1 in spark-core 3.2
> 
>
> Key: SPARK-37883
> URL: https://issues.apache.org/jira/browse/SPARK-37883
> Project: Spark
>  Issue Type: Bug
>  Components: Security, Spark Core
>Affects Versions: 3.2.0
>Reporter: Setu Agrawal
>Priority: Major
>  Labels: 
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0
>
> We are using spark-core jar file, as below
> libraryDependencies += "org.apache.spark" %% "spark-core" % "3.2.0"
> as per maven repository it usgaes log4j older version, which need to be 
> updated latest(2.17.1) to fix security vulenrability, please help us, how we 
> can get update version spark-core, which usages latest updated log4j.
> Thanks
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37906) spark-sql should not pass last simple comment to backend

2022-01-13 Thread angerszhu (Jira)

angerszhu created SPARK-37906:
-

 Summary: spark-sql should not pass last simple comment to backend 
 Key: SPARK-37906
 URL: https://issues.apache.org/jira/browse/SPARK-37906
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


spark-sql should not pass last simple comment to backend 

```
SELECT 1; -- comment
```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37905) Make `merge_spark_pr.py` set primary author from the first commit in case of ties

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37905:
-

Assignee: Dongjoon Hyun

> Make `merge_spark_pr.py` set primary author from the first commit in case of 
> ties
> -
>
> Key: SPARK-37905
> URL: https://issues.apache.org/jira/browse/SPARK-37905
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37905) Make `merge_spark_pr.py` set primary author from the first commit in case of ties

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37905.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35205
[https://github.com/apache/spark/pull/35205]

> Make `merge_spark_pr.py` set primary author from the first commit in case of 
> ties
> -
>
> Key: SPARK-37905
> URL: https://issues.apache.org/jira/browse/SPARK-37905
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37905) Make `merge_spark_pr.py` set primary author from the first commit in case of ties

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37905:


Assignee: (was: Apache Spark)

> Make `merge_spark_pr.py` set primary author from the first commit in case of 
> ties
> -
>
> Key: SPARK-37905
> URL: https://issues.apache.org/jira/browse/SPARK-37905
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37905) Make `merge_spark_pr.py` set primary author from the first commit in case of ties

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37905:


Assignee: Apache Spark

> Make `merge_spark_pr.py` set primary author from the first commit in case of 
> ties
> -
>
> Key: SPARK-37905
> URL: https://issues.apache.org/jira/browse/SPARK-37905
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37905) Make `merge_spark_pr.py` set primary author from the first commit in case of ties

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475923#comment-17475923
 ] 

Apache Spark commented on SPARK-37905:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35205

> Make `merge_spark_pr.py` set primary author from the first commit in case of 
> ties
> -
>
> Key: SPARK-37905
> URL: https://issues.apache.org/jira/browse/SPARK-37905
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37905) Make `merge_spark_pr.py` set primary author from the first commit in case of ties

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37905:
--
Summary: Make `merge_spark_pr.py` set primary author from the first commit 
in case of ties  (was: Fix `merge_spark_pr.py` to consider the first commit 
author as the primary author in case of ties)

> Make `merge_spark_pr.py` set primary author from the first commit in case of 
> ties
> -
>
> Key: SPARK-37905
> URL: https://issues.apache.org/jira/browse/SPARK-37905
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37905) Fix `merge_spark_pr.py` to consider the first commit author as the primary author in case of ties

2022-01-13 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-37905:
-

 Summary: Fix `merge_spark_pr.py` to consider the first commit 
author as the primary author in case of ties
 Key: SPARK-37905
 URL: https://issues.apache.org/jira/browse/SPARK-37905
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37880) Upgrade Scala to 2.13.8

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37880:
-

Assignee: Yang Jie

> Upgrade Scala to 2.13.8
> ---
>
> Key: SPARK-37880
> URL: https://issues.apache.org/jira/browse/SPARK-37880
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> Scala 2.13.8 already tags:
> [https://github.com/scala/scala/releases/tag/v2.13.8]
>  
> https://contributors.scala-lang.org/t/scala-2-13-8-release-planning/5487



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37880) Upgrade Scala to 2.13.8

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37880.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35181
[https://github.com/apache/spark/pull/35181]

> Upgrade Scala to 2.13.8
> ---
>
> Key: SPARK-37880
> URL: https://issues.apache.org/jira/browse/SPARK-37880
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.3.0
>
>
> Scala 2.13.8 already tags:
> [https://github.com/scala/scala/releases/tag/v2.13.8]
>  
> https://contributors.scala-lang.org/t/scala-2-13-8-release-planning/5487



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37878) Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37878:


Assignee: (was: Apache Spark)

> Migrate SHOW CREATE TABLE to use v2 command by default
> --
>
> Key: SPARK-37878
> URL: https://issues.apache.org/jira/browse/SPARK-37878
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW CREATE TABLE  to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37878) Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475911#comment-17475911
 ] 

Apache Spark commented on SPARK-37878:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/35204

> Migrate SHOW CREATE TABLE to use v2 command by default
> --
>
> Key: SPARK-37878
> URL: https://issues.apache.org/jira/browse/SPARK-37878
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW CREATE TABLE  to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37878) Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37878:


Assignee: Apache Spark

> Migrate SHOW CREATE TABLE to use v2 command by default
> --
>
> Key: SPARK-37878
> URL: https://issues.apache.org/jira/browse/SPARK-37878
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW CREATE TABLE  to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37893) Fix flaky test: AdaptiveQueryExecSuite with Scala 2.13

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37893.
---
Fix Version/s: 3.3.0
 Assignee: Yang Jie
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/35190

> Fix flaky test: AdaptiveQueryExecSuite with Scala 2.13
> --
>
> Key: SPARK-37893
> URL: https://issues.apache.org/jira/browse/SPARK-37893
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.0
>
>
> Use maven test `AdaptiveQueryExecSuite` with Scala-2.13, the following 
> exceptions will occur with a very small probability:
> {code:java}
> AdaptiveQueryExecSuite
> - Logging plan changes for AQE *** FAILED ***
>   java.util.ConcurrentModificationException: mutation occurred during 
> iteration
>   at 
> scala.collection.mutable.MutationTracker$.checkMutations(MutationTracker.scala:43)
>   at 
> scala.collection.mutable.CheckedIndexedSeqView$CheckedIterator.hasNext(CheckedIndexedSeqView.scala:47)
>   at 
> scala.collection.StrictOptimizedIterableOps.filterImpl(StrictOptimizedIterableOps.scala:225)
>   at 
> scala.collection.StrictOptimizedIterableOps.filterImpl$(StrictOptimizedIterableOps.scala:222)
>   at scala.collection.mutable.ArrayBuffer.filterImpl(ArrayBuffer.scala:43)
>   at 
> scala.collection.StrictOptimizedIterableOps.filterNot(StrictOptimizedIterableOps.scala:220)
>   at 
> scala.collection.StrictOptimizedIterableOps.filterNot$(StrictOptimizedIterableOps.scala:220)
>   at scala.collection.mutable.ArrayBuffer.filterNot(ArrayBuffer.scala:43)
>   at 
> org.apache.spark.SparkFunSuite$LogAppender.loggingEvents(SparkFunSuite.scala:288)
>   at 
> org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite.$anonfun$new$152(AdaptiveQueryExecSuite.scal{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2

2022-01-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37859:
---

Assignee: Karen Feng

> SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
> -
>
> Key: SPARK-37859
> URL: https://issues.apache.org/jira/browse/SPARK-37859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Assignee: Karen Feng
>Priority: Major
>
> In 
> https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312,
>  a new metadata field is added during reading. As we do a full comparison of 
> the user-provided schema and the actual schema in 
> https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356,
>  resolution fails if a table created with Spark 3.1 is read with Spark 3.2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2

2022-01-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37859.
-
Fix Version/s: 3.3.0
   3.2.1
   Resolution: Fixed

Issue resolved by pull request 35158
[https://github.com/apache/spark/pull/35158]

> SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
> -
>
> Key: SPARK-37859
> URL: https://issues.apache.org/jira/browse/SPARK-37859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Assignee: Karen Feng
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> In 
> https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312,
>  a new metadata field is added during reading. As we do a full comparison of 
> the user-provided schema and the actual schema in 
> https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356,
>  resolution fails if a table created with Spark 3.1 is read with Spark 3.2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37904) Improve RebalancePartitions in rules of Optimizer

2022-01-13 Thread XiDuo You (Jira)

XiDuo You created SPARK-37904:
-

 Summary: Improve RebalancePartitions in rules of Optimizer
 Key: SPARK-37904
 URL: https://issues.apache.org/jira/browse/SPARK-37904
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: XiDuo You


After SPARK-37267, we support do optimize rebalance partitions in everywhere of 
plan rather than limit to the root node. So It should make sense to also let 
`RebalancePartitions` work in all rules of Optimizer like `Repartition` and 
`RepartitionByExpression` did.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37886) Use ComparisonTestBase to reduce redundant test code

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475902#comment-17475902
 ] 

Apache Spark commented on SPARK-37886:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35203

> Use ComparisonTestBase to reduce redundant test code
> 
>
> Key: SPARK-37886
> URL: https://issues.apache.org/jira/browse/SPARK-37886
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> We have many testcase are using same logic to covert pdf to psdf, we can use 
> ComparisonTestBase as parent class and reduce redundant.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters

2022-01-13 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475894#comment-17475894
 ] 

Wenchen Fan commented on SPARK-27442:
-

I think we should fix this. It's OK for Spark to forbid special chars in the 
column name, but when we read existing parquet files, there is no point to 
forbid it at the Spark side.

[~angerszhuuu] can you take a look? Thanks!

> ParquetFileFormat fails to read column named with invalid characters
> 
>
> Key: SPARK-27442
> URL: https://issues.apache.org/jira/browse/SPARK-27442
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.0, 2.4.1
>Reporter: Jan Vršovský
>Priority: Minor
>
> When reading a parquet file which contains characters considered invalid, the 
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among " 
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read 
> it (and allow the user to correct it). However, possible workarounds (such as 
> using alias to rename the column, or forcing another schema) do not work, 
> since the check is done on the input.
> (Possible fix: remove superficial 
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from 
> {{buildReaderWithPartitionValues}} ?)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37479) Migrate DROP NAMESPACE to use V2 command by default

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475889#comment-17475889
 ] 

Apache Spark commented on SPARK-37479:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/35202

> Migrate DROP NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37479
> URL: https://issues.apache.org/jira/browse/SPARK-37479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37479) Migrate DROP NAMESPACE to use V2 command by default

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37479:


Assignee: Apache Spark

> Migrate DROP NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37479
> URL: https://issues.apache.org/jira/browse/SPARK-37479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37479) Migrate DROP NAMESPACE to use V2 command by default

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37479:


Assignee: (was: Apache Spark)

> Migrate DROP NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37479
> URL: https://issues.apache.org/jira/browse/SPARK-37479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37479) Migrate DROP NAMESPACE to use V2 command by default

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475888#comment-17475888
 ] 

Apache Spark commented on SPARK-37479:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/35202

> Migrate DROP NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37479
> URL: https://issues.apache.org/jira/browse/SPARK-37479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36885) Inline type hints for python/pyspark/sql/dataframe.py

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475883#comment-17475883
 ] 

Apache Spark commented on SPARK-36885:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35201

> Inline type hints for python/pyspark/sql/dataframe.py
> -
>
> Key: SPARK-36885
> URL: https://issues.apache.org/jira/browse/SPARK-36885
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints for python/pyspark/sql/dataframe.py from Inline type hints 
> for python/pyspark/sql/dataframe.pyi.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37903:


Assignee: Takuya Ueshin

> Replace string_typehints with get_type_hints.
> -
>
> Key: SPARK-37903
> URL: https://issues.apache.org/jira/browse/SPARK-37903
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>
> Currently we have a hacky way to resolve type hints written as strings, but 
> we can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37903.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35200
[https://github.com/apache/spark/pull/35200]

> Replace string_typehints with get_type_hints.
> -
>
> Key: SPARK-37903
> URL: https://issues.apache.org/jira/browse/SPARK-37903
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently we have a hacky way to resolve type hints written as strings, but 
> we can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37154) Inline type hints for python/pyspark/rdd.py

2022-01-13 Thread Byron Hsu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460243#comment-17460243
 ] 

Byron Hsu edited comment on SPARK-37154 at 1/14/22, 12:41 AM:
--

I am looking into this one


was (Author: byronhsu):
I am looking into this one

> Inline type hints for python/pyspark/rdd.py
> ---
>
> Key: SPARK-37154
> URL: https://issues.apache.org/jira/browse/SPARK-37154
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-37154) Inline type hints for python/pyspark/rdd.py

2022-01-13 Thread Byron Hsu (Jira)



[ https://issues.apache.org/jira/browse/SPARK-37154 ]


Byron Hsu deleted comment on SPARK-37154:
---

was (Author: byronhsu):
I am looking into this one

> Inline type hints for python/pyspark/rdd.py
> ---
>
> Key: SPARK-37154
> URL: https://issues.apache.org/jira/browse/SPARK-37154
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475858#comment-17475858
 ] 

Apache Spark commented on SPARK-37903:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/35200

> Replace string_typehints with get_type_hints.
> -
>
> Key: SPARK-37903
> URL: https://issues.apache.org/jira/browse/SPARK-37903
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently we have a hacky way to resolve type hints written as strings, but 
> we can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475857#comment-17475857
 ] 

Apache Spark commented on SPARK-37903:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/35200

> Replace string_typehints with get_type_hints.
> -
>
> Key: SPARK-37903
> URL: https://issues.apache.org/jira/browse/SPARK-37903
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently we have a hacky way to resolve type hints written as strings, but 
> we can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37903:


Assignee: Apache Spark

> Replace string_typehints with get_type_hints.
> -
>
> Key: SPARK-37903
> URL: https://issues.apache.org/jira/browse/SPARK-37903
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Currently we have a hacky way to resolve type hints written as strings, but 
> we can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37903:


Assignee: (was: Apache Spark)

> Replace string_typehints with get_type_hints.
> -
>
> Key: SPARK-37903
> URL: https://issues.apache.org/jira/browse/SPARK-37903
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently we have a hacky way to resolve type hints written as strings, but 
> we can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37879) Show test report in GitHub Actions builds from PRs

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37879.
--
Resolution: Fixed

Issue resolved by pull request 35193
[https://github.com/apache/spark/pull/35193]

> Show test report in GitHub Actions builds from PRs
> --
>
> Key: SPARK-37879
> URL: https://issues.apache.org/jira/browse/SPARK-37879
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, the test report like 
> https://github.com/apache/spark/runs/4788468586 is not directly shown in 
> workflow runs in the link provided in PRs, e.g.) 
> https://github.com/yaooqinn/spark/actions/runs/1687326379
> We should make the test report visible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37879) Show test report in GitHub Actions builds from PRs

2022-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37879:


Assignee: Hyukjin Kwon

> Show test report in GitHub Actions builds from PRs
> --
>
> Key: SPARK-37879
> URL: https://issues.apache.org/jira/browse/SPARK-37879
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, the test report like 
> https://github.com/apache/spark/runs/4788468586 is not directly shown in 
> workflow runs in the link provided in PRs, e.g.) 
> https://github.com/yaooqinn/spark/actions/runs/1687326379
> We should make the test report visible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-37903:
-

 Summary: Replace string_typehints with get_type_hints.
 Key: SPARK-37903
 URL: https://issues.apache.org/jira/browse/SPARK-37903
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Currently we have a hacky way to resolve type hints written as strings, but we 
can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37887) PySpark shell sets log level to INFO by default

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37887.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35198
[https://github.com/apache/spark/pull/35198]

> PySpark shell sets log level to INFO by default 
> 
>
> Key: SPARK-37887
> URL: https://issues.apache.org/jira/browse/SPARK-37887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Spark Shell
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> {code}
> ./bin/pyspark
> {code}
> {code}
> Python 3.9.5 (default, May 18 2021, 12:31:01)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> 22/01/13 10:28:15 INFO HiveConf: Found configuration file null
> 22/01/13 10:28:15 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
> ...
> >>> spark.range(10)
> 22/01/13 10:31:48 INFO SharedState: Setting hive.metastore.warehouse.dir 
> ('null') to the value of spark.sql.warehouse.dir.
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37900.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35195
[https://github.com/apache/spark/pull/35195]

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37900:
-

Assignee: Dongjoon Hyun

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37902) Update annotations to resolve issues detected with mypy==0.931

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37902:


Assignee: (was: Apache Spark)

> Update annotations to resolve issues detected with mypy==0.931
> --
>
> Key: SPARK-37902
> URL: https://issues.apache.org/jira/browse/SPARK-37902
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> The following new issues are detected when type checked with {{mypy==0.931}}
> {code}
> python/pyspark/pandas/base.py:879: error: "Sequence[Any]" has no attribute 
> "tolist"  [attr-defined]
> python/pyspark/sql/tests/test_pandas_udf_typehints_with_future_annotations.py:268:
>  error: Incompatible return value type (got "floating[Any]", expected 
> "float")  [return-value]
> python/pyspark/sql/tests/test_pandas_udf_typehints.py:265: error: 
> Incompatible return value type (got "floating[Any]", expected "float")  
> [return-value]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37902) Update annotations to resolve issues detected with mypy==0.931

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37902:


Assignee: Apache Spark

> Update annotations to resolve issues detected with mypy==0.931
> --
>
> Key: SPARK-37902
> URL: https://issues.apache.org/jira/browse/SPARK-37902
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> The following new issues are detected when type checked with {{mypy==0.931}}
> {code}
> python/pyspark/pandas/base.py:879: error: "Sequence[Any]" has no attribute 
> "tolist"  [attr-defined]
> python/pyspark/sql/tests/test_pandas_udf_typehints_with_future_annotations.py:268:
>  error: Incompatible return value type (got "floating[Any]", expected 
> "float")  [return-value]
> python/pyspark/sql/tests/test_pandas_udf_typehints.py:265: error: 
> Incompatible return value type (got "floating[Any]", expected "float")  
> [return-value]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37902) Update annotations to resolve issues detected with mypy==0.931

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475767#comment-17475767
 ] 

Apache Spark commented on SPARK-37902:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/35199

> Update annotations to resolve issues detected with mypy==0.931
> --
>
> Key: SPARK-37902
> URL: https://issues.apache.org/jira/browse/SPARK-37902
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> The following new issues are detected when type checked with {{mypy==0.931}}
> {code}
> python/pyspark/pandas/base.py:879: error: "Sequence[Any]" has no attribute 
> "tolist"  [attr-defined]
> python/pyspark/sql/tests/test_pandas_udf_typehints_with_future_annotations.py:268:
>  error: Incompatible return value type (got "floating[Any]", expected 
> "float")  [return-value]
> python/pyspark/sql/tests/test_pandas_udf_typehints.py:265: error: 
> Incompatible return value type (got "floating[Any]", expected "float")  
> [return-value]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36404) Support nested columns in ORC vectorized reader for data source v2

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36404:
--
Labels: releasenotes  (was: )

> Support nested columns in ORC vectorized reader for data source v2
> --
>
> Key: SPARK-36404
> URL: https://issues.apache.org/jira/browse/SPARK-36404
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
>  Labels: releasenotes
> Fix For: 3.3.0
>
>
> We added support of nested columns in ORC vectorized reader for data source 
> v1. Data source v2 and v1 both use same underlying implementation for 
> vectorized reader (OrcColumnVector), so we can support data source v2 as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36649) Support Trigger.AvailableNow on Kafka data source

2022-01-13 Thread Boyang Jerry Peng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475761#comment-17475761
 ] 

Boyang Jerry Peng commented on SPARK-36649:
---

i'm working on it

> Support Trigger.AvailableNow on Kafka data source
> -
>
> Key: SPARK-36649
> URL: https://issues.apache.org/jira/browse/SPARK-36649
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> SPARK-36533 introduces a new trigger Trigger.AvailableNow, but only 
> introduces the new functionality to the file stream source. Given that Kafka 
> data source is the one of major data sources being used in streaming query, 
> we should make Kafka data source support this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37902) Update annotations to resolve issues detected with mypy==0.931

2022-01-13 Thread Maciej Szymkiewicz (Jira)

Maciej Szymkiewicz created SPARK-37902:
--

 Summary: Update annotations to resolve issues detected with 
mypy==0.931
 Key: SPARK-37902
 URL: https://issues.apache.org/jira/browse/SPARK-37902
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Maciej Szymkiewicz


The following new issues are detected when type checked with {{mypy==0.931}}


{code}
python/pyspark/pandas/base.py:879: error: "Sequence[Any]" has no attribute 
"tolist"  [attr-defined]
python/pyspark/sql/tests/test_pandas_udf_typehints_with_future_annotations.py:268:
 error: Incompatible return value type (got "floating[Any]", expected "float")  
[return-value]
python/pyspark/sql/tests/test_pandas_udf_typehints.py:265: error: Incompatible 
return value type (got "floating[Any]", expected "float")  [return-value]

{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37887) PySpark shell sets log level to INFO by default

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37887:


Assignee: Apache Spark  (was: L. C. Hsieh)

> PySpark shell sets log level to INFO by default 
> 
>
> Key: SPARK-37887
> URL: https://issues.apache.org/jira/browse/SPARK-37887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Spark Shell
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> ./bin/pyspark
> {code}
> {code}
> Python 3.9.5 (default, May 18 2021, 12:31:01)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> 22/01/13 10:28:15 INFO HiveConf: Found configuration file null
> 22/01/13 10:28:15 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
> ...
> >>> spark.range(10)
> 22/01/13 10:31:48 INFO SharedState: Setting hive.metastore.warehouse.dir 
> ('null') to the value of spark.sql.warehouse.dir.
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37887) PySpark shell sets log level to INFO by default

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37887:


Assignee: L. C. Hsieh  (was: Apache Spark)

> PySpark shell sets log level to INFO by default 
> 
>
> Key: SPARK-37887
> URL: https://issues.apache.org/jira/browse/SPARK-37887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Spark Shell
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: L. C. Hsieh
>Priority: Major
>
> {code}
> ./bin/pyspark
> {code}
> {code}
> Python 3.9.5 (default, May 18 2021, 12:31:01)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> 22/01/13 10:28:15 INFO HiveConf: Found configuration file null
> 22/01/13 10:28:15 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
> ...
> >>> spark.range(10)
> 22/01/13 10:31:48 INFO SharedState: Setting hive.metastore.warehouse.dir 
> ('null') to the value of spark.sql.warehouse.dir.
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37887) PySpark shell sets log level to INFO by default

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475740#comment-17475740
 ] 

Apache Spark commented on SPARK-37887:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/35198

> PySpark shell sets log level to INFO by default 
> 
>
> Key: SPARK-37887
> URL: https://issues.apache.org/jira/browse/SPARK-37887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Spark Shell
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: L. C. Hsieh
>Priority: Major
>
> {code}
> ./bin/pyspark
> {code}
> {code}
> Python 3.9.5 (default, May 18 2021, 12:31:01)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> 22/01/13 10:28:15 INFO HiveConf: Found configuration file null
> 22/01/13 10:28:15 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
> ...
> >>> spark.range(10)
> 22/01/13 10:31:48 INFO SharedState: Setting hive.metastore.warehouse.dir 
> ('null') to the value of spark.sql.warehouse.dir.
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37900:
--
Component/s: Spark Core
 (was: Kubernetes)

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37887) PySpark shell sets log level to INFO by default

2022-01-13 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475713#comment-17475713
 ] 

L. C. Hsieh commented on SPARK-37887:
-

I know the root cause. I will submit a PR later.

> PySpark shell sets log level to INFO by default 
> 
>
> Key: SPARK-37887
> URL: https://issues.apache.org/jira/browse/SPARK-37887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Spark Shell
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> ./bin/pyspark
> {code}
> {code}
> Python 3.9.5 (default, May 18 2021, 12:31:01)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> 22/01/13 10:28:15 INFO HiveConf: Found configuration file null
> 22/01/13 10:28:15 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
> ...
> >>> spark.range(10)
> 22/01/13 10:31:48 INFO SharedState: Setting hive.metastore.warehouse.dir 
> ('null') to the value of spark.sql.warehouse.dir.
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37887) PySpark shell sets log level to INFO by default

2022-01-13 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-37887:
---

Assignee: L. C. Hsieh

> PySpark shell sets log level to INFO by default 
> 
>
> Key: SPARK-37887
> URL: https://issues.apache.org/jira/browse/SPARK-37887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Spark Shell
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: L. C. Hsieh
>Priority: Major
>
> {code}
> ./bin/pyspark
> {code}
> {code}
> Python 3.9.5 (default, May 18 2021, 12:31:01)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> 22/01/13 10:28:15 INFO HiveConf: Found configuration file null
> 22/01/13 10:28:15 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
> ...
> >>> spark.range(10)
> 22/01/13 10:31:48 INFO SharedState: Setting hive.metastore.warehouse.dir 
> ('null') to the value of spark.sql.warehouse.dir.
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37901) Upgrade Netty from 4.1.72 to 4.1.73

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37901:


Assignee: (was: Apache Spark)

> Upgrade Netty from 4.1.72 to 4.1.73
> ---
>
> Key: SPARK-37901
> URL: https://issues.apache.org/jira/browse/SPARK-37901
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: David Christle
>Priority: Minor
>
> Netty has a new release that upgrades log4j to 2.17.1. Although I didn't find 
> obvious dependence on log4j via netty in my search of Spark's codebase, it 
> would be good to pick up this specific version. The version Spark currently 
> depends on is 4.1.72, which depends on log4j 2.15. Several CVE's have been 
> fixed in log4j between 2.15 and 2.17.1.
> Besides this dependency update, several minor bugfixes have been made in this 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37901) Upgrade Netty from 4.1.72 to 4.1.73

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475672#comment-17475672
 ] 

Apache Spark commented on SPARK-37901:
--

User 'dchristle' has created a pull request for this issue:
https://github.com/apache/spark/pull/35196

> Upgrade Netty from 4.1.72 to 4.1.73
> ---
>
> Key: SPARK-37901
> URL: https://issues.apache.org/jira/browse/SPARK-37901
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: David Christle
>Priority: Minor
>
> Netty has a new release that upgrades log4j to 2.17.1. Although I didn't find 
> obvious dependence on log4j via netty in my search of Spark's codebase, it 
> would be good to pick up this specific version. The version Spark currently 
> depends on is 4.1.72, which depends on log4j 2.15. Several CVE's have been 
> fixed in log4j between 2.15 and 2.17.1.
> Besides this dependency update, several minor bugfixes have been made in this 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37901) Upgrade Netty from 4.1.72 to 4.1.73

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37901:


Assignee: Apache Spark

> Upgrade Netty from 4.1.72 to 4.1.73
> ---
>
> Key: SPARK-37901
> URL: https://issues.apache.org/jira/browse/SPARK-37901
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: David Christle
>Assignee: Apache Spark
>Priority: Minor
>
> Netty has a new release that upgrades log4j to 2.17.1. Although I didn't find 
> obvious dependence on log4j via netty in my search of Spark's codebase, it 
> would be good to pick up this specific version. The version Spark currently 
> depends on is 4.1.72, which depends on log4j 2.15. Several CVE's have been 
> fixed in log4j between 2.15 and 2.17.1.
> Besides this dependency update, several minor bugfixes have been made in this 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37901) Upgrade Netty from 4.1.72 to 4.1.73

2022-01-13 Thread David Christle (Jira)

David Christle created SPARK-37901:
--

 Summary: Upgrade Netty from 4.1.72 to 4.1.73
 Key: SPARK-37901
 URL: https://issues.apache.org/jira/browse/SPARK-37901
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: David Christle


Netty has a new release that upgrades log4j to 2.17.1. Although I didn't find 
obvious dependence on log4j via netty in my search of Spark's codebase, it 
would be good to pick up this specific version. The version Spark currently 
depends on is 4.1.72, which depends on log4j 2.15. Several CVE's have been 
fixed in log4j between 2.15 and 2.17.1.

Besides this dependency update, several minor bugfixes have been made in this 
release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475650#comment-17475650
 ] 

Apache Spark commented on SPARK-37900:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35195

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475643#comment-17475643
 ] 

Apache Spark commented on SPARK-37900:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35195

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37900:


Assignee: (was: Apache Spark)

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37900:


Assignee: Apache Spark

> Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
> 
>
> Key: SPARK-37900
> URL: https://issues.apache.org/jira/browse/SPARK-37900
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37900) Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager

2022-01-13 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-37900:
-

 Summary: Use SparkMasterRegex.KUBERNETES_REGEX in SecurityManager
 Key: SPARK-37900
 URL: https://issues.apache.org/jira/browse/SPARK-37900
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37864) Support Parquet v2 data page RLE encoding (for Boolean Values) for the vectorized path

2022-01-13 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-37864.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35163
[https://github.com/apache/spark/pull/35163]

> Support Parquet v2 data page RLE encoding (for Boolean Values) for the 
> vectorized path
> --
>
> Key: SPARK-37864
> URL: https://issues.apache.org/jira/browse/SPARK-37864
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.3.0
>
>
> Parquet v2 data page write Boolean Values use RLE encoding, when read v2 
> boolean type values it will throw exceptions as follows now:
>  
> {code:java}
> Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.access$100(VectorizedColumnReader.java:48)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader$1.visit(VectorizedColumnReader.java:250)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader$1.visit(VectorizedColumnReader.java:237)
>  ~[classes/:?]
>     at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:192) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPage(VectorizedColumnReader.java:237)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:173)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:311)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:209)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:298)
>  ~[classes/:?]
>     ... 19 more {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37864) Support Parquet v2 data page RLE encoding (for Boolean Values) for the vectorized path

2022-01-13 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-37864:


Assignee: Yang Jie

> Support Parquet v2 data page RLE encoding (for Boolean Values) for the 
> vectorized path
> --
>
> Key: SPARK-37864
> URL: https://issues.apache.org/jira/browse/SPARK-37864
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> Parquet v2 data page write Boolean Values use RLE encoding, when read v2 
> boolean type values it will throw exceptions as follows now:
>  
> {code:java}
> Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.access$100(VectorizedColumnReader.java:48)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader$1.visit(VectorizedColumnReader.java:250)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader$1.visit(VectorizedColumnReader.java:237)
>  ~[classes/:?]
>     at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:192) 
> ~[parquet-column-1.12.2.jar:1.12.2]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPage(VectorizedColumnReader.java:237)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:173)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:311)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:209)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>  ~[classes/:?]
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:298)
>  ~[classes/:?]
>     ... 19 more {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37479) Migrate DROP NAMESPACE to use V2 command by default

2022-01-13 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475542#comment-17475542
 ] 

Terry Kim commented on SPARK-37479:
---

OK, thanks!

> Migrate DROP NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37479
> URL: https://issues.apache.org/jira/browse/SPARK-37479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37899) EliminateInnerJoin to support convert inner join to left semi join

2022-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475525#comment-17475525
 ] 

Apache Spark commented on SPARK-37899:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35194

> EliminateInnerJoin to support convert inner join to left semi join
> --
>
> Key: SPARK-37899
> URL: https://issues.apache.org/jira/browse/SPARK-37899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37899) EliminateInnerJoin to support convert inner join to left semi join

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37899:


Assignee: (was: Apache Spark)

> EliminateInnerJoin to support convert inner join to left semi join
> --
>
> Key: SPARK-37899
> URL: https://issues.apache.org/jira/browse/SPARK-37899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37899) EliminateInnerJoin to support convert inner join to left semi join

2022-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37899:


Assignee: Apache Spark

> EliminateInnerJoin to support convert inner join to left semi join
> --
>
> Key: SPARK-37899
> URL: https://issues.apache.org/jira/browse/SPARK-37899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37899) EliminateInnerJoin to support convert inner join to left semi join

2022-01-13 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-37899:

Summary: EliminateInnerJoin to support convert inner join to left semi join 
 (was: EliminateInnerJoin)

> EliminateInnerJoin to support convert inner join to left semi join
> --
>
> Key: SPARK-37899
> URL: https://issues.apache.org/jira/browse/SPARK-37899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37899) EliminateInnerJoin

2022-01-13 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-37899:
---

 Summary: EliminateInnerJoin
 Key: SPARK-37899
 URL: https://issues.apache.org/jira/browse/SPARK-37899
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37895) Error while joining two tables with non-english field names

2022-01-13 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475361#comment-17475361
 ] 

Wenchen Fan commented on SPARK-37895:
-

[~beliefer]  can you help to fix it? also cc [~huaxingao] 

> Error while joining two tables with non-english field names
> ---
>
> Key: SPARK-37895
> URL: https://issues.apache.org/jira/browse/SPARK-37895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Marina Krasilnikova
>Priority: Minor
>
> While trying to join two tables with non-english field names in postgresql 
> with query like
> "select view1.`Имя1` , view1.`Имя2`, view2.`Имя3` from view1 left join  view2 
> on view1.`Имя2`=view2.`Имя4`"
> we get an error which says that there is no field "`Имя4`" (field name is 
> surrounded by backticks).
> It appears that to get the data from the second table it constructs query like
> SELECT "Имя3","Имя4" FROM "public"."tab2"  WHERE ("`Имя4`" IS NOT NULL) 
> and these backticks are redundant in WHERE clause.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37895) Error while joining two tables with non-english field names

2022-01-13 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475360#comment-17475360
 ] 

Wenchen Fan commented on SPARK-37895:
-

This bug is only in JDBC v2. In the v2 code path, we always enable nested 
column in filter pushdown, and the column name in the predicate follows SQL 
style, which may have quotes.

In the long term, this problem can be fixed by using v2 filters, which has 
native support for nested columns, so that we don't need to encode nested 
column into a single string and introduce quotes. For now, I think we should 
fix the v1 filter pushdown code path in JDBC v2, which is 
`JDBCScanBuilder.pushFilters`.

> Error while joining two tables with non-english field names
> ---
>
> Key: SPARK-37895
> URL: https://issues.apache.org/jira/browse/SPARK-37895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Marina Krasilnikova
>Priority: Minor
>
> While trying to join two tables with non-english field names in postgresql 
> with query like
> "select view1.`Имя1` , view1.`Имя2`, view2.`Имя3` from view1 left join  view2 
> on view1.`Имя2`=view2.`Имя4`"
> we get an error which says that there is no field "`Имя4`" (field name is 
> surrounded by backticks).
> It appears that to get the data from the second table it constructs query like
> SELECT "Имя3","Имя4" FROM "public"."tab2"  WHERE ("`Имя4`" IS NOT NULL) 
> and these backticks are redundant in WHERE clause.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37898) Error reading old dates when AQE is enabled in Spark 3.1. Works when AQE is disabled

2022-01-13 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-37898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaspar Muñoz  updated SPARK-37898:
--
Description: 
Hi guys, 
 
I was testing an spark job that fail when I encountered something that is not 
consistent among different spark versions.  I reduced my code to be replicated 
easily with a simple spark-shell. Note: Code logic probably does not make sense 
:)
 
The following snippet:
 
 - Works with Spark 3.1.2 and 3.1.3-rc  when AQE disabled
 - Fails with Spark 3.1.2 and 3.1.3-rc  when AQE enabled
 - Works with Spark 3.2.0 always
{code:java}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window


spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

val dataset = spark.read.parquet("/tmp/parquet-output")

val window = Window.orderBy(dataset.col("date").desc)
val resultDataset = dataset.withColumn("rankedFilterOverPartition", 
dense_rank().over(window)).filter("rankedFilterOverPartition = 
1").drop("rankedFilterOverPartition")
println(resultDataset.rdd.getNumPartitions){code}
 
Previously I wrote data with this snippet and Spark 2.2 to write data in the 
path /tmp/parquet-output.
 
{code:java}
import spark.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions._

case class MyCustomClass(id_col: Int, date: String, timestamp_col: 
java.sql.Timestamp)

val dataset = Seq(MyCustomClass(1, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00")), MyCustomClass(2, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00"))).toDF
 
dataset.select($"id_col", $"date".cast("date"), 
$"timestamp_col").write.mode("overwrite").parquet("/tmp/parquet-output"){code}
 
The error is:
{code:java}
scala> println(resultDataset.rdd.getNumPartitions) 22/01/13 13:45:16 WARN 
WindowExec: No Partition Defined for Window operation! Moving all data to a 
single partition, this can cause serious performance degradation.
22/01/13 13:45:16 WARN WindowExec: No Partition Defined for Window operation! 
Moving all data to a single partition, this can cause serious performance 
degradation.
22/01/13 13:45:17 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkUpgradeException: You may get a different result due to 
the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps 
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files 
may be written by Spark 2.x or legacy versions of Hive, which uses a legacy 
hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian 
calendar. See more details in SPARK-31404. You can set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the 
datetime values w.r.t. the calendar difference during reading. Or set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the 
datetime values as it is.
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:147)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala){code}
 

¿It's possible fix it for 3.1 branch? 

 
Regards

  was:
Hi guys, 
 
I was testing an spark job that fail when I encountered something that is not 
consistent among different spark versions.  I reduced my code to be replicated 
easily with a simple spark-shell. Note: Code logic probably does not make sense 
:)
 
The following snippet:
 
 - Works with Spark 3.1.2 and 3.1.3-rc  when AQE disabled
 - Fails with Spark 3.1.2 and 3.1.3-rc  when AQE enabled
 - Works with Spark 3.2.0 always
{code:java}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window


spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

val dataset = spark.read.parquet("/tmp/parquet-output")

val window = Window.orderBy(dataset.col("date").desc)
val resultDataset = dataset.withColumn("rankedFilterOverPartition", 
dense_rank().over(window)).filter("rankedFilterOverPartition = 
1").drop("rankedFilterOverPartition")
println(resultDataset.rdd.getNumPartitions){code}
 
Previously I wrote data with this snippet and Spark 2.2 to write data in the 
path /tmp/parquet-output.
 
{code:java}
import spark.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions._

case class MyCustomClass(id_col: Int, date: String, timestamp_col: 
java.sql.Timestamp)

val dataset = Seq(MyCustomClass(1, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00")), MyCustomClass(2, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00"))).toDF
 
dataset.select($"id_col", $"date".cast("date"), 
$"timestamp_col").write.mode("overwrite").parquet("/tmp/parquet-output"){code}
 
The error is:
{code:java}
scala>

[jira] [Updated] (SPARK-37898) Error reading old dates when AQE is enabled in Spark 3.1. Works when AQE is disabled

2022-01-13 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-37898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaspar Muñoz  updated SPARK-37898:
--
Description: 
Hi guys, 
 
I was testing an spark job that fail when I encountered something that is not 
consistent among different spark versions.  I reduced my code to be replicated 
easily with a simple spark-shell. Note: Code logic probably does not make sense 
:)
 
The following snippet:
 
 - Works with Spark 3.1.2 and 3.1.3-rc  when AQE disabled
 - Fails with Spark 3.1.2 and 3.1.3-rc  when AQE enabled
 - Works with Spark 3.2.0 always
{code:java}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window


spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

val dataset = spark.read.parquet("/tmp/parquet-output")

val window = Window.orderBy(dataset.col("date").desc)
val resultDataset = dataset.withColumn("rankedFilterOverPartition", 
dense_rank().over(window)).filter("rankedFilterOverPartition = 
1").drop("rankedFilterOverPartition")
println(resultDataset.rdd.getNumPartitions){code}
 
Previously I wrote data with this snippet and Spark 2.2 to write data in the 
path /tmp/parquet-output.
 
{code:java}
import spark.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions._

case class MyCustomClass(id_col: Int, date: String, timestamp_col: 
java.sql.Timestamp)

val dataset = Seq(MyCustomClass(1, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00")), MyCustomClass(2, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00"))).toDF
 
dataset.select($"id_col", $"date".cast("date"), 
$"timestamp_col").write.mode("overwrite").parquet("/tmp/parquet-output"){code}
 
The error is:
{code:java}
scala> println(resultDataset.rdd.getNumPartitions) 22/01/13 13:45:16 WARN 
WindowExec: No Partition Defined for Window operation! Moving all data to a 
single partition, this can cause serious performance degradation.
22/01/13 13:45:16 WARN WindowExec: No Partition Defined for Window operation! 
Moving all data to a single partition, this can cause serious performance 
degradation.
22/01/13 13:45:17 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkUpgradeException: You may get a different result due to 
the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps 
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files 
may be written by Spark 2.x or legacy versions of Hive, which uses a legacy 
hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian 
calendar. See more details in SPARK-31404. You can set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the 
datetime values w.r.t. the calendar difference during reading. Or set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the 
datetime values as it is.
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:147)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala){code}
 
 
Regards

  was:
Hi guys, 
 
I was testing an spark job that fail when I encountered something that is not 
consistent among different spark versions.  I reduced my code to be replicated 
easily with a simple spark-shell. 
 
The following snippet:
 
 - Works with Spark 3.1.2 and 3.1.3-rc  when AQE disabled
 - Fails with Spark 3.1.2 and 3.1.3-rc  when AQE enabled
 - Works with Spark 3.2.0 always


{code:java}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window


spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

val dataset = spark.read.parquet("/tmp/parquet-output")

val window = Window.orderBy(dataset.col("date").desc)
val resultDataset = dataset.withColumn("rankedFilterOverPartition", 
dense_rank().over(window)).filter("rankedFilterOverPartition = 
1").drop("rankedFilterOverPartition")
println(resultDataset.rdd.getNumPartitions){code}


 
Previously I wrote data with this snippet and Spark 2.2 to write data in the 
path /tmp/parquet-output.
 
{code:java}
import spark.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions._

case class MyCustomClass(id_col: Int, date: String, timestamp_col: 
java.sql.Timestamp)

val dataset = Seq(MyCustomClass(1, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00")), MyCustomClass(2, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00"))).toDF
 
dataset.select($"id_col", $"date".cast("date"), 
$"timestamp_col").write.mode("overwrite").parquet("/tmp/parquet-output"){code}
 
The error is:
{code:java}
scala> println(resultDataset.rdd.getNumPartitions) 22/01/13 13:45:16 WARN 
WindowExec: No Partition Defined for Window

[jira] [Created] (SPARK-37898) Error reading old dates when AQE is enabled in Spark 3.1. Works when AQE is disabled

2022-01-13 Thread Jira

Gaspar Muñoz  created SPARK-37898:
-

 Summary: Error reading old dates when AQE is enabled in Spark 3.1. 
Works when AQE is disabled
 Key: SPARK-37898
 URL: https://issues.apache.org/jira/browse/SPARK-37898
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Gaspar Muñoz 


Hi guys, 
 
I was testing an spark job that fail when I encountered something that is not 
consistent among different spark versions.  I reduced my code to be replicated 
easily with a simple spark-shell. 
 
The following snippet:
 
 - Works with Spark 3.1.2 and 3.1.3-rc  when AQE disabled
 - Fails with Spark 3.1.2 and 3.1.3-rc  when AQE enabled
 - Works with Spark 3.2.0 always


{code:java}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window


spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

val dataset = spark.read.parquet("/tmp/parquet-output")

val window = Window.orderBy(dataset.col("date").desc)
val resultDataset = dataset.withColumn("rankedFilterOverPartition", 
dense_rank().over(window)).filter("rankedFilterOverPartition = 
1").drop("rankedFilterOverPartition")
println(resultDataset.rdd.getNumPartitions){code}


 
Previously I wrote data with this snippet and Spark 2.2 to write data in the 
path /tmp/parquet-output.
 
{code:java}
import spark.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions._

case class MyCustomClass(id_col: Int, date: String, timestamp_col: 
java.sql.Timestamp)

val dataset = Seq(MyCustomClass(1, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00")), MyCustomClass(2, "0001-01-01", Timestamp.valueOf("1000-01-01 
10:00:00"))).toDF
 
dataset.select($"id_col", $"date".cast("date"), 
$"timestamp_col").write.mode("overwrite").parquet("/tmp/parquet-output"){code}
 
The error is:
{code:java}
scala> println(resultDataset.rdd.getNumPartitions) 22/01/13 13:45:16 WARN 
WindowExec: No Partition Defined for Window operation! Moving all data to a 
single partition, this can cause serious performance degradation.
22/01/13 13:45:16 WARN WindowExec: No Partition Defined for Window operation! 
Moving all data to a single partition, this can cause serious performance 
degradation.
22/01/13 13:45:17 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkUpgradeException: You may get a different result due to 
the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps 
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files 
may be written by Spark 2.x or legacy versions of Hive, which uses a legacy 
hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian 
calendar. See more details in SPARK-31404. You can set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the 
datetime values w.r.t. the calendar difference during reading. Or set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the 
datetime values as it is.
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:147)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala){code}
 
 
Regards



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37897) Filter with subexpression elimination may cause query failed

2022-01-13 Thread hujiahua (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated SPARK-37897:
-
Description: 
 

The following test results will fail, the root cause was that the execution 
order of filter predicates had changed after subexpression elimination. So I 
think we should keep predicates execution order after subexpression elimination.
{code:java}
test("filter with subexpression elimination may cause query failed.") {
  withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
val df = Seq(-1, 1, 2).toDF("c1")

//register `plusOne` udf, and the function will failed if input was not a 
positive number.
spark.sqlContext.udf.register("plusOne",
  (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
positive number.") })

val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
3").collect()
assert(result.size === 1)
  }
} 

Caused by: org.apache.spark.SparkException: Must be positive number.
    at 
org.apache.spark.sql.DataFrameSuite.$anonfun$new$3(DataFrameSuite.scala:67)
    at scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.java:23)
    ... 20 more{code}
 

https://github.com/apache/spark/blob/0e186e8a19926f91810f3eaf174611b71e598de6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratePredicate.scala#L63

!image-2022-01-13-20-22-09-055.png!

 

 

  was:
 

The following test results will fail, the root cause was that the execution 
order of filter predicates had changed after subexpression elimination. So I 
think we should keep predicates execution order after subexpression elimination.
{code:java}
test("filter with subexpression elimination may cause query failed.") {
  withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
val df = Seq(-1, 1, 2).toDF("c1")

//register `plusOne` udf, and the function will failed if input was not a 
positive number.
spark.sqlContext.udf.register("plusOne",
  (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
positive number.") })

val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
3").collect()
assert(result.size === 1)
  }
} 

Caused by: org.apache.spark.SparkException: Must be positive number.
    at 
org.apache.spark.sql.DataFrameSuite.$anonfun$new$3(DataFrameSuite.scala:67)
    at scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.java:23)
    ... 20 more{code}
 

 


> Filter with subexpression elimination may cause query failed
> 
>
> Key: SPARK-37897
> URL: https://issues.apache.org/jira/browse/SPARK-37897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: hujiahua
>Priority: Major
> Attachments: image-2022-01-13-20-22-09-055.png
>
>
>  
> The following test results will fail, the root cause was that the execution 
> order of filter predicates had changed after subexpression elimination. So I 
> think we should keep predicates execution order after subexpression 
> elimination.
> {code:java}
> test("filter with subexpression elimination may cause query failed.") {
>   withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
> val df = Seq(-1, 1, 2).toDF("c1")
> //register `plusOne` udf, and the function will failed if input was not a 
> positive number.
> spark.sqlContext.udf.register("plusOne",
>   (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
> positive number.") })
> val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
> 3").collect()
> assert(result.size === 1)
>   }
> } 
> Caused by: org.apache.spark.SparkException: Must be positive number.
>     at 
> org.apache.spark.sql.DataFrameSuite.$anonfun$new$3(DataFrameSuite.scala:67)
>     at 
> scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.java:23)
>     ... 20 more{code}
>  
> https://github.com/apache/spark/blob/0e186e8a19926f91810f3eaf174611b71e598de6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratePredicate.scala#L63
> !image-2022-01-13-20-22-09-055.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37897) Filter with subexpression elimination may cause query failed

2022-01-13 Thread hujiahua (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated SPARK-37897:
-
Attachment: image-2022-01-13-20-22-09-055.png

> Filter with subexpression elimination may cause query failed
> 
>
> Key: SPARK-37897
> URL: https://issues.apache.org/jira/browse/SPARK-37897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: hujiahua
>Priority: Major
> Attachments: image-2022-01-13-20-22-09-055.png
>
>
>  
> The following test results will fail, the root cause was that the execution 
> order of filter predicates had changed after subexpression elimination. So I 
> think we should keep predicates execution order after subexpression 
> elimination.
> {code:java}
> test("filter with subexpression elimination may cause query failed.") {
>   withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
> val df = Seq(-1, 1, 2).toDF("c1")
> //register `plusOne` udf, and the function will failed if input was not a 
> positive number.
> spark.sqlContext.udf.register("plusOne",
>   (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
> positive number.") })
> val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
> 3").collect()
> assert(result.size === 1)
>   }
> } 
> Caused by: org.apache.spark.SparkException: Must be positive number.
>     at 
> org.apache.spark.sql.DataFrameSuite.$anonfun$new$3(DataFrameSuite.scala:67)
>     at 
> scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.java:23)
>     ... 20 more{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37897) Filter with subexpression elimination may cause query failed

2022-01-13 Thread hujiahua (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated SPARK-37897:
-
Description: 
 

The following test results will fail, the root cause was that the execution 
order of filter predicates had changed after subexpression elimination. So I 
think we should keep predicates execution order after subexpression elimination.
{code:java}
test("filter with subexpression elimination may cause query failed.") {
  withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
val df = Seq(-1, 1, 2).toDF("c1")

//register `plusOne` udf, and the function will failed if input was not a 
positive number.
spark.sqlContext.udf.register("plusOne",
  (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
positive number.") })

val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
3").collect()
assert(result.size === 1)
  }
} 

Caused by: org.apache.spark.SparkException: Must be positive number.
    at 
org.apache.spark.sql.DataFrameSuite.$anonfun$new$3(DataFrameSuite.scala:67)
    at scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.java:23)
    ... 20 more{code}
 

 

  was:
 

The following test results will fail, the root cause was that the execution 
order of filter predicates had changed after subexpression elimination. So I 
think we should keep predicates execution order after subexpression elimination.
{code:java}
test("filter with subexpression elimination may cause query failed.") {
  withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
val df = Seq(-1, 1, 2).toDF("c1")

//register `plusOne` udf, and the function will failed if input was not a 
positive number.
spark.sqlContext.udf.register("plusOne",
  (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
positive number.") })

val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
3").collect()
assert(result.size === 1)
  }
} {code}
 

 


> Filter with subexpression elimination may cause query failed
> 
>
> Key: SPARK-37897
> URL: https://issues.apache.org/jira/browse/SPARK-37897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: hujiahua
>Priority: Major
>
>  
> The following test results will fail, the root cause was that the execution 
> order of filter predicates had changed after subexpression elimination. So I 
> think we should keep predicates execution order after subexpression 
> elimination.
> {code:java}
> test("filter with subexpression elimination may cause query failed.") {
>   withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")) {
> val df = Seq(-1, 1, 2).toDF("c1")
> //register `plusOne` udf, and the function will failed if input was not a 
> positive number.
> spark.sqlContext.udf.register("plusOne",
>   (n: Int) => { if (n >= 0) n + 1 else throw new SparkException("Must be 
> positive number.") })
> val result = df.filter("c1 >= 0 and plusOne(c1) > 1 and plusOne(c1) < 
> 3").collect()
> assert(result.size === 1)
>   }
> } 
> Caused by: org.apache.spark.SparkException: Must be positive number.
>     at 
> org.apache.spark.sql.DataFrameSuite.$anonfun$new$3(DataFrameSuite.scala:67)
>     at 
> scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.java:23)
>     ... 20 more{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 134 matches

Mail list logo