[jira] [Assigned] (SPARK-40157) Make pyspark.files examples self-contained

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40157:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Make pyspark.files examples self-contained
> --
>
> Key: SPARK-40157
> URL: https://issues.apache.org/jira/browse/SPARK-40157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40157) Make pyspark.files examples self-contained

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40157:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Make pyspark.files examples self-contained
> --
>
> Key: SPARK-40157
> URL: https://issues.apache.org/jira/browse/SPARK-40157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40157) Make pyspark.files examples self-contained

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582738#comment-17582738
 ] 

Apache Spark commented on SPARK-40157:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37607

> Make pyspark.files examples self-contained
> --
>
> Key: SPARK-40157
> URL: https://issues.apache.org/jira/browse/SPARK-40157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40173) Make pyspark.taskcontext examples self-contained

2022-08-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582735#comment-17582735
 ] 

Hyukjin Kwon commented on SPARK-40173:
--

im working on this.

> Make pyspark.taskcontext examples self-contained
> 
>
> Key: SPARK-40173
> URL: https://issues.apache.org/jira/browse/SPARK-40173
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark, Spark Core
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40173) Make pyspark.taskcontext examples self-contained

2022-08-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-40173:


 Summary: Make pyspark.taskcontext examples self-contained
 Key: SPARK-40173
 URL: https://issues.apache.org/jira/browse/SPARK-40173
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark, Spark Core
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40172:


Assignee: Gengliang Wang  (was: Apache Spark)

> Temporarily disable flaky test cases in ImageFileFormatSuite
> 
>
> Key: SPARK-40172
> URL: https://issues.apache.org/jira/browse/SPARK-40172
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests:
> [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]
> Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I 
> suggest disabling them in OSS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582722#comment-17582722
 ] 

Apache Spark commented on SPARK-40172:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37605

> Temporarily disable flaky test cases in ImageFileFormatSuite
> 
>
> Key: SPARK-40172
> URL: https://issues.apache.org/jira/browse/SPARK-40172
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests:
> [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]
> Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I 
> suggest disabling them in OSS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582723#comment-17582723
 ] 

Apache Spark commented on SPARK-40172:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37605

> Temporarily disable flaky test cases in ImageFileFormatSuite
> 
>
> Key: SPARK-40172
> URL: https://issues.apache.org/jira/browse/SPARK-40172
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests:
> [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]
> Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I 
> suggest disabling them in OSS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40172:


Assignee: Apache Spark  (was: Gengliang Wang)

> Temporarily disable flaky test cases in ImageFileFormatSuite
> 
>
> Key: SPARK-40172
> URL: https://issues.apache.org/jira/browse/SPARK-40172
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests:
> [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]
> Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I 
> suggest disabling them in OSS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40171) Fix flaky tests in ImageFileFormatSuite

2022-08-21 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582721#comment-17582721
 ] 

Gengliang Wang commented on SPARK-40171:


cc [~weichenxu123] 

> Fix flaky tests in ImageFileFormatSuite
> ---
>
> Key: SPARK-40171
> URL: https://issues.apache.org/jira/browse/SPARK-40171
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>
> There are 3 test cases that become flaky in the GitHub action tests:
> [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]
> We should fix them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite

2022-08-21 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40172:
--

 Summary: Temporarily disable flaky test cases in 
ImageFileFormatSuite
 Key: SPARK-40172
 URL: https://issues.apache.org/jira/browse/SPARK-40172
 Project: Spark
  Issue Type: Test
  Components: ML, Tests
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests:

[https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]

Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I 
suggest disabling them in OSS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40171) Fix flaky tests in ImageFileFormatSuite

2022-08-21 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40171:
--

 Summary: Fix flaky tests in ImageFileFormatSuite
 Key: SPARK-40171
 URL: https://issues.apache.org/jira/browse/SPARK-40171
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 3.4.0
Reporter: Gengliang Wang


There are 3 test cases that become flaky in the GitHub action tests:

[https://github.com/apache/spark/runs/7941765326?check_suite_focus=true]

We should fix them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key

2022-08-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582709#comment-17582709
 ] 

Hyukjin Kwon commented on SPARK-40149:
--

[~karenfeng] FYI

> Star expansion after outer join asymmetrically includes joining key
> ---
>
> Key: SPARK-40149
> URL: https://issues.apache.org/jira/browse/SPARK-40149
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Otakar Truněček
>Priority: Major
>
> When star expansion is used on left side of a join, the result will include 
> joining key, while on the right side of join it doesn't. I would expect the 
> behaviour to be symmetric (either include on both sides or on neither). 
> Example:
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as f
> spark = SparkSession.builder.getOrCreate()
> df_left = spark.range(5).withColumn('val', f.lit('left'))
> df_right = spark.range(3, 7).withColumn('val', f.lit('right'))
> df_merged = (
> df_left
> .alias('left')
> .join(df_right.alias('right'), on='id', how='full_outer')
> .withColumn('left_all', f.struct('left.*'))
> .withColumn('right_all', f.struct('right.*'))
> )
> df_merged.show()
> {code}
> result:
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|   {0, left}|   {null}|
> |  1|left| null|   {1, left}|   {null}|
> |  2|left| null|   {2, left}|   {null}|
> |  3|left|right|   {3, left}|  {right}|
> |  4|left|right|   {4, left}|  {right}|
> |  5|null|right|{null, null}|  {right}|
> |  6|null|right|{null, null}|  {right}|
> +---++-++-+
> {code}
> This behaviour started with release 3.2.0. Previously the key was not 
> included on either side. 
> Result from Spark 3.1.3
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|  {left}|   {null}|
> |  6|null|right|  {null}|  {right}|
> |  5|null|right|  {null}|  {right}|
> |  1|left| null|  {left}|   {null}|
> |  3|left|right|  {left}|  {right}|
> |  2|left| null|  {left}|   {null}|
> |  4|left|right|  {left}|  {right}|
> +---++-++-+ {code}
> I have a gut feeling this is related to these issues:
> https://issues.apache.org/jira/browse/SPARK-39376
> https://issues.apache.org/jira/browse/SPARK-34527
> https://issues.apache.org/jira/browse/SPARK-38603
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40140) REST API for SQL level information does not show information on running queries

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40140.
--
Resolution: Cannot Reproduce

> REST API for SQL level information does not show information on running 
> queries
> ---
>
> Key: SPARK-40140
> URL: https://issues.apache.org/jira/browse/SPARK-40140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
> Attachments: running.png
>
>
> Hi All,
> We noticed that the SQL information REST API implemented in 
> https://issues.apache.org/jira/browse/SPARK-27142 does not return back SQL 
> queries which are currently running. We can only see queries which are 
> completed/failed.
> As far as I can see, this should be supported since one of the fields in the 
> returned JSON is "runningJobIds". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-21 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582707#comment-17582707
 ] 

XiDuo You commented on SPARK-39915:
---

We may need a more strict machine to ensure the output partition number of 
repartition

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40170) StringCoding UTF8 decode slowly

2022-08-21 Thread caican (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582708#comment-17582708
 ] 

caican commented on SPARK-40170:


gently ping [~sowen]  [~r...@databricks.com] 

> StringCoding UTF8 decode slowly
> ---
>
> Key: SPARK-40170
> URL: https://issues.apache.org/jira/browse/SPARK-40170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: caican
>Priority: Major
> Attachments: image-2022-08-22-10-56-54-768.png, 
> image-2022-08-22-10-57-11-744.png
>
>
> When `UnsafeRow` is converted to `Row` at 
> `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
>  `,  UTF8String decoding and copyMemory  process are very slow.
> Does anyone have any ideas for optimization?
> !image-2022-08-22-10-56-54-768.png!
>  
> !image-2022-08-22-10-57-11-744.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-21 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582703#comment-17582703
 ] 

XiDuo You commented on SPARK-39915:
---

Thank you [~yumwang] for ping me. I see this issue.

 

This is not only for empty relation optimization but also for other unary node 
which is at top of repartition, e.g.:
{code:java}
val df1 = spark.range(1).selectExpr("id as c1")
val df2 = spark.range(1).selectExpr("id as c2")
df1.join(df2, col("c1") === col("c2")).repartition(200, 
col("c1")).rdd.getNumPartitions 

-- output
1{code}
the `.rdd` of dataset will inject a unary node `DeserializeToObject`, so the 
protection of current AQE for repartition does not work. see `AQEUtils`. And 
the protection does not retain the  `RoundRobinPartitioning`, which makes this 
issue more complex.

 

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly

2022-08-21 Thread caican (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caican updated SPARK-40170:
---
Description: 
When `UnsafeRow` is converted to `Row` at 
`org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
 `,  UTF8String decoding and copyMemory  process are very slow.

Does anyone have any ideas for optimization?

!image-2022-08-22-10-56-54-768.png!

 

!image-2022-08-22-10-57-11-744.png!

  was:
When `UnsafeRow` is converted to `Row` at  
`org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
 `,  UTF8String decoding and copyMemory  process are very slow.

!image-2022-08-22-10-56-54-768.png!

 

!image-2022-08-22-10-57-11-744.png!


> StringCoding UTF8 decode slowly
> ---
>
> Key: SPARK-40170
> URL: https://issues.apache.org/jira/browse/SPARK-40170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: caican
>Priority: Major
> Attachments: image-2022-08-22-10-56-54-768.png, 
> image-2022-08-22-10-57-11-744.png
>
>
> When `UnsafeRow` is converted to `Row` at 
> `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
>  `,  UTF8String decoding and copyMemory  process are very slow.
> Does anyone have any ideas for optimization?
> !image-2022-08-22-10-56-54-768.png!
>  
> !image-2022-08-22-10-57-11-744.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly

2022-08-21 Thread caican (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caican updated SPARK-40170:
---
Attachment: image-2022-08-22-10-57-11-744.png

> StringCoding UTF8 decode slowly
> ---
>
> Key: SPARK-40170
> URL: https://issues.apache.org/jira/browse/SPARK-40170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: caican
>Priority: Major
> Attachments: image-2022-08-22-10-56-54-768.png, 
> image-2022-08-22-10-57-11-744.png
>
>
> When `UnsafeRow` is converted to `Row` at  
> `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
>  `,  UTF8String decoding and copyMemory  process are very slow.
> !image-2022-08-22-10-51-07-542.png!
>  
> !image-2022-08-22-10-56-04-574.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly

2022-08-21 Thread caican (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caican updated SPARK-40170:
---
Description: 
When `UnsafeRow` is converted to `Row` at  
`org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
 `,  UTF8String decoding and copyMemory  process are very slow.

!image-2022-08-22-10-56-54-768.png!

 

!image-2022-08-22-10-57-11-744.png!

  was:
When `UnsafeRow` is converted to `Row` at  
`org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
 `,  UTF8String decoding and copyMemory  process are very slow.

!image-2022-08-22-10-51-07-542.png!

 

!image-2022-08-22-10-56-04-574.png!


> StringCoding UTF8 decode slowly
> ---
>
> Key: SPARK-40170
> URL: https://issues.apache.org/jira/browse/SPARK-40170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: caican
>Priority: Major
> Attachments: image-2022-08-22-10-56-54-768.png, 
> image-2022-08-22-10-57-11-744.png
>
>
> When `UnsafeRow` is converted to `Row` at  
> `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
>  `,  UTF8String decoding and copyMemory  process are very slow.
> !image-2022-08-22-10-56-54-768.png!
>  
> !image-2022-08-22-10-57-11-744.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly

2022-08-21 Thread caican (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caican updated SPARK-40170:
---
Attachment: image-2022-08-22-10-56-54-768.png

> StringCoding UTF8 decode slowly
> ---
>
> Key: SPARK-40170
> URL: https://issues.apache.org/jira/browse/SPARK-40170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: caican
>Priority: Major
> Attachments: image-2022-08-22-10-56-54-768.png
>
>
> When `UnsafeRow` is converted to `Row` at  
> `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
>  `,  UTF8String decoding and copyMemory  process are very slow.
> !image-2022-08-22-10-51-07-542.png!
>  
> !image-2022-08-22-10-56-04-574.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40170) StringCoding UTF8 decode slowly

2022-08-21 Thread caican (Jira)
caican created SPARK-40170:
--

 Summary: StringCoding UTF8 decode slowly
 Key: SPARK-40170
 URL: https://issues.apache.org/jira/browse/SPARK-40170
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: caican
 Attachments: image-2022-08-22-10-56-54-768.png

When `UnsafeRow` is converted to `Row` at  
`org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow
 `,  UTF8String decoding and copyMemory  process are very slow.

!image-2022-08-22-10-51-07-542.png!

 

!image-2022-08-22-10-56-04-574.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40074) Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)

2022-08-21 Thread Anuj Gargava (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Gargava updated SPARK-40074:
-
Affects Version/s: 3.3.0

> Error while creating dataset in Java spark-3.x using Encoders bean with Dense 
> Vector. (Issue arises when updating spark from 2.4 to 3.x)
> 
>
> Key: SPARK-40074
> URL: https://issues.apache.org/jira/browse/SPARK-40074
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, ML, SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2
> Environment: Scala 2.12
> Spark 3.x
>Reporter: Anuj Gargava
>Priority: Major
>
> Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also 
> scala is upgraded from 2.11 to 2.12). 
> This java code below used to work with spark 2.4 but when migrated to 3.x it 
> gives the error (mentioned below) I have done my own research but couldn't 
> find a solution or any related information.
>  
>  
> {code:java|title=Code.java|borderStyle=solid}
> public void test() {
> final SparkSession spark = SparkSession.builder()
> .appName("Test")
> .getOrCreate();
> DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, 
> 0.24}));
> DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, 
> 0.32}));
> final List inputsNew = Arrays.asList(denseFactor1, denseFactor2);
> final Dataset denseVectorDf = spark.createDataset(inputsNew, 
> Encoders.bean(DenseClass.class));
> denseVectorDf.printSchema();
> }
> public static class DenseClass implements Serializable
> { private org.apache.spark.ml.linalg.DenseVector denseVector; }{code}
> The error occurs while creating the dataset *denseVectorDf* .
> Error
>  
> {noformat}
> }}
> {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from 
> struct<> to 
> struct,values:array>.
> The type path of the target object is:
>  - field (class: "org.apache.spark.ml.linalg.DenseVector", name: 
> "denseVector")
> You can either add an explicit cast to the input data or choose a higher 
> precision type of the field in the target object}}
> {{{noformat}
> I have tried to use _double_ instead of dense vector and it works just fine, 
> but fails on using the dense vector with encoders bean.
>  
> StackOverflow link for the issue: 
> [https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40074) Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)

2022-08-21 Thread Anuj Gargava (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Gargava updated SPARK-40074:
-
Affects Version/s: 3.2.2

> Error while creating dataset in Java spark-3.x using Encoders bean with Dense 
> Vector. (Issue arises when updating spark from 2.4 to 3.x)
> 
>
> Key: SPARK-40074
> URL: https://issues.apache.org/jira/browse/SPARK-40074
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, ML, SQL
>Affects Versions: 3.1.2, 3.2.2
> Environment: Scala 2.12
> Spark 3.x
>Reporter: Anuj Gargava
>Priority: Major
>
> Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also 
> scala is upgraded from 2.11 to 2.12). 
> This java code below used to work with spark 2.4 but when migrated to 3.x it 
> gives the error (mentioned below) I have done my own research but couldn't 
> find a solution or any related information.
>  
>  
> {code:java|title=Code.java|borderStyle=solid}
> public void test() {
> final SparkSession spark = SparkSession.builder()
> .appName("Test")
> .getOrCreate();
> DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, 
> 0.24}));
> DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, 
> 0.32}));
> final List inputsNew = Arrays.asList(denseFactor1, denseFactor2);
> final Dataset denseVectorDf = spark.createDataset(inputsNew, 
> Encoders.bean(DenseClass.class));
> denseVectorDf.printSchema();
> }
> public static class DenseClass implements Serializable
> { private org.apache.spark.ml.linalg.DenseVector denseVector; }{code}
> The error occurs while creating the dataset *denseVectorDf* .
> Error
>  
> {noformat}
> }}
> {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from 
> struct<> to 
> struct,values:array>.
> The type path of the target object is:
>  - field (class: "org.apache.spark.ml.linalg.DenseVector", name: 
> "denseVector")
> You can either add an explicit cast to the input data or choose a higher 
> precision type of the field in the target object}}
> {{{noformat}
> I have tried to use _double_ instead of dense vector and it works just fine, 
> but fails on using the dense vector with encoders bean.
>  
> StackOverflow link for the issue: 
> [https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-08-21 Thread Ivan Sadikov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582664#comment-17582664
 ] 

Ivan Sadikov commented on SPARK-40169:
--

I would like to work on it as it was my responsibility to come up with a proper 
fix for the original issue :). I will sync with [~chaosun] offline and we will 
come up with the strategy to address the problem properly.

> Fix the issue with Parquet column index and predicate pushdown in Data source 
> V1
> 
>
> Key: SPARK-40169
> URL: https://issues.apache.org/jira/browse/SPARK-40169
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Ivan Sadikov
>Priority: Major
>
> This is a follow for SPARK-39833. In 
> [https://github.com/apache/spark/pull/37419,] we disabled column index for 
> Parquet due to correctness issues that we found when filtering data on the 
> partition column overlapping with data schema.
>  
> This ticket is for permanent and thorough fix for the issue and re-enablement 
> of the column index. See more details in the PR linked above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-08-21 Thread Ivan Sadikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Sadikov updated SPARK-40169:
-
Description: 
This is a follow for SPARK-39833. In 
[https://github.com/apache/spark/pull/37419,] we disabled column index for 
Parquet due to correctness issues that we found when filtering data on the 
partition column overlapping with data schema.

 

This ticket is for permanent and thorough fix for the issue and re-enablement 
of the column index. See more details in the PR linked above.

  was:
This is a follow for SPARK-39833.

 

We disabled 


> Fix the issue with Parquet column index and predicate pushdown in Data source 
> V1
> 
>
> Key: SPARK-40169
> URL: https://issues.apache.org/jira/browse/SPARK-40169
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Ivan Sadikov
>Priority: Major
>
> This is a follow for SPARK-39833. In 
> [https://github.com/apache/spark/pull/37419,] we disabled column index for 
> Parquet due to correctness issues that we found when filtering data on the 
> partition column overlapping with data schema.
>  
> This ticket is for permanent and thorough fix for the issue and re-enablement 
> of the column index. See more details in the PR linked above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-08-21 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40169:


 Summary: Fix the issue with Parquet column index and predicate 
pushdown in Data source V1
 Key: SPARK-40169
 URL: https://issues.apache.org/jira/browse/SPARK-40169
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0, 3.3.1, 3.2.3
Reporter: Ivan Sadikov


This is a follow for SPARK-39833.

 

We disabled 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error

2022-08-21 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582643#comment-17582643
 ] 

Serge Rielau commented on SPARK-40156:
--

+ [~maxgekk] 

For new function. we should be using the new error framework:
[https://github.com/apache/spark/blob/master/core/src/main/resources/error/error-classes.json]

> url_decode() exposes a Java error
> -
>
> Key: SPARK-40156
> URL: https://issues.apache.org/jira/browse/SPARK-40156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> Given a badly encode string Spark returns a Java error.
> It should the return an ERROR_CLASS
> spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org');
> 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT 
> url_decode('http%3A%2F%2spark.apache.org')]
> java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
> escape (%) pattern - Error at index 1 in: "2s"
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:232)
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:142)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40168:


Assignee: Apache Spark

> Handle FileNotFoundException when shuffle file deleted in decommissioner
> 
>
> Key: SPARK-40168
> URL: https://issues.apache.org/jira/browse/SPARK-40168
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Apache Spark
>Priority: Major
>
> When shuffle files not found, decommissioner will handles IOException, but 
> the real exception is as below:
> {code:java}
> 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during 
> migrating migrate_shuffle_1_356
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>     at 
> org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111)
>     at scala.collection.immutable.List.foreach(List.scala:431)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to 
> /10.240.2.65:43481: java.io.FileNotFoundException: 
> /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index 
> (No such file or directory)
>     at 
> org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392)
>     at 
> org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
>     at 
> io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
>     at 
> io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
>     at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>     at 
> io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>     at 
> io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723)
>     at 
> io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308)
>     at 
> io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933)
>     at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895)
>     at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
>     at 
> io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>     at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
>     at 
> 

[jira] [Assigned] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40168:


Assignee: (was: Apache Spark)

> Handle FileNotFoundException when shuffle file deleted in decommissioner
> 
>
> Key: SPARK-40168
> URL: https://issues.apache.org/jira/browse/SPARK-40168
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Major
>
> When shuffle files not found, decommissioner will handles IOException, but 
> the real exception is as below:
> {code:java}
> 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during 
> migrating migrate_shuffle_1_356
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>     at 
> org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111)
>     at scala.collection.immutable.List.foreach(List.scala:431)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to 
> /10.240.2.65:43481: java.io.FileNotFoundException: 
> /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index 
> (No such file or directory)
>     at 
> org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392)
>     at 
> org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
>     at 
> io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
>     at 
> io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
>     at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>     at 
> io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>     at 
> io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723)
>     at 
> io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308)
>     at 
> io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933)
>     at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895)
>     at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
>     at 
> io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>     at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
>     at 
> 

[jira] [Commented] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582634#comment-17582634
 ] 

Apache Spark commented on SPARK-40168:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/37603

> Handle FileNotFoundException when shuffle file deleted in decommissioner
> 
>
> Key: SPARK-40168
> URL: https://issues.apache.org/jira/browse/SPARK-40168
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Major
>
> When shuffle files not found, decommissioner will handles IOException, but 
> the real exception is as below:
> {code:java}
> 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during 
> migrating migrate_shuffle_1_356
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>     at 
> org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111)
>     at scala.collection.immutable.List.foreach(List.scala:431)
>     at 
> org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to 
> /10.240.2.65:43481: java.io.FileNotFoundException: 
> /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index 
> (No such file or directory)
>     at 
> org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392)
>     at 
> org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
>     at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
>     at 
> io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
>     at 
> io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
>     at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>     at 
> io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>     at 
> io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723)
>     at 
> io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308)
>     at 
> io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933)
>     at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
>     at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895)
>     at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
>     at 
> io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>     at 
> 

[jira] [Updated] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner

2022-08-21 Thread Zhongwei Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-40168:
-
Description: 
When shuffle files not found, decommissioner will handles IOException, but the 
real exception is as below:
{code:java}
22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during 
migrating migrate_shuffle_1_356
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
    at 
org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to 
/10.240.2.65:43481: java.io.FileNotFoundException: 
/tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index (No 
such file or directory)
    at 
org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392)
    at 
org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
    at 
io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
    at 
io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
    at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at 
io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at 
io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723)
    at 
io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308)
    at 
io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933)
    at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895)
    at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
    at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
    at 
io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
    at 
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    ... 1 more
Caused by: java.io.FileNotFoundException: 

[jira] [Updated] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner

2022-08-21 Thread Zhongwei Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-40168:
-
Description: 

{code:java}
// Some comments here
public String getFoo()
{
return foo;
}
{code}
{code:java}
// code placeholder
{code}
When shuffle files not found, decommissioner will handles IOException, but the 
real exception is as below:

```

22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during 
migrating migrate_shuffle_1_356
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
    at 
org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to 
/10.240.2.65:43481: java.io.FileNotFoundException: 
/tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index (No 
such file or directory)
    at 
org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392)
    at 
org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
    at 
io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
    at 
io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
    at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at 
io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at 
io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723)
    at 
io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308)
    at 
io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933)
    at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895)
    at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
    at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
    at 
io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
    at 
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    ... 

[jira] [Created] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner

2022-08-21 Thread Zhongwei Zhu (Jira)
Zhongwei Zhu created SPARK-40168:


 Summary: Handle FileNotFoundException when shuffle file deleted in 
decommissioner
 Key: SPARK-40168
 URL: https://issues.apache.org/jira/browse/SPARK-40168
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Zhongwei Zhu


When shuffle files not found, decommissioner will handles IOException, but the 
real exception is as below:

```

22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during 
migrating migrate_shuffle_1_356
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
    at 
org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to 
/10.240.2.65:43481: java.io.FileNotFoundException: 
/tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index (No 
such file or directory)
    at 
org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392)
    at 
org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
    at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
    at 
io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
    at 
io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
    at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at 
io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at 
io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723)
    at 
io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308)
    at 
io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933)
    at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
    at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895)
    at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
    at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
    at 
io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
    at 
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at 

[jira] [Resolved] (SPARK-40152) Codegen compilation error when using split_part

2022-08-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40152.
--
Fix Version/s: 3.4.0
   3.3.1
 Assignee: Yuming Wang
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37589

> Codegen compilation error when using split_part
> ---
>
> Key: SPARK-40152
> URL: https://issues.apache.org/jira/browse/SPARK-40152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bruce Robbins
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>
> The following query throws an error:
> {noformat}
> create or replace temp view v1 as
> select * from values
> ('11.12.13', '.', 3)
> as v1(col1, col2, col3);
> cache table v1;
> SELECT split_part(col1, col2, col3)
> from v1;
> {noformat}
> The error is:
> {noformat}
> 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 42, Column 1: Expression "project_isNull_0 = false" is not a type
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 42, Column 1: Expression "project_isNull_0 = false" is not a type
>   at 
> org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934)
>   at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887)
>   at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811)
>   at org.codehaus.janino.Parser.parseBlock(Parser.java:1792)
>   at 
> {noformat}
> In the end, {{split_part}} does successfully execute, although in interpreted 
> mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40163.
--
Fix Version/s: 3.4.0
 Assignee: seunggabi
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37478

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Assignee: seunggabi
>Priority: Trivial
> Fix For: 3.4.0
>
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40167) Add array_sort(column, comparator) to SparkR

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40167:


Assignee: (was: Apache Spark)

> Add array_sort(column, comparator) to SparkR
> 
>
> Key: SPARK-40167
> URL: https://issues.apache.org/jira/browse/SPARK-40167
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
> available in R as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40167) Add array_sort(column, comparator) to SparkR

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40167:


Assignee: Apache Spark

> Add array_sort(column, comparator) to SparkR
> 
>
> Key: SPARK-40167
> URL: https://issues.apache.org/jira/browse/SPARK-40167
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
> available in R as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40167) Add array_sort(column, comparator) to SparkR

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582570#comment-17582570
 ] 

Apache Spark commented on SPARK-40167:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/37600

> Add array_sort(column, comparator) to SparkR
> 
>
> Key: SPARK-40167
> URL: https://issues.apache.org/jira/browse/SPARK-40167
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
> available in R as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40167) Add array_sort(column, comparator) to SparkR

2022-08-21 Thread Maciej Szymkiewicz (Jira)
Maciej Szymkiewicz created SPARK-40167:
--

 Summary: Add array_sort(column, comparator) to SparkR
 Key: SPARK-40167
 URL: https://issues.apache.org/jira/browse/SPARK-40167
 Project: Spark
  Issue Type: Improvement
  Components: R, SQL
Affects Versions: 3.4.0
Reporter: Maciej Szymkiewicz


SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
available in R as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582568#comment-17582568
 ] 

Apache Spark commented on SPARK-40164:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/37602

> The partitionSpec should be distinct keys after filter one row of row_number
> 
>
> Key: SPARK-40164
> URL: https://issues.apache.org/jira/browse/SPARK-40164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Minor
>
> For query
> {code:sql}
> SELECT *
>   FROM (
> SELECT *, row_number() over(partition by key order by value) rn
> FROM testData t
>   ) t1
>   WHERE rn=1
> {code}
> column *key* will be distinct 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40164:


Assignee: (was: Apache Spark)

> The partitionSpec should be distinct keys after filter one row of row_number
> 
>
> Key: SPARK-40164
> URL: https://issues.apache.org/jira/browse/SPARK-40164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Minor
>
> For query
> {code:sql}
> SELECT *
>   FROM (
> SELECT *, row_number() over(partition by key order by value) rn
> FROM testData t
>   ) t1
>   WHERE rn=1
> {code}
> column *key* will be distinct 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40164:


Assignee: Apache Spark

> The partitionSpec should be distinct keys after filter one row of row_number
> 
>
> Key: SPARK-40164
> URL: https://issues.apache.org/jira/browse/SPARK-40164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Minor
>
> For query
> {code:sql}
> SELECT *
>   FROM (
> SELECT *, row_number() over(partition by key order by value) rn
> FROM testData t
>   ) t1
>   WHERE rn=1
> {code}
> column *key* will be distinct 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40150) Dynamically merge File Splits

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582564#comment-17582564
 ] 

Apache Spark commented on SPARK-40150:
--

User 'jackylee-ch' has created a pull request for this issue:
https://github.com/apache/spark/pull/37601

> Dynamically merge File Splits
> -
>
> Key: SPARK-40150
> URL: https://issues.apache.org/jira/browse/SPARK-40150
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jackey Lee
>Priority: Major
>
> We currently use maxPartitionBytes and minPartitionNum to split files and use 
> openCostInBytes to merge file splits. But these are static configurations, 
> and the same configuration does not work in all scenarios.
> This PR attempts to dynamically merge file splits, taking into the 
> concurrency while processing more data in one task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40150) Dynamically merge File Splits

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40150:


Assignee: (was: Apache Spark)

> Dynamically merge File Splits
> -
>
> Key: SPARK-40150
> URL: https://issues.apache.org/jira/browse/SPARK-40150
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jackey Lee
>Priority: Major
>
> We currently use maxPartitionBytes and minPartitionNum to split files and use 
> openCostInBytes to merge file splits. But these are static configurations, 
> and the same configuration does not work in all scenarios.
> This PR attempts to dynamically merge file splits, taking into the 
> concurrency while processing more data in one task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40150) Dynamically merge File Splits

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40150:


Assignee: Apache Spark

> Dynamically merge File Splits
> -
>
> Key: SPARK-40150
> URL: https://issues.apache.org/jira/browse/SPARK-40150
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jackey Lee
>Assignee: Apache Spark
>Priority: Major
>
> We currently use maxPartitionBytes and minPartitionNum to split files and use 
> openCostInBytes to merge file splits. But these are static configurations, 
> and the same configuration does not work in all scenarios.
> This PR attempts to dynamically merge file splits, taking into the 
> concurrency while processing more data in one task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31

2022-08-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40162.
--
  Assignee: BingKun Pan
Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37597

> Upgrade RoaringBitmap from 0.9.30 to 0.9.31
> ---
>
> Key: SPARK-40162
> URL: https://issues.apache.org/jira/browse/SPARK-40162
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31
> [simplify BatchIterators, fix bug in advanceIfNeeded 
> (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] 
> [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40165.
--
  Assignee: BingKun Pan
Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37598

> Update test plugins to latest versions
> --
>
> Key: SPARK-40165
> URL: https://issues.apache.org/jira/browse/SPARK-40165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 3.4.0
>
>
> Include:
>  * 1.scalacheck (from 1.15.4 to 1.16.0)
>  * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7)
>  * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40165:
-
Priority: Trivial  (was: Minor)

> Update test plugins to latest versions
> --
>
> Key: SPARK-40165
> URL: https://issues.apache.org/jira/browse/SPARK-40165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Trivial
> Fix For: 3.4.0
>
>
> Include:
>  * 1.scalacheck (from 1.15.4 to 1.16.0)
>  * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7)
>  * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40163:


Assignee: (was: Apache Spark)

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Priority: Trivial
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40163:


Assignee: Apache Spark

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Assignee: Apache Spark
>Priority: Trivial
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582549#comment-17582549
 ] 

Apache Spark commented on SPARK-40163:
--

User 'seunggabi' has created a pull request for this issue:
https://github.com/apache/spark/pull/37478

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Priority: Trivial
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40166) Add array_sort(column, comparator) to PySpark

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40166:


Assignee: (was: Apache Spark)

> Add array_sort(column, comparator) to PySpark
> -
>
> Key: SPARK-40166
> URL: https://issues.apache.org/jira/browse/SPARK-40166
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
> available in Python as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40166) Add array_sort(column, comparator) to PySpark

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582544#comment-17582544
 ] 

Apache Spark commented on SPARK-40166:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/37600

> Add array_sort(column, comparator) to PySpark
> -
>
> Key: SPARK-40166
> URL: https://issues.apache.org/jira/browse/SPARK-40166
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
> available in Python as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40166) Add array_sort(column, comparator) to PySpark

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40166:


Assignee: Apache Spark

> Add array_sort(column, comparator) to PySpark
> -
>
> Key: SPARK-40166
> URL: https://issues.apache.org/jira/browse/SPARK-40166
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
> available in Python as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40166) Add array_sort(column, comparator) to PySpark

2022-08-21 Thread Maciej Szymkiewicz (Jira)
Maciej Szymkiewicz created SPARK-40166:
--

 Summary: Add array_sort(column, comparator) to PySpark
 Key: SPARK-40166
 URL: https://issues.apache.org/jira/browse/SPARK-40166
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 3.4.0
Reporter: Maciej Szymkiewicz


SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be 
available in Python as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-21 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582535#comment-17582535
 ] 

Qian Sun commented on SPARK-40148:
--

[~hyukjin.kwon] OK, I'll create a follow-up PR to do these :)

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40148
> URL: https://issues.apache.org/jira/browse/SPARK-40148
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582531#comment-17582531
 ] 

Apache Spark commented on SPARK-40165:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37598

> Update test plugins to latest versions
> --
>
> Key: SPARK-40165
> URL: https://issues.apache.org/jira/browse/SPARK-40165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-40165:

Description: 
Include:
 * 1.scalacheck (from 1.15.4 to 1.16.0)
 * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7)
 * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0)

 

> Update test plugins to latest versions
> --
>
> Key: SPARK-40165
> URL: https://issues.apache.org/jira/browse/SPARK-40165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> Include:
>  * 1.scalacheck (from 1.15.4 to 1.16.0)
>  * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7)
>  * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40165:


Assignee: (was: Apache Spark)

> Update test plugins to latest versions
> --
>
> Key: SPARK-40165
> URL: https://issues.apache.org/jira/browse/SPARK-40165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40165:


Assignee: Apache Spark

> Update test plugins to latest versions
> --
>
> Key: SPARK-40165
> URL: https://issues.apache.org/jira/browse/SPARK-40165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40165) Update test plugins to latest versions

2022-08-21 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-40165:
---

 Summary: Update test plugins to latest versions
 Key: SPARK-40165
 URL: https://issues.apache.org/jira/browse/SPARK-40165
 Project: Spark
  Issue Type: Improvement
  Components: Build, Tests
Affects Versions: 3.4.0
Reporter: BingKun Pan
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39833.
--
Fix Version/s: 3.3.1
   3.2.3
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 37419
[https://github.com/apache/spark/pull/37419]

> Filtered parquet data frame count() and show() produce inconsistent results 
> when spark.sql.parquet.filterPushdown is true
> -
>
> Key: SPARK-39833
> URL: https://issues.apache.org/jira/browse/SPARK-39833
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Michael Allman
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.1, 3.2.3, 3.4.0
>
>
> One of our data scientists discovered a problem wherein a data frame 
> `.show()` call printed non-empty results, but `.count()` printed 0. I've 
> narrowed the issue to a small, reproducible test case which exhibits this 
> aberrant behavior. In pyspark, run the following code:
> {code:python}
> from pyspark.sql.types import *
> parquet_pushdown_bug_df = spark.createDataFrame([{"COL0": int(0)}], 
> schema=StructType(fields=[StructField("COL0",IntegerType(),True)]))
> parquet_pushdown_bug_df.repartition(1).write.mode("overwrite").parquet("parquet_pushdown_bug/col0=0/parquet_pushdown_bug.parquet")
> reread_parquet_pushdown_bug_df = spark.read.parquet("parquet_pushdown_bug")
> reread_parquet_pushdown_bug_df.filter("col0 = 0").show()
> print(reread_parquet_pushdown_bug_df.filter("col0 = 0").count())
> {code}
> In my usage, this prints a data frame with 1 row and a count of 0. However, 
> disabling `spark.sql.parquet.filterPushdown` produces consistent results:
> {code:python}
> spark.conf.set("spark.sql.parquet.filterPushdown", False)
> reread_parquet_pushdown_bug_df.filter("col0 = 0").show()
> reread_parquet_pushdown_bug_df.filter("col0 = 0").count()
> {code}
> This will print the same data frame, however it will print a count of 1. The 
> key to triggering this bug is not just enabling 
> `spark.sql.parquet.filterPushdown` (which is enabled by default). The case of 
> the column in the data frame (before writing) must differ from the case of 
> the partition column in the file path, i.e. COL0 versus col0 or col0 versus 
> COL0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39833:


Assignee: Ivan Sadikov

> Filtered parquet data frame count() and show() produce inconsistent results 
> when spark.sql.parquet.filterPushdown is true
> -
>
> Key: SPARK-39833
> URL: https://issues.apache.org/jira/browse/SPARK-39833
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Michael Allman
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness
>
> One of our data scientists discovered a problem wherein a data frame 
> `.show()` call printed non-empty results, but `.count()` printed 0. I've 
> narrowed the issue to a small, reproducible test case which exhibits this 
> aberrant behavior. In pyspark, run the following code:
> {code:python}
> from pyspark.sql.types import *
> parquet_pushdown_bug_df = spark.createDataFrame([{"COL0": int(0)}], 
> schema=StructType(fields=[StructField("COL0",IntegerType(),True)]))
> parquet_pushdown_bug_df.repartition(1).write.mode("overwrite").parquet("parquet_pushdown_bug/col0=0/parquet_pushdown_bug.parquet")
> reread_parquet_pushdown_bug_df = spark.read.parquet("parquet_pushdown_bug")
> reread_parquet_pushdown_bug_df.filter("col0 = 0").show()
> print(reread_parquet_pushdown_bug_df.filter("col0 = 0").count())
> {code}
> In my usage, this prints a data frame with 1 row and a count of 0. However, 
> disabling `spark.sql.parquet.filterPushdown` produces consistent results:
> {code:python}
> spark.conf.set("spark.sql.parquet.filterPushdown", False)
> reread_parquet_pushdown_bug_df.filter("col0 = 0").show()
> reread_parquet_pushdown_bug_df.filter("col0 = 0").count()
> {code}
> This will print the same data frame, however it will print a count of 1. The 
> key to triggering this bug is not just enabling 
> `spark.sql.parquet.filterPushdown` (which is enabled by default). The case of 
> the column in the data frame (before writing) must differ from the case of 
> the partition column in the file path, i.e. COL0 versus col0 or col0 versus 
> COL0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number

2022-08-21 Thread Wan Kun (Jira)
Wan Kun created SPARK-40164:
---

 Summary: The partitionSpec should be distinct keys after filter 
one row of row_number
 Key: SPARK-40164
 URL: https://issues.apache.org/jira/browse/SPARK-40164
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Wan Kun


For query
{code:sql}
SELECT *
  FROM (
SELECT *, row_number() over(partition by key order by value) rn
FROM testData t
  ) t1
  WHERE rn=1
{code}
column *key* will be distinct 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582517#comment-17582517
 ] 

Hyukjin Kwon commented on SPARK-40148:
--

Oops, my bad. Resolving.

I just noticed that we don't have examples for several API such as rowsBetween. 
It would be good to have. feel free to create a follow up

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40148
> URL: https://issues.apache.org/jira/browse/SPARK-40148
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40148.
--
Resolution: Duplicate

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40148
> URL: https://issues.apache.org/jira/browse/SPARK-40148
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40161) Make Series.mode apply PandasMode

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40161:


Assignee: Ruifeng Zheng

> Make Series.mode apply PandasMode
> -
>
> Key: SPARK-40161
> URL: https://issues.apache.org/jira/browse/SPARK-40161
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40161) Make Series.mode apply PandasMode

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40161.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37596
[https://github.com/apache/spark/pull/37596]

> Make Series.mode apply PandasMode
> -
>
> Key: SPARK-40161
> URL: https://issues.apache.org/jira/browse/SPARK-40161
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39925) Add array_sort(column, comparator) overload to DataFrame operations

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39925.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37361
[https://github.com/apache/spark/pull/37361]

> Add array_sort(column, comparator) overload to DataFrame operations
> ---
>
> Key: SPARK-39925
> URL: https://issues.apache.org/jira/browse/SPARK-39925
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Brandon Dahler
>Assignee: Brandon Dahler
>Priority: Minor
> Fix For: 3.4.0
>
>
> The ability to use {{array_sort with a comparator was added in SPARK-29020; 
> however, the new signature wasn't made available to the DataFrame operations 
> API.}}
>  
> Proposed new signature:
> {code:java}
> package org.apache.spark.sql
> object functions {
>   ...
>   def array_sort(e: Column, comparator: (Column, Column) => Column): Column
>   ...
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39925) Add array_sort(column, comparator) overload to DataFrame operations

2022-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39925:


Assignee: Brandon Dahler

> Add array_sort(column, comparator) overload to DataFrame operations
> ---
>
> Key: SPARK-39925
> URL: https://issues.apache.org/jira/browse/SPARK-39925
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Brandon Dahler
>Assignee: Brandon Dahler
>Priority: Minor
>
> The ability to use {{array_sort with a comparator was added in SPARK-29020; 
> however, the new signature wasn't made available to the DataFrame operations 
> API.}}
>  
> Proposed new signature:
> {code:java}
> package org.apache.spark.sql
> object functions {
>   ...
>   def array_sort(e: Column, comparator: (Column, Column) => Column): Column
>   ...
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread seunggabi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

seunggabi updated SPARK-40163:
--
Affects Version/s: 3.3.0
   (was: 3.2.2)

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Priority: Trivial
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread seunggabi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

seunggabi updated SPARK-40163:
--
Description: 
[https://github.com/apache/spark/pull/37478] 

- as-is
{code:java}
private fun config(builder: SparkSession.Builder): SparkSession.Builder {
val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)

var b = builder
map.keys.forEach {
val k = it
val v = map[k]

b = when (v) {
is Long -> b.config(k, v)
is String -> b.config(k, v)
is Double -> b.config(k, v)
is Boolean -> b.config(k, v)
else -> b
}
}

return b
}
} {code}
- to-be
{code:java}
private fun config(builder: SparkSession.Builder): SparkSession.Builder {
val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)

return b.config(map)
}
} {code}

  was:https://github.com/apache/spark/pull/37478 
!image-2022-08-21-17-45-36-461.png!


> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: seunggabi
>Priority: Trivial
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-08-21 Thread seunggabi (Jira)
seunggabi created SPARK-40163:
-

 Summary: [SPARK][SQL] feat: SparkSession.confing(Map)
 Key: SPARK-40163
 URL: https://issues.apache.org/jira/browse/SPARK-40163
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.2
Reporter: seunggabi


https://github.com/apache/spark/pull/37478 !image-2022-08-21-17-45-36-461.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582489#comment-17582489
 ] 

Apache Spark commented on SPARK-40162:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37597

> Upgrade RoaringBitmap from 0.9.30 to 0.9.31
> ---
>
> Key: SPARK-40162
> URL: https://issues.apache.org/jira/browse/SPARK-40162
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31
> [simplify BatchIterators, fix bug in advanceIfNeeded 
> (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] 
> [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40162:


Assignee: Apache Spark

> Upgrade RoaringBitmap from 0.9.30 to 0.9.31
> ---
>
> Key: SPARK-40162
> URL: https://issues.apache.org/jira/browse/SPARK-40162
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31
> [simplify BatchIterators, fix bug in advanceIfNeeded 
> (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] 
> [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40162:


Assignee: (was: Apache Spark)

> Upgrade RoaringBitmap from 0.9.30 to 0.9.31
> ---
>
> Key: SPARK-40162
> URL: https://issues.apache.org/jira/browse/SPARK-40162
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31
> [simplify BatchIterators, fix bug in advanceIfNeeded 
> (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] 
> [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582488#comment-17582488
 ] 

Apache Spark commented on SPARK-40162:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37597

> Upgrade RoaringBitmap from 0.9.30 to 0.9.31
> ---
>
> Key: SPARK-40162
> URL: https://issues.apache.org/jira/browse/SPARK-40162
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31
> [simplify BatchIterators, fix bug in advanceIfNeeded 
> (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] 
> [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31

2022-08-21 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-40162:
---

 Summary: Upgrade RoaringBitmap from 0.9.30 to 0.9.31
 Key: SPARK-40162
 URL: https://issues.apache.org/jira/browse/SPARK-40162
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: BingKun Pan
 Fix For: 3.4.0


https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31

[simplify BatchIterators, fix bug in advanceIfNeeded 
(|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
 [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] 
[)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40161) Make Series.mode apply PandasMode

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40161:


Assignee: Apache Spark

> Make Series.mode apply PandasMode
> -
>
> Key: SPARK-40161
> URL: https://issues.apache.org/jira/browse/SPARK-40161
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40151) Fix return type for new median(interval) function

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582482#comment-17582482
 ] 

Apache Spark commented on SPARK-40151:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37595

> Fix return type for new median(interval) function 
> --
>
> Key: SPARK-40151
> URL: https://issues.apache.org/jira/browse/SPARK-40151
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Critical
>
> median() right now returns an interval of the same type as the input.
> We should instead match mean and avg():
> The result type is computed as for the arguments:
> - year-month interval: The result is an `INTERVAL YEAR TO MONTH`.
> - day-time interval: The result is an `INTERVAL DAY TO SECOND`.
> - In all other cases the result is a DOUBLE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40161) Make Series.mode apply PandasMode

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40161:


Assignee: (was: Apache Spark)

> Make Series.mode apply PandasMode
> -
>
> Key: SPARK-40161
> URL: https://issues.apache.org/jira/browse/SPARK-40161
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40161) Make Series.mode apply PandasMode

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582483#comment-17582483
 ] 

Apache Spark commented on SPARK-40161:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37596

> Make Series.mode apply PandasMode
> -
>
> Key: SPARK-40161
> URL: https://issues.apache.org/jira/browse/SPARK-40161
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40151) Fix return type for new median(interval) function

2022-08-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582481#comment-17582481
 ] 

Apache Spark commented on SPARK-40151:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37595

> Fix return type for new median(interval) function 
> --
>
> Key: SPARK-40151
> URL: https://issues.apache.org/jira/browse/SPARK-40151
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Critical
>
> median() right now returns an interval of the same type as the input.
> We should instead match mean and avg():
> The result type is computed as for the arguments:
> - year-month interval: The result is an `INTERVAL YEAR TO MONTH`.
> - day-time interval: The result is an `INTERVAL DAY TO SECOND`.
> - In all other cases the result is a DOUBLE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40151) Fix return type for new median(interval) function

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40151:


Assignee: (was: Apache Spark)

> Fix return type for new median(interval) function 
> --
>
> Key: SPARK-40151
> URL: https://issues.apache.org/jira/browse/SPARK-40151
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Critical
>
> median() right now returns an interval of the same type as the input.
> We should instead match mean and avg():
> The result type is computed as for the arguments:
> - year-month interval: The result is an `INTERVAL YEAR TO MONTH`.
> - day-time interval: The result is an `INTERVAL DAY TO SECOND`.
> - In all other cases the result is a DOUBLE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40151) Fix return type for new median(interval) function

2022-08-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40151:


Assignee: Apache Spark

> Fix return type for new median(interval) function 
> --
>
> Key: SPARK-40151
> URL: https://issues.apache.org/jira/browse/SPARK-40151
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Apache Spark
>Priority: Critical
>
> median() right now returns an interval of the same type as the input.
> We should instead match mean and avg():
> The result type is computed as for the arguments:
> - year-month interval: The result is an `INTERVAL YEAR TO MONTH`.
> - day-time interval: The result is an `INTERVAL DAY TO SECOND`.
> - In all other cases the result is a DOUBLE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40161) Make Series.mode apply PandasMode

2022-08-21 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40161:
-

 Summary: Make Series.mode apply PandasMode
 Key: SPARK-40161
 URL: https://issues.apache.org/jira/browse/SPARK-40161
 Project: Spark
  Issue Type: Improvement
  Components: ps
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org