[jira] [Created] (SPARK-43446) Upgrade Apache Arrow to 12.0.0

2023-05-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-43446:
-

 Summary: Upgrade Apache Arrow to 12.0.0
 Key: SPARK-43446
 URL: https://issues.apache.org/jira/browse/SPARK-43446
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43424) Support vanila JDBC CHAR/VARCHAR through STS

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43424.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41102
[https://github.com/apache/spark/pull/41102]

> Support vanila JDBC CHAR/VARCHAR through STS 
> -
>
> Key: SPARK-43424
> URL: https://issues.apache.org/jira/browse/SPARK-43424
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43424) Support vanila JDBC CHAR/VARCHAR through STS

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43424:
-

Assignee: Kent Yao

> Support vanila JDBC CHAR/VARCHAR through STS 
> -
>
> Key: SPARK-43424
> URL: https://issues.apache.org/jira/browse/SPARK-43424
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43445) Enable GroupBySlowTests.test_split_apply_combine_on_series for pandas 2.0.0.

2023-05-10 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43445:
---

 Summary: Enable 
GroupBySlowTests.test_split_apply_combine_on_series for pandas 2.0.0.
 Key: SPARK-43445
 URL: https://issues.apache.org/jira/browse/SPARK-43445
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Enable GroupBySlowTests.test_split_apply_combine_on_series for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43444) Enable GroupBySlowTests.test_value_counts for pandas 2.0.0.

2023-05-10 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43444:
---

 Summary: Enable GroupBySlowTests.test_value_counts for pandas 
2.0.0.
 Key: SPARK-43444
 URL: https://issues.apache.org/jira/browse/SPARK-43444
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Enable GroupBySlowTests.test_value_counts for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43441) makeDotNode should not fail when DeterministicLevel is absent

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43441.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

Issue resolved by pull request 41124
[https://github.com/apache/spark/pull/41124]

> makeDotNode should not fail when DeterministicLevel is absent
> -
>
> Key: SPARK-43441
> URL: https://issues.apache.org/jira/browse/SPARK-43441
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Qi Tan
>Assignee: Qi Tan
>Priority: Minor
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43443) Add benchmark for Timestamp type inference when use invalid value

2023-05-10 Thread Jia Fan (Jira)
Jia Fan created SPARK-43443:
---

 Summary: Add benchmark for Timestamp type inference when use 
invalid value
 Key: SPARK-43443
 URL: https://issues.apache.org/jira/browse/SPARK-43443
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Jia Fan


We need a benchmark to measure whether our optimization of Timestamp type 
inference is useful, we have valid Timestamp value benchmark at now, but don't 
have invalid Timestamp value benchmark when use Timestamp type inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43442) Split test module `pyspark_pandas_connect`

2023-05-10 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721609#comment-17721609
 ] 

Snoot.io commented on SPARK-43442:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41127

> Split test module `pyspark_pandas_connect`
> --
>
> Key: SPARK-43442
> URL: https://issues.apache.org/jira/browse/SPARK-43442
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43442) Split test module `pyspark_pandas_connect`

2023-05-10 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721610#comment-17721610
 ] 

Snoot.io commented on SPARK-43442:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41127

> Split test module `pyspark_pandas_connect`
> --
>
> Key: SPARK-43442
> URL: https://issues.apache.org/jira/browse/SPARK-43442
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43403) GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is closed

2023-05-10 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721605#comment-17721605
 ] 

Snoot.io commented on SPARK-43403:
--

User 'zhouyifan279' has created a pull request for this issue:
https://github.com/apache/spark/pull/41105

> GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is 
> closed
> --
>
> Key: SPARK-43403
> URL: https://issues.apache.org/jira/browse/SPARK-43403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Zhou Yifan
>Priority: Major
> Attachments: image-2023-05-08-11-33-13-634.png
>
>
> !image-2023-05-08-11-33-13-634.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43442) Split test module `pyspark_pandas_connect`

2023-05-10 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-43442:
-

 Summary: Split test module `pyspark_pandas_connect`
 Key: SPARK-43442
 URL: https://issues.apache.org/jira/browse/SPARK-43442
 Project: Spark
  Issue Type: Test
  Components: Connect, PySpark, Tests
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43425) Add TimestampNTZType to ColumnarBatchRow

2023-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-43425:
-
Fix Version/s: 3.4.1

> Add TimestampNTZType to ColumnarBatchRow
> 
>
> Key: SPARK-43425
> URL: https://issues.apache.org/jira/browse/SPARK-43425
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43440) Support registration of an Arrow-optimized Python UDF

2023-05-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43440:
-
Description: 
Currently, when users register an Arrow-optimized Python UDF, it will be 
registered as a pickled Python UDF and thus, executed without Arrow 
optimization. 
We should support Arrow-optimized Python UDFs registration and execute them 
with Arrow optimization.

  was:Support registration of an Arrow-optimized Python UDF


> Support registration of an Arrow-optimized Python UDF 
> --
>
> Key: SPARK-43440
> URL: https://issues.apache.org/jira/browse/SPARK-43440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Currently, when users register an Arrow-optimized Python UDF, it will be 
> registered as a pickled Python UDF and thus, executed without Arrow 
> optimization. 
> We should support Arrow-optimized Python UDFs registration and execute them 
> with Arrow optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43441) makeDotNode should not fail when DeterministicLevel is absent

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43441:
-

Assignee: Qi Tan

> makeDotNode should not fail when DeterministicLevel is absent
> -
>
> Key: SPARK-43441
> URL: https://issues.apache.org/jira/browse/SPARK-43441
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Qi Tan
>Assignee: Qi Tan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43441) makeDotNode should not fail when DeterministicLevel is absent

2023-05-10 Thread Qi Tan (Jira)
Qi Tan created SPARK-43441:
--

 Summary: makeDotNode should not fail when DeterministicLevel is 
absent
 Key: SPARK-43441
 URL: https://issues.apache.org/jira/browse/SPARK-43441
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Qi Tan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43440) Support registration of an Arrow-optimized Python UDF

2023-05-10 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-43440:


 Summary: Support registration of an Arrow-optimized Python UDF 
 Key: SPARK-43440
 URL: https://issues.apache.org/jira/browse/SPARK-43440
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Xinrong Meng


Support registration of an Arrow-optimized Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42523) Apache Spark 3.4 release

2023-05-10 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721549#comment-17721549
 ] 

Xinrong Meng commented on SPARK-42523:
--

I am wondering if we shall keep the ticket open for minor releases such as the 
upcoming 3.4.1.

> Apache Spark 3.4 release
> 
>
> Key: SPARK-42523
> URL: https://issues.apache.org/jira/browse/SPARK-42523
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> An umbrella for Apache Spark 3.4 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43412) Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs

2023-05-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-43412.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41053
[https://github.com/apache/spark/pull/41053]

> Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs
> --
>
> Key: SPARK-43412
> URL: https://issues.apache.org/jira/browse/SPARK-43412
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>
> We are about to improve nested non-atomic input/output support of an 
> Arrow-optimized Python UDF.
> However, currently, it shares the same EvalType with a pickled Python UDF, 
> but the same implementation with a Pandas UDF.
> Introducing an EvalType enables isolating the changes to Arrow-optimized 
> Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43430) ExecutePlanRequest should have the ability to set request options.

2023-05-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-43430.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> ExecutePlanRequest should have the ability to set request options.
> --
>
> Key: SPARK-43430
> URL: https://issues.apache.org/jira/browse/SPARK-43430
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43430) ExecutePlanRequest should have the ability to set request options.

2023-05-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-43430:
-

Assignee: Martin Grund

> ExecutePlanRequest should have the ability to set request options.
> --
>
> Key: SPARK-43430
> URL: https://issues.apache.org/jira/browse/SPARK-43430
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias

2023-05-10 Thread Frederik Paradis (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederik Paradis updated SPARK-43439:
-
Affects Version/s: 3.4.0
   (was: 3.3.2)

> Drop does not work when passed a string with an alias
> -
>
> Key: SPARK-43439
> URL: https://issues.apache.org/jira/browse/SPARK-43439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Frederik Paradis
>Priority: Major
>
> When passing a string to the drop method, if the string contains an alias, 
> the column is not dropped. However, passing a column object with the same 
> name and alias, it works.
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as F
> spark = 
> SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
> df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
> j = df.drop("a.hour")
> print(j)  # DataFrame[any: bigint, hour: bigint]
> jj = df.drop(F.col("a.hour"))
> print(jj)  # DataFrame[any: bigint]
> {code}
>  
> Related issues:
> https://issues.apache.org/jira/browse/SPARK-31123
> https://issues.apache.org/jira/browse/SPARK-14759
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43439) Drop does not work when passed a string with an alias

2023-05-10 Thread Frederik Paradis (Jira)
Frederik Paradis created SPARK-43439:


 Summary: Drop does not work when passed a string with an alias
 Key: SPARK-43439
 URL: https://issues.apache.org/jira/browse/SPARK-43439
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.3.2
Reporter: Frederik Paradis


When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.

{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
df1 = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
j = df1.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df1.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias

2023-05-10 Thread Frederik Paradis (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederik Paradis updated SPARK-43439:
-
Description: 
When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.
{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
j = df.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 

  was:
When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.

{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
df1 = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
j = df1.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df1.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 


> Drop does not work when passed a string with an alias
> -
>
> Key: SPARK-43439
> URL: https://issues.apache.org/jira/browse/SPARK-43439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.2
>Reporter: Frederik Paradis
>Priority: Major
>
> When passing a string to the drop method, if the string contains an alias, 
> the column is not dropped. However, passing a column object with the same 
> name and alias, it works.
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as F
> spark = 
> SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
> df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
> j = df.drop("a.hour")
> print(j)  # DataFrame[any: bigint, hour: bigint]
> jj = df.drop(F.col("a.hour"))
> print(jj)  # DataFrame[any: bigint]
> {code}
>  
> Related issues:
> https://issues.apache.org/jira/browse/SPARK-31123
> https://issues.apache.org/jira/browse/SPARK-14759
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias

2023-05-10 Thread Frederik Paradis (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederik Paradis updated SPARK-43439:
-
Description: 
When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.
{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()

df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
j = df.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 

  was:
When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.
{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
j = df.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 


> Drop does not work when passed a string with an alias
> -
>
> Key: SPARK-43439
> URL: https://issues.apache.org/jira/browse/SPARK-43439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.2
>Reporter: Frederik Paradis
>Priority: Major
>
> When passing a string to the drop method, if the string contains an alias, 
> the column is not dropped. However, passing a column object with the same 
> name and alias, it works.
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as F
> spark = 
> SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
> df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
> j = df.drop("a.hour")
> print(j)  # DataFrame[any: bigint, hour: bigint]
> jj = df.drop(F.col("a.hour"))
> print(jj)  # DataFrame[any: bigint]
> {code}
>  
> Related issues:
> https://issues.apache.org/jira/browse/SPARK-31123
> https://issues.apache.org/jira/browse/SPARK-14759
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias

2023-05-10 Thread Frederik Paradis (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederik Paradis updated SPARK-43439:
-
Description: 
When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.
{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()

df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")

j = df.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 

  was:
When passing a string to the drop method, if the string contains an alias, the 
column is not dropped. However, passing a column object with the same name and 
alias, it works.
{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()

df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
j = df.drop("a.hour")
print(j)  # DataFrame[any: bigint, hour: bigint]

jj = df.drop(F.col("a.hour"))
print(jj)  # DataFrame[any: bigint]
{code}
 

Related issues:

https://issues.apache.org/jira/browse/SPARK-31123

https://issues.apache.org/jira/browse/SPARK-14759

 


> Drop does not work when passed a string with an alias
> -
>
> Key: SPARK-43439
> URL: https://issues.apache.org/jira/browse/SPARK-43439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.2
>Reporter: Frederik Paradis
>Priority: Major
>
> When passing a string to the drop method, if the string contains an alias, 
> the column is not dropped. However, passing a column object with the same 
> name and alias, it works.
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as F
> spark = 
> SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
> df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a")
> j = df.drop("a.hour")
> print(j)  # DataFrame[any: bigint, hour: bigint]
> jj = df.drop(F.col("a.hour"))
> print(jj)  # DataFrame[any: bigint]
> {code}
>  
> Related issues:
> https://issues.apache.org/jira/browse/SPARK-31123
> https://issues.apache.org/jira/browse/SPARK-14759
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43438) Fix mismatched column list error on INSERT

2023-05-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43438:


 Summary: Fix mismatched column list error on INSERT
 Key: SPARK-43438
 URL: https://issues.apache.org/jira/browse/SPARK-43438
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


This error message is pretty bad, and common
"_LEGACY_ERROR_TEMP_1038" : {
"message" : [
"Cannot write to table due to mismatched user specified column 
size() and data column size()."
]
},

It can perhaps be merged with this one - after giving it an ERROR_CLASS

"_LEGACY_ERROR_TEMP_1168" : {
"message" : [
" requires that the data to be inserted have the same number of 
columns as the target table: target table has  column(s) but the 
inserted data has  column(s), including  
partition column(s) having constant value(s)."
]
},



Repro:

CREATE TABLE tabtest(c1 INT, c2 INT);


INSERT INTO tabtest SELECT 1;

`spark_catalog`.`default`.`tabtest` requires that the data to be inserted have 
the same number of columns as the target table: target table has 2 column(s) 
but the inserted data has 1 column(s), including 0 partition column(s) having 
constant value(s).

INSERT INTO tabtest(c1) SELECT 1, 2, 3;
Cannot write to table due to mismatched user specified column size(1) and data 
column size(3).; line 1 pos 24


 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35198) Add support for calling debugCodegen from Python & Java

2023-05-10 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721475#comment-17721475
 ] 

Ignite TC Bot commented on SPARK-35198:
---

User 'juanvisoler' has created a pull request for this issue:
https://github.com/apache/spark/pull/40608

> Add support for calling debugCodegen from Python & Java
> ---
>
> Key: SPARK-35198
> URL: https://issues.apache.org/jira/browse/SPARK-35198
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.2.0
>Reporter: Holden Karau
>Priority: Minor
>  Labels: starter
>
> Because it is implimented with an implicit conversion it's a bit complicated 
> to call, we should add a direct method to get debug state for Java & Python 
> users of Dataframes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43437) Upgrade Arrow to 12.0.0

2023-05-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-43437:


 Summary: Upgrade Arrow to 12.0.0
 Key: SPARK-43437
 URL: https://issues.apache.org/jira/browse/SPARK-43437
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43014:
--
Affects Version/s: 3.5.0
   (was: 3.4.0)
   (was: 3.3.2)

> Support spark.kubernetes.setSubmitTimeInDriver
> --
>
> Key: SPARK-43014
> URL: https://issues.apache.org/jira/browse/SPARK-43014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Minor
> Fix For: 3.5.0
>
>
> If submit Spark in k8s cluster mode, `spark.app.submitTime` will be 
> overwritten when driver starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43014:
--
Summary: Support spark.kubernetes.setSubmitTimeInDriver  (was: 
spark.app.submitTime is not right in k8s cluster mode)

> Support spark.kubernetes.setSubmitTimeInDriver
> --
>
> Key: SPARK-43014
> URL: https://issues.apache.org/jira/browse/SPARK-43014
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Major
> Fix For: 3.5.0
>
>
> If submit Spark in k8s cluster mode, `spark.app.submitTime` will be 
> overwritten when driver starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43014:
--
Priority: Minor  (was: Major)

> Support spark.kubernetes.setSubmitTimeInDriver
> --
>
> Key: SPARK-43014
> URL: https://issues.apache.org/jira/browse/SPARK-43014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Minor
> Fix For: 3.5.0
>
>
> If submit Spark in k8s cluster mode, `spark.app.submitTime` will be 
> overwritten when driver starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43014:
--
Issue Type: Improvement  (was: Bug)

> Support spark.kubernetes.setSubmitTimeInDriver
> --
>
> Key: SPARK-43014
> URL: https://issues.apache.org/jira/browse/SPARK-43014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Major
> Fix For: 3.5.0
>
>
> If submit Spark in k8s cluster mode, `spark.app.submitTime` will be 
> overwritten when driver starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43014) spark.app.submitTime is not right in k8s cluster mode

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43014.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40645
[https://github.com/apache/spark/pull/40645]

> spark.app.submitTime is not right in k8s cluster mode
> -
>
> Key: SPARK-43014
> URL: https://issues.apache.org/jira/browse/SPARK-43014
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Major
> Fix For: 3.5.0
>
>
> If submit Spark in k8s cluster mode, `spark.app.submitTime` will be 
> overwritten when driver starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43014) spark.app.submitTime is not right in k8s cluster mode

2023-05-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43014:
-

Assignee: Zhou Yifan

> spark.app.submitTime is not right in k8s cluster mode
> -
>
> Key: SPARK-43014
> URL: https://issues.apache.org/jira/browse/SPARK-43014
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Major
>
> If submit Spark in k8s cluster mode, `spark.app.submitTime` will be 
> overwritten when driver starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43436) Upgrade rocksdbjni to 8.1.1.1

2023-05-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-43436:


 Summary: Upgrade rocksdbjni to 8.1.1.1
 Key: SPARK-43436
 URL: https://issues.apache.org/jira/browse/SPARK-43436
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Yang Jie


https://github.com/facebook/rocksdb/releases/tag/v8.1.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43405) Remove useless code in `ScriptInputOutputSchema`

2023-05-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-43405:
-

Assignee: Jia Fan

> Remove useless code in `ScriptInputOutputSchema`
> 
>
> Key: SPARK-43405
> URL: https://issues.apache.org/jira/browse/SPARK-43405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Minor
>
> In case class `ScriptInputOutputSchema`, some method like `getRowFormatSQL` 
> naver be used. So we can remove it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43405) Remove useless code in `ScriptInputOutputSchema`

2023-05-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-43405.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Remove useless code in `ScriptInputOutputSchema`
> 
>
> Key: SPARK-43405
> URL: https://issues.apache.org/jira/browse/SPARK-43405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Minor
> Fix For: 3.5.0
>
>
> In case class `ScriptInputOutputSchema`, some method like `getRowFormatSQL` 
> naver be used. So we can remove it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40912) Overhead of Exceptions in DeserializationStream

2023-05-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40912:


Assignee: Emil Ejbyfeldt

> Overhead of Exceptions in DeserializationStream 
> 
>
> Key: SPARK-40912
> URL: https://issues.apache.org/jira/browse/SPARK-40912
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Minor
>
> The interface of DeserializationStream forces implementation to raise 
> EOFException to indicate that there is no more data. And for the 
> KryoDeserializtionStream it even worse since the kryo library does not raise 
> EOFException we pay for the price of two exceptions for each stream. For 
> large shuffles with lots of small stream this is quite a bit large overhead 
> (seen couple % of cpu time). It also less safe to depend exceptions as it 
> might me raised for different reasons like corrupt data and that currently 
> cause data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40912) Overhead of Exceptions in DeserializationStream

2023-05-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40912.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 38428
[https://github.com/apache/spark/pull/38428]

> Overhead of Exceptions in DeserializationStream 
> 
>
> Key: SPARK-40912
> URL: https://issues.apache.org/jira/browse/SPARK-40912
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Minor
> Fix For: 3.5.0
>
>
> The interface of DeserializationStream forces implementation to raise 
> EOFException to indicate that there is no more data. And for the 
> KryoDeserializtionStream it even worse since the kryo library does not raise 
> EOFException we pay for the price of two exceptions for each stream. For 
> large shuffles with lots of small stream this is quite a bit large overhead 
> (seen couple % of cpu time). It also less safe to depend exceptions as it 
> might me raised for different reasons like corrupt data and that currently 
> cause data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43434) Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`

2023-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43434:


Assignee: Ruifeng Zheng

> Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`
> ---
>
> Key: SPARK-43434
> URL: https://issues.apache.org/jira/browse/SPARK-43434
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43434) Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`

2023-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43434.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41114
[https://github.com/apache/spark/pull/41114]

> Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`
> ---
>
> Key: SPARK-43434
> URL: https://issues.apache.org/jira/browse/SPARK-43434
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37942) Use error classes in the compilation errors of properties

2023-05-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-37942:


Assignee: Narek Karapetian

> Use error classes in the compilation errors of properties
> -
>
> Key: SPARK-37942
> URL: https://issues.apache.org/jira/browse/SPARK-37942
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Narek Karapetian
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * cannotReadCorruptedTablePropertyError
> * cannotCreateJDBCNamespaceWithPropertyError
> * cannotSetJDBCNamespaceWithPropertyError
> * cannotUnsetJDBCNamespaceWithPropertyError
> * alterTableSerDePropertiesNotSupportedForV2TablesError
> * unsetNonExistentPropertyError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37942) Use error classes in the compilation errors of properties

2023-05-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-37942.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41018
[https://github.com/apache/spark/pull/41018]

> Use error classes in the compilation errors of properties
> -
>
> Key: SPARK-37942
> URL: https://issues.apache.org/jira/browse/SPARK-37942
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Narek Karapetian
>Priority: Major
> Fix For: 3.5.0
>
>
> Migrate the following errors in QueryCompilationErrors:
> * cannotReadCorruptedTablePropertyError
> * cannotCreateJDBCNamespaceWithPropertyError
> * cannotSetJDBCNamespaceWithPropertyError
> * cannotUnsetJDBCNamespaceWithPropertyError
> * alterTableSerDePropertiesNotSupportedForV2TablesError
> * unsetNonExistentPropertyError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43386) Improve list of suggested column/attributes in `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class

2023-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721264#comment-17721264
 ] 

ASF GitHub Bot commented on SPARK-43386:


User 'vitaliili-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/41038

> Improve list of suggested column/attributes in 
> `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class
> --
>
> Key: SPARK-43386
> URL: https://issues.apache.org/jira/browse/SPARK-43386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Vitalii Li
>Priority: Major
>
> Match the style of unresolved column/attribute when sorting list of suggested 
> columns. If an unresolved column name is single-part identifier - use same 
> style for suggested columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43386) Improve list of suggested column/attributes in `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class

2023-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721263#comment-17721263
 ] 

ASF GitHub Bot commented on SPARK-43386:


User 'vitaliili-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/41038

> Improve list of suggested column/attributes in 
> `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class
> --
>
> Key: SPARK-43386
> URL: https://issues.apache.org/jira/browse/SPARK-43386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Vitalii Li
>Priority: Major
>
> Match the style of unresolved column/attribute when sorting list of suggested 
> columns. If an unresolved column name is single-part identifier - use same 
> style for suggested columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43422) Tags are lost on LogicalRelation when adding _metadata

2023-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43422:


Assignee: Jan-Ole Sasse

> Tags are lost on LogicalRelation when adding _metadata
> --
>
> Key: SPARK-43422
> URL: https://issues.apache.org/jira/browse/SPARK-43422
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.4.0
>Reporter: Jan-Ole Sasse
>Assignee: Jan-Ole Sasse
>Priority: Minor
>
> The  AddMetadataColumns does not copy tags for the LogicalRelation when 
> adding metadata output in addMetadataCol



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43422) Tags are lost on LogicalRelation when adding _metadata

2023-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43422.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41104
[https://github.com/apache/spark/pull/41104]

> Tags are lost on LogicalRelation when adding _metadata
> --
>
> Key: SPARK-43422
> URL: https://issues.apache.org/jira/browse/SPARK-43422
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.4.0
>Reporter: Jan-Ole Sasse
>Assignee: Jan-Ole Sasse
>Priority: Minor
> Fix For: 3.5.0
>
>
> The  AddMetadataColumns does not copy tags for the LogicalRelation when 
> adding metadata output in addMetadataCol



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43435) re-enable doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`

2023-05-10 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-43435:
-

 Summary: re-enable doctest 
`pyspark.sql.connect.dataframe.DataFrame.writeStream`
 Key: SPARK-43435
 URL: https://issues.apache.org/jira/browse/SPARK-43435
 Project: Spark
  Issue Type: Test
  Components: Connect, Tests
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43434) Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`

2023-05-10 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-43434:
-

 Summary: Disable flaky doctest 
`pyspark.sql.connect.dataframe.DataFrame.writeStream`
 Key: SPARK-43434
 URL: https://issues.apache.org/jira/browse/SPARK-43434
 Project: Spark
  Issue Type: Test
  Components: Connect, Tests
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43427) Unsigned integer types are deserialized as signed numeric equivalents

2023-05-10 Thread Parth Upadhyay (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Upadhyay updated SPARK-43427:
---
Description: 
I'm not sure if "bug" is the correct tag for this jira, but i've tagged it like 
that for now since the behavior seems odd, happy to update to "improvement" or 
something else based on the conversation!

h2. Issue

Protobuf supports unsigned integer types, including `uint32` and `uint64`. When 
deserializing protobuf values with fields of these types, uint32 is converted 
to `IntegerType` and uint64 is converted to `LongType` in the resulting spark 
struct. `IntegerType` and `LongType` are 
[signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer 
types, so this can lead to confusing results.

Namely, if a uint32 value in a stored proto is above 2^31 or a uint64 value is 
above 2^63, their representation in binary will contain a 1 in the highest bit, 
which when interpreted as a signed integer will come out as negative (I.e. 
overflow).

I propose that we deserialize unsigned integer types into a type that can 
contain them correctly, e.g.
uint32 => `LongType`
uint64 => `Decimal(20, 0)`

h2. Backwards Compatibility / Default Behavior
Should we maintain backwards compatibility and we add an option that allows 
deserializing these types differently? Or should we change change the default 
behavior (with an option to go back to the old way)? 

I think by default it makes more sense to deserialize them as the larger types 
so that it's semantically more correct. However, there may be existing users of 
this library that would be affected by this behavior change. Though, maybe we 
can justify the change since the function is tagged as `Experimental` (and 
spark 3.4.0 was only released very recently).

h2. Precedent
I believe that unsigned integer types in parquet are deserialized in a similar 
manner, i.e. put into a larger type so that the unsigned representation 
natively fits. https://issues.apache.org/jira/browse/SPARK-34817 and 
https://github.com/apache/spark/pull/31921

  was:
h2. Issue

Protobuf supports unsigned integer types, including `uint32` and `uint64`. When 
deserializing protobuf values with fields of these types, uint32 is converted 
to `IntegerType` and uint64 is converted to `LongType` in the resulting spark 
struct. `IntegerType` and `LongType` are 
[signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer 
types, so this can lead to confusing results.

Namely, if a uint32 value in a stored proto is above 2^31 or a uint64 value is 
above 2^63, their representation in binary will contain a 1 in the highest bit, 
which when interpreted as a signed integer will come out as negative (I.e. 
overflow).

I propose that we deserialize unsigned integer types into a type that can 
contain them correctly, e.g.
uint32 => `LongType`
uint64 => `Decimal(20, 0)`

h2. Backwards Compatibility / Default Behavior
Should we maintain backwards compatibility and we add an option that allows 
deserializing these types differently? Or should we change change the default 
behavior (with an option to go back to the old way)? 

I think by default it makes more sense to deserialize them as the larger types 
so that it's semantically more correct. However, there may be existing users of 
this library that would be affected by this behavior change. Though, maybe we 
can justify the change since the function is tagged as `Experimental` (and 
spark 3.4.0 was only released very recently).

h2. Precedent
I believe that unsigned integer types in parquet are deserialized in a similar 
manner, i.e. put into a larger type so that the unsigned representation 
natively fits. https://issues.apache.org/jira/browse/SPARK-34817 and 
https://github.com/apache/spark/pull/31921


> Unsigned integer types are deserialized as signed numeric equivalents
> -
>
> Key: SPARK-43427
> URL: https://issues.apache.org/jira/browse/SPARK-43427
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Parth Upadhyay
>Priority: Major
>
> I'm not sure if "bug" is the correct tag for this jira, but i've tagged it 
> like that for now since the behavior seems odd, happy to update to 
> "improvement" or something else based on the conversation!
> h2. Issue
> Protobuf supports unsigned integer types, including `uint32` and `uint64`. 
> When deserializing protobuf values with fields of these types, uint32 is 
> converted to `IntegerType` and uint64 is converted to `LongType` in the 
> resulting spark struct. `IntegerType` and `LongType` are 
> [signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer 
> types, so this can lead to confusing results.
> Namely, if a uint32 value in a stored proto 

[jira] [Commented] (SPARK-43357) Spark AWS Glue date partition push down broken

2023-05-10 Thread Stijn De Haes (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721219#comment-17721219
 ] 

Stijn De Haes commented on SPARK-43357:
---

Any chance we could backport this to older versions? 3.1, 3.2, 3.4 all have the 
same issue. I don't know which versions are actively supported?
I am willing to make new PR's to these older versions if needed.

> Spark AWS Glue date partition push down broken
> --
>
> Key: SPARK-43357
> URL: https://issues.apache.org/jira/browse/SPARK-43357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2
>Reporter: Stijn De Haes
>Assignee: Stijn De Haes
>Priority: Major
> Fix For: 3.5.0
>
>
> When using the following project: 
> [https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore]
> To have glue supported as as a hive metastore for spark there is an issue 
> when reading a date-partitioned data set. Writing is fine.
> You get the following error: 
> {quote}org.apache.hadoop.hive.metastore.api.InvalidObjectException: 
> Unsupported expression '2023 - 05 - 03' (Service: AWSGlue; Status Code: 400; 
> Error Code: InvalidInputException; Request ID: 
> beed68c6-b228-442e-8783-52c25b9d2243; Proxy: null)
> {quote}
>  
> A fix for this is making sure the date passed to glue is quoted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43425) Add TimestampNTZType to ColumnarBatchRow

2023-05-10 Thread Fokko Driesprong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated SPARK-43425:
-
Issue Type: Bug  (was: Improvement)

> Add TimestampNTZType to ColumnarBatchRow
> 
>
> Key: SPARK-43425
> URL: https://issues.apache.org/jira/browse/SPARK-43425
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43400) Add Primary Key syntax support

2023-05-10 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43400:
--
Summary: Add Primary Key syntax support  (was: create table support the 
PRIMARY KEY keyword)

> Add Primary Key syntax support
> --
>
> Key: SPARK-43400
> URL: https://issues.apache.org/jira/browse/SPARK-43400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> apache paimon and hudi support primary key definitions. It is necessary to 
> support the primary key definition syntax
> https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties
> [~gurwls223] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org