[jira] [Commented] (SPARK-44280) Add convertJavaTimestampToTimestamp in JDBCDialect API

2023-07-05 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740393#comment-17740393
 ] 

Snoot.io commented on SPARK-44280:
--

User 'mingkangli-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/41843

> Add convertJavaTimestampToTimestamp in JDBCDialect API
> --
>
> Key: SPARK-44280
> URL: https://issues.apache.org/jira/browse/SPARK-44280
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Mingkang Li
>Priority: Major
>
> A new method, {{{}convertJavaTimestampToTimestamp{}}}, is introduced to the 
> JDBCDialects API, providing the capability for JDBC dialects to override the 
> default Java timestamp conversion behavior. This enhancement is particularly 
> beneficial for databases such as PostgreSQL, which feature special values for 
> timestamps representing positive and negative infinity. 
> The pre-existing default behavior of timestamp conversion potentially 
> triggers an overflow due to these special values (i.e. The executor would 
> crash if you select a column that contains infinity timestamps in 
> PostgreSQL.) By integrating this new function, we can mitigate such issues, 
> enabling more versatile and robust timestamp value conversions across various 
> JDBC-based connectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44317) Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec

2023-07-05 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740392#comment-17740392
 ] 

Snoot.io commented on SPARK-44317:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/41875

> Define the computing logic through PartitionEvaluator API and use it in 
> ShuffledHashJoinExec
> 
>
> Key: SPARK-44317
> URL: https://issues.apache.org/jira/browse/SPARK-44317
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> ShuffledHashJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44317) Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec

2023-07-05 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740391#comment-17740391
 ] 

Snoot.io commented on SPARK-44317:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/41875

> Define the computing logic through PartitionEvaluator API and use it in 
> ShuffledHashJoinExec
> 
>
> Key: SPARK-44317
> URL: https://issues.apache.org/jira/browse/SPARK-44317
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> ShuffledHashJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44317) Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec

2023-07-05 Thread Vinod KC (Jira)
Vinod KC created SPARK-44317:


 Summary: Define the computing logic through PartitionEvaluator API 
and use it in ShuffledHashJoinExec
 Key: SPARK-44317
 URL: https://issues.apache.org/jira/browse/SPARK-44317
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Define the computing logic through PartitionEvaluator API and use it in 
ShuffledHashJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44268) Add tests to ensure error-classes.json and docs are in sync

2023-07-05 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740388#comment-17740388
 ] 

Snoot.io commented on SPARK-44268:
--

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41865

> Add tests to ensure error-classes.json and docs are in sync
> ---
>
> Key: SPARK-44268
> URL: https://issues.apache.org/jira/browse/SPARK-44268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> We should add tests to ensure error-classes.json and docs are in sync, docs 
> and error-classes.json are always up to date before the PR is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`

2023-07-05 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740386#comment-17740386
 ] 

Snoot.io commented on SPARK-44314:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/41872

> Add a new checkstyle rule to prohibit the use of `@Test(expected = 
> SomeException.class)`
> 
>
> Key: SPARK-44314
> URL: https://issues.apache.org/jira/browse/SPARK-44314
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://github.com/junit-team/junit4/wiki/Exception-testing]
>  
> {code:java}
> The expected parameter should be used with care. The above test will pass if 
> any code in the method throws IndexOutOfBoundsException. Using the method you 
> also cannot test the value of the message in the exception, or the state of a 
> domain object after the exception has been thrown.For these reasons, the 
> previous approaches are recommended. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44316) Upgrade Jersey to 2.40

2023-07-05 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44316:
---

 Summary: Upgrade Jersey to 2.40
 Key: SPARK-44316
 URL: https://issues.apache.org/jira/browse/SPARK-44316
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44252) Add error class for the case when loading state from DFS fails

2023-07-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740381#comment-17740381
 ] 

Hudson commented on SPARK-44252:


User 'lucyyao-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/41705

> Add error class for the case when loading state from DFS fails
> --
>
> Key: SPARK-44252
> URL: https://issues.apache.org/jira/browse/SPARK-44252
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Lucy Yao
>Priority: Major
>
> This is part of [https://github.com/apache/spark/pull/41705.]
> Wrap the exception during the loading state, to assign error class properly. 
> With assigning error class, we can classify the errors which help us to 
> determine what errors customers are struggling much. 
> StateStoreProvider.getStore() & StateStoreProvider.getReadStore() is the 
> entry point.
> This ticket also covers failedToReadDeltaFileError and 
> failedToReadSnapshotFileError from 
> [https://issues.apache.org/jira/browse/SPARK-36305|http://example.com/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44315) Move DefinedByConstructorParams to sql/api

2023-07-05 Thread Rui Wang (Jira)
Rui Wang created SPARK-44315:


 Summary: Move DefinedByConstructorParams to sql/api
 Key: SPARK-44315
 URL: https://issues.apache.org/jira/browse/SPARK-44315
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]

2023-07-05 Thread Mike K (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740380#comment-17740380
 ] 

Mike K commented on SPARK-44303:


User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41863

> Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]
> --
>
> Key: SPARK-44303
> URL: https://issues.apache.org/jira/browse/SPARK-44303
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`

2023-07-05 Thread Yang Jie (Jira)
Yang Jie created SPARK-44314:


 Summary: Add a new checkstyle rule to prohibit the use of 
`@Test(expected = SomeException.class)`
 Key: SPARK-44314
 URL: https://issues.apache.org/jira/browse/SPARK-44314
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Yang Jie


[https://github.com/junit-team/junit4/wiki/Exception-testing]

 
{code:java}
The expected parameter should be used with care. The above test will pass if 
any code in the method throws IndexOutOfBoundsException. Using the method you 
also cannot test the value of the message in the exception, or the state of a 
domain object after the exception has been thrown.For these reasons, the 
previous approaches are recommended. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44313) Generated column expression validation fails if there is a char/varchar column anywhere in the schema

2023-07-05 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44313.
--
Fix Version/s: 3.5.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 41868
[https://github.com/apache/spark/pull/41868]

> Generated column expression validation fails if there is a char/varchar 
> column anywhere in the schema
> -
>
> Key: SPARK-44313
> URL: https://issues.apache.org/jira/browse/SPARK-44313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Allison Portis
>Assignee: Allison Portis
>Priority: Major
> Fix For: 3.5.0, 3.4.2
>
>
> When validating generated column expressions, this call 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala#L123
>  to checkAnalysis fails when there are char or varchar columns anywhere in 
> the schema.
>  
> For example, this query will fail
> {code:java}
> CREATE TABLE default.example (
> name VARCHAR(64),
> tstamp TIMESTAMP,
> tstamp_date DATE GENERATED ALWAYS AS (CAST(tstamp as DATE))
> ){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44313) Generated column expression validation fails if there is a char/varchar column anywhere in the schema

2023-07-05 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44313:


Assignee: Allison Portis

> Generated column expression validation fails if there is a char/varchar 
> column anywhere in the schema
> -
>
> Key: SPARK-44313
> URL: https://issues.apache.org/jira/browse/SPARK-44313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Allison Portis
>Assignee: Allison Portis
>Priority: Major
>
> When validating generated column expressions, this call 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala#L123
>  to checkAnalysis fails when there are char or varchar columns anywhere in 
> the schema.
>  
> For example, this query will fail
> {code:java}
> CREATE TABLE default.example (
> name VARCHAR(64),
> tstamp TIMESTAMP,
> tstamp_date DATE GENERATED ALWAYS AS (CAST(tstamp as DATE))
> ){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44215) Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks

2023-07-05 Thread Mridul Muralidharan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740368#comment-17740368
 ] 

Mridul Muralidharan commented on SPARK-44215:
-

Issue resolved by pull request 41762
https://github.com/apache/spark/pull/41762

> Client receives zero number of chunks in merge meta response which doesn't 
> trigger fallback to unmerged blocks
> --
>
> Key: SPARK-44215
> URL: https://issues.apache.org/jira/browse/SPARK-44215
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.3.3, 3.5.0, 3.4.2
>
>
> We still see instances of the server returning 0 {{numChunks}} in 
> {{mergedMetaResponse}} which causes the executor to fail with 
> {{ArithmeticException}}. 
> {code}
> java.lang.ArithmeticException: / by zero
>   at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> {code}
> Here the executor doesn't fallback to fetch un-merged blocks and this also 
> doesn't result in a {{FetchFailure}}. So, the application fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44215) Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks

2023-07-05 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-44215:

Fix Version/s: 3.3.3

> Client receives zero number of chunks in merge meta response which doesn't 
> trigger fallback to unmerged blocks
> --
>
> Key: SPARK-44215
> URL: https://issues.apache.org/jira/browse/SPARK-44215
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.3.3, 3.5.0, 3.4.2
>
>
> We still see instances of the server returning 0 {{numChunks}} in 
> {{mergedMetaResponse}} which causes the executor to fail with 
> {{ArithmeticException}}. 
> {code}
> java.lang.ArithmeticException: / by zero
>   at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> {code}
> Here the executor doesn't fallback to fetch un-merged blocks and this also 
> doesn't result in a {{FetchFailure}}. So, the application fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44154) Bitmap functions

2023-07-05 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44154.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41623
[https://github.com/apache/spark/pull/41623]

> Bitmap functions
> 
>
> Key: SPARK-44154
> URL: https://issues.apache.org/jira/browse/SPARK-44154
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Entong Shen
>Priority: Major
> Fix For: 3.5.0
>
>
> Implemented bitmap functions. The functions are:
>  * {{{}bitmap_bucket_number(){}}}: returns the bucket number for a given 
> input number
>  * {{{}bitmap_bit_position(){}}}: returns bit position for a given input 
> number
>  * {{{}bitmap_count(){}}}: returns the number of set bits from an input bitmap
>  * {{{}bitmap_construct_agg(){}}}: aggregation function that aggregates input 
> bit positions, and creates a bitmap
>  * {{{}bitmap_or_agg(){}}}: aggregation function that performs a bitwise OR 
> on all the input bitmaps



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44310) The Connect Server startup log should display the hostname and port

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44310:


Assignee: BingKun Pan

> The Connect Server startup log should display the hostname and port
> ---
>
> Key: SPARK-44310
> URL: https://issues.apache.org/jira/browse/SPARK-44310
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44310) The Connect Server startup log should display the hostname and port

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44310.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41862
[https://github.com/apache/spark/pull/41862]

> The Connect Server startup log should display the hostname and port
> ---
>
> Key: SPARK-44310
> URL: https://issues.apache.org/jira/browse/SPARK-44310
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44313) Generated column expression validation fails if there is a char/varchar column anywhere in the schema

2023-07-05 Thread Allison Portis (Jira)
Allison Portis created SPARK-44313:
--

 Summary: Generated column expression validation fails if there is 
a char/varchar column anywhere in the schema
 Key: SPARK-44313
 URL: https://issues.apache.org/jira/browse/SPARK-44313
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1, 3.4.0
Reporter: Allison Portis


When validating generated column expressions, this call 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala#L123
 to checkAnalysis fails when there are char or varchar columns anywhere in the 
schema.

 

For example, this query will fail
{code:java}
CREATE TABLE default.example (
name VARCHAR(64),
tstamp TIMESTAMP,
tstamp_date DATE GENERATED ALWAYS AS (CAST(tstamp as DATE))
){code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44281) Move QueryCompilation error that used by DataType to sql/api as DataTypeErrors

2023-07-05 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved SPARK-44281.
--
Resolution: Fixed

> Move QueryCompilation error that used by DataType to sql/api as DataTypeErrors
> --
>
> Key: SPARK-44281
> URL: https://issues.apache.org/jira/browse/SPARK-44281
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44312) [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent

2023-07-05 Thread Robert Dillitz (Jira)
Robert Dillitz created SPARK-44312:
--

 Summary: [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT 
environment variable for the user agent
 Key: SPARK-44312
 URL: https://issues.apache.org/jira/browse/SPARK-44312
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.1
Reporter: Robert Dillitz


Allow us to prepend a Spark Connect user agent with an environment variable: 
*SPARK_CONNECT_USER_AGENT*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server

2023-07-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-43416.
---
Fix Version/s: 3.5.0
 Assignee: Niranjan Jayakar
   Resolution: Fixed

> Fix the bug where the ProduceEncoder#tuples fields names are different from 
> server
> --
>
> Key: SPARK-43416
> URL: https://issues.apache.org/jira/browse/SPARK-43416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Assignee: Niranjan Jayakar
>Priority: Major
> Fix For: 3.5.0
>
>
> The fields are named _1, _2, ... etc. However on the server side it could be 
> nicely named in agg operations such as key, value etc. Fix this if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44282) Split of DataType parsing for Connect

2023-07-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44282.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Split of DataType parsing for Connect
> -
>
> Key: SPARK-44282
> URL: https://issues.apache.org/jira/browse/SPARK-44282
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server

2023-07-05 Thread Zhen Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740312#comment-17740312
 ] 

Zhen Li commented on SPARK-43416:
-

[~hvanhovell] Yes. Fixed by https://github.com/apache/spark/pull/41846

> Fix the bug where the ProduceEncoder#tuples fields names are different from 
> server
> --
>
> Key: SPARK-43416
> URL: https://issues.apache.org/jira/browse/SPARK-43416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> The fields are named _1, _2, ... etc. However on the server side it could be 
> nicely named in agg operations such as key, value etc. Fix this if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server

2023-07-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740302#comment-17740302
 ] 

Herman van Hövell commented on SPARK-43416:
---

[~zhenli] has this been fixed?

> Fix the bug where the ProduceEncoder#tuples fields names are different from 
> server
> --
>
> Key: SPARK-43416
> URL: https://issues.apache.org/jira/browse/SPARK-43416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> The fields are named _1, _2, ... etc. However on the server side it could be 
> nicely named in agg operations such as key, value etc. Fix this if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44291) [CONNECT][SCALA] range query returns incorrect schema

2023-07-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44291.
---
Fix Version/s: 3.5.0
 Assignee: Niranjan Jayakar
   Resolution: Fixed

> [CONNECT][SCALA] range query returns incorrect schema
> -
>
> Key: SPARK-44291
> URL: https://issues.apache.org/jira/browse/SPARK-44291
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
> Fix For: 3.5.0
>
>
> The following code on Spark Connect produces the following output
> Code:
>  
> {code:java}
> val df = spark.range(3)
> df.show()
> df.printSchema(){code}
>  
> Output:
> {code:java}
> +---+
> | id|
> +---+
> |  0|
> |  1|
> |  2|
> +---+
> root
>  |-- value: long (nullable = true) {code}
> The mismatch is that one shows the column as "id" while the other shows this 
> as "value".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44311) UDF should support function taking value classes

2023-07-05 Thread Emil Ejbyfeldt (Jira)
Emil Ejbyfeldt created SPARK-44311:
--

 Summary: UDF should support function taking value classes
 Key: SPARK-44311
 URL: https://issues.apache.org/jira/browse/SPARK-44311
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: Emil Ejbyfeldt


Running the following code in a spark 
```
final case class ValueClass(a: Int) extends AnyVal
final case class Wrapper(v: ValueClass)

val f = udf((a: ValueClass) => a.a > 0)

spark.createDataset(Seq(Wrapper(ValueClass(1.filter(f(col("v"))).show()
```

fails with
```
java.lang.ClassCastException: class org.apache.spark.sql.types.IntegerType$ 
cannot be cast to class org.apache.spark.sql.types.StructType 
(org.apache.spark.sql.types.IntegerType$ and 
org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$220(Analyzer.scala:3241)
  at scala.Option.map(Option.scala:242)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$219(Analyzer.scala:3239)
  at scala.collection.immutable.List.map(List.scala:246)
  at scala.collection.immutable.List.map(List.scala:79)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3237)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3234)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:566)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:566)
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42554) Spark Connect Scala Client

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42554:
-
Fix Version/s: (was: 3.5.0)

> Spark Connect Scala Client
> --
>
> Key: SPARK-42554
> URL: https://issues.apache.org/jira/browse/SPARK-42554
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> This is the EPIC to track all the work for the Spark Connect Scala Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-42554) Spark Connect Scala Client

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-42554:
--

> Spark Connect Scala Client
> --
>
> Key: SPARK-42554
> URL: https://issues.apache.org/jira/browse/SPARK-42554
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> This is the EPIC to track all the work for the Spark Connect Scala Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44193) Implement GRPC exceptions interception for conversion

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44193.
--
  Assignee: Yihong He
Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/41743

> Implement GRPC exceptions interception for conversion
> -
>
> Key: SPARK-44193
> URL: https://issues.apache.org/jira/browse/SPARK-44193
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44310) The Connect Server startup log should display the hostname and port

2023-07-05 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44310:
---

 Summary: The Connect Server startup log should display the 
hostname and port
 Key: SPARK-44310
 URL: https://issues.apache.org/jira/browse/SPARK-44310
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44299) Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]

2023-07-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740126#comment-17740126
 ] 

ASF GitHub Bot commented on SPARK-44299:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41858

> Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
> -
>
> Key: SPARK-44299
> URL: https://issues.apache.org/jira/browse/SPARK-44299
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44193) Implement GRPC exceptions interception for conversion

2023-07-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740123#comment-17740123
 ] 

ASF GitHub Bot commented on SPARK-44193:


User 'heyihong' has created a pull request for this issue:
https://github.com/apache/spark/pull/41743

> Implement GRPC exceptions interception for conversion
> -
>
> Key: SPARK-44193
> URL: https://issues.apache.org/jira/browse/SPARK-44193
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yihong He
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42554) Spark Connect Scala Client

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42554.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41743
[https://github.com/apache/spark/pull/41743]

> Spark Connect Scala Client
> --
>
> Key: SPARK-42554
> URL: https://issues.apache.org/jira/browse/SPARK-42554
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> This is the EPIC to track all the work for the Spark Connect Scala Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44309) Display Add/Remove Time of Executors on ExecutorsTab

2023-07-05 Thread Kent Yao (Jira)
Kent Yao created SPARK-44309:


 Summary: Display Add/Remove Time of Executors on ExecutorsTab
 Key: SPARK-44309
 URL: https://issues.apache.org/jira/browse/SPARK-44309
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.5.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44294) HeapHistogram column shows unexpectedly w/ select-all-box

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44294:


Assignee: Kent Yao

> HeapHistogram column shows unexpectedly w/ select-all-box
> -
>
> Key: SPARK-44294
> URL: https://issues.apache.org/jira/browse/SPARK-44294
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44294) HeapHistogram column shows unexpectedly w/ select-all-box

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44294.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41847
[https://github.com/apache/spark/pull/41847]

> HeapHistogram column shows unexpectedly w/ select-all-box
> -
>
> Key: SPARK-44294
> URL: https://issues.apache.org/jira/browse/SPARK-44294
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44305) Broadcast operation is not required when no parameters are specified

2023-07-05 Thread 7mming7 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740018#comment-17740018
 ] 

7mming7 edited comment on SPARK-44305 at 7/5/23 7:35 AM:
-

cc [~yuming]  [~r...@databricks.com] 


was (Author: 7mming7):
cc [~yuming]  

> Broadcast operation is not required when no parameters are specified
> 
>
> Key: SPARK-44305
> URL: https://issues.apache.org/jira/browse/SPARK-44305
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: 7mming7
>Priority: Minor
> Attachments: image-2023-07-05-11-51-41-708.png
>
>
> The ability introduced by SPARK-14912, we can broadcast the parameters of the 
> data source to the read and write operations, but if the user does not 
> specify a specific parameter, the propagation operation will also be 
> performed, which affects the performance has a greater impact, so we need to 
> avoid broadcasting the full Hadoop parameters when the user does not specify 
> a specific parameter
>  
> !image-2023-07-05-11-51-41-708.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT

2023-07-05 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740048#comment-17740048
 ] 

Max Gekk commented on SPARK-43438:
--

Current behaviour on the recent OSS master:

{code:sql}
spark-sql (default)> CREATE TABLE tabtest(c1 INT, c2 INT);
spark-sql (default)> INSERT INTO tabtest SELECT 1;
spark-sql (default)> select * from tabtest;
1   NULL
spark-sql (default)> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
[INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS] Cannot write to 
`spark_catalog`.`default`.`tabtest`, the reason is too many data columns:
Table columns: `c1`.
Data columns: `1`, `2`, `3`.
{code}

[~srielau] Are ok with such behaviour?

> Fix mismatched column list error on INSERT
> --
>
> Key: SPARK-43438
> URL: https://issues.apache.org/jira/browse/SPARK-43438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size() and data column size()."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> " requires that the data to be inserted have the same number of 
> columns as the target table: target table has  column(s) but 
> the inserted data has  column(s), including  
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44277) Upgrade Avro to version 1.11.2

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44277:


Assignee: Ismaël Mejía

> Upgrade Avro to version 1.11.2
> --
>
> Key: SPARK-44277
> URL: https://issues.apache.org/jira/browse/SPARK-44277
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44277) Upgrade Avro to version 1.11.2

2023-07-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44277.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41830
[https://github.com/apache/spark/pull/41830]

> Upgrade Avro to version 1.11.2
> --
>
> Key: SPARK-44277
> URL: https://issues.apache.org/jira/browse/SPARK-44277
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44308) Spark 3.0.1 functions.scala -> posexplode_outer API not flattening data

2023-07-05 Thread Chirag Sanghvi (Jira)
Chirag Sanghvi created SPARK-44308:
--

 Summary: Spark 3.0.1 functions.scala -> posexplode_outer API not 
flattening data
 Key: SPARK-44308
 URL: https://issues.apache.org/jira/browse/SPARK-44308
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.0.1
Reporter: Chirag Sanghvi


Spark 3.x API functions.scala -> posexplode_outer to flatten the array column 
value doesn't work as expected when the table is created with 
"collection.delim" set to non default value.

This used to work as expected in Spark 2.4.5

 

Use the below DDL to create a hive table 

CREATE EXTERNAL TABLE `testnorm2`(
`enquiryuid` string,
`rulestriggered` array)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'collection.delim'=';',
'field.delim'='|',
'line.delim'='\n',
'serialization.format'='|')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

 

And fill up with the table with array values.

The below statements fill up the table with sample data.

 

INSERT INTO testnorm2 SELECT 'A', array('a','b');
INSERT INTO testnorm2 SELECT 'B', array('e','f','g','h');
INSERT INTO testnorm2 SELECT 'C', array();
INSERT INTO testnorm2 SELECT 'D', array('');
INSERT INTO testnorm2 SELECT 'E', array('','');
INSERT INTO testnorm2 SELECT 'F', array('','1','2');
INSERT INTO testnorm2 SELECT 'G', array(null);
INSERT INTO testnorm2 SELECT 'H', array(null,'');
INSERT INTO testnorm2 SELECT 'I', array(null,'4','5','6');
INSERT INTO testnorm2 SELECT 'G', array("");

 


Open Spark Shell (in spark 3.0.1)

and run below scala code statements


val df = spark.sql("select * from testnorm2");

 

the df.show () gives this output in both cases(spark 2.4 and spark 3.0.1).

+--+--+
|enquiryuid|data          |
+--+--+
|         I|   [, 4, 5, 6]|
|         F|      [, 1, 2]|
|         B|  [e, f, g, h]|
|         A|        [a, b]|
|         H|          [, ]|
|         E|          [, ]|
|         G|          null|
|         G|            []|
|         D|            []|
|         C|            []|
+--+--+


val explodeDF = df.select($"id",(posexplode_outer($"data"));

 

on doing this there is a difference in output for 2.4 and spark 3.0.1

on 2.4.x the output is

+--+++
|enquiryuid| pos| col|
+--+++
|         I|   0|null|
|         I|   1|   4|
|         I|   2|   5|
|         I|   3|   6|
|         F|   0|    |
|         F|   1|   1|
|         F|   2|   2|
|         B|   0|   e|
|         B|   1|   f|
|         B|   2|   g|
|         B|   3|   h|
|         A|   0|   a|
|         A|   1|   b|
|         H|   0|null|
|         H|   1|    |
|         E|   0|    |
|         E|   1|    |
|         G|null|null|
|         G|null|null|
|         D|null|null|
+--+++

Whereas  in 3.x the output is
+--+++
|enquiryuid| pos|     col|
+--+++
|         I|   0|\N,4,5,6|
|         F|   0|    ,1,2|
|         1|   0|     a,b|
|         C|null|    null|
|         G|null|    null|
|         B|   0| e,f,g,h|
|         H|   0|     \N,|
|         G|null|    null|
|         E|   0|       ,|
|         D|null|    null|
+--+++


The array in column 2 is not getting flattened in the case of spark 3.0.1 but 
in spark 2.4.5 it gets flattened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org