[jira] [Assigned] (SPARK-40358) Migrate collection type check failures onto error classes

2022-10-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40358:


Assignee: Shaokang Lv

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Shaokang Lv
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40358) Migrate collection type check failures onto error classes

2022-10-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40358.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38197
[https://github.com/apache/spark/pull/38197]

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Shaokang Lv
>Priority: Major
> Fix For: 3.4.0
>
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-10 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615459#comment-17615459
 ] 

Max Gekk commented on SPARK-37945:
--

[~khalidmammad...@gmail.com] Sure, go ahead.

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40707) Add groupby to connect DSL and test more than one grouping expressions

2022-10-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40707:
---

Assignee: Rui Wang

> Add groupby to connect DSL and test more than one grouping expressions
> --
>
> Key: SPARK-40707
> URL: https://issues.apache.org/jira/browse/SPARK-40707
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40707) Add groupby to connect DSL and test more than one grouping expressions

2022-10-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40707.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38155
[https://github.com/apache/spark/pull/38155]

> Add groupby to connect DSL and test more than one grouping expressions
> --
>
> Key: SPARK-40707
> URL: https://issues.apache.org/jira/browse/SPARK-40707
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40722) How to set BlockManager info of hostname as ipaddress

2022-10-10 Thread Chen Xia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615450#comment-17615450
 ] 

Chen Xia commented on SPARK-40722:
--

I know it can be used spark.driver.host  to set this configuration 

 

> How to set  BlockManager info of hostname as ipaddress
> --
>
> Key: SPARK-40722
> URL: https://issues.apache.org/jira/browse/SPARK-40722
> Project: Spark
>  Issue Type: Question
>  Components: Block Manager
>Affects Versions: 2.4.3
>Reporter: Chen Xia
>Priority: Major
>
>  
> {code:java}
> 2022-10-09 17:22:42.517 [INFO ] [YARN application state monitor          ] 
> o.a.s.u.SparkUI (54) [logInfo] - Stopped Spark web UI at 
> http://linkis-demo-cg-engineconnmanager-76778ff4b5-sf9xz.linkis-demo-cg-engineconnmanager-headless.linkis.svc.cluster.local:4040
> 2022-10-09 17:46:09.854 [INFO ] [main                                    ] 
> o.a.s.s.BlockManager (54) [logInfo] - Initialized BlockManager: 
> BlockManagerId(driver, 
> linkis-demo-cg-engineconnmanager-76778ff4b5-sf9xz.linkis-demo-cg-engineconnmanager-headless.linkis.svc.cluster.local,
>  38798, None) 
> {code}
> I want to  repleace 
> canonicalHostName(linkis-demo-cg-engineconnmanager-76778ff4b5-sf9xz.linkis-demo-cg-engineconnmanager-headless.linkis.svc.cluster.loca)
>  with ipaddress such as 10.10.10.10  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support

2022-10-10 Thread Mohan Parthasarathy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615440#comment-17615440
 ] 

Mohan Parthasarathy commented on SPARK-40658:
-

[~sanysand...@gmail.com] 
 # I wanted to write about "required" fields (that exists only in proto2) and 
wrongly mentioned Optional fields. When deserializing, we already check for 
"required" field. When serializing and row is null, the current code is a bit 
complex for me to understand as to what happens when we encounter a "required" 
field. Also, there is also one more place in the current code we check for 
"required" field in "structFieldFor" which sets nullable. We need some test 
cases with proto2 messages.
 # Custom default values should not affect the current logic ?
 # how does this affect the current logic ? It should be transparent to us, 
right ?
 # Currently we assume UTF8 in the code, right ? Would that fail if we receive 
a proto2 message ?
 # Yes, i ran some basic tests by converting the current tests with proto2 
messages and tests pass

 If we can get away without specifying V2 or V3 or ANY, that would be the 
simplest. 

> Protobuf v2 & v3 support
> 
>
> Key: SPARK-40658
> URL: https://issues.apache.org/jira/browse/SPARK-40658
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We want to ensure Protobuf functions support both Protobuf version 2 and 
> version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615437#comment-17615437
 ] 

Apache Spark commented on SPARK-40739:
--

User 'philwalk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38167

> "sbt packageBin" fails in cygwin or other windows bash session
> --
>
> Key: SPARK-40739
> URL: https://issues.apache.org/jira/browse/SPARK-40739
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_sbt_* is started from 
> a (non-WSL) bash session.
> See the spark PR link for detailed symptoms.
>Reporter: Phil Walker
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
>  In a Windows _*SHELL*_ environment, such as _*cygwin*_ or 
> {_}*msys2/mingw64*{_}, etc,  _*Core.settings*_ in 
> _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is 
> present (typically at {_}*C:\Windows*{_}), causing a build failure.  This 
> occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ 
> bash.exe.
> This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167]
> There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ :
>  * determine the absolute path of the first bash.exe in the command line. 
>  * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.)
>  * For Windows SHELL environments, the first argument to the spawned Process 
> is changed from "bash" to the absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40738:


Assignee: (was: Apache Spark)

> spark-shell fails with "bad array subscript" in cygwin or msys bash session
> ---
>
> Key: SPARK-40738
> URL: https://issues.apache.org/jira/browse/SPARK-40738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_spark-shell_* is 
> called from a bash session.
> NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since 
> they call spark-shell.
>Reporter: Phil Walker
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] 
> fixes this issue, and also fixes a build error that is also related to 
> _*cygwin*_  and *msys/mingw* bash *sbt* sessions.
> If a Windows user tries to start a *_spark-shell_* session by calling the 
> bash script (rather than the *_spark-shell.cmd_* script), it fails with a 
> confusing error message.  Script _*spark-class*_ calls 
> _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate 
> command line arguments, but the launcher produces a format appropriate to the 
> *_.cmd_* version of the script rather than the _*bash*_ version.
> The launcher Main method, when called for environments other than Windows, 
> interleaves NULL characters between the command line arguments.   It should 
> also do so in Windows when called from the bash script.  It incorrectly 
> assumes that if the OS is Windows, that it is being called by the .cmd 
> version of the script.
> The resulting error message is unhelpful:
>  
> {code:java}
> [lots of ugly stuff omitted]
> /opt/spark/bin/spark-class: line 100: CMD: bad array subscript
> {code}
> The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ 
> session is that the _*SHELL*_ environment variable is set.   This will 
> normally be set in any of the various Windows shell environments 
> ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally 
> be set in Windows environments.   In the _*spark-class.cmd*_ script, 
> _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users 
> to call the _*.cmd*_ scripts if they prefer (it will still work as before).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40738:


Assignee: Apache Spark

> spark-shell fails with "bad array subscript" in cygwin or msys bash session
> ---
>
> Key: SPARK-40738
> URL: https://issues.apache.org/jira/browse/SPARK-40738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_spark-shell_* is 
> called from a bash session.
> NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since 
> they call spark-shell.
>Reporter: Phil Walker
>Assignee: Apache Spark
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] 
> fixes this issue, and also fixes a build error that is also related to 
> _*cygwin*_  and *msys/mingw* bash *sbt* sessions.
> If a Windows user tries to start a *_spark-shell_* session by calling the 
> bash script (rather than the *_spark-shell.cmd_* script), it fails with a 
> confusing error message.  Script _*spark-class*_ calls 
> _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate 
> command line arguments, but the launcher produces a format appropriate to the 
> *_.cmd_* version of the script rather than the _*bash*_ version.
> The launcher Main method, when called for environments other than Windows, 
> interleaves NULL characters between the command line arguments.   It should 
> also do so in Windows when called from the bash script.  It incorrectly 
> assumes that if the OS is Windows, that it is being called by the .cmd 
> version of the script.
> The resulting error message is unhelpful:
>  
> {code:java}
> [lots of ugly stuff omitted]
> /opt/spark/bin/spark-class: line 100: CMD: bad array subscript
> {code}
> The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ 
> session is that the _*SHELL*_ environment variable is set.   This will 
> normally be set in any of the various Windows shell environments 
> ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally 
> be set in Windows environments.   In the _*spark-class.cmd*_ script, 
> _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users 
> to call the _*.cmd*_ scripts if they prefer (it will still work as before).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615439#comment-17615439
 ] 

Apache Spark commented on SPARK-40738:
--

User 'philwalk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38167

> spark-shell fails with "bad array subscript" in cygwin or msys bash session
> ---
>
> Key: SPARK-40738
> URL: https://issues.apache.org/jira/browse/SPARK-40738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_spark-shell_* is 
> called from a bash session.
> NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since 
> they call spark-shell.
>Reporter: Phil Walker
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] 
> fixes this issue, and also fixes a build error that is also related to 
> _*cygwin*_  and *msys/mingw* bash *sbt* sessions.
> If a Windows user tries to start a *_spark-shell_* session by calling the 
> bash script (rather than the *_spark-shell.cmd_* script), it fails with a 
> confusing error message.  Script _*spark-class*_ calls 
> _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate 
> command line arguments, but the launcher produces a format appropriate to the 
> *_.cmd_* version of the script rather than the _*bash*_ version.
> The launcher Main method, when called for environments other than Windows, 
> interleaves NULL characters between the command line arguments.   It should 
> also do so in Windows when called from the bash script.  It incorrectly 
> assumes that if the OS is Windows, that it is being called by the .cmd 
> version of the script.
> The resulting error message is unhelpful:
>  
> {code:java}
> [lots of ugly stuff omitted]
> /opt/spark/bin/spark-class: line 100: CMD: bad array subscript
> {code}
> The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ 
> session is that the _*SHELL*_ environment variable is set.   This will 
> normally be set in any of the various Windows shell environments 
> ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally 
> be set in Windows environments.   In the _*spark-class.cmd*_ script, 
> _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users 
> to call the _*.cmd*_ scripts if they prefer (it will still work as before).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615438#comment-17615438
 ] 

Apache Spark commented on SPARK-40738:
--

User 'philwalk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38167

> spark-shell fails with "bad array subscript" in cygwin or msys bash session
> ---
>
> Key: SPARK-40738
> URL: https://issues.apache.org/jira/browse/SPARK-40738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_spark-shell_* is 
> called from a bash session.
> NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since 
> they call spark-shell.
>Reporter: Phil Walker
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] 
> fixes this issue, and also fixes a build error that is also related to 
> _*cygwin*_  and *msys/mingw* bash *sbt* sessions.
> If a Windows user tries to start a *_spark-shell_* session by calling the 
> bash script (rather than the *_spark-shell.cmd_* script), it fails with a 
> confusing error message.  Script _*spark-class*_ calls 
> _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate 
> command line arguments, but the launcher produces a format appropriate to the 
> *_.cmd_* version of the script rather than the _*bash*_ version.
> The launcher Main method, when called for environments other than Windows, 
> interleaves NULL characters between the command line arguments.   It should 
> also do so in Windows when called from the bash script.  It incorrectly 
> assumes that if the OS is Windows, that it is being called by the .cmd 
> version of the script.
> The resulting error message is unhelpful:
>  
> {code:java}
> [lots of ugly stuff omitted]
> /opt/spark/bin/spark-class: line 100: CMD: bad array subscript
> {code}
> The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ 
> session is that the _*SHELL*_ environment variable is set.   This will 
> normally be set in any of the various Windows shell environments 
> ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally 
> be set in Windows environments.   In the _*spark-class.cmd*_ script, 
> _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users 
> to call the _*.cmd*_ scripts if they prefer (it will still work as before).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40739:


Assignee: (was: Apache Spark)

> "sbt packageBin" fails in cygwin or other windows bash session
> --
>
> Key: SPARK-40739
> URL: https://issues.apache.org/jira/browse/SPARK-40739
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_sbt_* is started from 
> a (non-WSL) bash session.
> See the spark PR link for detailed symptoms.
>Reporter: Phil Walker
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
>  In a Windows _*SHELL*_ environment, such as _*cygwin*_ or 
> {_}*msys2/mingw64*{_}, etc,  _*Core.settings*_ in 
> _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is 
> present (typically at {_}*C:\Windows*{_}), causing a build failure.  This 
> occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ 
> bash.exe.
> This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167]
> There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ :
>  * determine the absolute path of the first bash.exe in the command line. 
>  * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.)
>  * For Windows SHELL environments, the first argument to the spawned Process 
> is changed from "bash" to the absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40739:


Assignee: Apache Spark

> "sbt packageBin" fails in cygwin or other windows bash session
> --
>
> Key: SPARK-40739
> URL: https://issues.apache.org/jira/browse/SPARK-40739
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_sbt_* is started from 
> a (non-WSL) bash session.
> See the spark PR link for detailed symptoms.
>Reporter: Phil Walker
>Assignee: Apache Spark
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
>  In a Windows _*SHELL*_ environment, such as _*cygwin*_ or 
> {_}*msys2/mingw64*{_}, etc,  _*Core.settings*_ in 
> _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is 
> present (typically at {_}*C:\Windows*{_}), causing a build failure.  This 
> occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ 
> bash.exe.
> This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167]
> There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ :
>  * determine the absolute path of the first bash.exe in the command line. 
>  * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.)
>  * For Windows SHELL environments, the first argument to the spawned Process 
> is changed from "bash" to the absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615436#comment-17615436
 ] 

Apache Spark commented on SPARK-40739:
--

User 'philwalk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38167

> "sbt packageBin" fails in cygwin or other windows bash session
> --
>
> Key: SPARK-40739
> URL: https://issues.apache.org/jira/browse/SPARK-40739
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Windows
>Affects Versions: 3.3.0
> Environment: The problem occurs in Windows if *_sbt_* is started from 
> a (non-WSL) bash session.
> See the spark PR link for detailed symptoms.
>Reporter: Phil Walker
>Priority: Major
>  Labels: bash, cygwin, mingw, msys2,, windows
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
>  In a Windows _*SHELL*_ environment, such as _*cygwin*_ or 
> {_}*msys2/mingw64*{_}, etc,  _*Core.settings*_ in 
> _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is 
> present (typically at {_}*C:\Windows*{_}), causing a build failure.  This 
> occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ 
> bash.exe.
> This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167]
> There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ :
>  * determine the absolute path of the first bash.exe in the command line. 
>  * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.)
>  * For Windows SHELL environments, the first argument to the spawned Process 
> is changed from "bash" to the absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40742) Java compilation warnings related to generic type

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615430#comment-17615430
 ] 

Apache Spark commented on SPARK-40742:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38198

> Java compilation warnings related to generic type
> -
>
> Key: SPARK-40742
> URL: https://issues.apache.org/jira/browse/SPARK-40742
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> 2022-10-08T01:43:33.6487078Z 
> /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54:
>  warning: [rawtypes] found raw type: HashMap
> 2022-10-08T01:43:33.6487456Z     return new HashMap();
> 2022-10-08T01:43:33.6487682Z                ^
> 2022-10-08T01:43:33.6487957Z   missing type arguments for generic class 
> HashMap
> 2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
> 2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
> 2022-10-08T01:43:33.6489211Z     V extends Object declared in class 
> HashMap2022-10-08T01:50:21.5951932Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55:
>  warning: [rawtypes] found raw type: Map
> 2022-10-08T01:50:21.593Z       createPartitions(new InternalRow[]{ident}, 
> new Map[]{properties});
> 2022-10-08T01:50:21.6000343Z                                                  
>     ^
> 2022-10-08T01:50:21.6000642Z   missing type arguments for generic class 
> Map
> 2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
> 2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
> 2022-10-08T01:50:21.6002109Z     V extends Object declared in interface 
> Map2022-10-08T01:50:21.6006655Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216:
>  warning: [rawtypes] found raw type: Literal
> 2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) 
> {
> 2022-10-08T01:50:21.6007395Z                                 ^
> 2022-10-08T01:50:21.6007673Z   missing type arguments for generic class 
> Literal
> 2022-10-08T01:50:21.6008032Z   where T is a type-variable:
> 2022-10-08T01:50:21.6008324Z     T extends Object declared in interface 
> Literal2022-10-08T01:50:21.6008785Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56:
>  warning: [rawtypes] found raw type: Comparable
> 2022-10-08T01:50:21.6009223Z   public static class Coord implements 
> Comparable {
> 2022-10-08T01:50:21.6009503Z                                        ^
> 2022-10-08T01:50:21.6009791Z   missing type arguments for generic class 
> Comparable
> 2022-10-08T01:50:21.6010137Z   where T is a type-variable:
> 2022-10-08T01:50:21.6010433Z     T extends Object declared in interface 
> Comparable
> 2022-10-08T01:50:21.6010976Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191:
>  warning: [unchecked] unchecked method invocation: method sort in class 
> Collections is applied to given types
> 2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
> 2022-10-08T01:50:21.6011714Z                       ^
> 2022-10-08T01:50:21.6012050Z   required: List
> 2022-10-08T01:50:21.6012296Z   found: ArrayList
> 2022-10-08T01:50:21.6012604Z   where T is a type-variable:
> 2022-10-08T01:50:21.6012926Z     T extends Comparable declared in 
> method sort(List)2022-10-08T02:13:38.0769617Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85:
>  warning: [rawtypes] found raw type: AbstractWriterAppender
> 2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new 
> LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
> 2022-10-08T02:13:38.0770645Z     ^
> 2022-10-08T02:13:38.0770947Z   missing type arguments for generic class 
> AbstractWriterAppender
> 2022-10-08T02:13:38.0771330Z   where M is a type-variable:
> 2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class 
> AbstractWriterAppender2022-10-08T02:13:38.0774487Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268:
>  warning: [rawtypes] found raw type: Layout
> 2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
> 2022-10-08T02:13:38.0775173Z         ^
> 2022-10-08T02:13:38.0775441Z   missing type arguments for generic class 
> Layout
> 2022-10-08T02:13:38.0775849Z   where T is a type-variable:
> 

[jira] [Assigned] (SPARK-40742) Java compilation warnings related to generic type

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40742:


Assignee: (was: Apache Spark)

> Java compilation warnings related to generic type
> -
>
> Key: SPARK-40742
> URL: https://issues.apache.org/jira/browse/SPARK-40742
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> 2022-10-08T01:43:33.6487078Z 
> /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54:
>  warning: [rawtypes] found raw type: HashMap
> 2022-10-08T01:43:33.6487456Z     return new HashMap();
> 2022-10-08T01:43:33.6487682Z                ^
> 2022-10-08T01:43:33.6487957Z   missing type arguments for generic class 
> HashMap
> 2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
> 2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
> 2022-10-08T01:43:33.6489211Z     V extends Object declared in class 
> HashMap2022-10-08T01:50:21.5951932Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55:
>  warning: [rawtypes] found raw type: Map
> 2022-10-08T01:50:21.593Z       createPartitions(new InternalRow[]{ident}, 
> new Map[]{properties});
> 2022-10-08T01:50:21.6000343Z                                                  
>     ^
> 2022-10-08T01:50:21.6000642Z   missing type arguments for generic class 
> Map
> 2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
> 2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
> 2022-10-08T01:50:21.6002109Z     V extends Object declared in interface 
> Map2022-10-08T01:50:21.6006655Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216:
>  warning: [rawtypes] found raw type: Literal
> 2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) 
> {
> 2022-10-08T01:50:21.6007395Z                                 ^
> 2022-10-08T01:50:21.6007673Z   missing type arguments for generic class 
> Literal
> 2022-10-08T01:50:21.6008032Z   where T is a type-variable:
> 2022-10-08T01:50:21.6008324Z     T extends Object declared in interface 
> Literal2022-10-08T01:50:21.6008785Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56:
>  warning: [rawtypes] found raw type: Comparable
> 2022-10-08T01:50:21.6009223Z   public static class Coord implements 
> Comparable {
> 2022-10-08T01:50:21.6009503Z                                        ^
> 2022-10-08T01:50:21.6009791Z   missing type arguments for generic class 
> Comparable
> 2022-10-08T01:50:21.6010137Z   where T is a type-variable:
> 2022-10-08T01:50:21.6010433Z     T extends Object declared in interface 
> Comparable
> 2022-10-08T01:50:21.6010976Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191:
>  warning: [unchecked] unchecked method invocation: method sort in class 
> Collections is applied to given types
> 2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
> 2022-10-08T01:50:21.6011714Z                       ^
> 2022-10-08T01:50:21.6012050Z   required: List
> 2022-10-08T01:50:21.6012296Z   found: ArrayList
> 2022-10-08T01:50:21.6012604Z   where T is a type-variable:
> 2022-10-08T01:50:21.6012926Z     T extends Comparable declared in 
> method sort(List)2022-10-08T02:13:38.0769617Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85:
>  warning: [rawtypes] found raw type: AbstractWriterAppender
> 2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new 
> LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
> 2022-10-08T02:13:38.0770645Z     ^
> 2022-10-08T02:13:38.0770947Z   missing type arguments for generic class 
> AbstractWriterAppender
> 2022-10-08T02:13:38.0771330Z   where M is a type-variable:
> 2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class 
> AbstractWriterAppender2022-10-08T02:13:38.0774487Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268:
>  warning: [rawtypes] found raw type: Layout
> 2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
> 2022-10-08T02:13:38.0775173Z         ^
> 2022-10-08T02:13:38.0775441Z   missing type arguments for generic class 
> Layout
> 2022-10-08T02:13:38.0775849Z   where T is a type-variable:
> 2022-10-08T02:13:38.0776359Z     T extends Serializable declared in interface 
> 

[jira] [Commented] (SPARK-40742) Java compilation warnings related to generic type

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615428#comment-17615428
 ] 

Apache Spark commented on SPARK-40742:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38198

> Java compilation warnings related to generic type
> -
>
> Key: SPARK-40742
> URL: https://issues.apache.org/jira/browse/SPARK-40742
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> 2022-10-08T01:43:33.6487078Z 
> /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54:
>  warning: [rawtypes] found raw type: HashMap
> 2022-10-08T01:43:33.6487456Z     return new HashMap();
> 2022-10-08T01:43:33.6487682Z                ^
> 2022-10-08T01:43:33.6487957Z   missing type arguments for generic class 
> HashMap
> 2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
> 2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
> 2022-10-08T01:43:33.6489211Z     V extends Object declared in class 
> HashMap2022-10-08T01:50:21.5951932Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55:
>  warning: [rawtypes] found raw type: Map
> 2022-10-08T01:50:21.593Z       createPartitions(new InternalRow[]{ident}, 
> new Map[]{properties});
> 2022-10-08T01:50:21.6000343Z                                                  
>     ^
> 2022-10-08T01:50:21.6000642Z   missing type arguments for generic class 
> Map
> 2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
> 2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
> 2022-10-08T01:50:21.6002109Z     V extends Object declared in interface 
> Map2022-10-08T01:50:21.6006655Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216:
>  warning: [rawtypes] found raw type: Literal
> 2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) 
> {
> 2022-10-08T01:50:21.6007395Z                                 ^
> 2022-10-08T01:50:21.6007673Z   missing type arguments for generic class 
> Literal
> 2022-10-08T01:50:21.6008032Z   where T is a type-variable:
> 2022-10-08T01:50:21.6008324Z     T extends Object declared in interface 
> Literal2022-10-08T01:50:21.6008785Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56:
>  warning: [rawtypes] found raw type: Comparable
> 2022-10-08T01:50:21.6009223Z   public static class Coord implements 
> Comparable {
> 2022-10-08T01:50:21.6009503Z                                        ^
> 2022-10-08T01:50:21.6009791Z   missing type arguments for generic class 
> Comparable
> 2022-10-08T01:50:21.6010137Z   where T is a type-variable:
> 2022-10-08T01:50:21.6010433Z     T extends Object declared in interface 
> Comparable
> 2022-10-08T01:50:21.6010976Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191:
>  warning: [unchecked] unchecked method invocation: method sort in class 
> Collections is applied to given types
> 2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
> 2022-10-08T01:50:21.6011714Z                       ^
> 2022-10-08T01:50:21.6012050Z   required: List
> 2022-10-08T01:50:21.6012296Z   found: ArrayList
> 2022-10-08T01:50:21.6012604Z   where T is a type-variable:
> 2022-10-08T01:50:21.6012926Z     T extends Comparable declared in 
> method sort(List)2022-10-08T02:13:38.0769617Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85:
>  warning: [rawtypes] found raw type: AbstractWriterAppender
> 2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new 
> LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
> 2022-10-08T02:13:38.0770645Z     ^
> 2022-10-08T02:13:38.0770947Z   missing type arguments for generic class 
> AbstractWriterAppender
> 2022-10-08T02:13:38.0771330Z   where M is a type-variable:
> 2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class 
> AbstractWriterAppender2022-10-08T02:13:38.0774487Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268:
>  warning: [rawtypes] found raw type: Layout
> 2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
> 2022-10-08T02:13:38.0775173Z         ^
> 2022-10-08T02:13:38.0775441Z   missing type arguments for generic class 
> Layout
> 2022-10-08T02:13:38.0775849Z   where T is a type-variable:
> 

[jira] [Assigned] (SPARK-40742) Java compilation warnings related to generic type

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40742:


Assignee: Apache Spark

> Java compilation warnings related to generic type
> -
>
> Key: SPARK-40742
> URL: https://issues.apache.org/jira/browse/SPARK-40742
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> {code:java}
> 2022-10-08T01:43:33.6487078Z 
> /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54:
>  warning: [rawtypes] found raw type: HashMap
> 2022-10-08T01:43:33.6487456Z     return new HashMap();
> 2022-10-08T01:43:33.6487682Z                ^
> 2022-10-08T01:43:33.6487957Z   missing type arguments for generic class 
> HashMap
> 2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
> 2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
> 2022-10-08T01:43:33.6489211Z     V extends Object declared in class 
> HashMap2022-10-08T01:50:21.5951932Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55:
>  warning: [rawtypes] found raw type: Map
> 2022-10-08T01:50:21.593Z       createPartitions(new InternalRow[]{ident}, 
> new Map[]{properties});
> 2022-10-08T01:50:21.6000343Z                                                  
>     ^
> 2022-10-08T01:50:21.6000642Z   missing type arguments for generic class 
> Map
> 2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
> 2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
> 2022-10-08T01:50:21.6002109Z     V extends Object declared in interface 
> Map2022-10-08T01:50:21.6006655Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216:
>  warning: [rawtypes] found raw type: Literal
> 2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) 
> {
> 2022-10-08T01:50:21.6007395Z                                 ^
> 2022-10-08T01:50:21.6007673Z   missing type arguments for generic class 
> Literal
> 2022-10-08T01:50:21.6008032Z   where T is a type-variable:
> 2022-10-08T01:50:21.6008324Z     T extends Object declared in interface 
> Literal2022-10-08T01:50:21.6008785Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56:
>  warning: [rawtypes] found raw type: Comparable
> 2022-10-08T01:50:21.6009223Z   public static class Coord implements 
> Comparable {
> 2022-10-08T01:50:21.6009503Z                                        ^
> 2022-10-08T01:50:21.6009791Z   missing type arguments for generic class 
> Comparable
> 2022-10-08T01:50:21.6010137Z   where T is a type-variable:
> 2022-10-08T01:50:21.6010433Z     T extends Object declared in interface 
> Comparable
> 2022-10-08T01:50:21.6010976Z 
> /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191:
>  warning: [unchecked] unchecked method invocation: method sort in class 
> Collections is applied to given types
> 2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
> 2022-10-08T01:50:21.6011714Z                       ^
> 2022-10-08T01:50:21.6012050Z   required: List
> 2022-10-08T01:50:21.6012296Z   found: ArrayList
> 2022-10-08T01:50:21.6012604Z   where T is a type-variable:
> 2022-10-08T01:50:21.6012926Z     T extends Comparable declared in 
> method sort(List)2022-10-08T02:13:38.0769617Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85:
>  warning: [rawtypes] found raw type: AbstractWriterAppender
> 2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new 
> LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
> 2022-10-08T02:13:38.0770645Z     ^
> 2022-10-08T02:13:38.0770947Z   missing type arguments for generic class 
> AbstractWriterAppender
> 2022-10-08T02:13:38.0771330Z   where M is a type-variable:
> 2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class 
> AbstractWriterAppender2022-10-08T02:13:38.0774487Z 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268:
>  warning: [rawtypes] found raw type: Layout
> 2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
> 2022-10-08T02:13:38.0775173Z         ^
> 2022-10-08T02:13:38.0775441Z   missing type arguments for generic class 
> Layout
> 2022-10-08T02:13:38.0775849Z   where T is a type-variable:
> 2022-10-08T02:13:38.0776359Z     T extends Serializable declared in interface 
> 

[jira] [Created] (SPARK-40742) Java compilation warnings related to generic type

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40742:


 Summary: Java compilation warnings related to generic type
 Key: SPARK-40742
 URL: https://issues.apache.org/jira/browse/SPARK-40742
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


{code:java}
2022-10-08T01:43:33.6487078Z 
/home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54:
 warning: [rawtypes] found raw type: HashMap
2022-10-08T01:43:33.6487456Z     return new HashMap();
2022-10-08T01:43:33.6487682Z                ^
2022-10-08T01:43:33.6487957Z   missing type arguments for generic class 
HashMap
2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
2022-10-08T01:43:33.6489211Z     V extends Object declared in class 
HashMap2022-10-08T01:50:21.5951932Z 
/home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55:
 warning: [rawtypes] found raw type: Map
2022-10-08T01:50:21.593Z       createPartitions(new InternalRow[]{ident}, 
new Map[]{properties});
2022-10-08T01:50:21.6000343Z                                                    
  ^
2022-10-08T01:50:21.6000642Z   missing type arguments for generic class Map
2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
2022-10-08T01:50:21.6002109Z     V extends Object declared in interface 
Map2022-10-08T01:50:21.6006655Z 
/home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216:
 warning: [rawtypes] found raw type: Literal
2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) {
2022-10-08T01:50:21.6007395Z                                 ^
2022-10-08T01:50:21.6007673Z   missing type arguments for generic class 
Literal
2022-10-08T01:50:21.6008032Z   where T is a type-variable:
2022-10-08T01:50:21.6008324Z     T extends Object declared in interface 
Literal2022-10-08T01:50:21.6008785Z 
/home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56:
 warning: [rawtypes] found raw type: Comparable
2022-10-08T01:50:21.6009223Z   public static class Coord implements Comparable {
2022-10-08T01:50:21.6009503Z                                        ^
2022-10-08T01:50:21.6009791Z   missing type arguments for generic class 
Comparable
2022-10-08T01:50:21.6010137Z   where T is a type-variable:
2022-10-08T01:50:21.6010433Z     T extends Object declared in interface 
Comparable
2022-10-08T01:50:21.6010976Z 
/home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191:
 warning: [unchecked] unchecked method invocation: method sort in class 
Collections is applied to given types
2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
2022-10-08T01:50:21.6011714Z                       ^
2022-10-08T01:50:21.6012050Z   required: List
2022-10-08T01:50:21.6012296Z   found: ArrayList
2022-10-08T01:50:21.6012604Z   where T is a type-variable:
2022-10-08T01:50:21.6012926Z     T extends Comparable declared in 
method sort(List)2022-10-08T02:13:38.0769617Z 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85:
 warning: [rawtypes] found raw type: AbstractWriterAppender
2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new 
LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
2022-10-08T02:13:38.0770645Z     ^
2022-10-08T02:13:38.0770947Z   missing type arguments for generic class 
AbstractWriterAppender
2022-10-08T02:13:38.0771330Z   where M is a type-variable:
2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class 
AbstractWriterAppender2022-10-08T02:13:38.0774487Z 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268:
 warning: [rawtypes] found raw type: Layout
2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
2022-10-08T02:13:38.0775173Z         ^
2022-10-08T02:13:38.0775441Z   missing type arguments for generic class 
Layout
2022-10-08T02:13:38.0775849Z   where T is a type-variable:
2022-10-08T02:13:38.0776359Z     T extends Serializable declared in interface 
Layout2022-10-08T02:19:55.0035795Z [WARNING] 
/home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:17:
  [rawtypes] found raw type: SparkAvroKeyRecordWriter
2022-10-08T02:19:55.0037287Z [WARNING] 
/home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:13:
  [unchecked] unchecked call to 

[jira] [Resolved] (SPARK-40516) Add official image dockerfile for Spark v3.3.0

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40516.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 2
[https://github.com/apache/spark-docker/pull/2]

> Add official image dockerfile for Spark v3.3.0
> --
>
> Key: SPARK-40516
> URL: https://issues.apache.org/jira/browse/SPARK-40516
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, PySpark, SparkR
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> Example: [https://github.com/Yikun/spark-docker/tree/master/3.3.0]
> Test: 
> https://github.com/Yikun/spark-docker/blob/master/.github/workflows/build_3.3.0.yaml
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40516) Add official image dockerfile for Spark v3.3.0

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reassigned SPARK-40516:
---

Assignee: Yikun Jiang

> Add official image dockerfile for Spark v3.3.0
> --
>
> Key: SPARK-40516
> URL: https://issues.apache.org/jira/browse/SPARK-40516
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, PySpark, SparkR
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> Example: [https://github.com/Yikun/spark-docker/tree/master/3.3.0]
> Test: 
> https://github.com/Yikun/spark-docker/blob/master/.github/workflows/build_3.3.0.yaml
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40698) Improve the precision of `product` for intergral inputs

2022-10-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40698.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38148
[https://github.com/apache/spark/pull/38148]

> Improve the precision of `product` for intergral inputs
> ---
>
> Key: SPARK-40698
> URL: https://issues.apache.org/jira/browse/SPARK-40698
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40698) Improve the precision of `product` for intergral inputs

2022-10-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40698:


Assignee: Ruifeng Zheng

> Improve the precision of `product` for intergral inputs
> ---
>
> Key: SPARK-40698
> URL: https://issues.apache.org/jira/browse/SPARK-40698
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40741) spark项目bin/beeline对于distribute by sort by语句支持不好,输出结果错误

2022-10-10 Thread kaiqingli (Jira)
kaiqingli created SPARK-40741:
-

 Summary: spark项目bin/beeline对于distribute by sort by语句支持不好,输出结果错误
 Key: SPARK-40741
 URL: https://issues.apache.org/jira/browse/SPARK-40741
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
 Environment: spark 3.1

hive 3.0
Reporter: kaiqingli


sql中使用distribute by ... sort by 
...时,通过spark/bin/beeline执行的结果错误,使用hive/beeline输出结果正确,具体场景为,先基于posexplode拆分array数据,然后基于拆分的下标进行sort
 by,之后再collect list,结果与原始的array结果不一致,sql如下:

select id,
samplingtimesec,
array_data = new_array_data flag,
array_data,
new_array_data
from (
select id,
samplingtimesec,
array_data,
concat('[', concat_ws(',', collect_list(cell_voltage)), ']') new_array_data
from (
select id, samplingtimesec, array_data, cell_index, cell_voltage
from (
select id,
samplingtimesec,
array_data,--格式[1,2,3,4,5]
row_number() over (partition by id,samplingtimesec order by samplingtimesec) r 
--去重
from table
WHERE dt = '20221007'
and samplingtimesec <= 166507920
) tmp
lateral view posexplode(split(replace(replace(array_data, '[', ''), ']', ''), 
',')) v0 as cell_index, cell_voltage
where r = 1
distribute by id
, samplingtimesec sort by cell_index
) tmp
group by id, samplingtimesec, array_data
) tmp
where array_data != new_array_data;

以上sql,对于hive/beeline输出结果为0条;

对于spark/beeline输出结果不为0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40358) Migrate collection type check failures onto error classes

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40358:


Assignee: (was: Apache Spark)

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40358) Migrate collection type check failures onto error classes

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615412#comment-17615412
 ] 

Apache Spark commented on SPARK-40358:
--

User 'lvshaokang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38197

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40358) Migrate collection type check failures onto error classes

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40358:


Assignee: Apache Spark

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40596) Populate ExecutorDecommission with more informative messages

2022-10-10 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi resolved SPARK-40596.
--
  Assignee: Bo Zhang
Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/38030

> Populate ExecutorDecommission with more informative messages
> 
>
> Key: SPARK-40596
> URL: https://issues.apache.org/jira/browse/SPARK-40596
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>
> Currently the message in {{ExecutorDecommission}} is a fixed value 
> {{{}"Executor decommission."{}}}, and it is the same for all cases, including 
> spot instance interruptions and auto-scaling down. We should put a detailed 
> message in {{ExecutorDecommission}} to better differentiate those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40534) Extend support for Join Relation

2022-10-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40534:
---

Assignee: Rui Wang

> Extend support for Join Relation
> 
>
> Key: SPARK-40534
> URL: https://issues.apache.org/jira/browse/SPARK-40534
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Extend support for the `Join` relation with additional variants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40534) Extend support for Join Relation

2022-10-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40534.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38157
[https://github.com/apache/spark/pull/38157]

> Extend support for Join Relation
> 
>
> Key: SPARK-40534
> URL: https://issues.apache.org/jira/browse/SPARK-40534
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Extend support for the `Join` relation with additional variants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40703:


Assignee: (was: Apache Spark)

> Performance regression for joins in Spark 3.3 vs Spark 3.2
> --
>
> Key: SPARK-40703
> URL: https://issues.apache.org/jira/browse/SPARK-40703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bryan Keller
>Priority: Major
> Attachments: spark32-plan.txt, spark33-plan.txt, test.py
>
>
> When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a 
> performance regression vs Spark 3.2 was discovered. More specifically, it 
> appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no 
> longer enforces a minimum number of partitions for a join distribution in 
> some cases. This impacts DSv2 datasources, because if a scan has only a 
> single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() 
> returns a _SinglePartition_ instance. The _SinglePartition_ creates a 
> {_}SinglePartitionShuffleSpec{_}, and 
> {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true.
> Because {_}canCreatePartitioning{_}() returns true in this case, 
> {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce 
> minimum parallelism and also will favor the single partition when considering 
> the best distribution candidate. Ultimately this results in a single 
> partition being selected for the join distribution, even if the other side of 
> the join is a large table with many partitions. This can seriously impact 
> performance of the join.
> Spark 3.2 enforces minimum parallelism differently in 
> {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this 
> issue. It will shuffle both sides of the join to enforce parallelism.
> In the TPC-DS benchmark, some queries affected include 14a and 14b. This can 
> also be demonstrated using a simple query, for example:
> {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = 
> ics.i_item_sk}}
> ...where _item_ is a small table that is read into one partition, and 
> _catalog_sales_ is a large table. These tables are part of the TPC-DS but you 
> can create your own. Also, to demonstrate the issue, you may need to turn off 
> broadcast joins though that is not required for this issue to occur, it 
> happens when running the TPC-DS with broadcast setting at default.
> Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan 
> shows how in Spark 3.2, the join parallelism of 200 is reached by inserting 
> an exchange after the item table scan. In Spark 3.3, no such exchange is 
> inserted and the join parallelism is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40703:


Assignee: Apache Spark

> Performance regression for joins in Spark 3.3 vs Spark 3.2
> --
>
> Key: SPARK-40703
> URL: https://issues.apache.org/jira/browse/SPARK-40703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bryan Keller
>Assignee: Apache Spark
>Priority: Major
> Attachments: spark32-plan.txt, spark33-plan.txt, test.py
>
>
> When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a 
> performance regression vs Spark 3.2 was discovered. More specifically, it 
> appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no 
> longer enforces a minimum number of partitions for a join distribution in 
> some cases. This impacts DSv2 datasources, because if a scan has only a 
> single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() 
> returns a _SinglePartition_ instance. The _SinglePartition_ creates a 
> {_}SinglePartitionShuffleSpec{_}, and 
> {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true.
> Because {_}canCreatePartitioning{_}() returns true in this case, 
> {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce 
> minimum parallelism and also will favor the single partition when considering 
> the best distribution candidate. Ultimately this results in a single 
> partition being selected for the join distribution, even if the other side of 
> the join is a large table with many partitions. This can seriously impact 
> performance of the join.
> Spark 3.2 enforces minimum parallelism differently in 
> {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this 
> issue. It will shuffle both sides of the join to enforce parallelism.
> In the TPC-DS benchmark, some queries affected include 14a and 14b. This can 
> also be demonstrated using a simple query, for example:
> {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = 
> ics.i_item_sk}}
> ...where _item_ is a small table that is read into one partition, and 
> _catalog_sales_ is a large table. These tables are part of the TPC-DS but you 
> can create your own. Also, to demonstrate the issue, you may need to turn off 
> broadcast joins though that is not required for this issue to occur, it 
> happens when running the TPC-DS with broadcast setting at default.
> Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan 
> shows how in Spark 3.2, the join parallelism of 200 is reached by inserting 
> an exchange after the item table scan. In Spark 3.3, no such exchange is 
> inserted and the join parallelism is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615379#comment-17615379
 ] 

Apache Spark commented on SPARK-40703:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/38196

> Performance regression for joins in Spark 3.3 vs Spark 3.2
> --
>
> Key: SPARK-40703
> URL: https://issues.apache.org/jira/browse/SPARK-40703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bryan Keller
>Priority: Major
> Attachments: spark32-plan.txt, spark33-plan.txt, test.py
>
>
> When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a 
> performance regression vs Spark 3.2 was discovered. More specifically, it 
> appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no 
> longer enforces a minimum number of partitions for a join distribution in 
> some cases. This impacts DSv2 datasources, because if a scan has only a 
> single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() 
> returns a _SinglePartition_ instance. The _SinglePartition_ creates a 
> {_}SinglePartitionShuffleSpec{_}, and 
> {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true.
> Because {_}canCreatePartitioning{_}() returns true in this case, 
> {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce 
> minimum parallelism and also will favor the single partition when considering 
> the best distribution candidate. Ultimately this results in a single 
> partition being selected for the join distribution, even if the other side of 
> the join is a large table with many partitions. This can seriously impact 
> performance of the join.
> Spark 3.2 enforces minimum parallelism differently in 
> {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this 
> issue. It will shuffle both sides of the join to enforce parallelism.
> In the TPC-DS benchmark, some queries affected include 14a and 14b. This can 
> also be demonstrated using a simple query, for example:
> {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = 
> ics.i_item_sk}}
> ...where _item_ is a small table that is read into one partition, and 
> _catalog_sales_ is a large table. These tables are part of the TPC-DS but you 
> can create your own. Also, to demonstrate the issue, you may need to turn off 
> broadcast joins though that is not required for this issue to occur, it 
> happens when running the TPC-DS with broadcast setting at default.
> Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan 
> shows how in Spark 3.2, the join parallelism of 200 is reached by inserting 
> an exchange after the item table scan. In Spark 3.3, no such exchange is 
> inserted and the join parallelism is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40725) Add mypy-protobuf to requirements

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615373#comment-17615373
 ] 

Apache Spark commented on SPARK-40725:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38195

> Add mypy-protobuf to requirements
> -
>
> Key: SPARK-40725
> URL: https://issues.apache.org/jira/browse/SPARK-40725
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40725) Add mypy-protobuf to requirements

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615372#comment-17615372
 ] 

Apache Spark commented on SPARK-40725:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38195

> Add mypy-protobuf to requirements
> -
>
> Key: SPARK-40725
> URL: https://issues.apache.org/jira/browse/SPARK-40725
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40740) Improve listFunctions in SessionCatalog

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615370#comment-17615370
 ] 

Apache Spark commented on SPARK-40740:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/38194

> Improve listFunctions in SessionCatalog
> ---
>
> Key: SPARK-40740
> URL: https://issues.apache.org/jira/browse/SPARK-40740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Minor
>
> Currently `listFunctions` gets all external functions and registered 
> functions (built-in, temporary, and persistent functions with a specific 
> database name).  It is not necessary to get persistent functions that match a 
> specific database name again since we already fetched them from 
> `externalCatalog.listFunctions`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40740) Improve listFunctions in SessionCatalog

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615369#comment-17615369
 ] 

Apache Spark commented on SPARK-40740:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/38194

> Improve listFunctions in SessionCatalog
> ---
>
> Key: SPARK-40740
> URL: https://issues.apache.org/jira/browse/SPARK-40740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Minor
>
> Currently `listFunctions` gets all external functions and registered 
> functions (built-in, temporary, and persistent functions with a specific 
> database name).  It is not necessary to get persistent functions that match a 
> specific database name again since we already fetched them from 
> `externalCatalog.listFunctions`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40740) Improve listFunctions in SessionCatalog

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40740:


Assignee: (was: Apache Spark)

> Improve listFunctions in SessionCatalog
> ---
>
> Key: SPARK-40740
> URL: https://issues.apache.org/jira/browse/SPARK-40740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Minor
>
> Currently `listFunctions` gets all external functions and registered 
> functions (built-in, temporary, and persistent functions with a specific 
> database name).  It is not necessary to get persistent functions that match a 
> specific database name again since we already fetched them from 
> `externalCatalog.listFunctions`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40740) Improve listFunctions in SessionCatalog

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40740:


Assignee: Apache Spark

> Improve listFunctions in SessionCatalog
> ---
>
> Key: SPARK-40740
> URL: https://issues.apache.org/jira/browse/SPARK-40740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Currently `listFunctions` gets all external functions and registered 
> functions (built-in, temporary, and persistent functions with a specific 
> database name).  It is not necessary to get persistent functions that match a 
> specific database name again since we already fetched them from 
> `externalCatalog.listFunctions`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40740) Improve listFunctions in SessionCatalog

2022-10-10 Thread Allison Wang (Jira)
Allison Wang created SPARK-40740:


 Summary: Improve listFunctions in SessionCatalog
 Key: SPARK-40740
 URL: https://issues.apache.org/jira/browse/SPARK-40740
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Allison Wang


Currently `listFunctions` gets all external functions and registered functions 
(built-in, temporary, and persistent functions with a specific database name).  
It is not necessary to get persistent functions that match a specific database 
name again since we already fetched them from `externalCatalog.listFunctions`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-10 Thread Khalid Mammadov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615361#comment-17615361
 ] 

Khalid Mammadov commented on SPARK-37945:
-

[~maxgekk] I see you have already fixed most of these, I can pick up (and 
started already) below ones if Ok? 

unscaledValueTooLargeForPrecisionError
decimalPrecisionExceedsMaxPrecisionError
outOfDecimalTypeRangeError
integerOverflowError

Ps: Looks fearly streightforward and shouldn't take long

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615358#comment-17615358
 ] 

Apache Spark commented on SPARK-39375:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38193

> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime environment.
>  
> Spark Connect will strengthen Spark’s position as the modern unified engine 
> for large-scale data analytics and expand applicability to use cases and 
> developers we could not reach with the current setup: Spark will become 
> ubiquitously usable as the DataFrame API can be used with (almost) any 
> programming language.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39375:


Assignee: Apache Spark

> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime environment.
>  
> Spark Connect will strengthen Spark’s position as the modern unified engine 
> for large-scale data analytics and expand applicability to use cases and 
> developers we could not reach with the current setup: Spark will become 
> ubiquitously usable as the DataFrame API can be used with (almost) any 
> programming language.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39375:


Assignee: (was: Apache Spark)

> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime environment.
>  
> Spark Connect will strengthen Spark’s position as the modern unified engine 
> for large-scale data analytics and expand applicability to use cases and 
> developers we could not reach with the current setup: Spark will become 
> ubiquitously usable as the DataFrame API can be used with (almost) any 
> programming language.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615357#comment-17615357
 ] 

Apache Spark commented on SPARK-39375:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38193

> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime environment.
>  
> Spark Connect will strengthen Spark’s position as the modern unified engine 
> for large-scale data analytics and expand applicability to use cases and 
> developers we could not reach with the current setup: Spark will become 
> ubiquitously usable as the DataFrame API can be used with (almost) any 
> programming language.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39199) Implement pandas API missing parameters

2022-10-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-39199.
--
Resolution: Resolved

> Implement pandas API missing parameters
> ---
>
> Key: SPARK-39199
> URL: https://issues.apache.org/jira/browse/SPARK-39199
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.3.0, 3.4.0, 3.3.1
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> pandas API on Spark aims to make pandas code work on Spark clusters without 
> any changes. So full API coverage has been one of our major goals. Currently, 
> most pandas functions are implemented, whereas some of them are have 
> incomplete parameters support.
> There are some common parameters missing (resolved):
>  * How to do with NAs   
>  * Filter data types    
>  * Control result length    
>  * Reindex result   
> There are remaining missing parameters to implement (see doc below).
> See the design and the current status at 
> [https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session

2022-10-10 Thread Phil Walker (Jira)
Phil Walker created SPARK-40739:
---

 Summary: "sbt packageBin" fails in cygwin or other windows bash 
session
 Key: SPARK-40739
 URL: https://issues.apache.org/jira/browse/SPARK-40739
 Project: Spark
  Issue Type: Bug
  Components: Build, Windows
Affects Versions: 3.3.0
 Environment: The problem occurs in Windows if *_sbt_* is started from 
a (non-WSL) bash session.

See the spark PR link for detailed symptoms.
Reporter: Phil Walker


 In a Windows _*SHELL*_ environment, such as _*cygwin*_ or 
{_}*msys2/mingw64*{_}, etc,  _*Core.settings*_ in _*project/SparkBuild.scala*_ 
calls the wrong _*bash.exe*_ if WSL bash is present (typically at 
{_}*C:\Windows*{_}), causing a build failure.  This occurs even though the 
proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ bash.exe.

This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167]

There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ :
 * determine the absolute path of the first bash.exe in the command line. 
 * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.)
 * For Windows SHELL environments, the first argument to the spawned Process is 
changed from "bash" to the absolute path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session

2022-10-10 Thread Phil Walker (Jira)
Phil Walker created SPARK-40738:
---

 Summary: spark-shell fails with "bad array subscript" in cygwin or 
msys bash session
 Key: SPARK-40738
 URL: https://issues.apache.org/jira/browse/SPARK-40738
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, Windows
Affects Versions: 3.3.0
 Environment: The problem occurs in Windows if *_spark-shell_* is 
called from a bash session.

NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since 
they call spark-shell.
Reporter: Phil Walker


A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] 
fixes this issue, and also fixes a build error that is also related to 
_*cygwin*_  and *msys/mingw* bash *sbt* sessions.

If a Windows user tries to start a *_spark-shell_* session by calling the bash 
script (rather than the *_spark-shell.cmd_* script), it fails with a confusing 
error message.  Script _*spark-class*_ calls 
_*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate 
command line arguments, but the launcher produces a format appropriate to the 
*_.cmd_* version of the script rather than the _*bash*_ version.

The launcher Main method, when called for environments other than Windows, 
interleaves NULL characters between the command line arguments.   It should 
also do so in Windows when called from the bash script.  It incorrectly assumes 
that if the OS is Windows, that it is being called by the .cmd version of the 
script.

The resulting error message is unhelpful:

 
{code:java}
[lots of ugly stuff omitted]
/opt/spark/bin/spark-class: line 100: CMD: bad array subscript
{code}
The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ session 
is that the _*SHELL*_ environment variable is set.   This will normally be set 
in any of the various Windows shell environments ({_}*cygwin*{_}, 
{_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally be set in Windows 
environments.   In the _*spark-class.cmd*_ script, _*SHELL*_ is intentionally 
unset to avoid problems, and to permit bash users to call the _*.cmd*_ scripts 
if they prefer (it will still work as before).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Bryan Keller (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615264#comment-17615264
 ] 

Bryan Keller commented on SPARK-40703:
--

Sounds good, thanks.

> Performance regression for joins in Spark 3.3 vs Spark 3.2
> --
>
> Key: SPARK-40703
> URL: https://issues.apache.org/jira/browse/SPARK-40703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bryan Keller
>Priority: Major
> Attachments: spark32-plan.txt, spark33-plan.txt, test.py
>
>
> When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a 
> performance regression vs Spark 3.2 was discovered. More specifically, it 
> appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no 
> longer enforces a minimum number of partitions for a join distribution in 
> some cases. This impacts DSv2 datasources, because if a scan has only a 
> single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() 
> returns a _SinglePartition_ instance. The _SinglePartition_ creates a 
> {_}SinglePartitionShuffleSpec{_}, and 
> {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true.
> Because {_}canCreatePartitioning{_}() returns true in this case, 
> {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce 
> minimum parallelism and also will favor the single partition when considering 
> the best distribution candidate. Ultimately this results in a single 
> partition being selected for the join distribution, even if the other side of 
> the join is a large table with many partitions. This can seriously impact 
> performance of the join.
> Spark 3.2 enforces minimum parallelism differently in 
> {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this 
> issue. It will shuffle both sides of the join to enforce parallelism.
> In the TPC-DS benchmark, some queries affected include 14a and 14b. This can 
> also be demonstrated using a simple query, for example:
> {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = 
> ics.i_item_sk}}
> ...where _item_ is a small table that is read into one partition, and 
> _catalog_sales_ is a large table. These tables are part of the TPC-DS but you 
> can create your own. Also, to demonstrate the issue, you may need to turn off 
> broadcast joins though that is not required for this issue to occur, it 
> happens when running the TPC-DS with broadcast setting at default.
> Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan 
> shows how in Spark 3.2, the join parallelism of 200 is reached by inserting 
> an exchange after the item table scan. In Spark 3.3, no such exchange is 
> inserted and the join parallelism is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615255#comment-17615255
 ] 

Chao Sun commented on SPARK-40703:
--

Thanks [~bryanck] . Now I see where the issue is.

In your pyspark example, one side reports {{UnknownPartitioning}} while another 
side reports {{{}SinglePartition{}}}. Later on, Spark will insert shuffle for 
{{UnknownPartitioning}} so it becomes {{{}HashPartitioning{}}}. In this 
particular case, when Spark is deciding which side to insert shuffle, it'll 
pick the {{HashPartitioning}} again and convert it into the same 
{{HashPartitioning}} but with {{{}numPartitions = 1{}}}.

Before:
{code}
 ShuffleExchange(HashPartition(200))  <-->  SinglePartition
{code}
(suppose {{spark.sql.shuffle.partitions}} is 200)

After:
{code}
 ShuffleExchange(HashPartition(1))  <-->  SinglePartition
{code}
 
The reason Spark chooses to do in this way is because there is a trade-off 
between shuffle cost and parallelism. At the moment, when Spark sees that one 
side of the join has {{ShuffleExchange}} (meaning it needs to be shuffled 
anyways), and the other side doesn't, it'll try to avoid shuffling the other 
side. 

This makes more sense if we have:
{code}
ShuffleExchange(HashPartition(200)) <-> HashPartition(150)
{code}

as in this case, Spark will avoid shuffle the right hand side and instead just 
change the number of shuffle partitions on the left:
{code}
ShuffleExchange(HashPartition(150) <-> HashPartition(150)
{code}

I feel we can treat the {{SinglePartition}} as a special case here. Let me see 
if I can come up with a PR.

> Performance regression for joins in Spark 3.3 vs Spark 3.2
> --
>
> Key: SPARK-40703
> URL: https://issues.apache.org/jira/browse/SPARK-40703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bryan Keller
>Priority: Major
> Attachments: spark32-plan.txt, spark33-plan.txt, test.py
>
>
> When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a 
> performance regression vs Spark 3.2 was discovered. More specifically, it 
> appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no 
> longer enforces a minimum number of partitions for a join distribution in 
> some cases. This impacts DSv2 datasources, because if a scan has only a 
> single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() 
> returns a _SinglePartition_ instance. The _SinglePartition_ creates a 
> {_}SinglePartitionShuffleSpec{_}, and 
> {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true.
> Because {_}canCreatePartitioning{_}() returns true in this case, 
> {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce 
> minimum parallelism and also will favor the single partition when considering 
> the best distribution candidate. Ultimately this results in a single 
> partition being selected for the join distribution, even if the other side of 
> the join is a large table with many partitions. This can seriously impact 
> performance of the join.
> Spark 3.2 enforces minimum parallelism differently in 
> {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this 
> issue. It will shuffle both sides of the join to enforce parallelism.
> In the TPC-DS benchmark, some queries affected include 14a and 14b. This can 
> also be demonstrated using a simple query, for example:
> {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = 
> ics.i_item_sk}}
> ...where _item_ is a small table that is read into one partition, and 
> _catalog_sales_ is a large table. These tables are part of the TPC-DS but you 
> can create your own. Also, to demonstrate the issue, you may need to turn off 
> broadcast joins though that is not required for this issue to occur, it 
> happens when running the TPC-DS with broadcast setting at default.
> Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan 
> shows how in Spark 3.2, the join parallelism of 200 is reached by inserting 
> an exchange after the item table scan. In Spark 3.3, no such exchange is 
> inserted and the join parallelism is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40737) Add basic support for DataFrameWriter

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40737:


Assignee: Apache Spark

> Add basic support for DataFrameWriter
> -
>
> Key: SPARK-40737
> URL: https://issues.apache.org/jira/browse/SPARK-40737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> A key element of using Spark Connect is going to be to be able to write data 
> from a logical plan. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40737) Add basic support for DataFrameWriter

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615251#comment-17615251
 ] 

Apache Spark commented on SPARK-40737:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/38192

> Add basic support for DataFrameWriter
> -
>
> Key: SPARK-40737
> URL: https://issues.apache.org/jira/browse/SPARK-40737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> A key element of using Spark Connect is going to be to be able to write data 
> from a logical plan. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40737) Add basic support for DataFrameWriter

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40737:


Assignee: Apache Spark

> Add basic support for DataFrameWriter
> -
>
> Key: SPARK-40737
> URL: https://issues.apache.org/jira/browse/SPARK-40737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> A key element of using Spark Connect is going to be to be able to write data 
> from a logical plan. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40737) Add basic support for DataFrameWriter

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40737:


Assignee: (was: Apache Spark)

> Add basic support for DataFrameWriter
> -
>
> Key: SPARK-40737
> URL: https://issues.apache.org/jira/browse/SPARK-40737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> A key element of using Spark Connect is going to be to be able to write data 
> from a logical plan. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40737) Add basic support for DataFrameWriter

2022-10-10 Thread Martin Grund (Jira)
Martin Grund created SPARK-40737:


 Summary: Add basic support for DataFrameWriter
 Key: SPARK-40737
 URL: https://issues.apache.org/jira/browse/SPARK-40737
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


A key element of using Spark Connect is going to be to be able to write data 
from a logical plan. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2

2022-10-10 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated SPARK-40736:
--
Fix Version/s: 3.3.1

> Spark 3.3.0 doesn't works with Hive 3.1.2
> -
>
> Key: SPARK-40736
> URL: https://issues.apache.org/jira/browse/SPARK-40736
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Pratik Malani
>Priority: Major
>  Labels: Hive, spark, spark3.0
> Fix For: 3.3.1
>
>
> Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2.
> Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when 
> starting the Thriftserver
>  
> {noformat}
> Exception in thread "main" java.lang.IllegalAccessError: tried to access 
> class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from 
> class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat}
> Using below command to start the Thriftserver
>  
> *spark-class org.apache.spark.deploy.SparkSubmit --class 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal*
>  
> Have set the SPARK_HOME correctly.
>  
> The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2

2022-10-10 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated SPARK-40736:
--
Labels: Hive spark spark3.0  (was: )

> Spark 3.3.0 doesn't works with Hive 3.1.2
> -
>
> Key: SPARK-40736
> URL: https://issues.apache.org/jira/browse/SPARK-40736
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Pratik Malani
>Priority: Major
>  Labels: Hive, spark, spark3.0
>
> Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2.
> Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when 
> starting the Thriftserver
>  
> {noformat}
> Exception in thread "main" java.lang.IllegalAccessError: tried to access 
> class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from 
> class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat}
> Using below command to start the Thriftserver
>  
> *spark-class org.apache.spark.deploy.SparkSubmit --class 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal*
>  
> Have set the SPARK_HOME correctly.
>  
> The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2

2022-10-10 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated SPARK-40736:
--
Description: 
Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2.

Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when 
starting the Thriftserver

 
{noformat}
Exception in thread "main" java.lang.IllegalAccessError: tried to access class 
org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from class 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat}
Using below command to start the Thriftserver

 

*spark-class org.apache.spark.deploy.SparkSubmit --class 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal*

 

Have set the SPARK_HOME correctly.

 

The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2.

> Spark 3.3.0 doesn't works with Hive 3.1.2
> -
>
> Key: SPARK-40736
> URL: https://issues.apache.org/jira/browse/SPARK-40736
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Pratik Malani
>Priority: Major
>
> Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2.
> Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when 
> starting the Thriftserver
>  
> {noformat}
> Exception in thread "main" java.lang.IllegalAccessError: tried to access 
> class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from 
> class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat}
> Using below command to start the Thriftserver
>  
> *spark-class org.apache.spark.deploy.SparkSubmit --class 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal*
>  
> Have set the SPARK_HOME correctly.
>  
> The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2

2022-10-10 Thread Pratik Malani (Jira)
Pratik Malani created SPARK-40736:
-

 Summary: Spark 3.3.0 doesn't works with Hive 3.1.2
 Key: SPARK-40736
 URL: https://issues.apache.org/jira/browse/SPARK-40736
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Pratik Malani






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40706) IllegalStateException when querying array values inside a nested struct

2022-10-10 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614550#comment-17614550
 ] 

Bruce Robbins edited comment on SPARK-40706 at 10/10/22 5:01 PM:
-

Same as SPARK-39854?

At the very least, the suggested workaround also worked for your case:
{noformat}
spark-sql> set spark.sql.optimizer.nestedSchemaPruning.enabled=false;
spark.sql.optimizer.nestedSchemaPruning.enabled false
Time taken: 0.224 seconds, Fetched 1 row(s)
spark-sql> set spark.sql.optimizer.expression.nestedPruning.enabled=false;
spark.sql.optimizer.expression.nestedPruning.enabledfalse
Time taken: 0.016 seconds, Fetched 1 row(s)
spark-sql> SELECT 
response.message as message,
response.timestamp as timestamp,
score as risk_score,
model.value as model_type
FROM tbl
  LATERAL VIEW OUTER explode(response.data.items.attempt)   
  AS Attempt
  LATERAL VIEW OUTER explode(response.data.items.attempt.risk)  
  AS RiskModels
  LATERAL VIEW OUTER explode(RiskModels)
  AS RiskModel
  LATERAL VIEW OUTER explode(RiskModel.indicator)   
  AS Model
  LATERAL VIEW OUTER explode(RiskModel.Score)   
  AS Score;

 >  >  >  >  >  >  >
  >  >  > 
m1  09/07/2022  1   abc
m1  09/07/2022  2   abc
m1  09/07/2022  3   abc
m1  09/07/2022  1   def
m1  09/07/2022  2   def
m1  09/07/2022  3   def
Time taken: 1.213 seconds, Fetched 6 row(s)
spark-sql>  > 
{noformat}
 


was (Author: bersprockets):
Same as SPARK-39854?

At the very least, the suggest workaround also worked for your case:
{noformat}
spark-sql> set spark.sql.optimizer.nestedSchemaPruning.enabled=false;
spark.sql.optimizer.nestedSchemaPruning.enabled false
Time taken: 0.224 seconds, Fetched 1 row(s)
spark-sql> set spark.sql.optimizer.expression.nestedPruning.enabled=false;
spark.sql.optimizer.expression.nestedPruning.enabledfalse
Time taken: 0.016 seconds, Fetched 1 row(s)
spark-sql> SELECT 
response.message as message,
response.timestamp as timestamp,
score as risk_score,
model.value as model_type
FROM tbl
  LATERAL VIEW OUTER explode(response.data.items.attempt)   
  AS Attempt
  LATERAL VIEW OUTER explode(response.data.items.attempt.risk)  
  AS RiskModels
  LATERAL VIEW OUTER explode(RiskModels)
  AS RiskModel
  LATERAL VIEW OUTER explode(RiskModel.indicator)   
  AS Model
  LATERAL VIEW OUTER explode(RiskModel.Score)   
  AS Score;

 >  >  >  >  >  >  >
  >  >  > 
m1  09/07/2022  1   abc
m1  09/07/2022  2   abc
m1  09/07/2022  3   abc
m1  09/07/2022  1   def
m1  09/07/2022  2   def
m1  09/07/2022  3   def
Time taken: 1.213 seconds, Fetched 6 row(s)
spark-sql>  > 
{noformat}
 

> IllegalStateException when querying array values inside a nested struct
> ---
>
> Key: SPARK-40706
> URL: https://issues.apache.org/jira/browse/SPARK-40706
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Rohan Barman
>Priority: Major
>
> We are in the process of migrating our PySpark applications from Spark 
> version 3.1.2 to Spark version 3.2.0. 
> This bug is present in version 3.2.0. We do not see this issue in version 
> 3.1.2.
>  
> *Minimal example to reproduce bug*
> Below is a minimal example that generates hardcoded data and queries. The 
> data has several nested structs and arrays.
> Our real use case reads data from avro files and has more complex queries, 
> but this is sufficient to reproduce the error.
>  
> {code:java}
> # Generate data
> data = [
>   ('1',{
>   'timestamp': '09/07/2022',
>   'message': 'm1',
>   'data':{
> 'items': {
>   'id':1,
>   'attempt':[
> {'risk':[
>   {'score':[1,2,3]},
>   {'indicator':[
> {'code':'c1','value':'abc'},
> {'code':'c2','value':'def'}
>   ]}
> ]}
>   ]
> }
>   }
>   })
> ]
> from pyspark.sql.types import *
> schema = StructType([
> StructField('id', StringType(), True),
> StructField('response', 

[jira] [Resolved] (SPARK-40714) Remove PartitionAlreadyExistsException

2022-10-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40714.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38161
[https://github.com/apache/spark/pull/38161]

> Remove PartitionAlreadyExistsException
> --
>
> Key: SPARK-40714
> URL: https://issues.apache.org/jira/browse/SPARK-40714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Remove Remove PartitionAlreadyExistsException and use 
> PartitionsAlreadyExistException instead of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36681) Fail to load Snappy codec

2022-10-10 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615201#comment-17615201
 ] 

L. C. Hsieh commented on SPARK-36681:
-

This is fixed in 3.3.0 and later, yes, by upgrading to Hadoop 3.3.2.
As discussed above, there is no workaround in 3.2 for this issue.
If you are stick with 3.2, the only way is to upgrade to Hadoop 3.3.2 in Spark 
3.2 source.


> Fail to load Snappy codec
> -
>
> Key: SPARK-36681
> URL: https://issues.apache.org/jira/browse/SPARK-36681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> snappy-java as a native library should not be relocated in Hadoop shaded 
> client libraries. Currently we use Hadoop shaded client libraries in Spark. 
> If trying to use SnappyCodec to write sequence file, we will encounter the 
> following error:
> {code}
> [info]   Cause: java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native 
> Method)   
>   
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151)   
>   
>
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282)
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589)
>  
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable

2022-10-10 Thread xiaoping.huang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaoping.huang updated SPARK-40735:
---
Component/s: Connect
 Kubernetes
 R
 Spark Core
 SQL
 (was: Deploy)

> Consistently invoke bash with /usr/bin/env bash in scripts to make code more 
> portable
> -
>
> Key: SPARK-40735
> URL: https://issues.apache.org/jira/browse/SPARK-40735
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Kubernetes, R, Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: xiaoping.huang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40735:


Assignee: Apache Spark

> Consistently invoke bash with /usr/bin/env bash in scripts to make code more 
> portable
> -
>
> Key: SPARK-40735
> URL: https://issues.apache.org/jira/browse/SPARK-40735
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.4.0
>Reporter: xiaoping.huang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40735:


Assignee: (was: Apache Spark)

> Consistently invoke bash with /usr/bin/env bash in scripts to make code more 
> portable
> -
>
> Key: SPARK-40735
> URL: https://issues.apache.org/jira/browse/SPARK-40735
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.4.0
>Reporter: xiaoping.huang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615190#comment-17615190
 ] 

Apache Spark commented on SPARK-40735:
--

User 'huangxiaopingRD' has created a pull request for this issue:
https://github.com/apache/spark/pull/38191

> Consistently invoke bash with /usr/bin/env bash in scripts to make code more 
> portable
> -
>
> Key: SPARK-40735
> URL: https://issues.apache.org/jira/browse/SPARK-40735
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.4.0
>Reporter: xiaoping.huang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable

2022-10-10 Thread xiaoping.huang (Jira)
xiaoping.huang created SPARK-40735:
--

 Summary: Consistently invoke bash with /usr/bin/env bash in 
scripts to make code more portable
 Key: SPARK-40735
 URL: https://issues.apache.org/jira/browse/SPARK-40735
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 3.4.0
Reporter: xiaoping.huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40705) Issue with spark converting Row to Json using Scala 2.13

2022-10-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40705.
--
Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 38154
[https://github.com/apache/spark/pull/38154]

> Issue with spark converting Row to Json using Scala 2.13
> 
>
> Key: SPARK-40705
> URL: https://issues.apache.org/jira/browse/SPARK-40705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Amrane Ait Zeouay
>Assignee: Amrane Ait Zeouay
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: image-2022-10-07-19-43-42-232.png, 
> image-2022-10-07-19-43-51-892.png, image-2022-10-07-19-44-00-332.png, 
> image-2022-10-07-19-44-09-972.png
>
>
> h2. *Note: This issue can be reproduced only using Scala 2.13*
> When I'm trying to convert the Row to a json to publish it, i'm getting this 
> following error
> !image-2022-10-07-19-43-42-232.png!
> I tried to investigate and I found that the issue is in the matching.
> !image-2022-10-07-19-43-51-892.png!
> The type `ArraySeq` is not matched in `Row` class.
> !image-2022-10-07-19-44-00-332.png!
> This is the definition of my field
> !image-2022-10-07-19-44-09-972.png!
> And an example of it 
>  
> {code:json}
> {
> ...
> Codes: ["Test", "Spark", "Json"]
> ...
> }{code}
> The Scala version I'm using is `2.13.9`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40705) Issue with spark converting Row to Json using Scala 2.13

2022-10-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40705:


Assignee: Amrane Ait Zeouay

> Issue with spark converting Row to Json using Scala 2.13
> 
>
> Key: SPARK-40705
> URL: https://issues.apache.org/jira/browse/SPARK-40705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Amrane Ait Zeouay
>Assignee: Amrane Ait Zeouay
>Priority: Major
> Attachments: image-2022-10-07-19-43-42-232.png, 
> image-2022-10-07-19-43-51-892.png, image-2022-10-07-19-44-00-332.png, 
> image-2022-10-07-19-44-09-972.png
>
>
> h2. *Note: This issue can be reproduced only using Scala 2.13*
> When I'm trying to convert the Row to a json to publish it, i'm getting this 
> following error
> !image-2022-10-07-19-43-42-232.png!
> I tried to investigate and I found that the issue is in the matching.
> !image-2022-10-07-19-43-51-892.png!
> The type `ArraySeq` is not matched in `Row` class.
> !image-2022-10-07-19-44-00-332.png!
> This is the definition of my field
> !image-2022-10-07-19-44-09-972.png!
> And an example of it 
>  
> {code:json}
> {
> ...
> Codes: ["Test", "Spark", "Json"]
> ...
> }{code}
> The Scala version I'm using is `2.13.9`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40726) Supplement undocumented orc configurations in documentation

2022-10-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40726.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38188
[https://github.com/apache/spark/pull/38188]

> Supplement undocumented orc configurations in documentation
> ---
>
> Key: SPARK-40726
> URL: https://issues.apache.org/jira/browse/SPARK-40726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40726) Supplement undocumented orc configurations in documentation

2022-10-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40726:


Assignee: Qian Sun

> Supplement undocumented orc configurations in documentation
> ---
>
> Key: SPARK-40726
> URL: https://issues.apache.org/jira/browse/SPARK-40726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40734) KafkaMicroBatchSourceSuite failed

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40734:


 Summary: KafkaMicroBatchSourceSuite failed
 Key: SPARK-40734
 URL: https://issues.apache.org/jira/browse/SPARK-40734
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Yang Jie


"ensure stream-stream self-join generates only one offset in log and correct 
metrics" failed

Failure reason to be supplemented



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40733) ShowCreateTableSuite test failed

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40733:


 Summary: ShowCreateTableSuite test failed
 Key: SPARK-40733
 URL: https://issues.apache.org/jira/browse/SPARK-40733
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


* SHOW CREATE TABLE using Hive V1 catalog V1 command: hive table with serde 
info *** FAILED ***
 * - SHOW CREATE TABLE using Hive V1 catalog V2 command: hive table with serde 
info *** FAILED ***

Failure reason to be supplemented

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40732) Floating point precision changes

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40732:


 Summary: Floating point precision changes
 Key: SPARK-40732
 URL: https://issues.apache.org/jira/browse/SPARK-40732
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


Some case in SQLQueryTestSuite(sql/core) and 
ThriftServerQueryTestSuite(sql/hive-thriftserver) failed for this reason:

for example:

 
{code:java}
SQLQueryTestSuite- 

try_aggregates.sql *** FAILED ***
  try_aggregates.sql
  Expected "4.61168601842738[79]E18", but got "4.61168601842738[8]E18" Result 
did not match for query #20
  SELECT try_avg(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col) 
(SQLQueryTestSuite.scala:495) 

{code}
{code:java}
ThriftServerQueryTestSuite- try_aggregates.sql *** FAILED ***
  Expected "4.61168601842738[79]E18", but got "4.61168601842738[8]E18" Result 
did not match for query #20
  SELECT try_avg(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col) 
(ThriftServerQueryTestSuite.scala:222)- try_arithmetic.sql *** FAILED ***
  Expected "-4.65661287307739[26]E-10", but got "-4.65661287307739[3]E-10" 
Result did not match for query #26
  SELECT try_divide(1, (2147483647 + 1)) 
(ThriftServerQueryTestSuite.scala:222)- datetime-formatting.sql *** FAILED ***
  Expected "...-05-31 19:40:35.123  [3
  1969-12-31 15:00:00 3
  1970-12-31 04:59:59.999 3
  1996-03-31 07:03:33.123 3
  2018-11-17 05:33:33.123 3
  2019-12-31 09:33:33.123 3]
  2100-01-01 01:33:33...", but got "...-05-31 19:40:35.123  [5
  1969-12-31 15:00:00 5
  1970-12-31 04:59:59.999 5
  1996-03-31 07:03:33.123 5
  2018-11-17 05:33:33.123 3
  2019-12-31 09:33:33.123 5]
  2100-01-01 01:33:33..." Result did not match for query #8
  select col, date_format(col, 'F') from v 
(ThriftServerQueryTestSuite.scala:222) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40731) Sealed class cannot be mock by mockito

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40731:


 Summary: Sealed class cannot be mock by mockito
 Key: SPARK-40731
 URL: https://issues.apache.org/jira/browse/SPARK-40731
 Project: Spark
  Issue Type: Sub-task
  Components: DStreams
Affects Versions: 3.4.0
Reporter: Yang Jie


3 test case in WriteAheadLogSuite failed for this reason 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40730) Java 19 related issues

2022-10-10 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40730:
-
Description: 
I run maven test with Java 19, there are some test failed,  whether it can be 
solved or not, record it, which will be helpful for upgrading the next LTS(Java 
21)

 

> Java 19 related issues
> --
>
> Key: SPARK-40730
> URL: https://issues.apache.org/jira/browse/SPARK-40730
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> I run maven test with Java 19, there are some test failed,  whether it can be 
> solved or not, record it, which will be helpful for upgrading the next 
> LTS(Java 21)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40729) Spark-shell run failed with Java 19

2022-10-10 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40729:
-
Parent: SPARK-40730
Issue Type: Sub-task  (was: Improvement)

> Spark-shell run failed with Java 19
> ---
>
> Key: SPARK-40729
> URL: https://issues.apache.org/jira/browse/SPARK-40729
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Shell
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
> Attempting port 4041.
> Spark context Web UI available at http://localhost:4041
> Spark context available as 'sc' (master = local, app id = 
> local-1665401880396).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
>       /_/
>          
> Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> var array = new Array[Int](5)
> val broadcastArray = sc.broadcast(array)
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> array(0) = 5
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> // Exiting paste mode, now interpreting.
> java.lang.InternalError: java.lang.IllegalAccessException: final field has no 
> write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class 
> java.lang.Object (module java.base)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
>   at 
> java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
>   at 
> java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
>   at 
> java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
>   at java.base/java.lang.reflect.Field.set(Field.java:820)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2491)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:413)
>   ... 43 elided
> Caused by: java.lang.IllegalAccessException: final field has no write access: 
> $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object 
> (module java.base)
>   at 
> java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
>   at 
> java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
>   ... 55 more
> scala>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40730) Java 19 related issues

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40730:


 Summary: Java 19 related issues
 Key: SPARK-40730
 URL: https://issues.apache.org/jira/browse/SPARK-40730
 Project: Spark
  Issue Type: Umbrella
  Components: Spark Core, SQL
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40729) Spark-shell run failed with Java 19

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615093#comment-17615093
 ] 

Apache Spark commented on SPARK-40729:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38190

> Spark-shell run failed with Java 19
> ---
>
> Key: SPARK-40729
> URL: https://issues.apache.org/jira/browse/SPARK-40729
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
> Attempting port 4041.
> Spark context Web UI available at http://localhost:4041
> Spark context available as 'sc' (master = local, app id = 
> local-1665401880396).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
>       /_/
>          
> Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> var array = new Array[Int](5)
> val broadcastArray = sc.broadcast(array)
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> array(0) = 5
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> // Exiting paste mode, now interpreting.
> java.lang.InternalError: java.lang.IllegalAccessException: final field has no 
> write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class 
> java.lang.Object (module java.base)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
>   at 
> java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
>   at 
> java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
>   at 
> java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
>   at java.base/java.lang.reflect.Field.set(Field.java:820)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2491)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:413)
>   ... 43 elided
> Caused by: java.lang.IllegalAccessException: final field has no write access: 
> $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object 
> (module java.base)
>   at 
> java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
>   at 
> java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
>   ... 55 more
> scala>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40729) Spark-shell run failed with Java 19

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40729:


Assignee: Apache Spark

> Spark-shell run failed with Java 19
> ---
>
> Key: SPARK-40729
> URL: https://issues.apache.org/jira/browse/SPARK-40729
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
> Attempting port 4041.
> Spark context Web UI available at http://localhost:4041
> Spark context available as 'sc' (master = local, app id = 
> local-1665401880396).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
>       /_/
>          
> Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> var array = new Array[Int](5)
> val broadcastArray = sc.broadcast(array)
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> array(0) = 5
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> // Exiting paste mode, now interpreting.
> java.lang.InternalError: java.lang.IllegalAccessException: final field has no 
> write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class 
> java.lang.Object (module java.base)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
>   at 
> java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
>   at 
> java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
>   at 
> java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
>   at java.base/java.lang.reflect.Field.set(Field.java:820)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2491)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:413)
>   ... 43 elided
> Caused by: java.lang.IllegalAccessException: final field has no write access: 
> $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object 
> (module java.base)
>   at 
> java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
>   at 
> java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
>   ... 55 more
> scala>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40729) Spark-shell run failed with Java 19

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40729:


Assignee: (was: Apache Spark)

> Spark-shell run failed with Java 19
> ---
>
> Key: SPARK-40729
> URL: https://issues.apache.org/jira/browse/SPARK-40729
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
> Attempting port 4041.
> Spark context Web UI available at http://localhost:4041
> Spark context available as 'sc' (master = local, app id = 
> local-1665401880396).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
>       /_/
>          
> Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> var array = new Array[Int](5)
> val broadcastArray = sc.broadcast(array)
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> array(0) = 5
> sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
> // Exiting paste mode, now interpreting.
> java.lang.InternalError: java.lang.IllegalAccessException: final field has no 
> write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class 
> java.lang.Object (module java.base)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
>   at 
> java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
>   at 
> java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
>   at 
> java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
>   at java.base/java.lang.reflect.Field.set(Field.java:820)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2491)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:413)
>   ... 43 elided
> Caused by: java.lang.IllegalAccessException: final field has no write access: 
> $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object 
> (module java.base)
>   at 
> java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
>   at 
> java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
>   at 
> java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
>   at 
> java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
>   ... 55 more
> scala>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36681) Fail to load Snappy codec

2022-10-10 Thread icyjhl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615087#comment-17615087
 ] 

icyjhl commented on SPARK-36681:


Hi [~viirya], So this is only fixed in 3.3.0 and after?
Any workaround in 3.2?

Many Thanks!


> Fail to load Snappy codec
> -
>
> Key: SPARK-36681
> URL: https://issues.apache.org/jira/browse/SPARK-36681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> snappy-java as a native library should not be relocated in Hadoop shaded 
> client libraries. Currently we use Hadoop shaded client libraries in Spark. 
> If trying to use SnappyCodec to write sequence file, we will encounter the 
> following error:
> {code}
> [info]   Cause: java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native 
> Method)   
>   
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151)   
>   
>
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282)
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589)
>  
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40727.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 1
[https://github.com/apache/spark-docker/pull/1]

> Add merge_spark_docker_pr.py to help merge commit
> -
>
> Key: SPARK-40727
> URL: https://issues.apache.org/jira/browse/SPARK-40727
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit

2022-10-10 Thread Yikun Jiang (Jira)


[ https://issues.apache.org/jira/browse/SPARK-40727 ]


Yikun Jiang deleted comment on SPARK-40727:
-

was (Author: yikunkero):
Issue resolved by pull request 1
[https://github.com/apache/spark-docker/pull/1]

> Add merge_spark_docker_pr.py to help merge commit
> -
>
> Key: SPARK-40727
> URL: https://issues.apache.org/jira/browse/SPARK-40727
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-40727:

Fix Version/s: (was: 3.4.0)

> Add merge_spark_docker_pr.py to help merge commit
> -
>
> Key: SPARK-40727
> URL: https://issues.apache.org/jira/browse/SPARK-40727
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reopened SPARK-40727:
-

> Add merge_spark_docker_pr.py to help merge commit
> -
>
> Key: SPARK-40727
> URL: https://issues.apache.org/jira/browse/SPARK-40727
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40727.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 1
[https://github.com/apache/spark-docker/pull/1]

> Add merge_spark_docker_pr.py to help merge commit
> -
>
> Key: SPARK-40727
> URL: https://issues.apache.org/jira/browse/SPARK-40727
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40729) Spark-shell run failed with Java 19

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40729:


 Summary: Spark-shell run failed with Java 19
 Key: SPARK-40729
 URL: https://issues.apache.org/jira/browse/SPARK-40729
 Project: Spark
  Issue Type: Improvement
  Components: Spark Shell
Affects Versions: 3.4.0
Reporter: Yang Jie


{code:java}
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.
Spark context Web UI available at http://localhost:4041
Spark context available as 'sc' (master = local, app id = local-1665401880396).
Spark session available as 'spark'.
Welcome to
                    __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
      /_/
         
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19)
Type in expressions to have them evaluated.
Type :help for more information.


scala> :paste
// Entering paste mode (ctrl-D to finish)


var array = new Array[Int](5)
val broadcastArray = sc.broadcast(array)
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
array(0) = 5
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()


// Exiting paste mode, now interpreting.


java.lang.InternalError: java.lang.IllegalAccessException: final field has no 
write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class 
java.lang.Object (module java.base)
  at 
java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
  at 
java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
  at 
java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
  at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
  at java.base/java.lang.reflect.Field.set(Field.java:820)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2491)
  at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
  at org.apache.spark.rdd.RDD.map(RDD.scala:413)
  ... 43 elided
Caused by: java.lang.IllegalAccessException: final field has no write access: 
$Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object 
(module java.base)
  at 
java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
  at 
java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
  at 
java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
  at 
java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
  at 
java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
  ... 55 more


scala>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40728) Upgrade ASM to 9.4

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40728:


Assignee: Apache Spark

> Upgrade ASM to 9.4
> --
>
> Key: SPARK-40728
> URL: https://issues.apache.org/jira/browse/SPARK-40728
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40728) Upgrade ASM to 9.4

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40728:


Assignee: (was: Apache Spark)

> Upgrade ASM to 9.4
> --
>
> Key: SPARK-40728
> URL: https://issues.apache.org/jira/browse/SPARK-40728
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40728) Upgrade ASM to 9.4

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615076#comment-17615076
 ] 

Apache Spark commented on SPARK-40728:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38189

> Upgrade ASM to 9.4
> --
>
> Key: SPARK-40728
> URL: https://issues.apache.org/jira/browse/SPARK-40728
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40728) Upgrade ASM to 9.4

2022-10-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-40728:


 Summary: Upgrade ASM to 9.4
 Key: SPARK-40728
 URL: https://issues.apache.org/jira/browse/SPARK-40728
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit

2022-10-10 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-40727:
---

 Summary: Add merge_spark_docker_pr.py to help merge commit
 Key: SPARK-40727
 URL: https://issues.apache.org/jira/browse/SPARK-40727
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Yikun Jiang
Assignee: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40726) Supplement undocumented orc configurations in documentation

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40726:


Assignee: (was: Apache Spark)

> Supplement undocumented orc configurations in documentation
> ---
>
> Key: SPARK-40726
> URL: https://issues.apache.org/jira/browse/SPARK-40726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40726) Supplement undocumented orc configurations in documentation

2022-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40726:


Assignee: Apache Spark

> Supplement undocumented orc configurations in documentation
> ---
>
> Key: SPARK-40726
> URL: https://issues.apache.org/jira/browse/SPARK-40726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40726) Supplement undocumented orc configurations in documentation

2022-10-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615003#comment-17615003
 ] 

Apache Spark commented on SPARK-40726:
--

User 'dcoliversun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38188

> Supplement undocumented orc configurations in documentation
> ---
>
> Key: SPARK-40726
> URL: https://issues.apache.org/jira/browse/SPARK-40726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40725) Add mypy-protobuf to requirements

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reassigned SPARK-40725:
---

Assignee: Ruifeng Zheng

> Add mypy-protobuf to requirements
> -
>
> Key: SPARK-40725
> URL: https://issues.apache.org/jira/browse/SPARK-40725
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40725) Add mypy-protobuf to requirements

2022-10-10 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40725.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38186
[https://github.com/apache/spark/pull/38186]

> Add mypy-protobuf to requirements
> -
>
> Key: SPARK-40725
> URL: https://issues.apache.org/jira/browse/SPARK-40725
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40724) Simplify `corr` with method `inline`

2022-10-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40724:
-

Assignee: Ruifeng Zheng

> Simplify `corr` with method `inline`
> 
>
> Key: SPARK-40724
> URL: https://issues.apache.org/jira/browse/SPARK-40724
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >