[jira] [Assigned] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40358: Assignee: Shaokang Lv > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Shaokang Lv >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40358. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38197 [https://github.com/apache/spark/pull/38197] > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Shaokang Lv >Priority: Major > Fix For: 3.4.0 > > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615459#comment-17615459 ] Max Gekk commented on SPARK-37945: -- [~khalidmammad...@gmail.com] Sure, go ahead. > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40707) Add groupby to connect DSL and test more than one grouping expressions
[ https://issues.apache.org/jira/browse/SPARK-40707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-40707: --- Assignee: Rui Wang > Add groupby to connect DSL and test more than one grouping expressions > -- > > Key: SPARK-40707 > URL: https://issues.apache.org/jira/browse/SPARK-40707 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40707) Add groupby to connect DSL and test more than one grouping expressions
[ https://issues.apache.org/jira/browse/SPARK-40707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40707. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38155 [https://github.com/apache/spark/pull/38155] > Add groupby to connect DSL and test more than one grouping expressions > -- > > Key: SPARK-40707 > URL: https://issues.apache.org/jira/browse/SPARK-40707 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40722) How to set BlockManager info of hostname as ipaddress
[ https://issues.apache.org/jira/browse/SPARK-40722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615450#comment-17615450 ] Chen Xia commented on SPARK-40722: -- I know it can be used spark.driver.host to set this configuration > How to set BlockManager info of hostname as ipaddress > -- > > Key: SPARK-40722 > URL: https://issues.apache.org/jira/browse/SPARK-40722 > Project: Spark > Issue Type: Question > Components: Block Manager >Affects Versions: 2.4.3 >Reporter: Chen Xia >Priority: Major > > > {code:java} > 2022-10-09 17:22:42.517 [INFO ] [YARN application state monitor ] > o.a.s.u.SparkUI (54) [logInfo] - Stopped Spark web UI at > http://linkis-demo-cg-engineconnmanager-76778ff4b5-sf9xz.linkis-demo-cg-engineconnmanager-headless.linkis.svc.cluster.local:4040 > 2022-10-09 17:46:09.854 [INFO ] [main ] > o.a.s.s.BlockManager (54) [logInfo] - Initialized BlockManager: > BlockManagerId(driver, > linkis-demo-cg-engineconnmanager-76778ff4b5-sf9xz.linkis-demo-cg-engineconnmanager-headless.linkis.svc.cluster.local, > 38798, None) > {code} > I want to repleace > canonicalHostName(linkis-demo-cg-engineconnmanager-76778ff4b5-sf9xz.linkis-demo-cg-engineconnmanager-headless.linkis.svc.cluster.loca) > with ipaddress such as 10.10.10.10 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support
[ https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615440#comment-17615440 ] Mohan Parthasarathy commented on SPARK-40658: - [~sanysand...@gmail.com] # I wanted to write about "required" fields (that exists only in proto2) and wrongly mentioned Optional fields. When deserializing, we already check for "required" field. When serializing and row is null, the current code is a bit complex for me to understand as to what happens when we encounter a "required" field. Also, there is also one more place in the current code we check for "required" field in "structFieldFor" which sets nullable. We need some test cases with proto2 messages. # Custom default values should not affect the current logic ? # how does this affect the current logic ? It should be transparent to us, right ? # Currently we assume UTF8 in the code, right ? Would that fail if we receive a proto2 message ? # Yes, i ran some basic tests by converting the current tests with proto2 messages and tests pass If we can get away without specifying V2 or V3 or ANY, that would be the simplest. > Protobuf v2 & v3 support > > > Key: SPARK-40658 > URL: https://issues.apache.org/jira/browse/SPARK-40658 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Raghu Angadi >Priority: Major > > We want to ensure Protobuf functions support both Protobuf version 2 and > version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session
[ https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615437#comment-17615437 ] Apache Spark commented on SPARK-40739: -- User 'philwalk' has created a pull request for this issue: https://github.com/apache/spark/pull/38167 > "sbt packageBin" fails in cygwin or other windows bash session > -- > > Key: SPARK-40739 > URL: https://issues.apache.org/jira/browse/SPARK-40739 > Project: Spark > Issue Type: Bug > Components: Build, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_sbt_* is started from > a (non-WSL) bash session. > See the spark PR link for detailed symptoms. >Reporter: Phil Walker >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > In a Windows _*SHELL*_ environment, such as _*cygwin*_ or > {_}*msys2/mingw64*{_}, etc, _*Core.settings*_ in > _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is > present (typically at {_}*C:\Windows*{_}), causing a build failure. This > occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ > bash.exe. > This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167] > There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ : > * determine the absolute path of the first bash.exe in the command line. > * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.) > * For Windows SHELL environments, the first argument to the spawned Process > is changed from "bash" to the absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session
[ https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40738: Assignee: (was: Apache Spark) > spark-shell fails with "bad array subscript" in cygwin or msys bash session > --- > > Key: SPARK-40738 > URL: https://issues.apache.org/jira/browse/SPARK-40738 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_spark-shell_* is > called from a bash session. > NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since > they call spark-shell. >Reporter: Phil Walker >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] > fixes this issue, and also fixes a build error that is also related to > _*cygwin*_ and *msys/mingw* bash *sbt* sessions. > If a Windows user tries to start a *_spark-shell_* session by calling the > bash script (rather than the *_spark-shell.cmd_* script), it fails with a > confusing error message. Script _*spark-class*_ calls > _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate > command line arguments, but the launcher produces a format appropriate to the > *_.cmd_* version of the script rather than the _*bash*_ version. > The launcher Main method, when called for environments other than Windows, > interleaves NULL characters between the command line arguments. It should > also do so in Windows when called from the bash script. It incorrectly > assumes that if the OS is Windows, that it is being called by the .cmd > version of the script. > The resulting error message is unhelpful: > > {code:java} > [lots of ugly stuff omitted] > /opt/spark/bin/spark-class: line 100: CMD: bad array subscript > {code} > The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ > session is that the _*SHELL*_ environment variable is set. This will > normally be set in any of the various Windows shell environments > ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally > be set in Windows environments. In the _*spark-class.cmd*_ script, > _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users > to call the _*.cmd*_ scripts if they prefer (it will still work as before). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session
[ https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40738: Assignee: Apache Spark > spark-shell fails with "bad array subscript" in cygwin or msys bash session > --- > > Key: SPARK-40738 > URL: https://issues.apache.org/jira/browse/SPARK-40738 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_spark-shell_* is > called from a bash session. > NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since > they call spark-shell. >Reporter: Phil Walker >Assignee: Apache Spark >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] > fixes this issue, and also fixes a build error that is also related to > _*cygwin*_ and *msys/mingw* bash *sbt* sessions. > If a Windows user tries to start a *_spark-shell_* session by calling the > bash script (rather than the *_spark-shell.cmd_* script), it fails with a > confusing error message. Script _*spark-class*_ calls > _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate > command line arguments, but the launcher produces a format appropriate to the > *_.cmd_* version of the script rather than the _*bash*_ version. > The launcher Main method, when called for environments other than Windows, > interleaves NULL characters between the command line arguments. It should > also do so in Windows when called from the bash script. It incorrectly > assumes that if the OS is Windows, that it is being called by the .cmd > version of the script. > The resulting error message is unhelpful: > > {code:java} > [lots of ugly stuff omitted] > /opt/spark/bin/spark-class: line 100: CMD: bad array subscript > {code} > The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ > session is that the _*SHELL*_ environment variable is set. This will > normally be set in any of the various Windows shell environments > ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally > be set in Windows environments. In the _*spark-class.cmd*_ script, > _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users > to call the _*.cmd*_ scripts if they prefer (it will still work as before). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session
[ https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615439#comment-17615439 ] Apache Spark commented on SPARK-40738: -- User 'philwalk' has created a pull request for this issue: https://github.com/apache/spark/pull/38167 > spark-shell fails with "bad array subscript" in cygwin or msys bash session > --- > > Key: SPARK-40738 > URL: https://issues.apache.org/jira/browse/SPARK-40738 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_spark-shell_* is > called from a bash session. > NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since > they call spark-shell. >Reporter: Phil Walker >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] > fixes this issue, and also fixes a build error that is also related to > _*cygwin*_ and *msys/mingw* bash *sbt* sessions. > If a Windows user tries to start a *_spark-shell_* session by calling the > bash script (rather than the *_spark-shell.cmd_* script), it fails with a > confusing error message. Script _*spark-class*_ calls > _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate > command line arguments, but the launcher produces a format appropriate to the > *_.cmd_* version of the script rather than the _*bash*_ version. > The launcher Main method, when called for environments other than Windows, > interleaves NULL characters between the command line arguments. It should > also do so in Windows when called from the bash script. It incorrectly > assumes that if the OS is Windows, that it is being called by the .cmd > version of the script. > The resulting error message is unhelpful: > > {code:java} > [lots of ugly stuff omitted] > /opt/spark/bin/spark-class: line 100: CMD: bad array subscript > {code} > The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ > session is that the _*SHELL*_ environment variable is set. This will > normally be set in any of the various Windows shell environments > ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally > be set in Windows environments. In the _*spark-class.cmd*_ script, > _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users > to call the _*.cmd*_ scripts if they prefer (it will still work as before). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session
[ https://issues.apache.org/jira/browse/SPARK-40738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615438#comment-17615438 ] Apache Spark commented on SPARK-40738: -- User 'philwalk' has created a pull request for this issue: https://github.com/apache/spark/pull/38167 > spark-shell fails with "bad array subscript" in cygwin or msys bash session > --- > > Key: SPARK-40738 > URL: https://issues.apache.org/jira/browse/SPARK-40738 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_spark-shell_* is > called from a bash session. > NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since > they call spark-shell. >Reporter: Phil Walker >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] > fixes this issue, and also fixes a build error that is also related to > _*cygwin*_ and *msys/mingw* bash *sbt* sessions. > If a Windows user tries to start a *_spark-shell_* session by calling the > bash script (rather than the *_spark-shell.cmd_* script), it fails with a > confusing error message. Script _*spark-class*_ calls > _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate > command line arguments, but the launcher produces a format appropriate to the > *_.cmd_* version of the script rather than the _*bash*_ version. > The launcher Main method, when called for environments other than Windows, > interleaves NULL characters between the command line arguments. It should > also do so in Windows when called from the bash script. It incorrectly > assumes that if the OS is Windows, that it is being called by the .cmd > version of the script. > The resulting error message is unhelpful: > > {code:java} > [lots of ugly stuff omitted] > /opt/spark/bin/spark-class: line 100: CMD: bad array subscript > {code} > The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ > session is that the _*SHELL*_ environment variable is set. This will > normally be set in any of the various Windows shell environments > ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally > be set in Windows environments. In the _*spark-class.cmd*_ script, > _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users > to call the _*.cmd*_ scripts if they prefer (it will still work as before). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session
[ https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40739: Assignee: (was: Apache Spark) > "sbt packageBin" fails in cygwin or other windows bash session > -- > > Key: SPARK-40739 > URL: https://issues.apache.org/jira/browse/SPARK-40739 > Project: Spark > Issue Type: Bug > Components: Build, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_sbt_* is started from > a (non-WSL) bash session. > See the spark PR link for detailed symptoms. >Reporter: Phil Walker >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > In a Windows _*SHELL*_ environment, such as _*cygwin*_ or > {_}*msys2/mingw64*{_}, etc, _*Core.settings*_ in > _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is > present (typically at {_}*C:\Windows*{_}), causing a build failure. This > occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ > bash.exe. > This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167] > There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ : > * determine the absolute path of the first bash.exe in the command line. > * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.) > * For Windows SHELL environments, the first argument to the spawned Process > is changed from "bash" to the absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session
[ https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40739: Assignee: Apache Spark > "sbt packageBin" fails in cygwin or other windows bash session > -- > > Key: SPARK-40739 > URL: https://issues.apache.org/jira/browse/SPARK-40739 > Project: Spark > Issue Type: Bug > Components: Build, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_sbt_* is started from > a (non-WSL) bash session. > See the spark PR link for detailed symptoms. >Reporter: Phil Walker >Assignee: Apache Spark >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > In a Windows _*SHELL*_ environment, such as _*cygwin*_ or > {_}*msys2/mingw64*{_}, etc, _*Core.settings*_ in > _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is > present (typically at {_}*C:\Windows*{_}), causing a build failure. This > occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ > bash.exe. > This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167] > There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ : > * determine the absolute path of the first bash.exe in the command line. > * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.) > * For Windows SHELL environments, the first argument to the spawned Process > is changed from "bash" to the absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session
[ https://issues.apache.org/jira/browse/SPARK-40739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615436#comment-17615436 ] Apache Spark commented on SPARK-40739: -- User 'philwalk' has created a pull request for this issue: https://github.com/apache/spark/pull/38167 > "sbt packageBin" fails in cygwin or other windows bash session > -- > > Key: SPARK-40739 > URL: https://issues.apache.org/jira/browse/SPARK-40739 > Project: Spark > Issue Type: Bug > Components: Build, Windows >Affects Versions: 3.3.0 > Environment: The problem occurs in Windows if *_sbt_* is started from > a (non-WSL) bash session. > See the spark PR link for detailed symptoms. >Reporter: Phil Walker >Priority: Major > Labels: bash, cygwin, mingw, msys2,, windows > Original Estimate: 0h > Remaining Estimate: 0h > > In a Windows _*SHELL*_ environment, such as _*cygwin*_ or > {_}*msys2/mingw64*{_}, etc, _*Core.settings*_ in > _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is > present (typically at {_}*C:\Windows*{_}), causing a build failure. This > occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ > bash.exe. > This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167] > There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ : > * determine the absolute path of the first bash.exe in the command line. > * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.) > * For Windows SHELL environments, the first argument to the spawned Process > is changed from "bash" to the absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40742) Java compilation warnings related to generic type
[ https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615430#comment-17615430 ] Apache Spark commented on SPARK-40742: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38198 > Java compilation warnings related to generic type > - > > Key: SPARK-40742 > URL: https://issues.apache.org/jira/browse/SPARK-40742 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > 2022-10-08T01:43:33.6487078Z > /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: > warning: [rawtypes] found raw type: HashMap > 2022-10-08T01:43:33.6487456Z return new HashMap(); > 2022-10-08T01:43:33.6487682Z ^ > 2022-10-08T01:43:33.6487957Z missing type arguments for generic class > HashMap > 2022-10-08T01:43:33.6488617Z where K,V are type-variables: > 2022-10-08T01:43:33.6488911Z K extends Object declared in class HashMap > 2022-10-08T01:43:33.6489211Z V extends Object declared in class > HashMap2022-10-08T01:50:21.5951932Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: > warning: [rawtypes] found raw type: Map > 2022-10-08T01:50:21.593Z createPartitions(new InternalRow[]{ident}, > new Map[]{properties}); > 2022-10-08T01:50:21.6000343Z > ^ > 2022-10-08T01:50:21.6000642Z missing type arguments for generic class > Map > 2022-10-08T01:50:21.6001272Z where K,V are type-variables: > 2022-10-08T01:50:21.6001569Z K extends Object declared in interface Map > 2022-10-08T01:50:21.6002109Z V extends Object declared in interface > Map2022-10-08T01:50:21.6006655Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: > warning: [rawtypes] found raw type: Literal > 2022-10-08T01:50:21.6007121Z protected String visitLiteral(Literal literal) > { > 2022-10-08T01:50:21.6007395Z ^ > 2022-10-08T01:50:21.6007673Z missing type arguments for generic class > Literal > 2022-10-08T01:50:21.6008032Z where T is a type-variable: > 2022-10-08T01:50:21.6008324Z T extends Object declared in interface > Literal2022-10-08T01:50:21.6008785Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: > warning: [rawtypes] found raw type: Comparable > 2022-10-08T01:50:21.6009223Z public static class Coord implements > Comparable { > 2022-10-08T01:50:21.6009503Z ^ > 2022-10-08T01:50:21.6009791Z missing type arguments for generic class > Comparable > 2022-10-08T01:50:21.6010137Z where T is a type-variable: > 2022-10-08T01:50:21.6010433Z T extends Object declared in interface > Comparable > 2022-10-08T01:50:21.6010976Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: > warning: [unchecked] unchecked method invocation: method sort in class > Collections is applied to given types > 2022-10-08T01:50:21.6011474Z Collections.sort(tmp_bins); > 2022-10-08T01:50:21.6011714Z ^ > 2022-10-08T01:50:21.6012050Z required: List > 2022-10-08T01:50:21.6012296Z found: ArrayList > 2022-10-08T01:50:21.6012604Z where T is a type-variable: > 2022-10-08T01:50:21.6012926Z T extends Comparable declared in > method sort(List)2022-10-08T02:13:38.0769617Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: > warning: [rawtypes] found raw type: AbstractWriterAppender > 2022-10-08T02:13:38.0770287Z AbstractWriterAppender ap = new > LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode)); > 2022-10-08T02:13:38.0770645Z ^ > 2022-10-08T02:13:38.0770947Z missing type arguments for generic class > AbstractWriterAppender > 2022-10-08T02:13:38.0771330Z where M is a type-variable: > 2022-10-08T02:13:38.0771665Z M extends WriterManager declared in class > AbstractWriterAppender2022-10-08T02:13:38.0774487Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: > warning: [rawtypes] found raw type: Layout > 2022-10-08T02:13:38.0774940Z Layout l = ap.getLayout(); > 2022-10-08T02:13:38.0775173Z ^ > 2022-10-08T02:13:38.0775441Z missing type arguments for generic class > Layout > 2022-10-08T02:13:38.0775849Z where T is a type-variable: >
[jira] [Assigned] (SPARK-40742) Java compilation warnings related to generic type
[ https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40742: Assignee: (was: Apache Spark) > Java compilation warnings related to generic type > - > > Key: SPARK-40742 > URL: https://issues.apache.org/jira/browse/SPARK-40742 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > 2022-10-08T01:43:33.6487078Z > /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: > warning: [rawtypes] found raw type: HashMap > 2022-10-08T01:43:33.6487456Z return new HashMap(); > 2022-10-08T01:43:33.6487682Z ^ > 2022-10-08T01:43:33.6487957Z missing type arguments for generic class > HashMap > 2022-10-08T01:43:33.6488617Z where K,V are type-variables: > 2022-10-08T01:43:33.6488911Z K extends Object declared in class HashMap > 2022-10-08T01:43:33.6489211Z V extends Object declared in class > HashMap2022-10-08T01:50:21.5951932Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: > warning: [rawtypes] found raw type: Map > 2022-10-08T01:50:21.593Z createPartitions(new InternalRow[]{ident}, > new Map[]{properties}); > 2022-10-08T01:50:21.6000343Z > ^ > 2022-10-08T01:50:21.6000642Z missing type arguments for generic class > Map > 2022-10-08T01:50:21.6001272Z where K,V are type-variables: > 2022-10-08T01:50:21.6001569Z K extends Object declared in interface Map > 2022-10-08T01:50:21.6002109Z V extends Object declared in interface > Map2022-10-08T01:50:21.6006655Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: > warning: [rawtypes] found raw type: Literal > 2022-10-08T01:50:21.6007121Z protected String visitLiteral(Literal literal) > { > 2022-10-08T01:50:21.6007395Z ^ > 2022-10-08T01:50:21.6007673Z missing type arguments for generic class > Literal > 2022-10-08T01:50:21.6008032Z where T is a type-variable: > 2022-10-08T01:50:21.6008324Z T extends Object declared in interface > Literal2022-10-08T01:50:21.6008785Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: > warning: [rawtypes] found raw type: Comparable > 2022-10-08T01:50:21.6009223Z public static class Coord implements > Comparable { > 2022-10-08T01:50:21.6009503Z ^ > 2022-10-08T01:50:21.6009791Z missing type arguments for generic class > Comparable > 2022-10-08T01:50:21.6010137Z where T is a type-variable: > 2022-10-08T01:50:21.6010433Z T extends Object declared in interface > Comparable > 2022-10-08T01:50:21.6010976Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: > warning: [unchecked] unchecked method invocation: method sort in class > Collections is applied to given types > 2022-10-08T01:50:21.6011474Z Collections.sort(tmp_bins); > 2022-10-08T01:50:21.6011714Z ^ > 2022-10-08T01:50:21.6012050Z required: List > 2022-10-08T01:50:21.6012296Z found: ArrayList > 2022-10-08T01:50:21.6012604Z where T is a type-variable: > 2022-10-08T01:50:21.6012926Z T extends Comparable declared in > method sort(List)2022-10-08T02:13:38.0769617Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: > warning: [rawtypes] found raw type: AbstractWriterAppender > 2022-10-08T02:13:38.0770287Z AbstractWriterAppender ap = new > LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode)); > 2022-10-08T02:13:38.0770645Z ^ > 2022-10-08T02:13:38.0770947Z missing type arguments for generic class > AbstractWriterAppender > 2022-10-08T02:13:38.0771330Z where M is a type-variable: > 2022-10-08T02:13:38.0771665Z M extends WriterManager declared in class > AbstractWriterAppender2022-10-08T02:13:38.0774487Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: > warning: [rawtypes] found raw type: Layout > 2022-10-08T02:13:38.0774940Z Layout l = ap.getLayout(); > 2022-10-08T02:13:38.0775173Z ^ > 2022-10-08T02:13:38.0775441Z missing type arguments for generic class > Layout > 2022-10-08T02:13:38.0775849Z where T is a type-variable: > 2022-10-08T02:13:38.0776359Z T extends Serializable declared in interface >
[jira] [Commented] (SPARK-40742) Java compilation warnings related to generic type
[ https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615428#comment-17615428 ] Apache Spark commented on SPARK-40742: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38198 > Java compilation warnings related to generic type > - > > Key: SPARK-40742 > URL: https://issues.apache.org/jira/browse/SPARK-40742 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > 2022-10-08T01:43:33.6487078Z > /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: > warning: [rawtypes] found raw type: HashMap > 2022-10-08T01:43:33.6487456Z return new HashMap(); > 2022-10-08T01:43:33.6487682Z ^ > 2022-10-08T01:43:33.6487957Z missing type arguments for generic class > HashMap > 2022-10-08T01:43:33.6488617Z where K,V are type-variables: > 2022-10-08T01:43:33.6488911Z K extends Object declared in class HashMap > 2022-10-08T01:43:33.6489211Z V extends Object declared in class > HashMap2022-10-08T01:50:21.5951932Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: > warning: [rawtypes] found raw type: Map > 2022-10-08T01:50:21.593Z createPartitions(new InternalRow[]{ident}, > new Map[]{properties}); > 2022-10-08T01:50:21.6000343Z > ^ > 2022-10-08T01:50:21.6000642Z missing type arguments for generic class > Map > 2022-10-08T01:50:21.6001272Z where K,V are type-variables: > 2022-10-08T01:50:21.6001569Z K extends Object declared in interface Map > 2022-10-08T01:50:21.6002109Z V extends Object declared in interface > Map2022-10-08T01:50:21.6006655Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: > warning: [rawtypes] found raw type: Literal > 2022-10-08T01:50:21.6007121Z protected String visitLiteral(Literal literal) > { > 2022-10-08T01:50:21.6007395Z ^ > 2022-10-08T01:50:21.6007673Z missing type arguments for generic class > Literal > 2022-10-08T01:50:21.6008032Z where T is a type-variable: > 2022-10-08T01:50:21.6008324Z T extends Object declared in interface > Literal2022-10-08T01:50:21.6008785Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: > warning: [rawtypes] found raw type: Comparable > 2022-10-08T01:50:21.6009223Z public static class Coord implements > Comparable { > 2022-10-08T01:50:21.6009503Z ^ > 2022-10-08T01:50:21.6009791Z missing type arguments for generic class > Comparable > 2022-10-08T01:50:21.6010137Z where T is a type-variable: > 2022-10-08T01:50:21.6010433Z T extends Object declared in interface > Comparable > 2022-10-08T01:50:21.6010976Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: > warning: [unchecked] unchecked method invocation: method sort in class > Collections is applied to given types > 2022-10-08T01:50:21.6011474Z Collections.sort(tmp_bins); > 2022-10-08T01:50:21.6011714Z ^ > 2022-10-08T01:50:21.6012050Z required: List > 2022-10-08T01:50:21.6012296Z found: ArrayList > 2022-10-08T01:50:21.6012604Z where T is a type-variable: > 2022-10-08T01:50:21.6012926Z T extends Comparable declared in > method sort(List)2022-10-08T02:13:38.0769617Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: > warning: [rawtypes] found raw type: AbstractWriterAppender > 2022-10-08T02:13:38.0770287Z AbstractWriterAppender ap = new > LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode)); > 2022-10-08T02:13:38.0770645Z ^ > 2022-10-08T02:13:38.0770947Z missing type arguments for generic class > AbstractWriterAppender > 2022-10-08T02:13:38.0771330Z where M is a type-variable: > 2022-10-08T02:13:38.0771665Z M extends WriterManager declared in class > AbstractWriterAppender2022-10-08T02:13:38.0774487Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: > warning: [rawtypes] found raw type: Layout > 2022-10-08T02:13:38.0774940Z Layout l = ap.getLayout(); > 2022-10-08T02:13:38.0775173Z ^ > 2022-10-08T02:13:38.0775441Z missing type arguments for generic class > Layout > 2022-10-08T02:13:38.0775849Z where T is a type-variable: >
[jira] [Assigned] (SPARK-40742) Java compilation warnings related to generic type
[ https://issues.apache.org/jira/browse/SPARK-40742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40742: Assignee: Apache Spark > Java compilation warnings related to generic type > - > > Key: SPARK-40742 > URL: https://issues.apache.org/jira/browse/SPARK-40742 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > {code:java} > 2022-10-08T01:43:33.6487078Z > /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: > warning: [rawtypes] found raw type: HashMap > 2022-10-08T01:43:33.6487456Z return new HashMap(); > 2022-10-08T01:43:33.6487682Z ^ > 2022-10-08T01:43:33.6487957Z missing type arguments for generic class > HashMap > 2022-10-08T01:43:33.6488617Z where K,V are type-variables: > 2022-10-08T01:43:33.6488911Z K extends Object declared in class HashMap > 2022-10-08T01:43:33.6489211Z V extends Object declared in class > HashMap2022-10-08T01:50:21.5951932Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: > warning: [rawtypes] found raw type: Map > 2022-10-08T01:50:21.593Z createPartitions(new InternalRow[]{ident}, > new Map[]{properties}); > 2022-10-08T01:50:21.6000343Z > ^ > 2022-10-08T01:50:21.6000642Z missing type arguments for generic class > Map > 2022-10-08T01:50:21.6001272Z where K,V are type-variables: > 2022-10-08T01:50:21.6001569Z K extends Object declared in interface Map > 2022-10-08T01:50:21.6002109Z V extends Object declared in interface > Map2022-10-08T01:50:21.6006655Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: > warning: [rawtypes] found raw type: Literal > 2022-10-08T01:50:21.6007121Z protected String visitLiteral(Literal literal) > { > 2022-10-08T01:50:21.6007395Z ^ > 2022-10-08T01:50:21.6007673Z missing type arguments for generic class > Literal > 2022-10-08T01:50:21.6008032Z where T is a type-variable: > 2022-10-08T01:50:21.6008324Z T extends Object declared in interface > Literal2022-10-08T01:50:21.6008785Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: > warning: [rawtypes] found raw type: Comparable > 2022-10-08T01:50:21.6009223Z public static class Coord implements > Comparable { > 2022-10-08T01:50:21.6009503Z ^ > 2022-10-08T01:50:21.6009791Z missing type arguments for generic class > Comparable > 2022-10-08T01:50:21.6010137Z where T is a type-variable: > 2022-10-08T01:50:21.6010433Z T extends Object declared in interface > Comparable > 2022-10-08T01:50:21.6010976Z > /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: > warning: [unchecked] unchecked method invocation: method sort in class > Collections is applied to given types > 2022-10-08T01:50:21.6011474Z Collections.sort(tmp_bins); > 2022-10-08T01:50:21.6011714Z ^ > 2022-10-08T01:50:21.6012050Z required: List > 2022-10-08T01:50:21.6012296Z found: ArrayList > 2022-10-08T01:50:21.6012604Z where T is a type-variable: > 2022-10-08T01:50:21.6012926Z T extends Comparable declared in > method sort(List)2022-10-08T02:13:38.0769617Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: > warning: [rawtypes] found raw type: AbstractWriterAppender > 2022-10-08T02:13:38.0770287Z AbstractWriterAppender ap = new > LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode)); > 2022-10-08T02:13:38.0770645Z ^ > 2022-10-08T02:13:38.0770947Z missing type arguments for generic class > AbstractWriterAppender > 2022-10-08T02:13:38.0771330Z where M is a type-variable: > 2022-10-08T02:13:38.0771665Z M extends WriterManager declared in class > AbstractWriterAppender2022-10-08T02:13:38.0774487Z > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: > warning: [rawtypes] found raw type: Layout > 2022-10-08T02:13:38.0774940Z Layout l = ap.getLayout(); > 2022-10-08T02:13:38.0775173Z ^ > 2022-10-08T02:13:38.0775441Z missing type arguments for generic class > Layout > 2022-10-08T02:13:38.0775849Z where T is a type-variable: > 2022-10-08T02:13:38.0776359Z T extends Serializable declared in interface >
[jira] [Created] (SPARK-40742) Java compilation warnings related to generic type
Yang Jie created SPARK-40742: Summary: Java compilation warnings related to generic type Key: SPARK-40742 URL: https://issues.apache.org/jira/browse/SPARK-40742 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.4.0 Reporter: Yang Jie {code:java} 2022-10-08T01:43:33.6487078Z /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: warning: [rawtypes] found raw type: HashMap 2022-10-08T01:43:33.6487456Z return new HashMap(); 2022-10-08T01:43:33.6487682Z ^ 2022-10-08T01:43:33.6487957Z missing type arguments for generic class HashMap 2022-10-08T01:43:33.6488617Z where K,V are type-variables: 2022-10-08T01:43:33.6488911Z K extends Object declared in class HashMap 2022-10-08T01:43:33.6489211Z V extends Object declared in class HashMap2022-10-08T01:50:21.5951932Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: warning: [rawtypes] found raw type: Map 2022-10-08T01:50:21.593Z createPartitions(new InternalRow[]{ident}, new Map[]{properties}); 2022-10-08T01:50:21.6000343Z ^ 2022-10-08T01:50:21.6000642Z missing type arguments for generic class Map 2022-10-08T01:50:21.6001272Z where K,V are type-variables: 2022-10-08T01:50:21.6001569Z K extends Object declared in interface Map 2022-10-08T01:50:21.6002109Z V extends Object declared in interface Map2022-10-08T01:50:21.6006655Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: warning: [rawtypes] found raw type: Literal 2022-10-08T01:50:21.6007121Z protected String visitLiteral(Literal literal) { 2022-10-08T01:50:21.6007395Z ^ 2022-10-08T01:50:21.6007673Z missing type arguments for generic class Literal 2022-10-08T01:50:21.6008032Z where T is a type-variable: 2022-10-08T01:50:21.6008324Z T extends Object declared in interface Literal2022-10-08T01:50:21.6008785Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: warning: [rawtypes] found raw type: Comparable 2022-10-08T01:50:21.6009223Z public static class Coord implements Comparable { 2022-10-08T01:50:21.6009503Z ^ 2022-10-08T01:50:21.6009791Z missing type arguments for generic class Comparable 2022-10-08T01:50:21.6010137Z where T is a type-variable: 2022-10-08T01:50:21.6010433Z T extends Object declared in interface Comparable 2022-10-08T01:50:21.6010976Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: warning: [unchecked] unchecked method invocation: method sort in class Collections is applied to given types 2022-10-08T01:50:21.6011474Z Collections.sort(tmp_bins); 2022-10-08T01:50:21.6011714Z ^ 2022-10-08T01:50:21.6012050Z required: List 2022-10-08T01:50:21.6012296Z found: ArrayList 2022-10-08T01:50:21.6012604Z where T is a type-variable: 2022-10-08T01:50:21.6012926Z T extends Comparable declared in method sort(List)2022-10-08T02:13:38.0769617Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: warning: [rawtypes] found raw type: AbstractWriterAppender 2022-10-08T02:13:38.0770287Z AbstractWriterAppender ap = new LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode)); 2022-10-08T02:13:38.0770645Z ^ 2022-10-08T02:13:38.0770947Z missing type arguments for generic class AbstractWriterAppender 2022-10-08T02:13:38.0771330Z where M is a type-variable: 2022-10-08T02:13:38.0771665Z M extends WriterManager declared in class AbstractWriterAppender2022-10-08T02:13:38.0774487Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: warning: [rawtypes] found raw type: Layout 2022-10-08T02:13:38.0774940Z Layout l = ap.getLayout(); 2022-10-08T02:13:38.0775173Z ^ 2022-10-08T02:13:38.0775441Z missing type arguments for generic class Layout 2022-10-08T02:13:38.0775849Z where T is a type-variable: 2022-10-08T02:13:38.0776359Z T extends Serializable declared in interface Layout2022-10-08T02:19:55.0035795Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:17: [rawtypes] found raw type: SparkAvroKeyRecordWriter 2022-10-08T02:19:55.0037287Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:13: [unchecked] unchecked call to
[jira] [Resolved] (SPARK-40516) Add official image dockerfile for Spark v3.3.0
[ https://issues.apache.org/jira/browse/SPARK-40516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40516. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 2 [https://github.com/apache/spark-docker/pull/2] > Add official image dockerfile for Spark v3.3.0 > -- > > Key: SPARK-40516 > URL: https://issues.apache.org/jira/browse/SPARK-40516 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, PySpark, SparkR >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > > Example: [https://github.com/Yikun/spark-docker/tree/master/3.3.0] > Test: > https://github.com/Yikun/spark-docker/blob/master/.github/workflows/build_3.3.0.yaml > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40516) Add official image dockerfile for Spark v3.3.0
[ https://issues.apache.org/jira/browse/SPARK-40516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang reassigned SPARK-40516: --- Assignee: Yikun Jiang > Add official image dockerfile for Spark v3.3.0 > -- > > Key: SPARK-40516 > URL: https://issues.apache.org/jira/browse/SPARK-40516 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, PySpark, SparkR >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > > Example: [https://github.com/Yikun/spark-docker/tree/master/3.3.0] > Test: > https://github.com/Yikun/spark-docker/blob/master/.github/workflows/build_3.3.0.yaml > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40698) Improve the precision of `product` for intergral inputs
[ https://issues.apache.org/jira/browse/SPARK-40698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40698. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38148 [https://github.com/apache/spark/pull/38148] > Improve the precision of `product` for intergral inputs > --- > > Key: SPARK-40698 > URL: https://issues.apache.org/jira/browse/SPARK-40698 > Project: Spark > Issue Type: Sub-task > Components: ps, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40698) Improve the precision of `product` for intergral inputs
[ https://issues.apache.org/jira/browse/SPARK-40698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40698: Assignee: Ruifeng Zheng > Improve the precision of `product` for intergral inputs > --- > > Key: SPARK-40698 > URL: https://issues.apache.org/jira/browse/SPARK-40698 > Project: Spark > Issue Type: Sub-task > Components: ps, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40741) spark项目bin/beeline对于distribute by sort by语句支持不好,输出结果错误
kaiqingli created SPARK-40741: - Summary: spark项目bin/beeline对于distribute by sort by语句支持不好,输出结果错误 Key: SPARK-40741 URL: https://issues.apache.org/jira/browse/SPARK-40741 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Environment: spark 3.1 hive 3.0 Reporter: kaiqingli sql中使用distribute by ... sort by ...时,通过spark/bin/beeline执行的结果错误,使用hive/beeline输出结果正确,具体场景为,先基于posexplode拆分array数据,然后基于拆分的下标进行sort by,之后再collect list,结果与原始的array结果不一致,sql如下: select id, samplingtimesec, array_data = new_array_data flag, array_data, new_array_data from ( select id, samplingtimesec, array_data, concat('[', concat_ws(',', collect_list(cell_voltage)), ']') new_array_data from ( select id, samplingtimesec, array_data, cell_index, cell_voltage from ( select id, samplingtimesec, array_data,--格式[1,2,3,4,5] row_number() over (partition by id,samplingtimesec order by samplingtimesec) r --去重 from table WHERE dt = '20221007' and samplingtimesec <= 166507920 ) tmp lateral view posexplode(split(replace(replace(array_data, '[', ''), ']', ''), ',')) v0 as cell_index, cell_voltage where r = 1 distribute by id , samplingtimesec sort by cell_index ) tmp group by id, samplingtimesec, array_data ) tmp where array_data != new_array_data; 以上sql,对于hive/beeline输出结果为0条; 对于spark/beeline输出结果不为0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40358: Assignee: (was: Apache Spark) > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615412#comment-17615412 ] Apache Spark commented on SPARK-40358: -- User 'lvshaokang' has created a pull request for this issue: https://github.com/apache/spark/pull/38197 > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40358: Assignee: Apache Spark > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40596) Populate ExecutorDecommission with more informative messages
[ https://issues.apache.org/jira/browse/SPARK-40596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi resolved SPARK-40596. -- Assignee: Bo Zhang Resolution: Fixed Issue resolved by https://github.com/apache/spark/pull/38030 > Populate ExecutorDecommission with more informative messages > > > Key: SPARK-40596 > URL: https://issues.apache.org/jira/browse/SPARK-40596 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > > Currently the message in {{ExecutorDecommission}} is a fixed value > {{{}"Executor decommission."{}}}, and it is the same for all cases, including > spot instance interruptions and auto-scaling down. We should put a detailed > message in {{ExecutorDecommission}} to better differentiate those cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40534) Extend support for Join Relation
[ https://issues.apache.org/jira/browse/SPARK-40534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-40534: --- Assignee: Rui Wang > Extend support for Join Relation > > > Key: SPARK-40534 > URL: https://issues.apache.org/jira/browse/SPARK-40534 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > > Extend support for the `Join` relation with additional variants. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40534) Extend support for Join Relation
[ https://issues.apache.org/jira/browse/SPARK-40534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40534. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38157 [https://github.com/apache/spark/pull/38157] > Extend support for Join Relation > > > Key: SPARK-40534 > URL: https://issues.apache.org/jira/browse/SPARK-40534 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > Fix For: 3.4.0 > > > Extend support for the `Join` relation with additional variants. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40703: Assignee: (was: Apache Spark) > Performance regression for joins in Spark 3.3 vs Spark 3.2 > -- > > Key: SPARK-40703 > URL: https://issues.apache.org/jira/browse/SPARK-40703 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bryan Keller >Priority: Major > Attachments: spark32-plan.txt, spark33-plan.txt, test.py > > > When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a > performance regression vs Spark 3.2 was discovered. More specifically, it > appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no > longer enforces a minimum number of partitions for a join distribution in > some cases. This impacts DSv2 datasources, because if a scan has only a > single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() > returns a _SinglePartition_ instance. The _SinglePartition_ creates a > {_}SinglePartitionShuffleSpec{_}, and > {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true. > Because {_}canCreatePartitioning{_}() returns true in this case, > {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce > minimum parallelism and also will favor the single partition when considering > the best distribution candidate. Ultimately this results in a single > partition being selected for the join distribution, even if the other side of > the join is a large table with many partitions. This can seriously impact > performance of the join. > Spark 3.2 enforces minimum parallelism differently in > {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this > issue. It will shuffle both sides of the join to enforce parallelism. > In the TPC-DS benchmark, some queries affected include 14a and 14b. This can > also be demonstrated using a simple query, for example: > {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = > ics.i_item_sk}} > ...where _item_ is a small table that is read into one partition, and > _catalog_sales_ is a large table. These tables are part of the TPC-DS but you > can create your own. Also, to demonstrate the issue, you may need to turn off > broadcast joins though that is not required for this issue to occur, it > happens when running the TPC-DS with broadcast setting at default. > Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan > shows how in Spark 3.2, the join parallelism of 200 is reached by inserting > an exchange after the item table scan. In Spark 3.3, no such exchange is > inserted and the join parallelism is 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40703: Assignee: Apache Spark > Performance regression for joins in Spark 3.3 vs Spark 3.2 > -- > > Key: SPARK-40703 > URL: https://issues.apache.org/jira/browse/SPARK-40703 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bryan Keller >Assignee: Apache Spark >Priority: Major > Attachments: spark32-plan.txt, spark33-plan.txt, test.py > > > When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a > performance regression vs Spark 3.2 was discovered. More specifically, it > appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no > longer enforces a minimum number of partitions for a join distribution in > some cases. This impacts DSv2 datasources, because if a scan has only a > single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() > returns a _SinglePartition_ instance. The _SinglePartition_ creates a > {_}SinglePartitionShuffleSpec{_}, and > {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true. > Because {_}canCreatePartitioning{_}() returns true in this case, > {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce > minimum parallelism and also will favor the single partition when considering > the best distribution candidate. Ultimately this results in a single > partition being selected for the join distribution, even if the other side of > the join is a large table with many partitions. This can seriously impact > performance of the join. > Spark 3.2 enforces minimum parallelism differently in > {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this > issue. It will shuffle both sides of the join to enforce parallelism. > In the TPC-DS benchmark, some queries affected include 14a and 14b. This can > also be demonstrated using a simple query, for example: > {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = > ics.i_item_sk}} > ...where _item_ is a small table that is read into one partition, and > _catalog_sales_ is a large table. These tables are part of the TPC-DS but you > can create your own. Also, to demonstrate the issue, you may need to turn off > broadcast joins though that is not required for this issue to occur, it > happens when running the TPC-DS with broadcast setting at default. > Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan > shows how in Spark 3.2, the join parallelism of 200 is reached by inserting > an exchange after the item table scan. In Spark 3.3, no such exchange is > inserted and the join parallelism is 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615379#comment-17615379 ] Apache Spark commented on SPARK-40703: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/38196 > Performance regression for joins in Spark 3.3 vs Spark 3.2 > -- > > Key: SPARK-40703 > URL: https://issues.apache.org/jira/browse/SPARK-40703 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bryan Keller >Priority: Major > Attachments: spark32-plan.txt, spark33-plan.txt, test.py > > > When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a > performance regression vs Spark 3.2 was discovered. More specifically, it > appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no > longer enforces a minimum number of partitions for a join distribution in > some cases. This impacts DSv2 datasources, because if a scan has only a > single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() > returns a _SinglePartition_ instance. The _SinglePartition_ creates a > {_}SinglePartitionShuffleSpec{_}, and > {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true. > Because {_}canCreatePartitioning{_}() returns true in this case, > {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce > minimum parallelism and also will favor the single partition when considering > the best distribution candidate. Ultimately this results in a single > partition being selected for the join distribution, even if the other side of > the join is a large table with many partitions. This can seriously impact > performance of the join. > Spark 3.2 enforces minimum parallelism differently in > {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this > issue. It will shuffle both sides of the join to enforce parallelism. > In the TPC-DS benchmark, some queries affected include 14a and 14b. This can > also be demonstrated using a simple query, for example: > {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = > ics.i_item_sk}} > ...where _item_ is a small table that is read into one partition, and > _catalog_sales_ is a large table. These tables are part of the TPC-DS but you > can create your own. Also, to demonstrate the issue, you may need to turn off > broadcast joins though that is not required for this issue to occur, it > happens when running the TPC-DS with broadcast setting at default. > Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan > shows how in Spark 3.2, the join parallelism of 200 is reached by inserting > an exchange after the item table scan. In Spark 3.3, no such exchange is > inserted and the join parallelism is 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40725) Add mypy-protobuf to requirements
[ https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615373#comment-17615373 ] Apache Spark commented on SPARK-40725: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38195 > Add mypy-protobuf to requirements > - > > Key: SPARK-40725 > URL: https://issues.apache.org/jira/browse/SPARK-40725 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40725) Add mypy-protobuf to requirements
[ https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615372#comment-17615372 ] Apache Spark commented on SPARK-40725: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38195 > Add mypy-protobuf to requirements > - > > Key: SPARK-40725 > URL: https://issues.apache.org/jira/browse/SPARK-40725 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40740) Improve listFunctions in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615370#comment-17615370 ] Apache Spark commented on SPARK-40740: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/38194 > Improve listFunctions in SessionCatalog > --- > > Key: SPARK-40740 > URL: https://issues.apache.org/jira/browse/SPARK-40740 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Minor > > Currently `listFunctions` gets all external functions and registered > functions (built-in, temporary, and persistent functions with a specific > database name). It is not necessary to get persistent functions that match a > specific database name again since we already fetched them from > `externalCatalog.listFunctions`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40740) Improve listFunctions in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615369#comment-17615369 ] Apache Spark commented on SPARK-40740: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/38194 > Improve listFunctions in SessionCatalog > --- > > Key: SPARK-40740 > URL: https://issues.apache.org/jira/browse/SPARK-40740 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Minor > > Currently `listFunctions` gets all external functions and registered > functions (built-in, temporary, and persistent functions with a specific > database name). It is not necessary to get persistent functions that match a > specific database name again since we already fetched them from > `externalCatalog.listFunctions`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40740) Improve listFunctions in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40740: Assignee: (was: Apache Spark) > Improve listFunctions in SessionCatalog > --- > > Key: SPARK-40740 > URL: https://issues.apache.org/jira/browse/SPARK-40740 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Minor > > Currently `listFunctions` gets all external functions and registered > functions (built-in, temporary, and persistent functions with a specific > database name). It is not necessary to get persistent functions that match a > specific database name again since we already fetched them from > `externalCatalog.listFunctions`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40740) Improve listFunctions in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-40740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40740: Assignee: Apache Spark > Improve listFunctions in SessionCatalog > --- > > Key: SPARK-40740 > URL: https://issues.apache.org/jira/browse/SPARK-40740 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Minor > > Currently `listFunctions` gets all external functions and registered > functions (built-in, temporary, and persistent functions with a specific > database name). It is not necessary to get persistent functions that match a > specific database name again since we already fetched them from > `externalCatalog.listFunctions`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40740) Improve listFunctions in SessionCatalog
Allison Wang created SPARK-40740: Summary: Improve listFunctions in SessionCatalog Key: SPARK-40740 URL: https://issues.apache.org/jira/browse/SPARK-40740 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Allison Wang Currently `listFunctions` gets all external functions and registered functions (built-in, temporary, and persistent functions with a specific database name). It is not necessary to get persistent functions that match a specific database name again since we already fetched them from `externalCatalog.listFunctions`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615361#comment-17615361 ] Khalid Mammadov commented on SPARK-37945: - [~maxgekk] I see you have already fixed most of these, I can pick up (and started already) below ones if Ok? unscaledValueTooLargeForPrecisionError decimalPrecisionExceedsMaxPrecisionError outOfDecimalTypeRangeError integerOverflowError Ps: Looks fearly streightforward and shouldn't take long > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615358#comment-17615358 ] Apache Spark commented on SPARK-39375: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38193 > SPIP: Spark Connect - A client and server interface for Apache Spark > > > Key: SPARK-39375 > URL: https://issues.apache.org/jira/browse/SPARK-39375 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Critical > Labels: SPIP > > Please find the full document for discussion here: [Spark Connect > SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] > Below, we have just referenced the introduction. > h2. What are you trying to do? > While Spark is used extensively, it was designed nearly a decade ago, which, > in the age of serverless computing and ubiquitous programming language use, > poses a number of limitations. Most of the limitations stem from the tightly > coupled Spark driver architecture and fact that clusters are typically shared > across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark > driver runs both the client application and scheduler, which results in a > heavyweight architecture that requires proximity to the cluster. There is no > built-in capability to remotely connect to a Spark cluster in languages > other than SQL and users therefore rely on external solutions such as the > inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich > developer experience{*}: The current architecture and APIs do not cater for > interactive data exploration (as done with Notebooks), or allow for building > out rich developer experience common in modern code editors. (3) > {*}Stability{*}: with the current shared driver architecture, users causing > critical exceptions (e.g. OOM) bring the whole cluster down for all users. > (4) {*}Upgradability{*}: the current entangling of platform and client APIs > (e.g. first and third-party dependencies in the classpath) does not allow for > seamless upgrades between Spark versions (and with that, hinders new feature > adoption). > > We propose to overcome these challenges by building on the DataFrame API and > the underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > {_}Spark Connect{_}, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the server > for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The decoupled > architecture will result in improved stability, as clients are separated from > the driver. From the Spark Connect client perspective, Spark will be (almost) > versionless, and thus enable seamless upgradability, as server APIs can > evolve without affecting the client API. The decoupled client-server > architecture can be leveraged to build close integrations with local > developer tooling. Finally, separating the client process from the Spark > server process will improve Spark’s overall security posture by avoiding the > tight coupling of the client inside the Spark runtime environment. > > Spark Connect will strengthen Spark’s position as the modern unified engine > for large-scale data analytics and expand applicability to use cases and > developers we could not reach with the current setup: Spark will become > ubiquitously usable as the DataFrame API can be used with (almost) any > programming language. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39375: Assignee: Apache Spark > SPIP: Spark Connect - A client and server interface for Apache Spark > > > Key: SPARK-39375 > URL: https://issues.apache.org/jira/browse/SPARK-39375 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Critical > Labels: SPIP > > Please find the full document for discussion here: [Spark Connect > SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] > Below, we have just referenced the introduction. > h2. What are you trying to do? > While Spark is used extensively, it was designed nearly a decade ago, which, > in the age of serverless computing and ubiquitous programming language use, > poses a number of limitations. Most of the limitations stem from the tightly > coupled Spark driver architecture and fact that clusters are typically shared > across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark > driver runs both the client application and scheduler, which results in a > heavyweight architecture that requires proximity to the cluster. There is no > built-in capability to remotely connect to a Spark cluster in languages > other than SQL and users therefore rely on external solutions such as the > inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich > developer experience{*}: The current architecture and APIs do not cater for > interactive data exploration (as done with Notebooks), or allow for building > out rich developer experience common in modern code editors. (3) > {*}Stability{*}: with the current shared driver architecture, users causing > critical exceptions (e.g. OOM) bring the whole cluster down for all users. > (4) {*}Upgradability{*}: the current entangling of platform and client APIs > (e.g. first and third-party dependencies in the classpath) does not allow for > seamless upgrades between Spark versions (and with that, hinders new feature > adoption). > > We propose to overcome these challenges by building on the DataFrame API and > the underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > {_}Spark Connect{_}, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the server > for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The decoupled > architecture will result in improved stability, as clients are separated from > the driver. From the Spark Connect client perspective, Spark will be (almost) > versionless, and thus enable seamless upgradability, as server APIs can > evolve without affecting the client API. The decoupled client-server > architecture can be leveraged to build close integrations with local > developer tooling. Finally, separating the client process from the Spark > server process will improve Spark’s overall security posture by avoiding the > tight coupling of the client inside the Spark runtime environment. > > Spark Connect will strengthen Spark’s position as the modern unified engine > for large-scale data analytics and expand applicability to use cases and > developers we could not reach with the current setup: Spark will become > ubiquitously usable as the DataFrame API can be used with (almost) any > programming language. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39375: Assignee: (was: Apache Spark) > SPIP: Spark Connect - A client and server interface for Apache Spark > > > Key: SPARK-39375 > URL: https://issues.apache.org/jira/browse/SPARK-39375 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Critical > Labels: SPIP > > Please find the full document for discussion here: [Spark Connect > SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] > Below, we have just referenced the introduction. > h2. What are you trying to do? > While Spark is used extensively, it was designed nearly a decade ago, which, > in the age of serverless computing and ubiquitous programming language use, > poses a number of limitations. Most of the limitations stem from the tightly > coupled Spark driver architecture and fact that clusters are typically shared > across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark > driver runs both the client application and scheduler, which results in a > heavyweight architecture that requires proximity to the cluster. There is no > built-in capability to remotely connect to a Spark cluster in languages > other than SQL and users therefore rely on external solutions such as the > inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich > developer experience{*}: The current architecture and APIs do not cater for > interactive data exploration (as done with Notebooks), or allow for building > out rich developer experience common in modern code editors. (3) > {*}Stability{*}: with the current shared driver architecture, users causing > critical exceptions (e.g. OOM) bring the whole cluster down for all users. > (4) {*}Upgradability{*}: the current entangling of platform and client APIs > (e.g. first and third-party dependencies in the classpath) does not allow for > seamless upgrades between Spark versions (and with that, hinders new feature > adoption). > > We propose to overcome these challenges by building on the DataFrame API and > the underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > {_}Spark Connect{_}, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the server > for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The decoupled > architecture will result in improved stability, as clients are separated from > the driver. From the Spark Connect client perspective, Spark will be (almost) > versionless, and thus enable seamless upgradability, as server APIs can > evolve without affecting the client API. The decoupled client-server > architecture can be leveraged to build close integrations with local > developer tooling. Finally, separating the client process from the Spark > server process will improve Spark’s overall security posture by avoiding the > tight coupling of the client inside the Spark runtime environment. > > Spark Connect will strengthen Spark’s position as the modern unified engine > for large-scale data analytics and expand applicability to use cases and > developers we could not reach with the current setup: Spark will become > ubiquitously usable as the DataFrame API can be used with (almost) any > programming language. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615357#comment-17615357 ] Apache Spark commented on SPARK-39375: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38193 > SPIP: Spark Connect - A client and server interface for Apache Spark > > > Key: SPARK-39375 > URL: https://issues.apache.org/jira/browse/SPARK-39375 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Critical > Labels: SPIP > > Please find the full document for discussion here: [Spark Connect > SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] > Below, we have just referenced the introduction. > h2. What are you trying to do? > While Spark is used extensively, it was designed nearly a decade ago, which, > in the age of serverless computing and ubiquitous programming language use, > poses a number of limitations. Most of the limitations stem from the tightly > coupled Spark driver architecture and fact that clusters are typically shared > across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark > driver runs both the client application and scheduler, which results in a > heavyweight architecture that requires proximity to the cluster. There is no > built-in capability to remotely connect to a Spark cluster in languages > other than SQL and users therefore rely on external solutions such as the > inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich > developer experience{*}: The current architecture and APIs do not cater for > interactive data exploration (as done with Notebooks), or allow for building > out rich developer experience common in modern code editors. (3) > {*}Stability{*}: with the current shared driver architecture, users causing > critical exceptions (e.g. OOM) bring the whole cluster down for all users. > (4) {*}Upgradability{*}: the current entangling of platform and client APIs > (e.g. first and third-party dependencies in the classpath) does not allow for > seamless upgrades between Spark versions (and with that, hinders new feature > adoption). > > We propose to overcome these challenges by building on the DataFrame API and > the underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > {_}Spark Connect{_}, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the server > for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The decoupled > architecture will result in improved stability, as clients are separated from > the driver. From the Spark Connect client perspective, Spark will be (almost) > versionless, and thus enable seamless upgradability, as server APIs can > evolve without affecting the client API. The decoupled client-server > architecture can be leveraged to build close integrations with local > developer tooling. Finally, separating the client process from the Spark > server process will improve Spark’s overall security posture by avoiding the > tight coupling of the client inside the Spark runtime environment. > > Spark Connect will strengthen Spark’s position as the modern unified engine > for large-scale data analytics and expand applicability to use cases and > developers we could not reach with the current setup: Spark will become > ubiquitously usable as the DataFrame API can be used with (almost) any > programming language. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39199) Implement pandas API missing parameters
[ https://issues.apache.org/jira/browse/SPARK-39199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-39199. -- Resolution: Resolved > Implement pandas API missing parameters > --- > > Key: SPARK-39199 > URL: https://issues.apache.org/jira/browse/SPARK-39199 > Project: Spark > Issue Type: Umbrella > Components: Pandas API on Spark, PySpark >Affects Versions: 3.3.0, 3.4.0, 3.3.1 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > pandas API on Spark aims to make pandas code work on Spark clusters without > any changes. So full API coverage has been one of our major goals. Currently, > most pandas functions are implemented, whereas some of them are have > incomplete parameters support. > There are some common parameters missing (resolved): > * How to do with NAs > * Filter data types > * Control result length > * Reindex result > There are remaining missing parameters to implement (see doc below). > See the design and the current status at > [https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40739) "sbt packageBin" fails in cygwin or other windows bash session
Phil Walker created SPARK-40739: --- Summary: "sbt packageBin" fails in cygwin or other windows bash session Key: SPARK-40739 URL: https://issues.apache.org/jira/browse/SPARK-40739 Project: Spark Issue Type: Bug Components: Build, Windows Affects Versions: 3.3.0 Environment: The problem occurs in Windows if *_sbt_* is started from a (non-WSL) bash session. See the spark PR link for detailed symptoms. Reporter: Phil Walker In a Windows _*SHELL*_ environment, such as _*cygwin*_ or {_}*msys2/mingw64*{_}, etc, _*Core.settings*_ in _*project/SparkBuild.scala*_ calls the wrong _*bash.exe*_ if WSL bash is present (typically at {_}*C:\Windows*{_}), causing a build failure. This occurs even though the proper *bash.exe* is in the _*PATH*_ ahead of _*WSL*_ bash.exe. This is fixed by [spark PR 38167|https://github.com/apache/spark/pull/38167] There are 3 parts to the fix, implemented in _*project/SparkBuild.scala*_ : * determine the absolute path of the first bash.exe in the command line. * determine the build environment (e.g., Linux, Darwin, CYGWIN, MSYS2, etc.) * For Windows SHELL environments, the first argument to the spawned Process is changed from "bash" to the absolute path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40738) spark-shell fails with "bad array subscript" in cygwin or msys bash session
Phil Walker created SPARK-40738: --- Summary: spark-shell fails with "bad array subscript" in cygwin or msys bash session Key: SPARK-40738 URL: https://issues.apache.org/jira/browse/SPARK-40738 Project: Spark Issue Type: Bug Components: Spark Shell, Windows Affects Versions: 3.3.0 Environment: The problem occurs in Windows if *_spark-shell_* is called from a bash session. NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since they call spark-shell. Reporter: Phil Walker A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] fixes this issue, and also fixes a build error that is also related to _*cygwin*_ and *msys/mingw* bash *sbt* sessions. If a Windows user tries to start a *_spark-shell_* session by calling the bash script (rather than the *_spark-shell.cmd_* script), it fails with a confusing error message. Script _*spark-class*_ calls _*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate command line arguments, but the launcher produces a format appropriate to the *_.cmd_* version of the script rather than the _*bash*_ version. The launcher Main method, when called for environments other than Windows, interleaves NULL characters between the command line arguments. It should also do so in Windows when called from the bash script. It incorrectly assumes that if the OS is Windows, that it is being called by the .cmd version of the script. The resulting error message is unhelpful: {code:java} [lots of ugly stuff omitted] /opt/spark/bin/spark-class: line 100: CMD: bad array subscript {code} The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ session is that the _*SHELL*_ environment variable is set. This will normally be set in any of the various Windows shell environments ({_}*cygwin*{_}, {_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally be set in Windows environments. In the _*spark-class.cmd*_ script, _*SHELL*_ is intentionally unset to avoid problems, and to permit bash users to call the _*.cmd*_ scripts if they prefer (it will still work as before). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615264#comment-17615264 ] Bryan Keller commented on SPARK-40703: -- Sounds good, thanks. > Performance regression for joins in Spark 3.3 vs Spark 3.2 > -- > > Key: SPARK-40703 > URL: https://issues.apache.org/jira/browse/SPARK-40703 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bryan Keller >Priority: Major > Attachments: spark32-plan.txt, spark33-plan.txt, test.py > > > When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a > performance regression vs Spark 3.2 was discovered. More specifically, it > appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no > longer enforces a minimum number of partitions for a join distribution in > some cases. This impacts DSv2 datasources, because if a scan has only a > single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() > returns a _SinglePartition_ instance. The _SinglePartition_ creates a > {_}SinglePartitionShuffleSpec{_}, and > {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true. > Because {_}canCreatePartitioning{_}() returns true in this case, > {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce > minimum parallelism and also will favor the single partition when considering > the best distribution candidate. Ultimately this results in a single > partition being selected for the join distribution, even if the other side of > the join is a large table with many partitions. This can seriously impact > performance of the join. > Spark 3.2 enforces minimum parallelism differently in > {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this > issue. It will shuffle both sides of the join to enforce parallelism. > In the TPC-DS benchmark, some queries affected include 14a and 14b. This can > also be demonstrated using a simple query, for example: > {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = > ics.i_item_sk}} > ...where _item_ is a small table that is read into one partition, and > _catalog_sales_ is a large table. These tables are part of the TPC-DS but you > can create your own. Also, to demonstrate the issue, you may need to turn off > broadcast joins though that is not required for this issue to occur, it > happens when running the TPC-DS with broadcast setting at default. > Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan > shows how in Spark 3.2, the join parallelism of 200 is reached by inserting > an exchange after the item table scan. In Spark 3.3, no such exchange is > inserted and the join parallelism is 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615255#comment-17615255 ] Chao Sun commented on SPARK-40703: -- Thanks [~bryanck] . Now I see where the issue is. In your pyspark example, one side reports {{UnknownPartitioning}} while another side reports {{{}SinglePartition{}}}. Later on, Spark will insert shuffle for {{UnknownPartitioning}} so it becomes {{{}HashPartitioning{}}}. In this particular case, when Spark is deciding which side to insert shuffle, it'll pick the {{HashPartitioning}} again and convert it into the same {{HashPartitioning}} but with {{{}numPartitions = 1{}}}. Before: {code} ShuffleExchange(HashPartition(200)) <--> SinglePartition {code} (suppose {{spark.sql.shuffle.partitions}} is 200) After: {code} ShuffleExchange(HashPartition(1)) <--> SinglePartition {code} The reason Spark chooses to do in this way is because there is a trade-off between shuffle cost and parallelism. At the moment, when Spark sees that one side of the join has {{ShuffleExchange}} (meaning it needs to be shuffled anyways), and the other side doesn't, it'll try to avoid shuffling the other side. This makes more sense if we have: {code} ShuffleExchange(HashPartition(200)) <-> HashPartition(150) {code} as in this case, Spark will avoid shuffle the right hand side and instead just change the number of shuffle partitions on the left: {code} ShuffleExchange(HashPartition(150) <-> HashPartition(150) {code} I feel we can treat the {{SinglePartition}} as a special case here. Let me see if I can come up with a PR. > Performance regression for joins in Spark 3.3 vs Spark 3.2 > -- > > Key: SPARK-40703 > URL: https://issues.apache.org/jira/browse/SPARK-40703 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bryan Keller >Priority: Major > Attachments: spark32-plan.txt, spark33-plan.txt, test.py > > > When running the TPC-DS benchmarks using a DSv2 datasource in Spark 3.3, a > performance regression vs Spark 3.2 was discovered. More specifically, it > appears as if {_}EnsureRequirements.ensureDistributionAndOrdering{_}() no > longer enforces a minimum number of partitions for a join distribution in > some cases. This impacts DSv2 datasources, because if a scan has only a > single read partition {_}DataSourceV2ScanExecBase.outputPartitioning{_}() > returns a _SinglePartition_ instance. The _SinglePartition_ creates a > {_}SinglePartitionShuffleSpec{_}, and > {_}SinglePartitionShuffleSpec.canCreatePartitioning{_}() returns true. > Because {_}canCreatePartitioning{_}() returns true in this case, > {_}EnsureRequirements.ensureDistributionAndOrdering{_}() won't enforce > minimum parallelism and also will favor the single partition when considering > the best distribution candidate. Ultimately this results in a single > partition being selected for the join distribution, even if the other side of > the join is a large table with many partitions. This can seriously impact > performance of the join. > Spark 3.2 enforces minimum parallelism differently in > {_}ensureDistributionAndOrdering{_}() and thus does not suffer from this > issue. It will shuffle both sides of the join to enforce parallelism. > In the TPC-DS benchmark, some queries affected include 14a and 14b. This can > also be demonstrated using a simple query, for example: > {{select ics.i_item_sk from catalog_sales cs join item ics on cs.cs_item_sk = > ics.i_item_sk}} > ...where _item_ is a small table that is read into one partition, and > _catalog_sales_ is a large table. These tables are part of the TPC-DS but you > can create your own. Also, to demonstrate the issue, you may need to turn off > broadcast joins though that is not required for this issue to occur, it > happens when running the TPC-DS with broadcast setting at default. > Attached is the plan for this query in Spark 3.2 and in Spark 3.3. The plan > shows how in Spark 3.2, the join parallelism of 200 is reached by inserting > an exchange after the item table scan. In Spark 3.3, no such exchange is > inserted and the join parallelism is 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40737) Add basic support for DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40737: Assignee: Apache Spark > Add basic support for DataFrameWriter > - > > Key: SPARK-40737 > URL: https://issues.apache.org/jira/browse/SPARK-40737 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > > A key element of using Spark Connect is going to be to be able to write data > from a logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40737) Add basic support for DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615251#comment-17615251 ] Apache Spark commented on SPARK-40737: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/38192 > Add basic support for DataFrameWriter > - > > Key: SPARK-40737 > URL: https://issues.apache.org/jira/browse/SPARK-40737 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > A key element of using Spark Connect is going to be to be able to write data > from a logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40737) Add basic support for DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40737: Assignee: Apache Spark > Add basic support for DataFrameWriter > - > > Key: SPARK-40737 > URL: https://issues.apache.org/jira/browse/SPARK-40737 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > > A key element of using Spark Connect is going to be to be able to write data > from a logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40737) Add basic support for DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40737: Assignee: (was: Apache Spark) > Add basic support for DataFrameWriter > - > > Key: SPARK-40737 > URL: https://issues.apache.org/jira/browse/SPARK-40737 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > A key element of using Spark Connect is going to be to be able to write data > from a logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40737) Add basic support for DataFrameWriter
Martin Grund created SPARK-40737: Summary: Add basic support for DataFrameWriter Key: SPARK-40737 URL: https://issues.apache.org/jira/browse/SPARK-40737 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund A key element of using Spark Connect is going to be to be able to write data from a logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-40736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratik Malani updated SPARK-40736: -- Fix Version/s: 3.3.1 > Spark 3.3.0 doesn't works with Hive 3.1.2 > - > > Key: SPARK-40736 > URL: https://issues.apache.org/jira/browse/SPARK-40736 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Pratik Malani >Priority: Major > Labels: Hive, spark, spark3.0 > Fix For: 3.3.1 > > > Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2. > Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when > starting the Thriftserver > > {noformat} > Exception in thread "main" java.lang.IllegalAccessError: tried to access > class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from > class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$ > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat} > Using below command to start the Thriftserver > > *spark-class org.apache.spark.deploy.SparkSubmit --class > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal* > > Have set the SPARK_HOME correctly. > > The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-40736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratik Malani updated SPARK-40736: -- Labels: Hive spark spark3.0 (was: ) > Spark 3.3.0 doesn't works with Hive 3.1.2 > - > > Key: SPARK-40736 > URL: https://issues.apache.org/jira/browse/SPARK-40736 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Pratik Malani >Priority: Major > Labels: Hive, spark, spark3.0 > > Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2. > Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when > starting the Thriftserver > > {noformat} > Exception in thread "main" java.lang.IllegalAccessError: tried to access > class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from > class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$ > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat} > Using below command to start the Thriftserver > > *spark-class org.apache.spark.deploy.SparkSubmit --class > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal* > > Have set the SPARK_HOME correctly. > > The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-40736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratik Malani updated SPARK-40736: -- Description: Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2. Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when starting the Thriftserver {noformat} Exception in thread "main" java.lang.IllegalAccessError: tried to access class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$ at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat} Using below command to start the Thriftserver *spark-class org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal* Have set the SPARK_HOME correctly. The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2. > Spark 3.3.0 doesn't works with Hive 3.1.2 > - > > Key: SPARK-40736 > URL: https://issues.apache.org/jira/browse/SPARK-40736 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Pratik Malani >Priority: Major > > Hive 2.3.9 is impacted with CVE-2021-34538, so trying to use the Hive 3.1.2. > Using Spark 3.3.0 with Hadoop 3.3.4 and Hive 3.1.2, getting below error when > starting the Thriftserver > > {noformat} > Exception in thread "main" java.lang.IllegalAccessError: tried to access > class org.apache.hive.service.server.HiveServer2$ServerOptionsProcessor from > class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$ > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:92) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){noformat} > Using below command to start the Thriftserver > > *spark-class org.apache.spark.deploy.SparkSubmit --class > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal* > > Have set the SPARK_HOME correctly. > > The same works well with Hive 2.3.9, but fails when we upgrade to Hive 3.1.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40736) Spark 3.3.0 doesn't works with Hive 3.1.2
Pratik Malani created SPARK-40736: - Summary: Spark 3.3.0 doesn't works with Hive 3.1.2 Key: SPARK-40736 URL: https://issues.apache.org/jira/browse/SPARK-40736 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Pratik Malani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40706) IllegalStateException when querying array values inside a nested struct
[ https://issues.apache.org/jira/browse/SPARK-40706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614550#comment-17614550 ] Bruce Robbins edited comment on SPARK-40706 at 10/10/22 5:01 PM: - Same as SPARK-39854? At the very least, the suggested workaround also worked for your case: {noformat} spark-sql> set spark.sql.optimizer.nestedSchemaPruning.enabled=false; spark.sql.optimizer.nestedSchemaPruning.enabled false Time taken: 0.224 seconds, Fetched 1 row(s) spark-sql> set spark.sql.optimizer.expression.nestedPruning.enabled=false; spark.sql.optimizer.expression.nestedPruning.enabledfalse Time taken: 0.016 seconds, Fetched 1 row(s) spark-sql> SELECT response.message as message, response.timestamp as timestamp, score as risk_score, model.value as model_type FROM tbl LATERAL VIEW OUTER explode(response.data.items.attempt) AS Attempt LATERAL VIEW OUTER explode(response.data.items.attempt.risk) AS RiskModels LATERAL VIEW OUTER explode(RiskModels) AS RiskModel LATERAL VIEW OUTER explode(RiskModel.indicator) AS Model LATERAL VIEW OUTER explode(RiskModel.Score) AS Score; > > > > > > > > > > m1 09/07/2022 1 abc m1 09/07/2022 2 abc m1 09/07/2022 3 abc m1 09/07/2022 1 def m1 09/07/2022 2 def m1 09/07/2022 3 def Time taken: 1.213 seconds, Fetched 6 row(s) spark-sql> > {noformat} was (Author: bersprockets): Same as SPARK-39854? At the very least, the suggest workaround also worked for your case: {noformat} spark-sql> set spark.sql.optimizer.nestedSchemaPruning.enabled=false; spark.sql.optimizer.nestedSchemaPruning.enabled false Time taken: 0.224 seconds, Fetched 1 row(s) spark-sql> set spark.sql.optimizer.expression.nestedPruning.enabled=false; spark.sql.optimizer.expression.nestedPruning.enabledfalse Time taken: 0.016 seconds, Fetched 1 row(s) spark-sql> SELECT response.message as message, response.timestamp as timestamp, score as risk_score, model.value as model_type FROM tbl LATERAL VIEW OUTER explode(response.data.items.attempt) AS Attempt LATERAL VIEW OUTER explode(response.data.items.attempt.risk) AS RiskModels LATERAL VIEW OUTER explode(RiskModels) AS RiskModel LATERAL VIEW OUTER explode(RiskModel.indicator) AS Model LATERAL VIEW OUTER explode(RiskModel.Score) AS Score; > > > > > > > > > > m1 09/07/2022 1 abc m1 09/07/2022 2 abc m1 09/07/2022 3 abc m1 09/07/2022 1 def m1 09/07/2022 2 def m1 09/07/2022 3 def Time taken: 1.213 seconds, Fetched 6 row(s) spark-sql> > {noformat} > IllegalStateException when querying array values inside a nested struct > --- > > Key: SPARK-40706 > URL: https://issues.apache.org/jira/browse/SPARK-40706 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Rohan Barman >Priority: Major > > We are in the process of migrating our PySpark applications from Spark > version 3.1.2 to Spark version 3.2.0. > This bug is present in version 3.2.0. We do not see this issue in version > 3.1.2. > > *Minimal example to reproduce bug* > Below is a minimal example that generates hardcoded data and queries. The > data has several nested structs and arrays. > Our real use case reads data from avro files and has more complex queries, > but this is sufficient to reproduce the error. > > {code:java} > # Generate data > data = [ > ('1',{ > 'timestamp': '09/07/2022', > 'message': 'm1', > 'data':{ > 'items': { > 'id':1, > 'attempt':[ > {'risk':[ > {'score':[1,2,3]}, > {'indicator':[ > {'code':'c1','value':'abc'}, > {'code':'c2','value':'def'} > ]} > ]} > ] > } > } > }) > ] > from pyspark.sql.types import * > schema = StructType([ > StructField('id', StringType(), True), > StructField('response',
[jira] [Resolved] (SPARK-40714) Remove PartitionAlreadyExistsException
[ https://issues.apache.org/jira/browse/SPARK-40714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40714. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38161 [https://github.com/apache/spark/pull/38161] > Remove PartitionAlreadyExistsException > -- > > Key: SPARK-40714 > URL: https://issues.apache.org/jira/browse/SPARK-40714 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Remove Remove PartitionAlreadyExistsException and use > PartitionsAlreadyExistException instead of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36681) Fail to load Snappy codec
[ https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615201#comment-17615201 ] L. C. Hsieh commented on SPARK-36681: - This is fixed in 3.3.0 and later, yes, by upgrading to Hadoop 3.3.2. As discussed above, there is no workaround in 3.2 for this issue. If you are stick with 3.2, the only way is to upgrade to Hadoop 3.3.2 in Spark 3.2 source. > Fail to load Snappy codec > - > > Key: SPARK-36681 > URL: https://issues.apache.org/jira/browse/SPARK-36681 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > snappy-java as a native library should not be relocated in Hadoop shaded > client libraries. Currently we use Hadoop shaded client libraries in Spark. > If trying to use SnappyCodec to write sequence file, we will encounter the > following error: > {code} > [info] Cause: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native > Method) > > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151) > > > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282) > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589) > > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629) > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable
[ https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaoping.huang updated SPARK-40735: --- Component/s: Connect Kubernetes R Spark Core SQL (was: Deploy) > Consistently invoke bash with /usr/bin/env bash in scripts to make code more > portable > - > > Key: SPARK-40735 > URL: https://issues.apache.org/jira/browse/SPARK-40735 > Project: Spark > Issue Type: Improvement > Components: Connect, Kubernetes, R, Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: xiaoping.huang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable
[ https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40735: Assignee: Apache Spark > Consistently invoke bash with /usr/bin/env bash in scripts to make code more > portable > - > > Key: SPARK-40735 > URL: https://issues.apache.org/jira/browse/SPARK-40735 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 3.4.0 >Reporter: xiaoping.huang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable
[ https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40735: Assignee: (was: Apache Spark) > Consistently invoke bash with /usr/bin/env bash in scripts to make code more > portable > - > > Key: SPARK-40735 > URL: https://issues.apache.org/jira/browse/SPARK-40735 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 3.4.0 >Reporter: xiaoping.huang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable
[ https://issues.apache.org/jira/browse/SPARK-40735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615190#comment-17615190 ] Apache Spark commented on SPARK-40735: -- User 'huangxiaopingRD' has created a pull request for this issue: https://github.com/apache/spark/pull/38191 > Consistently invoke bash with /usr/bin/env bash in scripts to make code more > portable > - > > Key: SPARK-40735 > URL: https://issues.apache.org/jira/browse/SPARK-40735 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 3.4.0 >Reporter: xiaoping.huang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40735) Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable
xiaoping.huang created SPARK-40735: -- Summary: Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable Key: SPARK-40735 URL: https://issues.apache.org/jira/browse/SPARK-40735 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 3.4.0 Reporter: xiaoping.huang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40705) Issue with spark converting Row to Json using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-40705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40705. -- Fix Version/s: 3.3.1 3.4.0 Resolution: Fixed Issue resolved by pull request 38154 [https://github.com/apache/spark/pull/38154] > Issue with spark converting Row to Json using Scala 2.13 > > > Key: SPARK-40705 > URL: https://issues.apache.org/jira/browse/SPARK-40705 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Amrane Ait Zeouay >Assignee: Amrane Ait Zeouay >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: image-2022-10-07-19-43-42-232.png, > image-2022-10-07-19-43-51-892.png, image-2022-10-07-19-44-00-332.png, > image-2022-10-07-19-44-09-972.png > > > h2. *Note: This issue can be reproduced only using Scala 2.13* > When I'm trying to convert the Row to a json to publish it, i'm getting this > following error > !image-2022-10-07-19-43-42-232.png! > I tried to investigate and I found that the issue is in the matching. > !image-2022-10-07-19-43-51-892.png! > The type `ArraySeq` is not matched in `Row` class. > !image-2022-10-07-19-44-00-332.png! > This is the definition of my field > !image-2022-10-07-19-44-09-972.png! > And an example of it > > {code:json} > { > ... > Codes: ["Test", "Spark", "Json"] > ... > }{code} > The Scala version I'm using is `2.13.9` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40705) Issue with spark converting Row to Json using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-40705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40705: Assignee: Amrane Ait Zeouay > Issue with spark converting Row to Json using Scala 2.13 > > > Key: SPARK-40705 > URL: https://issues.apache.org/jira/browse/SPARK-40705 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Amrane Ait Zeouay >Assignee: Amrane Ait Zeouay >Priority: Major > Attachments: image-2022-10-07-19-43-42-232.png, > image-2022-10-07-19-43-51-892.png, image-2022-10-07-19-44-00-332.png, > image-2022-10-07-19-44-09-972.png > > > h2. *Note: This issue can be reproduced only using Scala 2.13* > When I'm trying to convert the Row to a json to publish it, i'm getting this > following error > !image-2022-10-07-19-43-42-232.png! > I tried to investigate and I found that the issue is in the matching. > !image-2022-10-07-19-43-51-892.png! > The type `ArraySeq` is not matched in `Row` class. > !image-2022-10-07-19-44-00-332.png! > This is the definition of my field > !image-2022-10-07-19-44-09-972.png! > And an example of it > > {code:json} > { > ... > Codes: ["Test", "Spark", "Json"] > ... > }{code} > The Scala version I'm using is `2.13.9` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40726) Supplement undocumented orc configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40726. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38188 [https://github.com/apache/spark/pull/38188] > Supplement undocumented orc configurations in documentation > --- > > Key: SPARK-40726 > URL: https://issues.apache.org/jira/browse/SPARK-40726 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40726) Supplement undocumented orc configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40726: Assignee: Qian Sun > Supplement undocumented orc configurations in documentation > --- > > Key: SPARK-40726 > URL: https://issues.apache.org/jira/browse/SPARK-40726 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40734) KafkaMicroBatchSourceSuite failed
Yang Jie created SPARK-40734: Summary: KafkaMicroBatchSourceSuite failed Key: SPARK-40734 URL: https://issues.apache.org/jira/browse/SPARK-40734 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 3.4.0 Reporter: Yang Jie "ensure stream-stream self-join generates only one offset in log and correct metrics" failed Failure reason to be supplemented -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40733) ShowCreateTableSuite test failed
Yang Jie created SPARK-40733: Summary: ShowCreateTableSuite test failed Key: SPARK-40733 URL: https://issues.apache.org/jira/browse/SPARK-40733 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie * SHOW CREATE TABLE using Hive V1 catalog V1 command: hive table with serde info *** FAILED *** * - SHOW CREATE TABLE using Hive V1 catalog V2 command: hive table with serde info *** FAILED *** Failure reason to be supplemented -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40732) Floating point precision changes
Yang Jie created SPARK-40732: Summary: Floating point precision changes Key: SPARK-40732 URL: https://issues.apache.org/jira/browse/SPARK-40732 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie Some case in SQLQueryTestSuite(sql/core) and ThriftServerQueryTestSuite(sql/hive-thriftserver) failed for this reason: for example: {code:java} SQLQueryTestSuite- try_aggregates.sql *** FAILED *** try_aggregates.sql Expected "4.61168601842738[79]E18", but got "4.61168601842738[8]E18" Result did not match for query #20 SELECT try_avg(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col) (SQLQueryTestSuite.scala:495) {code} {code:java} ThriftServerQueryTestSuite- try_aggregates.sql *** FAILED *** Expected "4.61168601842738[79]E18", but got "4.61168601842738[8]E18" Result did not match for query #20 SELECT try_avg(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col) (ThriftServerQueryTestSuite.scala:222)- try_arithmetic.sql *** FAILED *** Expected "-4.65661287307739[26]E-10", but got "-4.65661287307739[3]E-10" Result did not match for query #26 SELECT try_divide(1, (2147483647 + 1)) (ThriftServerQueryTestSuite.scala:222)- datetime-formatting.sql *** FAILED *** Expected "...-05-31 19:40:35.123 [3 1969-12-31 15:00:00 3 1970-12-31 04:59:59.999 3 1996-03-31 07:03:33.123 3 2018-11-17 05:33:33.123 3 2019-12-31 09:33:33.123 3] 2100-01-01 01:33:33...", but got "...-05-31 19:40:35.123 [5 1969-12-31 15:00:00 5 1970-12-31 04:59:59.999 5 1996-03-31 07:03:33.123 5 2018-11-17 05:33:33.123 3 2019-12-31 09:33:33.123 5] 2100-01-01 01:33:33..." Result did not match for query #8 select col, date_format(col, 'F') from v (ThriftServerQueryTestSuite.scala:222) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40731) Sealed class cannot be mock by mockito
Yang Jie created SPARK-40731: Summary: Sealed class cannot be mock by mockito Key: SPARK-40731 URL: https://issues.apache.org/jira/browse/SPARK-40731 Project: Spark Issue Type: Sub-task Components: DStreams Affects Versions: 3.4.0 Reporter: Yang Jie 3 test case in WriteAheadLogSuite failed for this reason -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40730) Java 19 related issues
[ https://issues.apache.org/jira/browse/SPARK-40730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-40730: - Description: I run maven test with Java 19, there are some test failed, whether it can be solved or not, record it, which will be helpful for upgrading the next LTS(Java 21) > Java 19 related issues > -- > > Key: SPARK-40730 > URL: https://issues.apache.org/jira/browse/SPARK-40730 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > I run maven test with Java 19, there are some test failed, whether it can be > solved or not, record it, which will be helpful for upgrading the next > LTS(Java 21) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40729) Spark-shell run failed with Java 19
[ https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-40729: - Parent: SPARK-40730 Issue Type: Sub-task (was: Improvement) > Spark-shell run failed with Java 19 > --- > > Key: SPARK-40729 > URL: https://issues.apache.org/jira/browse/SPARK-40729 > Project: Spark > Issue Type: Sub-task > Components: Spark Shell >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. > Attempting port 4041. > Spark context Web UI available at http://localhost:4041 > Spark context available as 'sc' (master = local, app id = > local-1665401880396). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.3.0 > /_/ > > Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19) > Type in expressions to have them evaluated. > Type :help for more information. > scala> :paste > // Entering paste mode (ctrl-D to finish) > var array = new Array[Int](5) > val broadcastArray = sc.broadcast(array) > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > array(0) = 5 > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > // Exiting paste mode, now interpreting. > java.lang.InternalError: java.lang.IllegalAccessException: final field has no > write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class > java.lang.Object (module java.base) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167) > at > java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145) > at > java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184) > at > java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153) > at java.base/java.lang.reflect.Field.set(Field.java:820) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2491) > at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) > at org.apache.spark.rdd.RDD.map(RDD.scala:413) > ... 43 elided > Caused by: java.lang.IllegalAccessException: final field has no write access: > $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object > (module java.base) > at > java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502) > at > java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145) > ... 55 more > scala> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40730) Java 19 related issues
Yang Jie created SPARK-40730: Summary: Java 19 related issues Key: SPARK-40730 URL: https://issues.apache.org/jira/browse/SPARK-40730 Project: Spark Issue Type: Umbrella Components: Spark Core, SQL Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40729) Spark-shell run failed with Java 19
[ https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615093#comment-17615093 ] Apache Spark commented on SPARK-40729: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38190 > Spark-shell run failed with Java 19 > --- > > Key: SPARK-40729 > URL: https://issues.apache.org/jira/browse/SPARK-40729 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. > Attempting port 4041. > Spark context Web UI available at http://localhost:4041 > Spark context available as 'sc' (master = local, app id = > local-1665401880396). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.3.0 > /_/ > > Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19) > Type in expressions to have them evaluated. > Type :help for more information. > scala> :paste > // Entering paste mode (ctrl-D to finish) > var array = new Array[Int](5) > val broadcastArray = sc.broadcast(array) > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > array(0) = 5 > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > // Exiting paste mode, now interpreting. > java.lang.InternalError: java.lang.IllegalAccessException: final field has no > write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class > java.lang.Object (module java.base) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167) > at > java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145) > at > java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184) > at > java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153) > at java.base/java.lang.reflect.Field.set(Field.java:820) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2491) > at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) > at org.apache.spark.rdd.RDD.map(RDD.scala:413) > ... 43 elided > Caused by: java.lang.IllegalAccessException: final field has no write access: > $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object > (module java.base) > at > java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502) > at > java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145) > ... 55 more > scala> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40729) Spark-shell run failed with Java 19
[ https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40729: Assignee: Apache Spark > Spark-shell run failed with Java 19 > --- > > Key: SPARK-40729 > URL: https://issues.apache.org/jira/browse/SPARK-40729 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > {code:java} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. > Attempting port 4041. > Spark context Web UI available at http://localhost:4041 > Spark context available as 'sc' (master = local, app id = > local-1665401880396). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.3.0 > /_/ > > Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19) > Type in expressions to have them evaluated. > Type :help for more information. > scala> :paste > // Entering paste mode (ctrl-D to finish) > var array = new Array[Int](5) > val broadcastArray = sc.broadcast(array) > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > array(0) = 5 > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > // Exiting paste mode, now interpreting. > java.lang.InternalError: java.lang.IllegalAccessException: final field has no > write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class > java.lang.Object (module java.base) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167) > at > java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145) > at > java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184) > at > java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153) > at java.base/java.lang.reflect.Field.set(Field.java:820) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2491) > at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) > at org.apache.spark.rdd.RDD.map(RDD.scala:413) > ... 43 elided > Caused by: java.lang.IllegalAccessException: final field has no write access: > $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object > (module java.base) > at > java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502) > at > java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145) > ... 55 more > scala> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40729) Spark-shell run failed with Java 19
[ https://issues.apache.org/jira/browse/SPARK-40729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40729: Assignee: (was: Apache Spark) > Spark-shell run failed with Java 19 > --- > > Key: SPARK-40729 > URL: https://issues.apache.org/jira/browse/SPARK-40729 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. > Attempting port 4041. > Spark context Web UI available at http://localhost:4041 > Spark context available as 'sc' (master = local, app id = > local-1665401880396). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.3.0 > /_/ > > Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19) > Type in expressions to have them evaluated. > Type :help for more information. > scala> :paste > // Entering paste mode (ctrl-D to finish) > var array = new Array[Int](5) > val broadcastArray = sc.broadcast(array) > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > array(0) = 5 > sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() > // Exiting paste mode, now interpreting. > java.lang.InternalError: java.lang.IllegalAccessException: final field has no > write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class > java.lang.Object (module java.base) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167) > at > java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145) > at > java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184) > at > java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153) > at java.base/java.lang.reflect.Field.set(Field.java:820) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2491) > at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) > at org.apache.spark.rdd.RDD.map(RDD.scala:413) > ... 43 elided > Caused by: java.lang.IllegalAccessException: final field has no write access: > $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object > (module java.base) > at > java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511) > at > java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502) > at > java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630) > at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145) > ... 55 more > scala> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36681) Fail to load Snappy codec
[ https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615087#comment-17615087 ] icyjhl commented on SPARK-36681: Hi [~viirya], So this is only fixed in 3.3.0 and after? Any workaround in 3.2? Many Thanks! > Fail to load Snappy codec > - > > Key: SPARK-36681 > URL: https://issues.apache.org/jira/browse/SPARK-36681 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > snappy-java as a native library should not be relocated in Hadoop shaded > client libraries. Currently we use Hadoop shaded client libraries in Spark. > If trying to use SnappyCodec to write sequence file, we will encounter the > following error: > {code} > [info] Cause: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native > Method) > > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151) > > > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282) > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589) > > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629) > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit
[ https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40727. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 1 [https://github.com/apache/spark-docker/pull/1] > Add merge_spark_docker_pr.py to help merge commit > - > > Key: SPARK-40727 > URL: https://issues.apache.org/jira/browse/SPARK-40727 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit
[ https://issues.apache.org/jira/browse/SPARK-40727 ] Yikun Jiang deleted comment on SPARK-40727: - was (Author: yikunkero): Issue resolved by pull request 1 [https://github.com/apache/spark-docker/pull/1] > Add merge_spark_docker_pr.py to help merge commit > - > > Key: SPARK-40727 > URL: https://issues.apache.org/jira/browse/SPARK-40727 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit
[ https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-40727: Fix Version/s: (was: 3.4.0) > Add merge_spark_docker_pr.py to help merge commit > - > > Key: SPARK-40727 > URL: https://issues.apache.org/jira/browse/SPARK-40727 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit
[ https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang reopened SPARK-40727: - > Add merge_spark_docker_pr.py to help merge commit > - > > Key: SPARK-40727 > URL: https://issues.apache.org/jira/browse/SPARK-40727 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit
[ https://issues.apache.org/jira/browse/SPARK-40727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40727. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 1 [https://github.com/apache/spark-docker/pull/1] > Add merge_spark_docker_pr.py to help merge commit > - > > Key: SPARK-40727 > URL: https://issues.apache.org/jira/browse/SPARK-40727 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40729) Spark-shell run failed with Java 19
Yang Jie created SPARK-40729: Summary: Spark-shell run failed with Java 19 Key: SPARK-40729 URL: https://issues.apache.org/jira/browse/SPARK-40729 Project: Spark Issue Type: Improvement Components: Spark Shell Affects Versions: 3.4.0 Reporter: Yang Jie {code:java} Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/10/10 19:37:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22/10/10 19:38:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Spark context Web UI available at http://localhost:4041 Spark context available as 'sc' (master = local, app id = local-1665401880396). Spark session available as 'spark'. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.3.0 /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 19) Type in expressions to have them evaluated. Type :help for more information. scala> :paste // Entering paste mode (ctrl-D to finish) var array = new Array[Int](5) val broadcastArray = sc.broadcast(array) sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() array(0) = 5 sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect() // Exiting paste mode, now interpreting. java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object (module java.base) at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167) at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145) at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184) at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153) at java.base/java.lang.reflect.Field.set(Field.java:820) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) at org.apache.spark.SparkContext.clean(SparkContext.scala:2491) at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) at org.apache.spark.rdd.RDD.map(RDD.scala:413) ... 43 elided Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$2365/0x00080199eef0.arg$1/putField, from class java.lang.Object (module java.base) at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511) at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502) at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630) at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145) ... 55 more scala> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40728) Upgrade ASM to 9.4
[ https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40728: Assignee: Apache Spark > Upgrade ASM to 9.4 > -- > > Key: SPARK-40728 > URL: https://issues.apache.org/jira/browse/SPARK-40728 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40728) Upgrade ASM to 9.4
[ https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40728: Assignee: (was: Apache Spark) > Upgrade ASM to 9.4 > -- > > Key: SPARK-40728 > URL: https://issues.apache.org/jira/browse/SPARK-40728 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40728) Upgrade ASM to 9.4
[ https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615076#comment-17615076 ] Apache Spark commented on SPARK-40728: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38189 > Upgrade ASM to 9.4 > -- > > Key: SPARK-40728 > URL: https://issues.apache.org/jira/browse/SPARK-40728 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40728) Upgrade ASM to 9.4
Yang Jie created SPARK-40728: Summary: Upgrade ASM to 9.4 Key: SPARK-40728 URL: https://issues.apache.org/jira/browse/SPARK-40728 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40727) Add merge_spark_docker_pr.py to help merge commit
Yikun Jiang created SPARK-40727: --- Summary: Add merge_spark_docker_pr.py to help merge commit Key: SPARK-40727 URL: https://issues.apache.org/jira/browse/SPARK-40727 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.4.0 Reporter: Yikun Jiang Assignee: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40726) Supplement undocumented orc configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40726: Assignee: (was: Apache Spark) > Supplement undocumented orc configurations in documentation > --- > > Key: SPARK-40726 > URL: https://issues.apache.org/jira/browse/SPARK-40726 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40726) Supplement undocumented orc configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40726: Assignee: Apache Spark > Supplement undocumented orc configurations in documentation > --- > > Key: SPARK-40726 > URL: https://issues.apache.org/jira/browse/SPARK-40726 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40726) Supplement undocumented orc configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615003#comment-17615003 ] Apache Spark commented on SPARK-40726: -- User 'dcoliversun' has created a pull request for this issue: https://github.com/apache/spark/pull/38188 > Supplement undocumented orc configurations in documentation > --- > > Key: SPARK-40726 > URL: https://issues.apache.org/jira/browse/SPARK-40726 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40725) Add mypy-protobuf to requirements
[ https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang reassigned SPARK-40725: --- Assignee: Ruifeng Zheng > Add mypy-protobuf to requirements > - > > Key: SPARK-40725 > URL: https://issues.apache.org/jira/browse/SPARK-40725 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40725) Add mypy-protobuf to requirements
[ https://issues.apache.org/jira/browse/SPARK-40725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40725. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38186 [https://github.com/apache/spark/pull/38186] > Add mypy-protobuf to requirements > - > > Key: SPARK-40725 > URL: https://issues.apache.org/jira/browse/SPARK-40725 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40724) Simplify `corr` with method `inline`
[ https://issues.apache.org/jira/browse/SPARK-40724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40724: - Assignee: Ruifeng Zheng > Simplify `corr` with method `inline` > > > Key: SPARK-40724 > URL: https://issues.apache.org/jira/browse/SPARK-40724 > Project: Spark > Issue Type: Improvement > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org