[jira] [Resolved] (SPARK-37635) SHOW TBLPROPERTIES should print the fully qualified table name
[ https://issues.apache.org/jira/browse/SPARK-37635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37635. Fix Version/s: 3.3.0 Assignee: Wenchen Fan Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34890 > SHOW TBLPROPERTIES should print the fully qualified table name > -- > > Key: SPARK-37635 > URL: https://issues.apache.org/jira/browse/SPARK-37635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37310) Migrate ALTER NAMESPACE ... SET PROPERTIES to use v2 command by default
[ https://issues.apache.org/jira/browse/SPARK-37310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37310. Fix Version/s: 3.3.0 Assignee: Terry Kim (was: Apache Spark) Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34891 > Migrate ALTER NAMESPACE ... SET PROPERTIES to use v2 command by default > --- > > Key: SPARK-37310 > URL: https://issues.apache.org/jira/browse/SPARK-37310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.3.0 > > > Migrate ALTER NAMESPACE ... SET PROPERTIES to use v2 command by default -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36038) Basic speculation metrics at stage level
[ https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36038. Assignee: Thejdeep Gudivada Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34607 > Basic speculation metrics at stage level > > > Key: SPARK-36038 > URL: https://issues.apache.org/jira/browse/SPARK-36038 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Venkata krishnan Sowrirajan >Assignee: Thejdeep Gudivada >Priority: Major > Fix For: 3.3.0 > > > Currently there are no speculation metrics available either at application > level or at stage level. With in our platform, we have added speculation > metrics at stage level as a summary similarly to the stage level metrics > tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled > etc. This enables us to effectively understand speculative execution feature > at an application level and helps in further tuning the speculation configs. > cc [~ron8hu] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37586) Add cipher mode option and set default cipher mode for aes_encrypt and aes_decrypt
[ https://issues.apache.org/jira/browse/SPARK-37586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37586. Fix Version/s: 3.3.0 Assignee: Max Gekk Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34837 > Add cipher mode option and set default cipher mode for aes_encrypt and > aes_decrypt > -- > > Key: SPARK-37586 > URL: https://issues.apache.org/jira/browse/SPARK-37586 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > https://github.com/apache/spark/pull/32801 added aes_encrypt/aes_decrypt > functions to spark. However they rely on the jvm's configuration regarding > which cipher mode to support, this is problematic as it is not fixed across > versions and systems. > Let's hardcode a default cipher mode and also allow users to set a cipher > mode as an argument to the function. > In the future, we can support other modes like GCM and CBC that have been > already supported by other systems: > # Snowflake: > https://docs.snowflake.com/en/sql-reference/functions/encrypt.html > # Bigquery: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aead-encryption-concepts#block_cipher_modes -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37568) Support 2-arguments by the convert_timezone() function
[ https://issues.apache.org/jira/browse/SPARK-37568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454960#comment-17454960 ] Kousuke Saruta commented on SPARK-37568: [~yoda-mon] OK, please go ahead. > Support 2-arguments by the convert_timezone() function > -- > > Key: SPARK-37568 > URL: https://issues.apache.org/jira/browse/SPARK-37568 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > # If sourceTs is a timestamp_ntz, take the sourceTz from the session time > zone, see the SQL config spark.sql.session.timeZone > # If sourceTs is a timestamp_ltz, convert it to a timestamp_ntz using the > targetTz -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37568) Support 2-arguments by the convert_timezone() function
[ https://issues.apache.org/jira/browse/SPARK-37568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454947#comment-17454947 ] Kousuke Saruta commented on SPARK-37568: cc: [~yoda-mon] [~YActs] Do you want to work on this? > Support 2-arguments by the convert_timezone() function > -- > > Key: SPARK-37568 > URL: https://issues.apache.org/jira/browse/SPARK-37568 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > # If sourceTs is a timestamp_ntz, take the sourceTz from the session time > zone, see the SQL config spark.sql.session.timeZone > # If sourceTs is a timestamp_ltz, convert it to a timestamp_ntz using the > targetTz -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
[ https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37469. Fix Version/s: 3.3.0 Assignee: Yazhi Wang Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34720 > Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI > --- > > Key: SPARK-37469 > URL: https://issues.apache.org/jira/browse/SPARK-37469 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Yazhi Wang >Assignee: Yazhi Wang >Priority: Minor > Fix For: 3.3.0 > > Attachments: executor-page.png, sql-page.png > > > Metrics in Executor/Task page shown as " > Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which > make us confused !executor-page.png! > !sql-page.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37529) Support K8s integration tests for Java 17
Kousuke Saruta created SPARK-37529: -- Summary: Support K8s integration tests for Java 17 Key: SPARK-37529 URL: https://issues.apache.org/jira/browse/SPARK-37529 Project: Spark Issue Type: Sub-task Components: Kubernetes, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Now that we can build container image for Java 17, let's support K8s integration tests for Java 17. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451376#comment-17451376 ] Kousuke Saruta commented on SPARK-37487: [~tanelk] Thank you for pinging me. I think a sampling job for the global sort performs the extra CollectMetrics (operations before the sort are performed twice). Please let me look into more. > CollectMetrics is executed twice if it is followed by a sort > > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > Labels: correctness > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-37487: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name = "my_event", > min($"id").as("min_val"), > max($"id").as("max_val"), > // Test unresolved alias > sum($"id"), > count(when($"id" % 2 === 0, 1)).as("num_even")) > .observe( > name = "other_event", > avg($"id").cast("int").as("avg_val")) > .sort($"id".desc) > validateObservedMetrics(df) > } > {code} > The count and sum aggregate report twice the number of rows: > {code} > [info] - SPARK-37487: get observable metrics with sort by callback *** FAILED > *** (169 milliseconds) > [info] [0,99,9900,100] did not equal [0,99,4950,50] > (DataFrameCallbackSuite.scala:342) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > {code} > I could not figure out how this happes. Hopefully the UT can help with > debugging -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation
[ https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37468: --- Description: Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But I think it can support those types because their underlying types are integer or long, which UnionEstimation can compute stats for. (was: Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But I think it can support those types because their underlying types are integer or long, which it UnionEstimation can compute stats for.) > Support ANSI intervals and TimestampNTZ for UnionEstimation > --- > > Key: SPARK-37468 > URL: https://issues.apache.org/jira/browse/SPARK-37468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. > But I think it can support those types because their underlying types are > integer or long, which UnionEstimation can compute stats for. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation
Kousuke Saruta created SPARK-37468: -- Summary: Support ANSI intervals and TimestampNTZ for UnionEstimation Key: SPARK-37468 URL: https://issues.apache.org/jira/browse/SPARK-37468 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But I think it can support those types because their underlying types are integer or long, which it UnionEstimation can compute stats for. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37459) Upgrade commons-cli to 1.5.0
Kousuke Saruta created SPARK-37459: -- Summary: Upgrade commons-cli to 1.5.0 Key: SPARK-37459 URL: https://issues.apache.org/jira/browse/SPARK-37459 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Currently used commons-cli is too old and contains an issue which affects the behavior of bin/spark-sql {code} bin/spark-sql -e 'SELECT "Spark"' ... Error in query: no viable alternative at input 'SELECT "'(line 1, pos 7) == SQL == SELECT "Spark ---^^^ {code} The root cause of this issue seems to be resolved in CLI-185. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37354) Make the Java version installed on the container image used by the K8s integration tests with SBT configurable
[ https://issues.apache.org/jira/browse/SPARK-37354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37354. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34628 > Make the Java version installed on the container image used by the K8s > integration tests with SBT configurable > -- > > Key: SPARK-37354 > URL: https://issues.apache.org/jira/browse/SPARK-37354 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.2.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.3.0 > > > I noticed that the default Java version installed on the container image used > by the K8s integration tests are different depending on the way to run the > tests. > If the tests are launched by Maven, the Java version is 8 is installed. > On the other hand, if the tests are launched by SBT, the Java version is 11. > Further, we have no way to change the version. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37354) Make the Java version installed on the container image used by the K8s integration tests with SBT configurable
Kousuke Saruta created SPARK-37354: -- Summary: Make the Java version installed on the container image used by the K8s integration tests with SBT configurable Key: SPARK-37354 URL: https://issues.apache.org/jira/browse/SPARK-37354 Project: Spark Issue Type: Bug Components: Kubernetes, Tests Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta I noticed that the default Java version installed on the container image used by the K8s integration tests are different depending on the way to run the tests. If the tests are launched by Maven, the Java version is 8 is installed. On the other hand, if the tests are launched by SBT, the Java version is 11. Further, we have no way to change the version. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37319) Support K8s image building with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37319. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34586 > Support K8s image building with Java 17 > --- > > Key: SPARK-37319 > URL: https://issues.apache.org/jira/browse/SPARK-37319 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37320) Delete py_container_checks.zip after the test in DepsTestsSuite finishes
[ https://issues.apache.org/jira/browse/SPARK-37320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37320: --- Description: When K8s integration tests run, py_container_checks.zip still remains in resource-managers/kubernetes/integration-tests/tests/. It's is created in the test "Launcher python client dependencies using a zip file" in DepsTestsSuite. was: When K8s integration tests run, py_container_checks.zip is still remaining in resource-managers/kubernetes/integration-tests/tests/. It's is created in the test "Launcher python client dependencies using a zip file" in DepsTestsSuite. > Delete py_container_checks.zip after the test in DepsTestsSuite finishes > > > Key: SPARK-37320 > URL: https://issues.apache.org/jira/browse/SPARK-37320 > Project: Spark > Issue Type: Bug > Components: k8, Tests >Affects Versions: 3.2.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > When K8s integration tests run, py_container_checks.zip still remains in > resource-managers/kubernetes/integration-tests/tests/. > It's is created in the test "Launcher python client dependencies using a zip > file" in DepsTestsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37320) Delete py_container_checks.zip after the test in DepsTestsSuite finishes
Kousuke Saruta created SPARK-37320: -- Summary: Delete py_container_checks.zip after the test in DepsTestsSuite finishes Key: SPARK-37320 URL: https://issues.apache.org/jira/browse/SPARK-37320 Project: Spark Issue Type: Bug Components: k8, Tests Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta When K8s integration tests run, py_container_checks.zip is still remaining in resource-managers/kubernetes/integration-tests/tests/. It's is created in the test "Launcher python client dependencies using a zip file" in DepsTestsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37315) Mitigate ConcurrentModificationException thrown from a test in MLEventSuite
[ https://issues.apache.org/jira/browse/SPARK-37315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37315: --- Summary: Mitigate ConcurrentModificationException thrown from a test in MLEventSuite (was: Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite) > Mitigate ConcurrentModificationException thrown from a test in MLEventSuite > --- > > Key: SPARK-37315 > URL: https://issues.apache.org/jira/browse/SPARK-37315 > Project: Spark > Issue Type: Bug > Components: ML, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > Recently, I notice ConcurrentModificationException is sometimes thrown from > the following part of the test "pipeline read/write events" in MLEventSuite > when Scala 2.13 is used. > {code} > events.map(JsonProtocol.sparkEventToJson).foreach { event => > assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) > } > {code} > I think the root cause is the ArrayBuffer (events) is updated asynchronously > by the following part. > {code} > private val listener: SparkListener = new SparkListener { > override def onOtherEvent(event: SparkListenerEvent): Unit = event match { > case e: MLEvent => events.append(e) > case _ => > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37315) Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite
[ https://issues.apache.org/jira/browse/SPARK-37315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37315: --- Description: Recently, I notice ConcurrentModificationException is sometimes thrown from the following part of the test "pipeline read/write events" in MLEventSuite when Scala 2.13 is used. {code} events.map(JsonProtocol.sparkEventToJson).foreach { event => assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) } {code} I think the root cause is the ArrayBuffer (events) is updated asynchronously by the following part. {code} private val listener: SparkListener = new SparkListener { override def onOtherEvent(event: SparkListenerEvent): Unit = event match { case e: MLEvent => events.append(e) case _ => } } {code} was: Recently, I notice ConcurrentModificationException is thrown from the following part of the test "pipeline read/write events" in MLEventSuite when Scala 2.13 is used. {code} events.map(JsonProtocol.sparkEventToJson).foreach { event => assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) } {code} I think the root cause is the ArrayBuffer (events) is updated asynchronously by the following part. {code} private val listener: SparkListener = new SparkListener { override def onOtherEvent(event: SparkListenerEvent): Unit = event match { case e: MLEvent => events.append(e) case _ => } } {code} > Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite > - > > Key: SPARK-37315 > URL: https://issues.apache.org/jira/browse/SPARK-37315 > Project: Spark > Issue Type: Bug > Components: ML, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, I notice ConcurrentModificationException is sometimes thrown from > the following part of the test "pipeline read/write events" in MLEventSuite > when Scala 2.13 is used. > {code} > events.map(JsonProtocol.sparkEventToJson).foreach { event => > assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) > } > {code} > I think the root cause is the ArrayBuffer (events) is updated asynchronously > by the following part. > {code} > private val listener: SparkListener = new SparkListener { > override def onOtherEvent(event: SparkListenerEvent): Unit = event match { > case e: MLEvent => events.append(e) > case _ => > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37315) Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite
Kousuke Saruta created SPARK-37315: -- Summary: Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite Key: SPARK-37315 URL: https://issues.apache.org/jira/browse/SPARK-37315 Project: Spark Issue Type: Bug Components: ML, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Recently, I notice ConcurrentModificationException is thrown from the following part of the test "pipeline read/write events" in MLEventSuite when Scala 2.13 is used. {code} events.map(JsonProtocol.sparkEventToJson).foreach { event => assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) } {code} I think the root cause is the ArrayBuffer (events) is updated asynchronously by the following part. {code} private val listener: SparkListener = new SparkListener { override def onOtherEvent(event: SparkListenerEvent): Unit = event match { case e: MLEvent => events.append(e) case _ => } } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37312) Add `.java-version` to `.gitignore` and `.rat-excludes`
[ https://issues.apache.org/jira/browse/SPARK-37312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37312. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34577 > Add `.java-version` to `.gitignore` and `.rat-excludes` > --- > > Key: SPARK-37312 > URL: https://issues.apache.org/jira/browse/SPARK-37312 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37314) Upgrade kubernetes-client to 5.10.1
[ https://issues.apache.org/jira/browse/SPARK-37314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37314: --- Description: kubernetes-client 5.10.0 and 5.10.1 were released, which include some bug fixes. https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 Especially, the connection leak issue would affect Spark. https://github.com/fabric8io/kubernetes-client/issues/3561 was: kubernetes-client 5.10.0 and 5.10.1 were relased, which include some bug fixes. https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 Especially, the connection leak issue would affect Spark. https://github.com/fabric8io/kubernetes-client/issues/3561 > Upgrade kubernetes-client to 5.10.1 > --- > > Key: SPARK-37314 > URL: https://issues.apache.org/jira/browse/SPARK-37314 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > kubernetes-client 5.10.0 and 5.10.1 were released, which include some bug > fixes. > https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 > https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 > Especially, the connection leak issue would affect Spark. > https://github.com/fabric8io/kubernetes-client/issues/3561 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37314) Upgrade kubernetes-client to 5.10.1
[ https://issues.apache.org/jira/browse/SPARK-37314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37314: --- Description: kubernetes-client 5.10.0 and 5.10.1 were relased, which include some bug fixes. https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 Especially, the connection leak issue would affect Spark. https://github.com/fabric8io/kubernetes-client/issues/3561 was: A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include some bug fixes. https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 Especially, the connection leak issue would affect Spark. https://github.com/fabric8io/kubernetes-client/issues/3561 > Upgrade kubernetes-client to 5.10.1 > --- > > Key: SPARK-37314 > URL: https://issues.apache.org/jira/browse/SPARK-37314 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > kubernetes-client 5.10.0 and 5.10.1 were relased, which include some bug > fixes. > https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 > https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 > Especially, the connection leak issue would affect Spark. > https://github.com/fabric8io/kubernetes-client/issues/3561 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37314) Upgrade kubernetes-client to 5.10.1
[ https://issues.apache.org/jira/browse/SPARK-37314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37314: --- Description: A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include some bug fixes. https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 Especially, the connection leak issue would affect Spark. https://github.com/fabric8io/kubernetes-client/issues/3561 was: A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include some bug fixes. Especially, the connection leak issue would affect Spark. > Upgrade kubernetes-client to 5.10.1 > --- > > Key: SPARK-37314 > URL: https://issues.apache.org/jira/browse/SPARK-37314 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which > include some bug fixes. > https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 > https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1 > Especially, the connection leak issue would affect Spark. > https://github.com/fabric8io/kubernetes-client/issues/3561 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37314) Upgrade kubernetes-client to 5.10.1
Kousuke Saruta created SPARK-37314: -- Summary: Upgrade kubernetes-client to 5.10.1 Key: SPARK-37314 URL: https://issues.apache.org/jira/browse/SPARK-37314 Project: Spark Issue Type: Bug Components: Build, Kubernetes Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include some bug fixes. Especially, the connection leak issue would affect Spark. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37302: --- Description: dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in the both of Maven and Coursier local repository. {code:java} $ rm -rf ~/.m2/repository/* $ # For Linux $ rm -rf ~/.cache/coursier/v1/* $ # For macOS $ rm -rf ~/Library/Caches/Coursier/v1/* $ dev/change-scala-version.sh 2.13 $ dev/test-dependencies.sh $ build/sbt -Pscala-2.13 clean compile ... [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1: error: package com.google.common.primitives does not exist [error] import com.google.common.primitives.Ints; [error]^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1: error: package com.google.common.annotations does not exist [error] import com.google.common.annotations.VisibleForTesting; [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1: error: package com.google.common.base does not exist [error] import com.google.common.base.Preconditions; ... {code} {code:java} [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25: Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub. [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.HttpConnectionFactory) [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13: Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with a stub. [error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), null) [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.ConnectionFactory) [error] val connector = new ServerConnector( {code} The reason is that exec-maven-plugin used in test-dependencies.sh downloads pom of guava and jetty-io but doesn't downloads the corresponding jars, and skip dependency testing if Scala 2.13 is used (if dependency testing runs, Maven downloads those jars). {code} if [[ "$SCALA_BINARY_VERSION" != "2.12" ]]; then # TODO(SPARK-36168) Support Scala 2.13 in dev/test-dependencies.sh echo "Skip dependency testing on $SCALA_BINARY_VERSION" exit 0 fi {
[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37302: --- Description: dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in the both of Maven and Coursier local repository. {code:java} $ rm -rf ~/.m2/repository/* $ # For Linux $ rm -rf ~/.cache/coursier/v1/* $ # For macOS $ rm -rf ~/Library/Caches/Coursier/v1/* $ dev/change-scala-version.sh 2.13 $ dev/test-dependencies.sh $ build/sbt -Pscala-2.13 clean compile ... [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1: error: package com.google.common.primitives does not exist [error] import com.google.common.primitives.Ints; [error]^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1: error: package com.google.common.annotations does not exist [error] import com.google.common.annotations.VisibleForTesting; [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1: error: package com.google.common.base does not exist [error] import com.google.common.base.Preconditions; ... {code} {code:java} [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25: Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub. [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.HttpConnectionFactory) [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13: Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with a stub. [error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), null) [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.ConnectionFactory) [error] val connector = new ServerConnector( {code} The reason is that exec-maven-plugin used in test-dependencies.sh downloads pom of guava and jetty-io but doesn't downloads the corresponding jars, and skip dependency testing if Scala 2.13 is used (if dependency testing runs, Maven downloads those jars). {code} if [[ "$SCALA_BINARY_VERSION" != "2.12" ]]; then # TODO(SPARK-36168) Support Scala 2.13 in dev/test-dependencies.sh echo "Skip dependency testing on $SCALA_BINARY_VERSION" exit 0 fi {
[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37302: --- Description: dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in the both of Maven and Coursier local repository. {code:java} $ rm -rf ~/.m2/repository/* $ # For Linux $ rm -rf ~/.cache/coursier/v1/* $ # For macOS $ rm -rf ~/Library/Caches/Coursier/v1/* $ dev/change-scala-version.sh 2.13 $ dev/test-dependencies.sh $ build/sbt -Pscala-2.13 clean compile ... [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1: error: package com.google.common.primitives does not exist [error] import com.google.common.primitives.Ints; [error]^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1: error: package com.google.common.annotations does not exist [error] import com.google.common.annotations.VisibleForTesting; [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1: error: package com.google.common.base does not exist [error] import com.google.common.base.Preconditions; ... {code} {code:java} [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25: Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub. [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.HttpConnectionFactory) [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13: Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with a stub. [error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), null) [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.ConnectionFactory) [error] val connector = new ServerConnector( {code} The reason is that exec-maven-plugin used in `test-dependencies.sh` downloads pom of guava and jetty-io but doesn't downloads the corresponding jars. {code:java} $ find ~/.m2 -name "guava*" ... /home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom /home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom.sha1 ... /home/kou/.m2/repository/com/google/guava/guava-parent/14.0.1/guava-parent-14.0.1.pom /home/kou/.m2/repository/com/google/guava/gu
[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37302: --- Description: dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in the both of Maven and Coursier local repository. {code:java} $ rm -rf ~/.m2/repository/* $ # For Linux $ rm -rf ~/.cache/coursier/v1/* $ # For macOS $ rm -rf ~/Library/Caches/Coursier/v1/* $ dev/change-scala-version.sh 2.13 $ dev/test-dependencies.sh $ build/sbt -Pscala-2.13 clean compile ... [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1: error: package com.google.common.primitives does not exist [error] import com.google.common.primitives.Ints; [error]^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1: error: package com.google.common.annotations does not exist [error] import com.google.common.annotations.VisibleForTesting; [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1: error: package com.google.common.base does not exist [error] import com.google.common.base.Preconditions; ... {code} {code:java} [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25: Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub. [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.HttpConnectionFactory) [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13: Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with a stub. [error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), null) [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.ConnectionFactory) [error] val connector = new ServerConnector( {code} The reason is that exec-maven-plugin used in `test-dependencies.sh` downloads pom of guava and jetty-io but doesn't downloads the corresponding jars. {code:java} $ find ~/.m2 -name "guava*" ... /home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom /home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom.sha1 ... /home/kou/.m2/repository/com/google/guava/guava-parent/14.0.1/guava-parent-14.0.1.pom /home/kou/.m2/repository/com/google/guava/gu
[jira] [Created] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
Kousuke Saruta created SPARK-37302: -- Summary: Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh Key: SPARK-37302 URL: https://issues.apache.org/jira/browse/SPARK-37302 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in the both of Maven and Coursier local repository. {code} $ rm -rf ~/.m2/repository/* $ # For Linux $ rm -rf ~/.cache/coursier/v1/* $ # For macOS $ rm -rf ~/Library/Caches/Coursier/v1/* $ dev/change-scala-version.sh 2.13 $ dev/test-dependencies.sh $ build/sbt -Pscala-2.13 clean compile ... [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1: error: package com.google.common.primitives does not exist [error] import com.google.common.primitives.Ints; [error]^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1: error: package com.google.common.annotations does not exist [error] import com.google.common.annotations.VisibleForTesting; [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1: error: package com.google.common.base does not exist [error] import com.google.common.base.Preconditions; ... {code} {code} [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25: Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub. [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.HttpConnectionFactory) [error] val connector = new ServerConnector( [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13: Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with a stub. [error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), null) [error] ^ [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25: multiple constructors for ServerConnector with alternatives: [error] (x$1: org.eclipse.jetty.server.Server,x$2: java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.util.ssl.SslContextFactory,x$3: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector [error] cannot be invoked with (org.eclipse.jetty.server.Server, Null, org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, org.eclipse.jetty.server.ConnectionFactory) [error] val connector = new ServerConnector( {code} The reason is that exec-maven-plugin used in `test-dependencies.sh` downloads pom of guava and jetty-io but doesn't downloads the corresponding jars. {code} $ find ~/.m2 -name "guava*" ... /home/kou/.m2/repository/com/google/g
[jira] [Created] (SPARK-37284) Upgrade Jekyll to 4.2.1
Kousuke Saruta created SPARK-37284: -- Summary: Upgrade Jekyll to 4.2.1 Key: SPARK-37284 URL: https://issues.apache.org/jira/browse/SPARK-37284 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Jekyll 4.2.1 was released in September, which includes the fix of a regression bug. https://github.com/jekyll/jekyll/releases/tag/v4.2.1 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format
[ https://issues.apache.org/jira/browse/SPARK-37283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37283: --- Description: If, a table being created contains a column of ANSI interval types and the underlying file format has a corresponding Hive SerDe (e.g. Parquet), `HiveExternalcatalog` tries to store the table in Hive compatible format. But, as ANSI interval types in Spark and interval type in Hive are not compatible (Hive only supports interval_year_month and interval_day_time), the following warning with stack trace will be logged. {code} spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet; 21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist `default`.`tbl1` in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Error: type expected at the position 0 of 'interval year to month' but 'interval year to month' is found. at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551) at org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499) at org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376) at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:
[jira] [Created] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format
Kousuke Saruta created SPARK-37283: -- Summary: Don't try to store a V1 table which contains ANSI intervals in Hive compatible format Key: SPARK-37283 URL: https://issues.apache.org/jira/browse/SPARK-37283 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta If, a table being created contains a column of ANSI interval types and the underlying file format has a corresponding Hive SerDe (e.g. Parquet), `HiveExternalcatalog` tries to store the table in Hive compatible format. But, as ANSI interval types in Spark and interval type in Hive are not compatible (Hive only supports interval_year_month and interval_day_time), the following warning with stack trace will be logged. {code} spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet; 21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist `default`.`tbl1` in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Error: type expected at the position 0 of 'interval year to month' but 'interval year to month' is found. at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551) at org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499) at org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376) at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHel
[jira] [Resolved] (SPARK-37264) [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core
[ https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37264. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34541 > [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from > orc-core > -- > > Key: SPARK-37264 > URL: https://issues.apache.org/jira/browse/SPARK-37264 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > Like hadoop-common and hadoop-hdfs, this PR proposes to exclude > hadoop-client-api transitive dependency from orc-core. > Why are the changes needed? > Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency > on Hadoop 3.3.1. > This causes test-dependencies.sh failure on Java 17. As a result, > run-tests.py also fails. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37264) [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core
[ https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37264: --- Description: Like hadoop-common and hadoop-hdfs, this PR proposes to exclude hadoop-client-api transitive dependency from orc-core. Why are the changes needed? Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency on Hadoop 3.3.1. This causes test-dependencies.sh failure on Java 17. As a result, run-tests.py also fails. was: In the current master, `run-tests.py` fails on Java 17 due to `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for Java 17. Hadoop 2.7 doesn't support Java 17 so let's > [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from > orc-core > -- > > Key: SPARK-37264 > URL: https://issues.apache.org/jira/browse/SPARK-37264 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like hadoop-common and hadoop-hdfs, this PR proposes to exclude > hadoop-client-api transitive dependency from orc-core. > Why are the changes needed? > Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency > on Hadoop 3.3.1. > This causes test-dependencies.sh failure on Java 17. As a result, > run-tests.py also fails. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37264) Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7
[ https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37264: --- Description: In the current master, `run-tests.py` fails on Java 17 due to `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for Java 17. Hadoop 2.7 doesn't support Java 17 so let's was: In the current master, `run-tests.py` fails on Java 17 due to `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for Java 17. Currently, we don't maintain the dependency manifests for Java 17 yet so let's skip it temporarily. > Cut the transitive dependency on hadoop-client-api which orc-shims depends on > only for Java 17 with hadoop-2.7 > -- > > Key: SPARK-37264 > URL: https://issues.apache.org/jira/browse/SPARK-37264 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current master, `run-tests.py` fails on Java 17 due to > `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile > dependency on hadoop-client-api:3.3.1 only for Java 17. > Hadoop 2.7 doesn't support Java 17 so let's -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37264) [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core
[ https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37264: --- Summary: [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core (was: Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7) > [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from > orc-core > -- > > Key: SPARK-37264 > URL: https://issues.apache.org/jira/browse/SPARK-37264 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current master, `run-tests.py` fails on Java 17 due to > `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile > dependency on hadoop-client-api:3.3.1 only for Java 17. > Hadoop 2.7 doesn't support Java 17 so let's -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37264) Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7
[ https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37264: --- Summary: Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7 (was: Skip dependency testing on Java 17 temporarily) > Cut the transitive dependency on hadoop-client-api which orc-shims depends on > only for Java 17 with hadoop-2.7 > -- > > Key: SPARK-37264 > URL: https://issues.apache.org/jira/browse/SPARK-37264 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current master, `run-tests.py` fails on Java 17 due to > `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile > dependency on hadoop-client-api:3.3.1 only for Java 17. > Currently, we don't maintain the dependency manifests for Java 17 yet so > let's skip it temporarily. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37264) Skip dependency testing on Java 17 temporarily
[ https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37264: --- Description: In the current master, `run-tests.py` fails on Java 17 due to `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for Java 17. Currently, we don't maintain the dependency manifests for Java 17 yet so let's skip it temporarily. was: In the current master, test-dependencies.sh fails on Java 17 because orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for Java 17. Currently, we don't maintain the dependency manifests for Java 17 yet so let's skip it temporarily. > Skip dependency testing on Java 17 temporarily > -- > > Key: SPARK-37264 > URL: https://issues.apache.org/jira/browse/SPARK-37264 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > In the current master, `run-tests.py` fails on Java 17 due to > `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile > dependency on hadoop-client-api:3.3.1 only for Java 17. > Currently, we don't maintain the dependency manifests for Java 17 yet so > let's skip it temporarily. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37265) Support Java 17 in `dev/test-dependencies.sh`
Kousuke Saruta created SPARK-37265: -- Summary: Support Java 17 in `dev/test-dependencies.sh` Key: SPARK-37265 URL: https://issues.apache.org/jira/browse/SPARK-37265 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37264) Skip dependency testing on Java 17 temporarily
Kousuke Saruta created SPARK-37264: -- Summary: Skip dependency testing on Java 17 temporarily Key: SPARK-37264 URL: https://issues.apache.org/jira/browse/SPARK-37264 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In the current master, test-dependencies.sh fails on Java 17 because orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for Java 17. Currently, we don't maintain the dependency manifests for Java 17 yet so let's skip it temporarily. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-36895) Add Create Index syntax support
[ https://issues.apache.org/jira/browse/SPARK-36895 ] Kousuke Saruta deleted comment on SPARK-36895: was (Author: sarutak): The change in https://github.com/apache/spark/pull/34148 was reverted and resolved again in https://github.com/apache/spark/pull/34523 > Add Create Index syntax support > --- > > Key: SPARK-36895 > URL: https://issues.apache.org/jira/browse/SPARK-36895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36895) Add Create Index syntax support
[ https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440696#comment-17440696 ] Kousuke Saruta commented on SPARK-36895: The change in https://github.com/apache/spark/pull/34148 was reverted and resolved again in https://github.com/apache/spark/pull/34523 > Add Create Index syntax support > --- > > Key: SPARK-36895 > URL: https://issues.apache.org/jira/browse/SPARK-36895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37240) Cannot read partitioned parquet files with ANSI interval partition values
[ https://issues.apache.org/jira/browse/SPARK-37240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37240. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34517 > Cannot read partitioned parquet files with ANSI interval partition values > - > > Key: SPARK-37240 > URL: https://issues.apache.org/jira/browse/SPARK-37240 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > The code below demonstrates the issue: > {code:scala} > scala> sql("SELECT INTERVAL '1' YEAR AS i, 0 as > id").write.partitionBy("i").parquet("/Users/maximgekk/tmp/ansi_interval_parquet") > scala> spark.read.schema("i INTERVAL YEAR, id > INT").parquet("/Users/maximgekk/tmp/ansi_interval_parquet").show(false) > 21/11/08 10:56:36 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.RuntimeException: DataType INTERVAL YEAR is not supported in column > vectorized reader. > at > org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:100) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:243) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-36038) Basic speculation metrics at stage level
[ https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta reopened SPARK-36038: Assignee: (was: Venkata krishnan Sowrirajan) The change was reverted. https://github.com/apache/spark/pull/34518 So I re-open this. > Basic speculation metrics at stage level > > > Key: SPARK-36038 > URL: https://issues.apache.org/jira/browse/SPARK-36038 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > Fix For: 3.3.0 > > > Currently there are no speculation metrics available either at application > level or at stage level. With in our platform, we have added speculation > metrics at stage level as a summary similarly to the stage level metrics > tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled > etc. This enables us to effectively understand speculative execution feature > at an application level and helps in further tuning the speculation configs. > cc [~ron8hu] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37158) Add doc about spark not supported hive built-in function
[ https://issues.apache.org/jira/browse/SPARK-37158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37158. Resolution: Won't Fix See the discussion. https://github.com/apache/spark/pull/34434#issuecomment-954545315 > Add doc about spark not supported hive built-in function > > > Key: SPARK-37158 > URL: https://issues.apache.org/jira/browse/SPARK-37158 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Add doc about spark not supported hive built-in function -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37238) Upgrade ORC to 1.6.12
[ https://issues.apache.org/jira/browse/SPARK-37238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37238. Fix Version/s: 3.2.1 Assignee: Dongjoon Hyun Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34512 > Upgrade ORC to 1.6.12 > - > > Key: SPARK-37238 > URL: https://issues.apache.org/jira/browse/SPARK-37238 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.1 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37211. Fix Version/s: 3.3.0 Assignee: Yuto Akutsu Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34487 > More descriptions and adding an image to the failure message about enabling > GitHub Actions > -- > > Key: SPARK-37211 > URL: https://issues.apache.org/jira/browse/SPARK-37211 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Yuto Akutsu >Assignee: Yuto Akutsu >Priority: Minor > Fix For: 3.3.0 > > > I've seen and experienced that the build-and-test workflow of first-time PRs > fails and it was caused by developers forgetting to enable Github Actions on > their own repositories. > I think developers will be able to notice the cause quicker by adding more > descriptions and an image to the test-failure message. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37231) Dynamic writes/reads of ANSI interval partitions
[ https://issues.apache.org/jira/browse/SPARK-37231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37231. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34506 > Dynamic writes/reads of ANSI interval partitions > > > Key: SPARK-37231 > URL: https://issues.apache.org/jira/browse/SPARK-37231 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Check and fix if it's needed dynamic partitions writes of ANSI intervals. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7
[ https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438986#comment-17438986 ] Kousuke Saruta commented on SPARK-35496: [~dongjoon] Thank you for letting me know. That's great. > Upgrade Scala 2.13 to 2.13.7 > > > Key: SPARK-35496 > URL: https://issues.apache.org/jira/browse/SPARK-35496 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > This issue aims to upgrade to Scala 2.13.7. > Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). > However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 > which is different from both Scala 2.13.5 and Scala 3. > - https://github.com/scala/bug/issues/12403 > {code} > scala3-3.0.0:$ bin/scala > scala> Array.empty[Double].intersect(Array(0.0)) > val res0: Array[Double] = Array() > scala-2.13.6:$ bin/scala > Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292). > Type in expressions for evaluation. Or try :help. > scala> Array.empty[Double].intersect(Array(0.0)) > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D > ... 32 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7
[ https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438535#comment-17438535 ] Kousuke Saruta commented on SPARK-35496: [~LuciferYang] Scala 2.13.7 was released a few days ago. https://github.com/scala/scala/releases/tag/v2.13.7 Would you like to continue to work on this? > Upgrade Scala 2.13 to 2.13.7 > > > Key: SPARK-35496 > URL: https://issues.apache.org/jira/browse/SPARK-35496 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > This issue aims to upgrade to Scala 2.13.7. > Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). > However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 > which is different from both Scala 2.13.5 and Scala 3. > - https://github.com/scala/bug/issues/12403 > {code} > scala3-3.0.0:$ bin/scala > scala> Array.empty[Double].intersect(Array(0.0)) > val res0: Array[Double] = Array() > scala-2.13.6:$ bin/scala > Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292). > Type in expressions for evaluation. Or try :help. > scala> Array.empty[Double].intersect(Array(0.0)) > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D > ... 32 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37206) Upgrade Avro to 1.11.0
Kousuke Saruta created SPARK-37206: -- Summary: Upgrade Avro to 1.11.0 Key: SPARK-37206 URL: https://issues.apache.org/jira/browse/SPARK-37206 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Recently, Avro 1.1.0 was released which includes bunch of bug fixes. https://issues.apache.org/jira/issues/?jql=project%3DAVRO%20AND%20fixVersion%3D1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37108) Expose make_date expression in R
[ https://issues.apache.org/jira/browse/SPARK-37108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37108. Fix Version/s: 3.3.0 Assignee: Leona Yoda Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34480 > Expose make_date expression in R > > > Key: SPARK-37108 > URL: https://issues.apache.org/jira/browse/SPARK-37108 > Project: Spark > Issue Type: Improvement > Components: R >Affects Versions: 3.3.0 >Reporter: Leona Yoda >Assignee: Leona Yoda >Priority: Minor > Fix For: 3.3.0 > > > Expose make_date API on SparkR. > > (cf. https://issues.apache.org/jira/browse/SPARK-36554) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37159. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34425 > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. &
[jira] [Resolved] (SPARK-36554) Error message while trying to use spark sql functions directly on dataframe columns without using select expression
[ https://issues.apache.org/jira/browse/SPARK-36554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36554. Fix Version/s: 3.3.0 Assignee: Nicolas Azrak Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34356 > Error message while trying to use spark sql functions directly on dataframe > columns without using select expression > --- > > Key: SPARK-36554 > URL: https://issues.apache.org/jira/browse/SPARK-36554 > Project: Spark > Issue Type: Bug > Components: Documentation, Examples, PySpark >Affects Versions: 3.1.1 >Reporter: Lekshmi Ramachandran >Assignee: Nicolas Azrak >Priority: Minor > Labels: documentation, features, functions, spark-sql > Fix For: 3.3.0 > > Attachments: Screen Shot .png > > Original Estimate: 24h > Remaining Estimate: 24h > > The below code generates a dataframe successfully . Here make_date function > is used inside a select expression > > from pyspark.sql.functions import expr, make_date > df = spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],['Y', > 'M', 'D']) > df.select("*",expr("make_date(Y,M,D) as lk")).show() > > The below code fails with a message "cannot import name 'make_date' from > 'pyspark.sql.functions'" . Here the make_date function is directly called on > dataframe columns without select expression > > from pyspark.sql.functions import make_date > df = spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],['Y', > 'M', 'D']) > df.select(make_date(df.Y,df.M,df.D).alias("datefield")).show() > > The error message generated is misleading when it says "cannot import > make_date from pyspark.sql.functions" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37170) Pin PySpark version installed in the Binder environment for tagged commit
[ https://issues.apache.org/jira/browse/SPARK-37170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37170: --- Summary: Pin PySpark version installed in the Binder environment for tagged commit (was: Pin PySpark version for Binder) > Pin PySpark version installed in the Binder environment for tagged commit > - > > Key: SPARK-37170 > URL: https://issues.apache.org/jira/browse/SPARK-37170 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 > Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > I noticed that the PySpark 3.1.2 is installed in the live notebook > environment even though the notebook is for PySpark 3.2.0. > http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html > I guess someone accessed to Binder and built the container image with v3.2.0 > before we published the pyspark package to PyPi. > https://mybinder.org/ > I think it's difficult to rebuild the image manually. > To avoid such accident, I'll propose to pin the version of PySpark in > binder/postBuild > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37170) Pin PySpark version for Binder
[ https://issues.apache.org/jira/browse/SPARK-37170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37170: --- Description: I noticed that the PySpark 3.1.2 is installed in the live notebook environment even though the notebook is for PySpark 3.2.0. http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html I guess someone accessed to Binder and built the container image with v3.2.0 before we published the pyspark package to PyPi. https://mybinder.org/ I think it's difficult to rebuild the image manually. To avoid such accident, I'll propose to pin the version of PySpark in binder/postBuild was: I noticed that the PySpark 3.1.2 is installed in the live notebook environment even though the notebook is for PySpark 3.2. http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html I guess someone accessed to Binder and built the container image with v3.2.0 before we published the pyspark package to PyPi. https://mybinder.org/ I think it's difficult to rebuild the image manually. To avoid such accident, I'll propose to pin the version of PySpark in binder/postBuild > Pin PySpark version for Binder > -- > > Key: SPARK-37170 > URL: https://issues.apache.org/jira/browse/SPARK-37170 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > I noticed that the PySpark 3.1.2 is installed in the live notebook > environment even though the notebook is for PySpark 3.2.0. > http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html > I guess someone accessed to Binder and built the container image with v3.2.0 > before we published the pyspark package to PyPi. > https://mybinder.org/ > I think it's difficult to rebuild the image manually. > To avoid such accident, I'll propose to pin the version of PySpark in > binder/postBuild > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37170) Pin PySpark version for Binder
[ https://issues.apache.org/jira/browse/SPARK-37170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37170: --- Description: I noticed that the PySpark 3.1.2 is installed in the live notebook environment even though the notebook is for PySpark 3.2. http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html I guess someone accessed to Binder and built the container image with v3.2.0 before we published the pyspark package to PyPi. https://mybinder.org/ I think it's difficult to rebuild the image manually. To avoid such accident, I'll propose to pin the version of PySpark in binder/postBuild was: I noticed that the PySpark 3.1.2 is installed in the environment of live notebook even though the notebook is for PySpark 3.2. http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html I guess someone accessed to Binder and built the container image with v3.2.0 before we published the pyspark package to PyPi. https://mybinder.org/ I think it's difficult to rebuild the image manually. To avoid such accident, I'll propose to pin the version of PySpark in binder/postBuild > Pin PySpark version for Binder > -- > > Key: SPARK-37170 > URL: https://issues.apache.org/jira/browse/SPARK-37170 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > I noticed that the PySpark 3.1.2 is installed in the live notebook > environment even though the notebook is for PySpark 3.2. > http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html > I guess someone accessed to Binder and built the container image with v3.2.0 > before we published the pyspark package to PyPi. > https://mybinder.org/ > I think it's difficult to rebuild the image manually. > To avoid such accident, I'll propose to pin the version of PySpark in > binder/postBuild > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37170) Pin PySpark version for Binder
Kousuke Saruta created SPARK-37170: -- Summary: Pin PySpark version for Binder Key: SPARK-37170 URL: https://issues.apache.org/jira/browse/SPARK-37170 Project: Spark Issue Type: Bug Components: docs, PySpark Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta I noticed that the PySpark 3.1.2 is installed in the environment of live notebook even though the notebook is for PySpark 3.2. http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html I guess someone accessed to Binder and built the container image with v3.2.0 before we published the pyspark package to PyPi. https://mybinder.org/ I think it's difficult to rebuild the image manually. To avoid such accident, I'll propose to pin the version of PySpark in binder/postBuild -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
Kousuke Saruta created SPARK-37159: -- Summary: Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 Key: SPARK-37159 URL: https://issues.apache.org/jira/browse/SPARK-37159 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but `HiveExternalCatalogVersionsSuite`. {code} [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** (42 seconds, 526 milliseconds) [info] spark-submit returned with exit code 1. [info] Command line: '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' 'spark.sql.hive.metastore.version=2.3' '--conf' 'spark.sql.hive.metastore.jars=maven' '--conf' 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' [info] [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO SparkContext: Running Spark version 3.2.0 [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO ResourceUtils: == [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO ResourceUtils: No custom resources configured for spark.driver. [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO ResourceUtils: == [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO SparkContext: Submitted application: prepare testing tables [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO ResourceProfile: Limiting resource is cpu [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO ResourceProfileManager: Added ResourceProfile id: 0 [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing view acls to: kou [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing modify acls to: kou [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing view acls groups to: [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing modify acls groups to: [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kou); groups with view permissions: Set(); users with modify permissions: Set(kou); groups with modify permissions: Set() [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: Successfully started service 'sparkDriver' on port 35867. [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: Registering MapOutputTracker [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: Registering BlockManagerMaster [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information [info] 2021-10-28 06:07:18.944 - stderr> 21/10/28 22:07:18 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up [info] 2021-10-28 06:07:18.945 - stdout> Traceback (most recent call last): [info] 2021-10-28 06:07:18
[jira] [Created] (SPARK-37112) Fix MiMa failure with Scala 2.13
Kousuke Saruta created SPARK-37112: -- Summary: Fix MiMa failure with Scala 2.13 Key: SPARK-37112 URL: https://issues.apache.org/jira/browse/SPARK-37112 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta SPARK-36151 re-enabled MiMa for Scala 2.13 but it always fails in the scheduled build. https://github.com/apache/spark/runs/3992588994?check_suite_focus=true#step:9:2303 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37103) Switch from Maven to SBT to build Spark on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-37103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37103: --- Description: Recently, building Spark on AppVeyor almost always fails due to StackOverflowError at compile time. We can't identify the reason so far but one workaround would be building with SBT. was: Recently, building Spark on AppVeyor almost always fails due to StackOverflowError at compile time. We can't identify the reason so far but one workaround would be building with SBT. > Switch from Maven to SBT to build Spark on AppVeyor > --- > > Key: SPARK-37103 > URL: https://issues.apache.org/jira/browse/SPARK-37103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > Recently, building Spark on AppVeyor almost always fails due to > StackOverflowError at compile time. > We can't identify the reason so far but one workaround would be building with > SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37103) Switch from Maven to SBT to build Spark on AppVeyor
Kousuke Saruta created SPARK-37103: -- Summary: Switch from Maven to SBT to build Spark on AppVeyor Key: SPARK-37103 URL: https://issues.apache.org/jira/browse/SPARK-37103 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Recently, building Spark on AppVeyor almost always fails due to StackOverflowError at compile time. We can't identify the reason so far but one workaround would be building with SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37086) Fix the R test of FPGrowthModel for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37086: --- Description: Similar to the issue filed in SPARK-37059, the R test of FPGrowthModel assumes that the result records returned by FPGrowthModel.freqItemsets are sorted by a certain kind of order but it's wrong. As a result, the test fails with Scala 2.13. {code} ══ Failed ══ ── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth ── `expected_itemsets` not equivalent to `itemsets`. Component “items”: Component 1: Component 1: 1 string mismatch Component “items”: Component 2: Length mismatch: comparison on first 1 components Component “items”: Component 2: Component 1: 1 string mismatch Component “items”: Component 3: Length mismatch: comparison on first 1 components Component “items”: Component 4: Length mismatch: comparison on first 1 components Component “items”: Component 4: Component 1: 1 string mismatch Component “items”: Component 5: Length mismatch: comparison on first 1 components Component “items”: Component 5: Component 1: 1 string mismatch Component “freq”: Mean relative difference: 0.5454545 {code} was: Similar to the issue filed in SPARK-37059, an R test of FPGrowthModel assumes that the result records returned by FPGrowthModel.freqItemsets are sorted by a certain kind of order but it's wrong. As a result, such tests fail with Scala 2.13. {code} ══ Failed ══ ── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth ── `expected_itemsets` not equivalent to `itemsets`. Component “items”: Component 1: Component 1: 1 string mismatch Component “items”: Component 2: Length mismatch: comparison on first 1 components Component “items”: Component 2: Component 1: 1 string mismatch Component “items”: Component 3: Length mismatch: comparison on first 1 components Component “items”: Component 4: Length mismatch: comparison on first 1 components Component “items”: Component 4: Component 1: 1 string mismatch Component “items”: Component 5: Length mismatch: comparison on first 1 components Component “items”: Component 5: Component 1: 1 string mismatch Component “freq”: Mean relative difference: 0.5454545 {code} > Fix the R test of FPGrowthModel for Scala 2.13 > -- > > Key: SPARK-37086 > URL: https://issues.apache.org/jira/browse/SPARK-37086 > Project: Spark > Issue Type: Bug > Components: ML, R, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > Similar to the issue filed in SPARK-37059, the R test of FPGrowthModel > assumes that the result records returned by FPGrowthModel.freqItemsets are > sorted by a certain kind of order but it's wrong. > As a result, the test fails with Scala 2.13. > {code} > ══ Failed > ══ > ── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth > ── > `expected_itemsets` not equivalent to `itemsets`. > Component “items”: Component 1: Component 1: 1 string mismatch > Component “items”: Component 2: Length mismatch: comparison on first 1 > components > Component “items”: Component 2: Component 1: 1 string mismatch > Component “items”: Component 3: Length mismatch: comparison on first 1 > components > Component “items”: Component 4: Length mismatch: comparison on first 1 > components > Component “items”: Component 4: Component 1: 1 string mismatch > Component “items”: Component 5: Length mismatch: comparison on first 1 > components > Component “items”: Component 5: Component 1: 1 string mismatch > Component “freq”: Mean relative difference: 0.5454545 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37086) Fix the R test of FPGrowthModel for Scala 2.13
Kousuke Saruta created SPARK-37086: -- Summary: Fix the R test of FPGrowthModel for Scala 2.13 Key: SPARK-37086 URL: https://issues.apache.org/jira/browse/SPARK-37086 Project: Spark Issue Type: Bug Components: ML, R, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Similar to the issue filed in SPARK-37059, an R test of FPGrowthModel assumes that the result records returned by FPGrowthModel.freqItemsets are sorted by a certain kind of order but it's wrong. As a result, such tests fail with Scala 2.13. {code} ══ Failed ══ ── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth ── `expected_itemsets` not equivalent to `itemsets`. Component “items”: Component 1: Component 1: 1 string mismatch Component “items”: Component 2: Length mismatch: comparison on first 1 components Component “items”: Component 2: Component 1: 1 string mismatch Component “items”: Component 3: Length mismatch: comparison on first 1 components Component “items”: Component 4: Length mismatch: comparison on first 1 components Component “items”: Component 4: Component 1: 1 string mismatch Component “items”: Component 5: Length mismatch: comparison on first 1 components Component “items”: Component 5: Component 1: 1 string mismatch Component “freq”: Mean relative difference: 0.5454545 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37081) Upgrade the version of RDBMS and corresponding JDBC drivers used by docker-integration-tests
Kousuke Saruta created SPARK-37081: -- Summary: Upgrade the version of RDBMS and corresponding JDBC drivers used by docker-integration-tests Key: SPARK-37081 URL: https://issues.apache.org/jira/browse/SPARK-37081 Project: Spark Issue Type: Improvement Components: SQL, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Let's upgrade the version of RDBMS and corresponding JDBC drivers. Especially, PostgreSQL 14 was released recently so it's great to ensure that the JDBC source for PostgreSQL works with PostgreSQL 14. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37076) Implement StructType.toString explicitly for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37076: --- Summary: Implement StructType.toString explicitly for Scala 2.13 (was: Implements StructType.toString explicitly for Scala 2.13) > Implement StructType.toString explicitly for Scala 2.13 > --- > > Key: SPARK-37076 > URL: https://issues.apache.org/jira/browse/SPARK-37076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 > Environment: > Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > The string returned by StructType.toString is different between Scala 2.12 > and 2.13. > * Scala 2.12 > {code} > val st = StructType(StructField("a", IntegerType) :: Nil) > st.toString > res0: String = StructType(StructField(a,IntegerType,true) > {code} > * Scala 2.13 > {code} > val st = StructType(StructField("a", IntegerType) :: Nil) > st.toString > val res0: String = Seq(StructField(a,IntegerType,true)) > {code} > It's because the logic to make the prefix of the string was changed from > Scala 2.13. > Scala 2.12: > https://github.com/scala/scala/blob/v2.12.15/src/library/scala/collection/TraversableLike.scala#L804 > Scala > 2:13:https://github.com/scala/scala/blob/v2.13.5/src/library/scala/collection/Seq.scala#L46 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37076) Implements StructType.toString explicitly for Scala 2.13
Kousuke Saruta created SPARK-37076: -- Summary: Implements StructType.toString explicitly for Scala 2.13 Key: SPARK-37076 URL: https://issues.apache.org/jira/browse/SPARK-37076 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Environment: Reporter: Kousuke Saruta Assignee: Kousuke Saruta The string returned by StructType.toString is different between Scala 2.12 and 2.13. * Scala 2.12 {code} val st = StructType(StructField("a", IntegerType) :: Nil) st.toString res0: String = StructType(StructField(a,IntegerType,true) {code} * Scala 2.13 {code} val st = StructType(StructField("a", IntegerType) :: Nil) st.toString val res0: String = Seq(StructField(a,IntegerType,true)) {code} It's because the logic to make the prefix of the string was changed from Scala 2.13. Scala 2.12: https://github.com/scala/scala/blob/v2.12.15/src/library/scala/collection/TraversableLike.scala#L804 Scala 2:13:https://github.com/scala/scala/blob/v2.13.5/src/library/scala/collection/Seq.scala#L46 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37059. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34330 > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > The collect_set builtin function doesn't ensure the sort order of its result > for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order > of the result rows. > Nevertheless, their PySpark doctests assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37059: --- Description: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of the result rows. Nevertheless, their PySpark doctests assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. was: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of the result rows. Nevertheless, their doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > The collect_set builtin function doesn't ensure the sort order of its result > for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order > of the result rows. > Nevertheless, their PySpark doctests assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37059: --- Description: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of the result rows. Nevertheless, thier doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. was: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result rows. Nevertheless, thier doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > The collect_set builtin function doesn't ensure the sort order of its result > for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order > of the result rows. > Nevertheless, thier doctests for PySpark assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37059: --- Description: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of the result rows. Nevertheless, their doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. was: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of the result rows. Nevertheless, thier doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > The collect_set builtin function doesn't ensure the sort order of its result > for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order > of the result rows. > Nevertheless, their doctests for PySpark assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37059: --- Description: The collect_set builtin function doesn't ensure the sort order of its result for each row. FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result rows. Nevertheless, thier doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. was: The collect_set builtin function doesn't ensure the sort order of its result. FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result rows. Nevertheless, thier doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > The collect_set builtin function doesn't ensure the sort order of its result > for each row. FPGrouthModel.freqItemsets also doesn' ensure the sort order of > the result rows. > Nevertheless, thier doctests for PySpark assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37059: --- Component/s: Tests > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > The collect_set builtin function doesn't ensure the sort order of its result. > FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result > rows. > Nevertheless, thier doctests for PySpark assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests
[ https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37059: --- Summary: Ensure the sort order of the output in the PySpark doctests (was: Ensure the sort order of the output in the PySpark examples) > Ensure the sort order of the output in the PySpark doctests > --- > > Key: SPARK-37059 > URL: https://issues.apache.org/jira/browse/SPARK-37059 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > The collect_set builtin function doesn't ensure the sort order of its result. > FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result > rows. > Nevertheless, thier doctests for PySpark assume a certain kind of sort order, > causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37059) Ensure the sort order of the output in the PySpark examples
Kousuke Saruta created SPARK-37059: -- Summary: Ensure the sort order of the output in the PySpark examples Key: SPARK-37059 URL: https://issues.apache.org/jira/browse/SPARK-37059 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta The collect_set builtin function doesn't ensure the sort order of its result. FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result rows. Nevertheless, thier doctests for PySpark assume a certain kind of sort order, causing that such doctests fail with Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Component/s: Build > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: Build, ML >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Summary: Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 (was: Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13) > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Summary: Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13 (was: ResolvedRFormula.toString throws ClassCastException with Scala 2.13) > Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Description: ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. (was: ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the type of ResolvedRFormula.terms is scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.) > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 > --- > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13
Kousuke Saruta created SPARK-37026: -- Summary: ResolvedRFormula.toString throws ClassCastException with Scala 2.13 Key: SPARK-37026 URL: https://issues.apache.org/jira/browse/SPARK-37026 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the type of ResolvedRFormula.terms is scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Issue Type: Bug (was: Improvement) > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 > --- > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is > scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but > scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will > be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36922) The SIGN/SIGNUM functions should support ANSI intervals
[ https://issues.apache.org/jira/browse/SPARK-36922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36922. Fix Version/s: 3.3.0 Assignee: PengLei Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34256 > The SIGN/SIGNUM functions should support ANSI intervals > --- > > Key: SPARK-36922 > URL: https://issues.apache.org/jira/browse/SPARK-36922 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: PengLei >Priority: Major > Fix For: 3.3.0 > > > Extend the *sign/signum* functions to support ANSI intervals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value
[ https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36993: --- Summary: Fix json_tuple throw NPE if fields exist no foldable null value (was: Fix json_tupe throw NPE if fields exist no foldable null value) > Fix json_tuple throw NPE if fields exist no foldable null value > --- > > Key: SPARK-36993 > URL: https://issues.apache.org/jira/browse/SPARK-36993 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: XiDuo You >Priority: Major > > If json_tuple exists no foldable null field, Spark would throw NPE during > eval field.toString. > e.g. the query will fail with: > {code:java} > SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS > c1 ); > {code} > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435) > at > org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36981) Upgrade joda-time to 2.10.12
[ https://issues.apache.org/jira/browse/SPARK-36981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36981. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34253 > Upgrade joda-time to 2.10.12 > > > Key: SPARK-36981 > URL: https://issues.apache.org/jira/browse/SPARK-36981 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > joda-time 2.10.12 seems to support an updated TZDB. > https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12 > https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36972) Add max_by/min_by API to PySpark
[ https://issues.apache.org/jira/browse/SPARK-36972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36972. Fix Version/s: 3.3.0 Assignee: Leona Yoda Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34240 > Add max_by/min_by API to PySpark > > > Key: SPARK-36972 > URL: https://issues.apache.org/jira/browse/SPARK-36972 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Leona Yoda >Assignee: Leona Yoda >Priority: Minor > Fix For: 3.3.0 > > > Related issues > - https://issues.apache.org/jira/browse/SPARK-27653 > * https://issues.apache.org/jira/browse/SPARK-36963 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36981) Upgrade joda-time to 2.10.12
[ https://issues.apache.org/jira/browse/SPARK-36981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36981: --- Description: joda-time 2.10.12 seems to support an updated TZDB. https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12 https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547 was: joda-time 2.10.12 seems to support the updated TZDB. https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12 https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547 > Upgrade joda-time to 2.10.12 > > > Key: SPARK-36981 > URL: https://issues.apache.org/jira/browse/SPARK-36981 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Minor > > joda-time 2.10.12 seems to support an updated TZDB. > https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12 > https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36981) Upgrade joda-time to 2.10.12
Kousuke Saruta created SPARK-36981: -- Summary: Upgrade joda-time to 2.10.12 Key: SPARK-36981 URL: https://issues.apache.org/jira/browse/SPARK-36981 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta joda-time 2.10.12 seems to support the updated TZDB. https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12 https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36960) Pushdown filters with ANSI interval values to ORC
Kousuke Saruta created SPARK-36960: -- Summary: Pushdown filters with ANSI interval values to ORC Key: SPARK-36960 URL: https://issues.apache.org/jira/browse/SPARK-36960 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Now that V1 and V2 ORC datasources support ANSI intervals, it's great to be able to push down filters with ANSI interval values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36937) Change OrcSourceSuite to test both V1 and V2 sources.
[ https://issues.apache.org/jira/browse/SPARK-36937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36937: --- Summary: Change OrcSourceSuite to test both V1 and V2 sources. (was: Re-structure OrcSourceSuite to test both V1 and V2 sources.) > Change OrcSourceSuite to test both V1 and V2 sources. > - > > Key: SPARK-36937 > URL: https://issues.apache.org/jira/browse/SPARK-36937 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > There is no V2 test for the ORC source which implements > CommonFileDataSourceSuite while the corresponding ones exist for all other > built-in file-based datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36937) Re-structure OrcSourceSuite to test both V1 and V2 sources.
Kousuke Saruta created SPARK-36937: -- Summary: Re-structure OrcSourceSuite to test both V1 and V2 sources. Key: SPARK-36937 URL: https://issues.apache.org/jira/browse/SPARK-36937 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta There is no V2 test for the ORC source which implements CommonFileDataSourceSuite while the corresponding ones exist for all other built-in file-based datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36937) Re-structure OrcSourceSuite to test both V1 and V2 sources.
[ https://issues.apache.org/jira/browse/SPARK-36937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36937: --- Issue Type: Improvement (was: Bug) > Re-structure OrcSourceSuite to test both V1 and V2 sources. > --- > > Key: SPARK-36937 > URL: https://issues.apache.org/jira/browse/SPARK-36937 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta > Assignee: Kousuke Saruta >Priority: Major > > There is no V2 test for the ORC source which implements > CommonFileDataSourceSuite while the corresponding ones exist for all other > built-in file-based datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36931) Read/write dataframes with ANSI intervals from/to ORC files
[ https://issues.apache.org/jira/browse/SPARK-36931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36931: --- Summary: Read/write dataframes with ANSI intervals from/to ORC files (was: Read/write dataframes with ANSI intervals from/to parquet files) > Read/write dataframes with ANSI intervals from/to ORC files > --- > > Key: SPARK-36931 > URL: https://issues.apache.org/jira/browse/SPARK-36931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 > Reporter: Kousuke Saruta >Priority: Major > > Implement writing and reading ANSI intervals (year-month and day-time > intervals) columns in dataframes to ORC datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36931) Read/write dataframes with ANSI intervals from/to parquet files
Kousuke Saruta created SPARK-36931: -- Summary: Read/write dataframes with ANSI intervals from/to parquet files Key: SPARK-36931 URL: https://issues.apache.org/jira/browse/SPARK-36931 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Kousuke Saruta Implement writing and reading ANSI intervals (year-month and day-time intervals) columns in dataframes to ORC datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36038) Basic speculation metrics at stage level
[ https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36038. Fix Version/s: 3.3.0 Assignee: Venkata krishnan Sowrirajan Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/33253 > Basic speculation metrics at stage level > > > Key: SPARK-36038 > URL: https://issues.apache.org/jira/browse/SPARK-36038 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Venkata krishnan Sowrirajan >Assignee: Venkata krishnan Sowrirajan >Priority: Major > Fix For: 3.3.0 > > > Currently there are no speculation metrics available either at application > level or at stage level. With in our platform, we have added speculation > metrics at stage level as a summary similarly to the stage level metrics > tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled > etc. This enables us to effectively understand speculative execution feature > at an application level and helps in further tuning the speculation configs. > cc [~ron8hu] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422557#comment-17422557 ] Kousuke Saruta commented on SPARK-36861: Hmm, if a "T" follows the date part but it's not a valid ISO 8601 format, casting a string to date should fail ? In PostgreSQL, parsing will fail in such case. > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Blocker > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36899) Support ILIKE API on R
[ https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36899. Fix Version/s: 3.3.0 Assignee: Leona Yoda Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/34152 > Support ILIKE API on R > -- > > Key: SPARK-36899 > URL: https://issues.apache.org/jira/browse/SPARK-36899 > Project: Spark > Issue Type: Sub-task > Components: R >Affects Versions: 3.3.0 >Reporter: Leona Yoda >Assignee: Leona Yoda >Priority: Major > Fix For: 3.3.0 > > > Support ILIKE (case sensitive LIKE) API on R -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files
[ https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36830: --- Description: Implement writing and reading ANSI intervals (year-month and day-time intervals) columns in dataframes to JSON datasources. (was: Implement writing and reading ANSI intervals (year-month and day-time intervals) columns in dataframes to Parquet datasources.) > Read/write dataframes with ANSI intervals from/to JSON files > > > Key: SPARK-36830 > URL: https://issues.apache.org/jira/browse/SPARK-36830 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Implement writing and reading ANSI intervals (year-month and day-time > intervals) columns in dataframes to JSON datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files
[ https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422483#comment-17422483 ] Kousuke Saruta commented on SPARK-36830: Thank you, will do. > Read/write dataframes with ANSI intervals from/to JSON files > > > Key: SPARK-36830 > URL: https://issues.apache.org/jira/browse/SPARK-36830 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Implement writing and reading ANSI intervals (year-month and day-time > intervals) columns in dataframes to Parquet datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files
[ https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422108#comment-17422108 ] Kousuke Saruta commented on SPARK-36831: Thank you. I'll open a PR. > Read/write dataframes with ANSI intervals from/to CSV files > --- > > Key: SPARK-36831 > URL: https://issues.apache.org/jira/browse/SPARK-36831 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Implement writing and reading ANSI intervals (year-month and day-time > intervals) columns in dataframes to CSV datasources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org