from:"Kousuke Saruta"

[jira] [Resolved] (SPARK-37635) SHOW TBLPROPERTIES should print the fully qualified table name

2021-12-14 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37635.

Fix Version/s: 3.3.0
 Assignee: Wenchen Fan
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34890

> SHOW TBLPROPERTIES should print the fully qualified table name
> --
>
> Key: SPARK-37635
> URL: https://issues.apache.org/jira/browse/SPARK-37635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37310) Migrate ALTER NAMESPACE ... SET PROPERTIES to use v2 command by default

2021-12-14 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37310.

Fix Version/s: 3.3.0
 Assignee: Terry Kim  (was: Apache Spark)
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34891

> Migrate ALTER NAMESPACE ... SET PROPERTIES to use v2 command by default
> ---
>
> Key: SPARK-37310
> URL: https://issues.apache.org/jira/browse/SPARK-37310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate ALTER NAMESPACE ... SET PROPERTIES to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36038) Basic speculation metrics at stage level

2021-12-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36038.

  Assignee: Thejdeep Gudivada
Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34607

> Basic speculation metrics at stage level
> 
>
> Key: SPARK-36038
> URL: https://issues.apache.org/jira/browse/SPARK-36038
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Thejdeep Gudivada
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently there are no speculation metrics available either at application 
> level or at stage level. With in our platform, we have added speculation 
> metrics at stage level as a summary similarly to the stage level metrics 
> tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled 
> etc. This enables us to effectively understand speculative execution feature 
> at an application level and helps in further tuning the speculation configs.
> cc [~ron8hu]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37586) Add cipher mode option and set default cipher mode for aes_encrypt and aes_decrypt

2021-12-08 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37586.

Fix Version/s: 3.3.0
 Assignee: Max Gekk
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34837

> Add cipher mode option and set default cipher mode for aes_encrypt and 
> aes_decrypt
> --
>
> Key: SPARK-37586
> URL: https://issues.apache.org/jira/browse/SPARK-37586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> https://github.com/apache/spark/pull/32801 added aes_encrypt/aes_decrypt 
> functions to spark. However they rely on the jvm's configuration regarding 
> which cipher mode to support, this is problematic as it is not fixed across 
> versions and systems.
> Let's hardcode a default cipher mode and also allow users to set a cipher 
> mode as an argument to the function.
> In the future, we can support other modes like GCM and CBC that have been 
> already supported by other systems:
> # Snowflake: 
> https://docs.snowflake.com/en/sql-reference/functions/encrypt.html
> # Bigquery: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aead-encryption-concepts#block_cipher_modes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37568) Support 2-arguments by the convert_timezone() function

2021-12-07 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454960#comment-17454960
 ] 

Kousuke Saruta commented on SPARK-37568:


[~yoda-mon] OK, please go ahead.

> Support 2-arguments by the convert_timezone() function
> --
>
> Key: SPARK-37568
> URL: https://issues.apache.org/jira/browse/SPARK-37568
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> # If sourceTs is a timestamp_ntz, take the sourceTz from the session time 
> zone, see the SQL config spark.sql.session.timeZone
> # If sourceTs is a timestamp_ltz, convert it to a timestamp_ntz using the 
> targetTz



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37568) Support 2-arguments by the convert_timezone() function

2021-12-07 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454947#comment-17454947
 ] 

Kousuke Saruta commented on SPARK-37568:


cc: [~yoda-mon] [~YActs] Do you want to work on this?

> Support 2-arguments by the convert_timezone() function
> --
>
> Key: SPARK-37568
> URL: https://issues.apache.org/jira/browse/SPARK-37568
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> # If sourceTs is a timestamp_ntz, take the sourceTz from the session time 
> zone, see the SQL config spark.sql.session.timeZone
> # If sourceTs is a timestamp_ltz, convert it to a timestamp_ntz using the 
> targetTz



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-12-06 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37469.

Fix Version/s: 3.3.0
 Assignee: Yazhi Wang
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34720

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Assignee: Yazhi Wang
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37529) Support K8s integration tests for Java 17

2021-12-02 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37529:
--

 Summary: Support K8s integration tests for Java 17
 Key: SPARK-37529
 URL: https://issues.apache.org/jira/browse/SPARK-37529
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Now that we can build container image for Java 17, let's support K8s 
integration tests for Java 17.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort

2021-11-30 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451376#comment-17451376
 ] 

Kousuke Saruta commented on SPARK-37487:


[~tanelk] Thank you for pinging me.
I think a sampling job for the global sort performs the extra CollectMetrics 
(operations before the sort are performed twice).
Please let me look into more.

> CollectMetrics is executed twice if it is followed by a sort
> 
>
> Key: SPARK-37487
> URL: https://issues.apache.org/jira/browse/SPARK-37487
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Major
>  Labels: correctness
>
> It is best examplified by this new UT in DataFrameCallbackSuite:
> {code}
>   test("SPARK-37487: get observable metrics with sort by callback") {
> val df = spark.range(100)
>   .observe(
> name = "my_event",
> min($"id").as("min_val"),
> max($"id").as("max_val"),
> // Test unresolved alias
> sum($"id"),
> count(when($"id" % 2 === 0, 1)).as("num_even"))
>   .observe(
> name = "other_event",
> avg($"id").cast("int").as("avg_val"))
>   .sort($"id".desc)
> validateObservedMetrics(df)
>   }
> {code}
> The count and sum aggregate report twice the number of rows:
> {code}
> [info] - SPARK-37487: get observable metrics with sort by callback *** FAILED 
> *** (169 milliseconds)
> [info]   [0,99,9900,100] did not equal [0,99,4950,50] 
> (DataFrameCallbackSuite.scala:342)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342)
> [info]   at 
> org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350)
> [info]   at 
> org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> {code}
> I could not figure out how this happes. Hopefully the UT can help with 
> debugging



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37468:
---
Description: Currently, UnionEstimation doesn't support ANSI intervals and 
TimestampNTZ. But I think it can support those types because their underlying 
types are integer or long, which UnionEstimation can compute stats for.  (was: 
Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But 
I think it can support those types because their underlying types are integer 
or long, which it UnionEstimation can compute stats for.)

> Support ANSI intervals and TimestampNTZ for UnionEstimation
> ---
>
> Key: SPARK-37468
> URL: https://issues.apache.org/jira/browse/SPARK-37468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. 
> But I think it can support those types because their underlying types are 
> integer or long, which UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37468:
--

 Summary: Support ANSI intervals and TimestampNTZ for 
UnionEstimation
 Key: SPARK-37468
 URL: https://issues.apache.org/jira/browse/SPARK-37468
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But 
I think it can support those types because their underlying types are integer 
or long, which it UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37459) Upgrade commons-cli to 1.5.0

2021-11-24 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37459:
--

 Summary: Upgrade commons-cli to 1.5.0
 Key: SPARK-37459
 URL: https://issues.apache.org/jira/browse/SPARK-37459
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Currently used commons-cli is too old and contains an issue which affects the 
behavior of bin/spark-sql

{code}
bin/spark-sql -e 'SELECT "Spark"'
...
Error in query: 
no viable alternative at input 'SELECT "'(line 1, pos 7)

== SQL ==
SELECT "Spark
---^^^
{code}

The root cause of this issue seems to be resolved in CLI-185.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37354) Make the Java version installed on the container image used by the K8s integration tests with SBT configurable

2021-11-21 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37354.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34628

> Make the Java version installed on the container image used by the K8s 
> integration tests with SBT configurable
> --
>
> Key: SPARK-37354
> URL: https://issues.apache.org/jira/browse/SPARK-37354
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.2.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.3.0
>
>
> I noticed that the default Java version installed on the container image used 
> by the K8s integration tests are different depending on the way to run the 
> tests.
> If the tests are launched by Maven, the Java version is 8 is installed.
> On the other hand, if the tests are launched by SBT, the Java version is 11.
> Further, we have no way to change the version.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37354) Make the Java version installed on the container image used by the K8s integration tests with SBT configurable

2021-11-17 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37354:
--

 Summary: Make the Java version installed on the container image 
used by the K8s integration tests with SBT configurable
 Key: SPARK-37354
 URL: https://issues.apache.org/jira/browse/SPARK-37354
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Tests
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


I noticed that the default Java version installed on the container image used 
by the K8s integration tests are different depending on the way to run the 
tests.

If the tests are launched by Maven, the Java version is 8 is installed.
On the other hand, if the tests are launched by SBT, the Java version is 11.
Further, we have no way to change the version.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37319) Support K8s image building with Java 17

2021-11-14 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37319.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34586

> Support K8s image building with Java 17
> ---
>
> Key: SPARK-37319
> URL: https://issues.apache.org/jira/browse/SPARK-37319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37320) Delete py_container_checks.zip after the test in DepsTestsSuite finishes

2021-11-14 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37320:
---
Description: 
When K8s integration tests run, py_container_checks.zip  still remains in 
resource-managers/kubernetes/integration-tests/tests/.
It's is created in the test "Launcher python client dependencies using a zip 
file" in DepsTestsSuite.

  was:
When K8s integration tests run, py_container_checks.zip is still remaining in 
resource-managers/kubernetes/integration-tests/tests/.
It's is created in the test "Launcher python client dependencies using a zip 
file" in DepsTestsSuite.


> Delete py_container_checks.zip after the test in DepsTestsSuite finishes
> 
>
> Key: SPARK-37320
> URL: https://issues.apache.org/jira/browse/SPARK-37320
> Project: Spark
>  Issue Type: Bug
>  Components: k8, Tests
>Affects Versions: 3.2.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> When K8s integration tests run, py_container_checks.zip  still remains in 
> resource-managers/kubernetes/integration-tests/tests/.
> It's is created in the test "Launcher python client dependencies using a zip 
> file" in DepsTestsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37320) Delete py_container_checks.zip after the test in DepsTestsSuite finishes

2021-11-14 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37320:
--

 Summary: Delete py_container_checks.zip after the test in 
DepsTestsSuite finishes
 Key: SPARK-37320
 URL: https://issues.apache.org/jira/browse/SPARK-37320
 Project: Spark
  Issue Type: Bug
  Components: k8, Tests
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


When K8s integration tests run, py_container_checks.zip is still remaining in 
resource-managers/kubernetes/integration-tests/tests/.
It's is created in the test "Launcher python client dependencies using a zip 
file" in DepsTestsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37315) Mitigate ConcurrentModificationException thrown from a test in MLEventSuite

2021-11-13 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37315:
---
Summary: Mitigate ConcurrentModificationException thrown from a test in 
MLEventSuite  (was: Mitigate a ConcurrentModificationException thrown from a 
test in MLEventSuite)

> Mitigate ConcurrentModificationException thrown from a test in MLEventSuite
> ---
>
> Key: SPARK-37315
> URL: https://issues.apache.org/jira/browse/SPARK-37315
> Project: Spark
>  Issue Type: Bug
>  Components: ML, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> Recently, I notice ConcurrentModificationException is sometimes thrown from 
> the following part of the test "pipeline read/write events" in MLEventSuite 
> when Scala 2.13 is used.
> {code}
> events.map(JsonProtocol.sparkEventToJson).foreach { event =>
>   assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
> }
> {code}
> I think the root cause is the ArrayBuffer (events) is updated asynchronously 
> by the following part.
> {code}
> private val listener: SparkListener = new SparkListener {
>   override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
> case e: MLEvent => events.append(e)
> case _ =>
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37315) Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite

2021-11-13 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37315:
---
Description: 
Recently, I notice ConcurrentModificationException is sometimes thrown from the 
following part of the test "pipeline read/write events" in MLEventSuite when 
Scala 2.13 is used.
{code}
events.map(JsonProtocol.sparkEventToJson).foreach { event =>
  assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
}
{code}

I think the root cause is the ArrayBuffer (events) is updated asynchronously by 
the following part.
{code}
private val listener: SparkListener = new SparkListener {
  override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
case e: MLEvent => events.append(e)
case _ =>
  }
}
{code}

  was:
Recently, I notice ConcurrentModificationException is thrown from the following 
part of the test "pipeline read/write events" in MLEventSuite when Scala 2.13 
is used.
{code}
events.map(JsonProtocol.sparkEventToJson).foreach { event =>
  assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
}
{code}

I think the root cause is the ArrayBuffer (events) is updated asynchronously by 
the following part.
{code}
private val listener: SparkListener = new SparkListener {
  override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
case e: MLEvent => events.append(e)
case _ =>
  }
}
{code}


> Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite
> -
>
> Key: SPARK-37315
> URL: https://issues.apache.org/jira/browse/SPARK-37315
> Project: Spark
>  Issue Type: Bug
>  Components: ML, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Recently, I notice ConcurrentModificationException is sometimes thrown from 
> the following part of the test "pipeline read/write events" in MLEventSuite 
> when Scala 2.13 is used.
> {code}
> events.map(JsonProtocol.sparkEventToJson).foreach { event =>
>   assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
> }
> {code}
> I think the root cause is the ArrayBuffer (events) is updated asynchronously 
> by the following part.
> {code}
> private val listener: SparkListener = new SparkListener {
>   override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
> case e: MLEvent => events.append(e)
> case _ =>
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37315) Mitigate a ConcurrentModificationException thrown from a test in MLEventSuite

2021-11-13 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37315:
--

 Summary: Mitigate a ConcurrentModificationException thrown from a 
test in MLEventSuite
 Key: SPARK-37315
 URL: https://issues.apache.org/jira/browse/SPARK-37315
 Project: Spark
  Issue Type: Bug
  Components: ML, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Recently, I notice ConcurrentModificationException is thrown from the following 
part of the test "pipeline read/write events" in MLEventSuite when Scala 2.13 
is used.
{code}
events.map(JsonProtocol.sparkEventToJson).foreach { event =>
  assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
}
{code}

I think the root cause is the ArrayBuffer (events) is updated asynchronously by 
the following part.
{code}
private val listener: SparkListener = new SparkListener {
  override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
case e: MLEvent => events.append(e)
case _ =>
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37312) Add `.java-version` to `.gitignore` and `.rat-excludes`

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37312.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34577

> Add `.java-version` to `.gitignore` and `.rat-excludes`
> ---
>
> Key: SPARK-37312
> URL: https://issues.apache.org/jira/browse/SPARK-37312
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37314) Upgrade kubernetes-client to 5.10.1

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37314:
---
Description: 
kubernetes-client 5.10.0 and 5.10.1 were released, which include some bug fixes.

https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1

Especially, the connection leak issue would affect Spark.
https://github.com/fabric8io/kubernetes-client/issues/3561

  was:
kubernetes-client 5.10.0 and 5.10.1 were relased, which include some bug fixes.

https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1

Especially, the connection leak issue would affect Spark.
https://github.com/fabric8io/kubernetes-client/issues/3561


> Upgrade kubernetes-client to 5.10.1
> ---
>
> Key: SPARK-37314
> URL: https://issues.apache.org/jira/browse/SPARK-37314
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> kubernetes-client 5.10.0 and 5.10.1 were released, which include some bug 
> fixes.
> https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
> https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1
> Especially, the connection leak issue would affect Spark.
> https://github.com/fabric8io/kubernetes-client/issues/3561



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37314) Upgrade kubernetes-client to 5.10.1

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37314:
---
Description: 
kubernetes-client 5.10.0 and 5.10.1 were relased, which include some bug fixes.

https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1

Especially, the connection leak issue would affect Spark.
https://github.com/fabric8io/kubernetes-client/issues/3561

  was:
A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include 
some bug fixes.

https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1

Especially, the connection leak issue would affect Spark.
https://github.com/fabric8io/kubernetes-client/issues/3561


> Upgrade kubernetes-client to 5.10.1
> ---
>
> Key: SPARK-37314
> URL: https://issues.apache.org/jira/browse/SPARK-37314
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> kubernetes-client 5.10.0 and 5.10.1 were relased, which include some bug 
> fixes.
> https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
> https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1
> Especially, the connection leak issue would affect Spark.
> https://github.com/fabric8io/kubernetes-client/issues/3561



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37314) Upgrade kubernetes-client to 5.10.1

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37314:
---
Description: 
A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include 
some bug fixes.

https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1

Especially, the connection leak issue would affect Spark.
https://github.com/fabric8io/kubernetes-client/issues/3561

  was:
A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include 
some bug fixes.
Especially, the connection leak issue would affect Spark.


> Upgrade kubernetes-client to 5.10.1
> ---
>
> Key: SPARK-37314
> URL: https://issues.apache.org/jira/browse/SPARK-37314
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which 
> include some bug fixes.
> https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0
> https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.1
> Especially, the connection leak issue would affect Spark.
> https://github.com/fabric8io/kubernetes-client/issues/3561



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37314) Upgrade kubernetes-client to 5.10.1

2021-11-12 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37314:
--

 Summary: Upgrade kubernetes-client to 5.10.1
 Key: SPARK-37314
 URL: https://issues.apache.org/jira/browse/SPARK-37314
 Project: Spark
  Issue Type: Bug
  Components: Build, Kubernetes
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


A few days ago, kubernetes-client 5.10.0 and 5.10.1 are relased, which include 
some bug fixes.
Especially, the connection leak issue would affect Spark.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37302:
---
Description: 
dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in 
the both of Maven and Coursier local repository.
{code:java}
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
{code}
{code:java}
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), 
null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.ConnectionFactory)
[error] val connector = new ServerConnector(
{code}
The reason is that exec-maven-plugin used in test-dependencies.sh downloads pom 
of guava and jetty-io but doesn't downloads the corresponding jars, and skip 
dependency testing if Scala 2.13 is used (if dependency testing runs, Maven 
downloads those jars).
{code}
if [[ "$SCALA_BINARY_VERSION" != "2.12" ]]; then
  # TODO(SPARK-36168) Support Scala 2.13 in dev/test-dependencies.sh
  echo "Skip dependency testing on $SCALA_BINARY_VERSION"
  exit 0
fi
{

[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37302:
---
Description: 
dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in 
the both of Maven and Coursier local repository.
{code:java}
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
{code}
{code:java}
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), 
null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.ConnectionFactory)
[error] val connector = new ServerConnector(
{code}
The reason is that exec-maven-plugin used in test-dependencies.sh downloads pom 
of guava and jetty-io but doesn't downloads the corresponding jars, and skip 
dependency testing if Scala 2.13 is used (if dependency testing runs, Maven 
downloads those jars).
{code}
if [[ "$SCALA_BINARY_VERSION" != "2.12" ]]; then
  # TODO(SPARK-36168) Support Scala 2.13 in dev/test-dependencies.sh
  echo "Skip dependency testing on $SCALA_BINARY_VERSION"
  exit 0
fi
{

[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37302:
---
Description: 
dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in 
the both of Maven and Coursier local repository.
{code:java}
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
{code}
{code:java}
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), 
null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.ConnectionFactory)
[error] val connector = new ServerConnector(
{code}
The reason is that exec-maven-plugin used in `test-dependencies.sh` downloads 
pom of guava and jetty-io but doesn't downloads the corresponding jars.
{code:java}
$ find ~/.m2 -name "guava*"
...
/home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom
/home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom.sha1
...
/home/kou/.m2/repository/com/google/guava/guava-parent/14.0.1/guava-parent-14.0.1.pom
/home/kou/.m2/repository/com/google/guava/gu

[jira] [Updated] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh

2021-11-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37302:
---
Description: 
dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in 
the both of Maven and Coursier local repository.
{code:java}
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
{code}
{code:java}
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), 
null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.ConnectionFactory)
[error] val connector = new ServerConnector(
{code}
The reason is that exec-maven-plugin used in `test-dependencies.sh` downloads 
pom of guava and jetty-io but doesn't downloads the corresponding jars.
{code:java}
$ find ~/.m2 -name "guava*"
...
/home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom
/home/kou/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.pom.sha1
...
/home/kou/.m2/repository/com/google/guava/guava-parent/14.0.1/guava-parent-14.0.1.pom
/home/kou/.m2/repository/com/google/guava/gu

[jira] [Created] (SPARK-37302) Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh

2021-11-12 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37302:
--

 Summary: Explicitly download the dependencies of guava and 
jetty-io in test-dependencies.sh
 Key: SPARK-37302
 URL: https://issues.apache.org/jira/browse/SPARK-37302
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


dev/run-tests.py fails if Scala 2.13 is used and guava or jetty-io is not in 
the both of Maven and Coursier local repository.

{code}
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
{code}
{code}
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new HttpClientTransportOverHTTP(numSelectors), 
null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.ConnectionFactory)
[error] val connector = new ServerConnector(
{code}


The reason is that exec-maven-plugin used in `test-dependencies.sh` downloads 
pom of guava and jetty-io but doesn't downloads the corresponding jars.

{code}
$ find ~/.m2 -name "guava*"
...
/home/kou/.m2/repository/com/google/g

[jira] [Created] (SPARK-37284) Upgrade Jekyll to 4.2.1

2021-11-10 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37284:
--

 Summary: Upgrade Jekyll to 4.2.1
 Key: SPARK-37284
 URL: https://issues.apache.org/jira/browse/SPARK-37284
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Jekyll 4.2.1 was released in September, which includes the fix of a regression 
bug.
https://github.com/jekyll/jekyll/releases/tag/v4.2.1



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37283:
---
Description: 
If, a table being created contains a column of ANSI interval types and the 
underlying file format has a corresponding Hive SerDe (e.g. Parquet),
`HiveExternalcatalog` tries to store the table in Hive compatible format.
But, as ANSI interval types in Spark and interval type in Hive are not 
compatible (Hive only supports interval_year_month and interval_day_time), the 
following warning with stack trace will be logged.

{code}
spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist `default`.`tbl1` 
in a Hive compatible way. Persisting it into Hive metastore in Spark SQL 
specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
'interval year to month' but 'interval year to month' is found.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:

[jira] [Created] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37283:
--

 Summary: Don't try to store a V1 table which contains ANSI 
intervals in Hive compatible format
 Key: SPARK-37283
 URL: https://issues.apache.org/jira/browse/SPARK-37283
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


If, a table being created contains a column of ANSI interval types and the 
underlying file format has a corresponding Hive SerDe (e.g. Parquet),
`HiveExternalcatalog` tries to store the table in Hive compatible format.
But, as ANSI interval types in Spark and interval type in Hive are not 
compatible (Hive only supports interval_year_month and interval_day_time), the 
following warning with stack trace will be logged.

{code}
spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist `default`.`tbl1` 
in a Hive compatible way. Persisting it into Hive metastore in Spark SQL 
specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
'interval year to month' but 'interval year to month' is found.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHel

[jira] [Resolved] (SPARK-37264) [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core

2021-11-09 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37264.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34541

> [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from 
> orc-core
> --
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.3.0
>
>
> Like hadoop-common and hadoop-hdfs, this PR proposes to exclude 
> hadoop-client-api transitive dependency from orc-core.
> Why are the changes needed?
> Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency 
> on Hadoop 3.3.1.
> This causes test-dependencies.sh failure on Java 17. As a result, 
> run-tests.py also fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37264) [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core

2021-11-09 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37264:
---
Description: 
Like hadoop-common and hadoop-hdfs, this PR proposes to exclude 
hadoop-client-api transitive dependency from orc-core.
Why are the changes needed?

Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency on 
Hadoop 3.3.1.
This causes test-dependencies.sh failure on Java 17. As a result, run-tests.py 
also fails.

  was:
In the current master, `run-tests.py` fails on Java 17 due to 
`test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
dependency on hadoop-client-api:3.3.1 only for Java 17.
Hadoop 2.7 doesn't support Java 17 so let's 


> [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from 
> orc-core
> --
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like hadoop-common and hadoop-hdfs, this PR proposes to exclude 
> hadoop-client-api transitive dependency from orc-core.
> Why are the changes needed?
> Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency 
> on Hadoop 3.3.1.
> This causes test-dependencies.sh failure on Java 17. As a result, 
> run-tests.py also fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37264) Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7

2021-11-09 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37264:
---
Description: 
In the current master, `run-tests.py` fails on Java 17 due to 
`test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
dependency on hadoop-client-api:3.3.1 only for Java 17.
Hadoop 2.7 doesn't support Java 17 so let's 

  was:
In the current master, `run-tests.py` fails on Java 17 due to 
`test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
dependency on hadoop-client-api:3.3.1 only for Java 17.
Currently, we don't maintain the dependency manifests for Java 17 yet so let's 
skip it temporarily.


> Cut the transitive dependency on hadoop-client-api which orc-shims depends on 
> only for Java 17 with hadoop-2.7
> --
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current master, `run-tests.py` fails on Java 17 due to 
> `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
> dependency on hadoop-client-api:3.3.1 only for Java 17.
> Hadoop 2.7 doesn't support Java 17 so let's 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37264) [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core

2021-11-09 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37264:
---
Summary: [SPARK-37264][BUILD] Exclude hadoop-client-api transitive 
dependency from orc-core  (was: Cut the transitive dependency on 
hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7)

> [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from 
> orc-core
> --
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current master, `run-tests.py` fails on Java 17 due to 
> `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
> dependency on hadoop-client-api:3.3.1 only for Java 17.
> Hadoop 2.7 doesn't support Java 17 so let's 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37264) Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7

2021-11-09 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37264:
---
Summary: Cut the transitive dependency on hadoop-client-api which orc-shims 
depends on only for Java 17 with hadoop-2.7  (was: Skip dependency testing on 
Java 17 temporarily)

> Cut the transitive dependency on hadoop-client-api which orc-shims depends on 
> only for Java 17 with hadoop-2.7
> --
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current master, `run-tests.py` fails on Java 17 due to 
> `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
> dependency on hadoop-client-api:3.3.1 only for Java 17.
> Currently, we don't maintain the dependency manifests for Java 17 yet so 
> let's skip it temporarily.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37264) Skip dependency testing on Java 17 temporarily

2021-11-09 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37264:
---
Description: 
In the current master, `run-tests.py` fails on Java 17 due to 
`test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
dependency on hadoop-client-api:3.3.1 only for Java 17.
Currently, we don't maintain the dependency manifests for Java 17 yet so let's 
skip it temporarily.

  was:
In the current master, test-dependencies.sh fails on Java 17 because 
orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for 
Java 17.

Currently, we don't maintain the dependency manifests for Java 17 yet so let's 
skip it temporarily.


> Skip dependency testing on Java 17 temporarily
> --
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current master, `run-tests.py` fails on Java 17 due to 
> `test-dependencies.sh` fails. The cause is orc-shims:1.7.1 has a compile 
> dependency on hadoop-client-api:3.3.1 only for Java 17.
> Currently, we don't maintain the dependency manifests for Java 17 yet so 
> let's skip it temporarily.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37265) Support Java 17 in `dev/test-dependencies.sh`

2021-11-09 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37265:
--

 Summary: Support Java 17 in `dev/test-dependencies.sh`
 Key: SPARK-37265
 URL: https://issues.apache.org/jira/browse/SPARK-37265
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37264) Skip dependency testing on Java 17 temporarily

2021-11-09 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37264:
--

 Summary: Skip dependency testing on Java 17 temporarily
 Key: SPARK-37264
 URL: https://issues.apache.org/jira/browse/SPARK-37264
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In the current master, test-dependencies.sh fails on Java 17 because 
orc-shims:1.7.1 has a compile dependency on hadoop-client-api:3.3.1 only for 
Java 17.

Currently, we don't maintain the dependency manifests for Java 17 yet so let's 
skip it temporarily.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-36895) Add Create Index syntax support

2021-11-08 Thread Kousuke Saruta (Jira)



[ https://issues.apache.org/jira/browse/SPARK-36895 ]


Kousuke Saruta deleted comment on SPARK-36895:


was (Author: sarutak):
The change in https://github.com/apache/spark/pull/34148 was reverted and 
resolved again in https://github.com/apache/spark/pull/34523

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36895) Add Create Index syntax support

2021-11-08 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440696#comment-17440696
 ] 

Kousuke Saruta commented on SPARK-36895:


The change in https://github.com/apache/spark/pull/34148 was reverted and 
resolved again in https://github.com/apache/spark/pull/34523

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37240) Cannot read partitioned parquet files with ANSI interval partition values

2021-11-08 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37240.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34517

> Cannot read partitioned parquet files with ANSI interval partition values
> -
>
> Key: SPARK-37240
> URL: https://issues.apache.org/jira/browse/SPARK-37240
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> The code below demonstrates the issue:
> {code:scala}
> scala> sql("SELECT INTERVAL '1' YEAR AS i, 0 as 
> id").write.partitionBy("i").parquet("/Users/maximgekk/tmp/ansi_interval_parquet")
> scala> spark.read.schema("i INTERVAL YEAR, id 
> INT").parquet("/Users/maximgekk/tmp/ansi_interval_parquet").show(false)
> 21/11/08 10:56:36 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
> java.lang.RuntimeException: DataType INTERVAL YEAR is not supported in column 
> vectorized reader.
>   at 
> org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:100)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:243)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-36038) Basic speculation metrics at stage level

2021-11-08 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta reopened SPARK-36038:

  Assignee: (was: Venkata krishnan Sowrirajan)

The change was reverted.
https://github.com/apache/spark/pull/34518

So I re-open this.

> Basic speculation metrics at stage level
> 
>
> Key: SPARK-36038
> URL: https://issues.apache.org/jira/browse/SPARK-36038
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently there are no speculation metrics available either at application 
> level or at stage level. With in our platform, we have added speculation 
> metrics at stage level as a summary similarly to the stage level metrics 
> tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled 
> etc. This enables us to effectively understand speculative execution feature 
> at an application level and helps in further tuning the speculation configs.
> cc [~ron8hu]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37158) Add doc about spark not supported hive built-in function

2021-11-07 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37158.

Resolution: Won't Fix

See the discussion.
https://github.com/apache/spark/pull/34434#issuecomment-954545315

> Add doc about spark not supported hive built-in function
> 
>
> Key: SPARK-37158
> URL: https://issues.apache.org/jira/browse/SPARK-37158
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about spark not supported hive built-in function



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37238) Upgrade ORC to 1.6.12

2021-11-07 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37238.

Fix Version/s: 3.2.1
 Assignee: Dongjoon Hyun
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34512

> Upgrade ORC to 1.6.12
> -
>
> Key: SPARK-37238
> URL: https://issues.apache.org/jira/browse/SPARK-37238
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions

2021-11-07 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37211.

Fix Version/s: 3.3.0
 Assignee: Yuto Akutsu
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34487

> More descriptions and adding an image to the failure message about enabling 
> GitHub Actions
> --
>
> Key: SPARK-37211
> URL: https://issues.apache.org/jira/browse/SPARK-37211
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Yuto Akutsu
>Assignee: Yuto Akutsu
>Priority: Minor
> Fix For: 3.3.0
>
>
> I've seen and experienced that the build-and-test workflow of first-time PRs 
> fails and it was caused by developers forgetting to enable Github Actions on 
> their own repositories.
> I think developers will be able to notice the cause quicker by adding more 
> descriptions and an image to the test-failure message.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37231) Dynamic writes/reads of ANSI interval partitions

2021-11-07 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37231.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34506

> Dynamic writes/reads of ANSI interval partitions
> 
>
> Key: SPARK-37231
> URL: https://issues.apache.org/jira/browse/SPARK-37231
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Check and fix if it's needed dynamic partitions writes of ANSI intervals.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7

2021-11-04 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438986#comment-17438986
 ] 

Kousuke Saruta commented on SPARK-35496:


[~dongjoon]
Thank you for letting me know. That's great.

> Upgrade Scala 2.13 to 2.13.7
> 
>
> Key: SPARK-35496
> URL: https://issues.apache.org/jira/browse/SPARK-35496
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> This issue aims to upgrade to Scala 2.13.7.
> Scala 2.13.6 released（https://github.com/scala/scala/releases/tag/v2.13.6）. 
> However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 
> which is different from both Scala 2.13.5 and Scala 3.
> - https://github.com/scala/bug/issues/12403
> {code}
> scala3-3.0.0:$ bin/scala
> scala> Array.empty[Double].intersect(Array(0.0))
> val res0: Array[Double] = Array()
> scala-2.13.6:$ bin/scala
> Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292).
> Type in expressions for evaluation. Or try :help.
> scala> Array.empty[Double].intersect(Array(0.0))
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D
>   ... 32 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7

2021-11-04 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438535#comment-17438535
 ] 

Kousuke Saruta commented on SPARK-35496:


[~LuciferYang] Scala 2.13.7 was released a few days ago.
https://github.com/scala/scala/releases/tag/v2.13.7
Would you like to continue to work on this?

> Upgrade Scala 2.13 to 2.13.7
> 
>
> Key: SPARK-35496
> URL: https://issues.apache.org/jira/browse/SPARK-35496
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> This issue aims to upgrade to Scala 2.13.7.
> Scala 2.13.6 released（https://github.com/scala/scala/releases/tag/v2.13.6）. 
> However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 
> which is different from both Scala 2.13.5 and Scala 3.
> - https://github.com/scala/bug/issues/12403
> {code}
> scala3-3.0.0:$ bin/scala
> scala> Array.empty[Double].intersect(Array(0.0))
> val res0: Array[Double] = Array()
> scala-2.13.6:$ bin/scala
> Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292).
> Type in expressions for evaluation. Or try :help.
> scala> Array.empty[Double].intersect(Array(0.0))
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D
>   ... 32 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37206) Upgrade Avro to 1.11.0

2021-11-03 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37206:
--

 Summary: Upgrade Avro to 1.11.0
 Key: SPARK-37206
 URL: https://issues.apache.org/jira/browse/SPARK-37206
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Recently, Avro 1.1.0 was released which includes bunch of bug fixes.
https://issues.apache.org/jira/issues/?jql=project%3DAVRO%20AND%20fixVersion%3D1.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37108) Expose make_date expression in R

2021-11-03 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37108.

Fix Version/s: 3.3.0
 Assignee: Leona Yoda
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34480

> Expose make_date expression in R
> 
>
> Key: SPARK-37108
> URL: https://issues.apache.org/jira/browse/SPARK-37108
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Assignee: Leona Yoda
>Priority: Minor
> Fix For: 3.3.0
>
>
> Expose make_date API on SparkR.
>  
> (cf. https://issues.apache.org/jira/browse/SPARK-36554)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17

2021-11-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37159.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34425

> Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
> ---
>
> Key: SPARK-37159
> URL: https://issues.apache.org/jira/browse/SPARK-37159
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.3.0
>
>
> SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but 
> `HiveExternalCatalogVersionsSuite`.
> {code}
> [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED 
> *** (42 seconds, 526 milliseconds)
> [info]   spark-submit returned with exit code 1.
> [info]   Command line: 
> '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit'
>  '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' 
> 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' 
> 'spark.sql.hive.metastore.version=2.3' '--conf' 
> 'spark.sql.hive.metastore.jars=maven' '--conf' 
> 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62'
>  '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' 
> '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62'
>  
> '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py'
> [info]   
> [info]   2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j 
> profile: org/apache/spark/log4j-defaults.properties
> [info]   2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO 
> SparkContext: Running Spark version 3.2.0
> [info]   2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN 
> NativeCodeLoader: Unable to load native-hadoop library for your platform... 
> using builtin-java classes where applicable
> [info]   2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO 
> ResourceUtils: ==
> [info]   2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO 
> ResourceUtils: No custom resources configured for spark.driver.
> [info]   2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO 
> ResourceUtils: ==
> [info]   2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO 
> SparkContext: Submitted application: prepare testing tables
> [info]   2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO 
> ResourceProfile: Default ResourceProfile created, executor resources: 
> Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: 
> memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 
> 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
> [info]   2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO 
> ResourceProfile: Limiting resource is cpu
> [info]   2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO 
> ResourceProfileManager: Added ResourceProfile id: 0
> [info]   2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO 
> SecurityManager: Changing view acls to: kou
> [info]   2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO 
> SecurityManager: Changing modify acls to: kou
> [info]   2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO 
> SecurityManager: Changing view acls groups to: 
> [info]   2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO 
> SecurityManager: Changing modify acls groups to: 
> [info]   2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO 
> SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
> users  with view permissions: Set(kou); groups with view permissions: Set(); 
> users  with modify permissions: Set(kou); groups with modify permissions: 
> Set()
> [info]   2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: 
> Successfully started service 'sparkDriver' on port 35867.
&

[jira] [Resolved] (SPARK-36554) Error message while trying to use spark sql functions directly on dataframe columns without using select expression

2021-11-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36554.

Fix Version/s: 3.3.0
 Assignee: Nicolas Azrak
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34356

> Error message while trying to use spark sql functions directly on dataframe 
> columns without using select expression
> ---
>
> Key: SPARK-36554
> URL: https://issues.apache.org/jira/browse/SPARK-36554
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Examples, PySpark
>Affects Versions: 3.1.1
>Reporter: Lekshmi Ramachandran
>Assignee: Nicolas Azrak
>Priority: Minor
>  Labels: documentation, features, functions, spark-sql
> Fix For: 3.3.0
>
> Attachments: Screen Shot .png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The below code generates a dataframe successfully . Here make_date function 
> is used inside a select expression
>  
> from pyspark.sql.functions import  expr, make_date
> df = spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],['Y', 
> 'M', 'D'])
> df.select("*",expr("make_date(Y,M,D) as lk")).show()
>  
> The below code fails with a message "cannot import name 'make_date' from 
> 'pyspark.sql.functions'" . Here the make_date function is directly called on 
> dataframe columns without select expression
>  
> from pyspark.sql.functions import make_date
> df = spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],['Y', 
> 'M', 'D'])
> df.select(make_date(df.Y,df.M,df.D).alias("datefield")).show()
>  
> The error message generated is misleading when it says "cannot  import 
> make_date from pyspark.sql.functions"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37170) Pin PySpark version installed in the Binder environment for tagged commit

2021-10-31 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37170:
---
Summary: Pin PySpark version installed in the Binder environment for tagged 
commit  (was: Pin PySpark version for Binder)

> Pin PySpark version installed in the Binder environment for tagged commit
> -
>
> Key: SPARK-37170
> URL: https://issues.apache.org/jira/browse/SPARK-37170
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>    Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> I noticed that the PySpark 3.1.2 is installed in the live notebook 
> environment even though the notebook is for PySpark 3.2.0.
> http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html
> I guess someone accessed to Binder and built the container image with v3.2.0 
> before we published the pyspark package to PyPi.
> https://mybinder.org/
> I think it's difficult to rebuild the image manually.
> To avoid such accident, I'll propose to pin the version of PySpark in 
> binder/postBuild
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37170) Pin PySpark version for Binder

2021-10-30 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37170:
---
Description: 
I noticed that the PySpark 3.1.2 is installed in the live notebook environment 
even though the notebook is for PySpark 3.2.0.
http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html

I guess someone accessed to Binder and built the container image with v3.2.0 
before we published the pyspark package to PyPi.
https://mybinder.org/

I think it's difficult to rebuild the image manually.
To avoid such accident, I'll propose to pin the version of PySpark in 
binder/postBuild

 

 

  was:
I noticed that the PySpark 3.1.2 is installed in the live notebook environment 
even though the notebook is for PySpark 3.2.
http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html

I guess someone accessed to Binder and built the container image with v3.2.0 
before we published the pyspark package to PyPi.
https://mybinder.org/

I think it's difficult to rebuild the image manually.
To avoid such accident, I'll propose to pin the version of PySpark in 
binder/postBuild

 

 


> Pin PySpark version for Binder
> --
>
> Key: SPARK-37170
> URL: https://issues.apache.org/jira/browse/SPARK-37170
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> I noticed that the PySpark 3.1.2 is installed in the live notebook 
> environment even though the notebook is for PySpark 3.2.0.
> http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html
> I guess someone accessed to Binder and built the container image with v3.2.0 
> before we published the pyspark package to PyPi.
> https://mybinder.org/
> I think it's difficult to rebuild the image manually.
> To avoid such accident, I'll propose to pin the version of PySpark in 
> binder/postBuild
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37170) Pin PySpark version for Binder

2021-10-30 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37170:
---
Description: 
I noticed that the PySpark 3.1.2 is installed in the live notebook environment 
even though the notebook is for PySpark 3.2.
http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html

I guess someone accessed to Binder and built the container image with v3.2.0 
before we published the pyspark package to PyPi.
https://mybinder.org/

I think it's difficult to rebuild the image manually.
To avoid such accident, I'll propose to pin the version of PySpark in 
binder/postBuild

 

 

  was:
I noticed that the PySpark 3.1.2 is installed in the environment of live 
notebook even though the notebook is for PySpark 3.2.
http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html

I guess someone accessed to Binder and built the container image with v3.2.0 
before we published the pyspark package to PyPi.
https://mybinder.org/

I think it's difficult to rebuild the image manually.
To avoid such accident, I'll propose to pin the version of PySpark in 
binder/postBuild

 

 


> Pin PySpark version for Binder
> --
>
> Key: SPARK-37170
> URL: https://issues.apache.org/jira/browse/SPARK-37170
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> I noticed that the PySpark 3.1.2 is installed in the live notebook 
> environment even though the notebook is for PySpark 3.2.
> http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html
> I guess someone accessed to Binder and built the container image with v3.2.0 
> before we published the pyspark package to PyPi.
> https://mybinder.org/
> I think it's difficult to rebuild the image manually.
> To avoid such accident, I'll propose to pin the version of PySpark in 
> binder/postBuild
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37170) Pin PySpark version for Binder

2021-10-30 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37170:
--

 Summary: Pin PySpark version for Binder
 Key: SPARK-37170
 URL: https://issues.apache.org/jira/browse/SPARK-37170
 Project: Spark
  Issue Type: Bug
  Components: docs, PySpark
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


I noticed that the PySpark 3.1.2 is installed in the environment of live 
notebook even though the notebook is for PySpark 3.2.
http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html

I guess someone accessed to Binder and built the container image with v3.2.0 
before we published the pyspark package to PyPi.
https://mybinder.org/

I think it's difficult to rebuild the image manually.
To avoid such accident, I'll propose to pin the version of PySpark in 
binder/postBuild

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17

2021-10-29 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37159:
--

 Summary: Change HiveExternalCatalogVersionsSuite to be able to 
test with Java 17
 Key: SPARK-37159
 URL: https://issues.apache.org/jira/browse/SPARK-37159
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but 
`HiveExternalCatalogVersionsSuite`.

{code}
[info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED 
*** (42 seconds, 526 milliseconds)
[info]   spark-submit returned with exit code 1.
[info]   Command line: 
'/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit'
 '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' 
'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' 
'spark.sql.hive.metastore.version=2.3' '--conf' 
'spark.sql.hive.metastore.jars=maven' '--conf' 
'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62'
 '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' 
'-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62'
 
'/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py'
[info]   
[info]   2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
[info]   2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO SparkContext: 
Running Spark version 3.2.0
[info]   2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN 
NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable
[info]   2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO 
ResourceUtils: ==
[info]   2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO 
ResourceUtils: No custom resources configured for spark.driver.
[info]   2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO 
ResourceUtils: ==
[info]   2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO SparkContext: 
Submitted application: prepare testing tables
[info]   2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO 
ResourceProfile: Default ResourceProfile created, executor resources: Map(cores 
-> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 
1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , 
vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
[info]   2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO 
ResourceProfile: Limiting resource is cpu
[info]   2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO 
ResourceProfileManager: Added ResourceProfile id: 0
[info]   2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO 
SecurityManager: Changing view acls to: kou
[info]   2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO 
SecurityManager: Changing modify acls to: kou
[info]   2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO 
SecurityManager: Changing view acls groups to: 
[info]   2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO 
SecurityManager: Changing modify acls groups to: 
[info]   2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO 
SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
users  with view permissions: Set(kou); groups with view permissions: Set(); 
users  with modify permissions: Set(kou); groups with modify permissions: Set()
[info]   2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: 
Successfully started service 'sparkDriver' on port 35867.
[info]   2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: 
Registering MapOutputTracker
[info]   2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: 
Registering BlockManagerMaster
[info]   2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO 
BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
[info]   2021-10-28 06:07:18.944 - stderr> 21/10/28 22:07:18 INFO 
BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
[info]   2021-10-28 06:07:18.945 - stdout> Traceback (most recent call last):
[info]   2021-10-28 06:07:18

[jira] [Created] (SPARK-37112) Fix MiMa failure with Scala 2.13

2021-10-25 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37112:
--

 Summary: Fix MiMa failure with Scala 2.13
 Key: SPARK-37112
 URL: https://issues.apache.org/jira/browse/SPARK-37112
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


SPARK-36151 re-enabled MiMa for Scala 2.13 but it always fails in the scheduled 
build.
https://github.com/apache/spark/runs/3992588994?check_suite_focus=true#step:9:2303



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37103) Switch from Maven to SBT to build Spark on AppVeyor

2021-10-23 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37103:
---
Description: 
Recently, building Spark on AppVeyor almost always fails due to 
StackOverflowError at compile time.
We can't identify the reason so far but one workaround would be building with 
SBT.

  was:
Recently, building Spark on AppVeyor almost always fails due to 
StackOverflowError  at compile time.
We can't identify the reason so far but one workaround would be building with 
SBT.


> Switch from Maven to SBT to build Spark on AppVeyor
> ---
>
> Key: SPARK-37103
> URL: https://issues.apache.org/jira/browse/SPARK-37103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> Recently, building Spark on AppVeyor almost always fails due to 
> StackOverflowError at compile time.
> We can't identify the reason so far but one workaround would be building with 
> SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37103) Switch from Maven to SBT to build Spark on AppVeyor

2021-10-23 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37103:
--

 Summary: Switch from Maven to SBT to build Spark on AppVeyor
 Key: SPARK-37103
 URL: https://issues.apache.org/jira/browse/SPARK-37103
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Recently, building Spark on AppVeyor almost always fails due to 
StackOverflowError  at compile time.
We can't identify the reason so far but one workaround would be building with 
SBT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37086) Fix the R test of FPGrowthModel for Scala 2.13

2021-10-21 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37086:
---
Description: 
Similar to the issue filed in SPARK-37059, the R test of FPGrowthModel assumes 
that the result records returned by FPGrowthModel.freqItemsets are sorted by a 
certain kind of order but it's wrong.
As a result, the test fails with Scala 2.13.

{code}
 ══ Failed 
══
── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth ──
`expected_itemsets` not equivalent to `itemsets`.
Component “items”: Component 1: Component 1: 1 string mismatch
Component “items”: Component 2: Length mismatch: comparison on first 1 
components
Component “items”: Component 2: Component 1: 1 string mismatch
Component “items”: Component 3: Length mismatch: comparison on first 1 
components
Component “items”: Component 4: Length mismatch: comparison on first 1 
components
Component “items”: Component 4: Component 1: 1 string mismatch
Component “items”: Component 5: Length mismatch: comparison on first 1 
components
Component “items”: Component 5: Component 1: 1 string mismatch
Component “freq”: Mean relative difference: 0.5454545
{code}

  was:
Similar to the issue filed in SPARK-37059, an R test of FPGrowthModel assumes 
that the result records returned by FPGrowthModel.freqItemsets are sorted by a 
certain kind of order but it's wrong.
As a result, such tests fail with Scala 2.13.

{code}
 ══ Failed 
══
── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth ──
`expected_itemsets` not equivalent to `itemsets`.
Component “items”: Component 1: Component 1: 1 string mismatch
Component “items”: Component 2: Length mismatch: comparison on first 1 
components
Component “items”: Component 2: Component 1: 1 string mismatch
Component “items”: Component 3: Length mismatch: comparison on first 1 
components
Component “items”: Component 4: Length mismatch: comparison on first 1 
components
Component “items”: Component 4: Component 1: 1 string mismatch
Component “items”: Component 5: Length mismatch: comparison on first 1 
components
Component “items”: Component 5: Component 1: 1 string mismatch
Component “freq”: Mean relative difference: 0.5454545
{code}


> Fix the R test of FPGrowthModel for Scala 2.13
> --
>
> Key: SPARK-37086
> URL: https://issues.apache.org/jira/browse/SPARK-37086
> Project: Spark
>  Issue Type: Bug
>  Components: ML, R, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> Similar to the issue filed in SPARK-37059, the R test of FPGrowthModel 
> assumes that the result records returned by FPGrowthModel.freqItemsets are 
> sorted by a certain kind of order but it's wrong.
> As a result, the test fails with Scala 2.13.
> {code}
>  ══ Failed 
> ══
> ── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth 
> ──
> `expected_itemsets` not equivalent to `itemsets`.
> Component “items”: Component 1: Component 1: 1 string mismatch
> Component “items”: Component 2: Length mismatch: comparison on first 1 
> components
> Component “items”: Component 2: Component 1: 1 string mismatch
> Component “items”: Component 3: Length mismatch: comparison on first 1 
> components
> Component “items”: Component 4: Length mismatch: comparison on first 1 
> components
> Component “items”: Component 4: Component 1: 1 string mismatch
> Component “items”: Component 5: Length mismatch: comparison on first 1 
> components
> Component “items”: Component 5: Component 1: 1 string mismatch
> Component “freq”: Mean relative difference: 0.5454545
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37086) Fix the R test of FPGrowthModel for Scala 2.13

2021-10-21 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37086:
--

 Summary: Fix the R test of FPGrowthModel for Scala 2.13
 Key: SPARK-37086
 URL: https://issues.apache.org/jira/browse/SPARK-37086
 Project: Spark
  Issue Type: Bug
  Components: ML, R, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Similar to the issue filed in SPARK-37059, an R test of FPGrowthModel assumes 
that the result records returned by FPGrowthModel.freqItemsets are sorted by a 
certain kind of order but it's wrong.
As a result, such tests fail with Scala 2.13.

{code}
 ══ Failed 
══
── 1. Failure (test_mllib_fpm.R:42:3): spark.fpGrowth ──
`expected_itemsets` not equivalent to `itemsets`.
Component “items”: Component 1: Component 1: 1 string mismatch
Component “items”: Component 2: Length mismatch: comparison on first 1 
components
Component “items”: Component 2: Component 1: 1 string mismatch
Component “items”: Component 3: Length mismatch: comparison on first 1 
components
Component “items”: Component 4: Length mismatch: comparison on first 1 
components
Component “items”: Component 4: Component 1: 1 string mismatch
Component “items”: Component 5: Length mismatch: comparison on first 1 
components
Component “items”: Component 5: Component 1: 1 string mismatch
Component “freq”: Mean relative difference: 0.5454545
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37081) Upgrade the version of RDBMS and corresponding JDBC drivers used by docker-integration-tests

2021-10-20 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37081:
--

 Summary: Upgrade the version of RDBMS and corresponding JDBC 
drivers used by docker-integration-tests
 Key: SPARK-37081
 URL: https://issues.apache.org/jira/browse/SPARK-37081
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Let's upgrade the version of RDBMS and corresponding JDBC drivers. 
Especially, PostgreSQL 14 was released recently so it's great to ensure that 
the JDBC source for PostgreSQL works with PostgreSQL 14.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37076) Implement StructType.toString explicitly for Scala 2.13

2021-10-20 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37076:
---
Summary: Implement StructType.toString explicitly for Scala 2.13  (was: 
Implements StructType.toString explicitly for Scala 2.13)

> Implement StructType.toString explicitly for Scala 2.13
> ---
>
> Key: SPARK-37076
> URL: https://issues.apache.org/jira/browse/SPARK-37076
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
> Environment: 
>    Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> The string returned by StructType.toString is different  between Scala 2.12 
> and 2.13.
> * Scala 2.12
> {code}
> val st = StructType(StructField("a", IntegerType) :: Nil)
> st.toString
> res0: String = StructType(StructField(a,IntegerType,true)
> {code}
> * Scala 2.13
> {code}
> val st = StructType(StructField("a", IntegerType) :: Nil)
> st.toString
> val res0: String = Seq(StructField(a,IntegerType,true))
> {code}
> It's because the logic to make the prefix of the string was changed from 
> Scala 2.13.
> Scala 2.12: 
> https://github.com/scala/scala/blob/v2.12.15/src/library/scala/collection/TraversableLike.scala#L804
> Scala 
> 2:13:https://github.com/scala/scala/blob/v2.13.5/src/library/scala/collection/Seq.scala#L46



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37076) Implements StructType.toString explicitly for Scala 2.13

2021-10-20 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37076:
--

 Summary: Implements StructType.toString explicitly for Scala 2.13
 Key: SPARK-37076
 URL: https://issues.apache.org/jira/browse/SPARK-37076
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
 Environment: 


Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


The string returned by StructType.toString is different  between Scala 2.12 and 
2.13.

* Scala 2.12
{code}
val st = StructType(StructField("a", IntegerType) :: Nil)
st.toString
res0: String = StructType(StructField(a,IntegerType,true)
{code}

* Scala 2.13
{code}
val st = StructType(StructField("a", IntegerType) :: Nil)
st.toString
val res0: String = Seq(StructField(a,IntegerType,true))
{code}

It's because the logic to make the prefix of the string was changed from Scala 
2.13.

Scala 2.12: 
https://github.com/scala/scala/blob/v2.12.15/src/library/scala/collection/TraversableLike.scala#L804
Scala 
2:13:https://github.com/scala/scala/blob/v2.13.5/src/library/scala/collection/Seq.scala#L46



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-37059.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34330

> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.3.0
>
>
> The collect_set builtin function doesn't ensure the sort order of its result 
> for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order 
> of the result rows.
> Nevertheless, their PySpark doctests assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37059:
---
Description: 
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of 
the result rows.
Nevertheless, their PySpark doctests assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.


  was:
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of 
the result rows.
Nevertheless, their doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.



> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> The collect_set builtin function doesn't ensure the sort order of its result 
> for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order 
> of the result rows.
> Nevertheless, their PySpark doctests assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37059:
---
Description: 
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of 
the result rows.
Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.


  was:
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn' ensure the sort order of 
the result rows.
Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.



> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> The collect_set builtin function doesn't ensure the sort order of its result 
> for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order 
> of the result rows.
> Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37059:
---
Description: 
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of 
the result rows.
Nevertheless, their doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.


  was:
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order of 
the result rows.
Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.



> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> The collect_set builtin function doesn't ensure the sort order of its result 
> for each row. FPGrouthModel.freqItemsets also doesn't ensure the sort order 
> of the result rows.
> Nevertheless, their doctests for PySpark assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37059:
---
Description: 
The collect_set builtin function doesn't ensure the sort order of its result 
for each row. FPGrouthModel.freqItemsets also doesn' ensure the sort order of 
the result rows.
Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.


  was:
The collect_set builtin function doesn't ensure the sort order of its result. 
FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result rows.
Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.



> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> The collect_set builtin function doesn't ensure the sort order of its result 
> for each row. FPGrouthModel.freqItemsets also doesn' ensure the sort order of 
> the result rows.
> Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37059:
---
Component/s: Tests

> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> The collect_set builtin function doesn't ensure the sort order of its result. 
> FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result 
> rows.
> Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37059) Ensure the sort order of the output in the PySpark doctests

2021-10-19 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37059:
---
Summary: Ensure the sort order of the output in the PySpark doctests  (was: 
Ensure the sort order of the output in the PySpark examples)

> Ensure the sort order of the output in the PySpark doctests
> ---
>
> Key: SPARK-37059
> URL: https://issues.apache.org/jira/browse/SPARK-37059
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> The collect_set builtin function doesn't ensure the sort order of its result. 
> FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result 
> rows.
> Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
> causing that such doctests fail with Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37059) Ensure the sort order of the output in the PySpark examples

2021-10-19 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37059:
--

 Summary: Ensure the sort order of the output in the PySpark 
examples
 Key: SPARK-37059
 URL: https://issues.apache.org/jira/browse/SPARK-37059
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


The collect_set builtin function doesn't ensure the sort order of its result. 
FPGrouthModel.freqItemsets also doesn' ensure the sort order of the result rows.
Nevertheless, thier doctests for PySpark assume a certain kind of sort order, 
causing that such doctests fail with Scala 2.13.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13

2021-10-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37026:
---
Component/s: Build

> Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
> -
>
> Key: SPARK-37026
> URL: https://issues.apache.org/jira/browse/SPARK-37026
> Project: Spark
>  Issue Type: Bug
>  Components: Build, ML
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because 
> the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but 
> scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13

2021-10-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37026:
---
Summary: Ensure the element type of ResolvedRFormula.terms is scala.Seq for 
Scala 2.13  (was: Ensure the element type of RFormula.terms is scala.Seq for 
Scala 2.13)

> Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
> -
>
> Key: SPARK-37026
> URL: https://issues.apache.org/jira/browse/SPARK-37026
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because 
> the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but 
> scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37026) Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13

2021-10-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37026:
---
Summary: Ensure the element type of RFormula.terms is scala.Seq for Scala 
2.13  (was: ResolvedRFormula.toString throws ClassCastException with Scala 2.13)

> Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13
> -
>
> Key: SPARK-37026
> URL: https://issues.apache.org/jira/browse/SPARK-37026
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because 
> the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but 
> scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13

2021-10-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37026:
---
Description: ResolvedRFormula.toString throws ClassCastException with Scala 
2.13 because the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] 
but scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.  (was: 
ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the 
type of ResolvedRFormula.terms is 
scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but 
scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will be 
passed.)

> ResolvedRFormula.toString throws ClassCastException with Scala 2.13
> ---
>
> Key: SPARK-37026
> URL: https://issues.apache.org/jira/browse/SPARK-37026
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because 
> the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but 
> scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13

2021-10-16 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37026:
--

 Summary: ResolvedRFormula.toString throws ClassCastException with 
Scala 2.13
 Key: SPARK-37026
 URL: https://issues.apache.org/jira/browse/SPARK-37026
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the 
type of ResolvedRFormula.terms is 
scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but 
scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will be 
passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13

2021-10-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37026:
---
Issue Type: Bug  (was: Improvement)

> ResolvedRFormula.toString throws ClassCastException with Scala 2.13
> ---
>
> Key: SPARK-37026
> URL: https://issues.apache.org/jira/browse/SPARK-37026
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because 
> the type of ResolvedRFormula.terms is 
> scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but 
> scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will 
> be passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36922) The SIGN/SIGNUM functions should support ANSI intervals

2021-10-14 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36922.

Fix Version/s: 3.3.0
 Assignee: PengLei
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34256

> The SIGN/SIGNUM functions should support ANSI intervals
> ---
>
> Key: SPARK-36922
> URL: https://issues.apache.org/jira/browse/SPARK-36922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Extend the *sign/signum* functions to support ANSI intervals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-13 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36993:
---
Summary: Fix json_tuple throw NPE if fields exist no foldable null value  
(was: Fix json_tupe throw NPE if fields exist no foldable null value)

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36981) Upgrade joda-time to 2.10.12

2021-10-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36981.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34253

> Upgrade joda-time to 2.10.12
> 
>
> Key: SPARK-36981
> URL: https://issues.apache.org/jira/browse/SPARK-36981
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.3.0
>
>
> joda-time 2.10.12 seems to support an updated TZDB.
> https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12
> https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36972) Add max_by/min_by API to PySpark

2021-10-12 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36972.

Fix Version/s: 3.3.0
 Assignee: Leona Yoda
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34240

> Add max_by/min_by API to PySpark
> 
>
> Key: SPARK-36972
> URL: https://issues.apache.org/jira/browse/SPARK-36972
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Assignee: Leona Yoda
>Priority: Minor
> Fix For: 3.3.0
>
>
> Related issues
>  - https://issues.apache.org/jira/browse/SPARK-27653
>  * https://issues.apache.org/jira/browse/SPARK-36963



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36981) Upgrade joda-time to 2.10.12

2021-10-11 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36981:
---
Description: 
joda-time 2.10.12 seems to support an updated TZDB.
https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12
https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547

  was:
joda-time 2.10.12 seems to support the updated TZDB.
https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12
https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547


> Upgrade joda-time to 2.10.12
> 
>
> Key: SPARK-36981
> URL: https://issues.apache.org/jira/browse/SPARK-36981
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Minor
>
> joda-time 2.10.12 seems to support an updated TZDB.
> https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12
> https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36981) Upgrade joda-time to 2.10.12

2021-10-11 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36981:
--

 Summary: Upgrade joda-time to 2.10.12
 Key: SPARK-36981
 URL: https://issues.apache.org/jira/browse/SPARK-36981
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


joda-time 2.10.12 seems to support the updated TZDB.
https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12
https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36960) Pushdown filters with ANSI interval values to ORC

2021-10-08 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36960:
--

 Summary: Pushdown filters with ANSI interval values to ORC
 Key: SPARK-36960
 URL: https://issues.apache.org/jira/browse/SPARK-36960
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Now that V1 and V2 ORC datasources support ANSI intervals, it's great to be 
able to push down filters with ANSI interval values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36937) Change OrcSourceSuite to test both V1 and V2 sources.

2021-10-06 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36937:
---
Summary: Change OrcSourceSuite to test both V1 and V2 sources.  (was: 
Re-structure OrcSourceSuite to test both V1 and V2 sources.)

> Change OrcSourceSuite to test both V1 and V2 sources.
> -
>
> Key: SPARK-36937
> URL: https://issues.apache.org/jira/browse/SPARK-36937
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> There is no V2 test for the ORC source which implements 
> CommonFileDataSourceSuite while the corresponding ones exist for all other 
> built-in file-based datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36937) Re-structure OrcSourceSuite to test both V1 and V2 sources.

2021-10-06 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36937:
--

 Summary: Re-structure OrcSourceSuite to test both V1 and V2 
sources.
 Key: SPARK-36937
 URL: https://issues.apache.org/jira/browse/SPARK-36937
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


There is no V2 test for the ORC source which implements 
CommonFileDataSourceSuite while the corresponding ones exist for all other 
built-in file-based datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36937) Re-structure OrcSourceSuite to test both V1 and V2 sources.

2021-10-06 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36937:
---
Issue Type: Improvement  (was: Bug)

> Re-structure OrcSourceSuite to test both V1 and V2 sources.
> ---
>
> Key: SPARK-36937
> URL: https://issues.apache.org/jira/browse/SPARK-36937
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>    Assignee: Kousuke Saruta
>Priority: Major
>
> There is no V2 test for the ORC source which implements 
> CommonFileDataSourceSuite while the corresponding ones exist for all other 
> built-in file-based datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36931) Read/write dataframes with ANSI intervals from/to ORC files

2021-10-05 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36931:
---
Summary: Read/write dataframes with ANSI intervals from/to ORC files  (was: 
Read/write dataframes with ANSI intervals from/to parquet files)

> Read/write dataframes with ANSI intervals from/to ORC files
> ---
>
> Key: SPARK-36931
> URL: https://issues.apache.org/jira/browse/SPARK-36931
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>    Reporter: Kousuke Saruta
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to ORC datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36931) Read/write dataframes with ANSI intervals from/to parquet files

2021-10-05 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36931:
--

 Summary: Read/write dataframes with ANSI intervals from/to parquet 
files
 Key: SPARK-36931
 URL: https://issues.apache.org/jira/browse/SPARK-36931
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Kousuke Saruta


Implement writing and reading ANSI intervals (year-month and day-time 
intervals) columns in dataframes to ORC datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36038) Basic speculation metrics at stage level

2021-10-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36038.

Fix Version/s: 3.3.0
 Assignee: Venkata krishnan Sowrirajan
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/33253

> Basic speculation metrics at stage level
> 
>
> Key: SPARK-36038
> URL: https://issues.apache.org/jira/browse/SPARK-36038
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Venkata krishnan Sowrirajan
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently there are no speculation metrics available either at application 
> level or at stage level. With in our platform, we have added speculation 
> metrics at stage level as a summary similarly to the stage level metrics 
> tracking numTotalSpeculated, numCompleted (successful), numFailed, numKilled 
> etc. This enables us to effectively understand speculative execution feature 
> at an application level and helps in further tuning the speculation configs.
> cc [~ron8hu]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-30 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422557#comment-17422557
 ] 

Kousuke Saruta commented on SPARK-36861:


Hmm, if a "T" follows the date part but it's not a valid ISO 8601 format, 
casting a string to date should fail ?
In PostgreSQL, parsing will fail in such case.

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Blocker
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36899.

Fix Version/s: 3.3.0
 Assignee: Leona Yoda
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34152

> Support ILIKE API on R
> --
>
> Key: SPARK-36899
> URL: https://issues.apache.org/jira/browse/SPARK-36899
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Assignee: Leona Yoda
>Priority: Major
> Fix For: 3.3.0
>
>
> Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36830:
---
Description: Implement writing and reading ANSI intervals (year-month and 
day-time intervals) columns in dataframes to JSON datasources.  (was: Implement 
writing and reading ANSI intervals (year-month and day-time intervals) columns 
in dataframes to Parquet datasources.)

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to JSON datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422483#comment-17422483
 ] 

Kousuke Saruta commented on SPARK-36830:


Thank you, will do.

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to Parquet datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422108#comment-17422108
 ] 

Kousuke Saruta commented on SPARK-36831:


Thank you. I'll open a PR.

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1662 matches

Mail list logo