date:20221019

[jira] [Commented] (SPARK-40422) Upgrade hive to 4.0.0

2022-10-19 Thread Bilna (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620774#comment-17620774
 ] 

Bilna commented on SPARK-40422:
---

[~srowen] In the mvn dependency tree I can see google-gson is coming through 
apache hive. that is the reason I have requested to upgrade the hive version. 
Can you please tell me which JIRA fixed the GSON version
 

> Upgrade hive to 4.0.0
> -
>
> Key: SPARK-40422
> URL: https://issues.apache.org/jira/browse/SPARK-40422
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bilna
>Priority: Major
>
> Upgrade hive to 4.0.0 to avoid security vulnerability CVE-2022-25647 through 
> google-gson:2.2.4. In hive:4.0.0, the google-gson is upgraded to 2.8.9 for 
> which CVE is not reported yet.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40457) upgrade jackson data mapper to latest

2022-10-19 Thread Bilna (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620766#comment-17620766
 ] 

Bilna commented on SPARK-40457:
---

[~hyukjin.kwon] Understood. So I think I can mark this as false positive. 
Thanks for the link

> upgrade jackson data mapper to latest 
> --
>
> Key: SPARK-40457
> URL: https://issues.apache.org/jira/browse/SPARK-40457
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bilna
>Priority: Major
>
> Upgrade  jackson-mapper-asl to the latest to resolve CVE-2019-10172



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40758) Upgrade Apache zookeeper to get rid of CVE-2020-10663

2022-10-19 Thread Bilna (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620765#comment-17620765
 ] 

Bilna commented on SPARK-40758:
---

https://issues.apache.org/jira/browse/ZOOKEEPER-3933 This link says the 
reported CVE is false positive. So I think we can close this.

> Upgrade Apache zookeeper to get rid of CVE-2020-10663
> -
>
> Key: SPARK-40758
> URL: https://issues.apache.org/jira/browse/SPARK-40758
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bilna
>Priority: Major
>
> In order to resolve security vulnerability CVE-2020-10663, upgrade Apache 
> zookeeper to 3.8.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40852) Implement `DataFrame.summary`

2022-10-19 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40852:
-

Assignee: Ruifeng Zheng

> Implement `DataFrame.summary`
> -
>
> Key: SPARK-40852
> URL: https://issues.apache.org/jira/browse/SPARK-40852
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40852) Implement `DataFrame.summary`

2022-10-19 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-40852:
-

 Summary: Implement `DataFrame.summary`
 Key: SPARK-40852
 URL: https://issues.apache.org/jira/browse/SPARK-40852
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40768) Migrate type check failures of bloom_filter_agg() onto error classes

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620741#comment-17620741
 ] 

Apache Spark commented on SPARK-40768:
--

User 'lvshaokang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38315

> Migrate type check failures of bloom_filter_agg() onto error classes
> 
>
> Key: SPARK-40768
> URL: https://issues.apache.org/jira/browse/SPARK-40768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in 
> bloom_filter_agg():
> https://github.com/apache/spark/blob/1f4e4c812a9dc6d7e35631c1663c1ba6f6d9b721/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L66-L76



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40768) Migrate type check failures of bloom_filter_agg() onto error classes

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40768:


Assignee: Apache Spark

> Migrate type check failures of bloom_filter_agg() onto error classes
> 
>
> Key: SPARK-40768
> URL: https://issues.apache.org/jira/browse/SPARK-40768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in 
> bloom_filter_agg():
> https://github.com/apache/spark/blob/1f4e4c812a9dc6d7e35631c1663c1ba6f6d9b721/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L66-L76



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40768) Migrate type check failures of bloom_filter_agg() onto error classes

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40768:


Assignee: (was: Apache Spark)

> Migrate type check failures of bloom_filter_agg() onto error classes
> 
>
> Key: SPARK-40768
> URL: https://issues.apache.org/jira/browse/SPARK-40768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in 
> bloom_filter_agg():
> https://github.com/apache/spark/blob/1f4e4c812a9dc6d7e35631c1663c1ba6f6d9b721/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L66-L76



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40768) Migrate type check failures of bloom_filter_agg() onto error classes

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620739#comment-17620739
 ] 

Apache Spark commented on SPARK-40768:
--

User 'lvshaokang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38315

> Migrate type check failures of bloom_filter_agg() onto error classes
> 
>
> Key: SPARK-40768
> URL: https://issues.apache.org/jira/browse/SPARK-40768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in 
> bloom_filter_agg():
> https://github.com/apache/spark/blob/1f4e4c812a9dc6d7e35631c1663c1ba6f6d9b721/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L66-L76



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40813) Add limit and offset to Connect DSL

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620712#comment-17620712
 ] 

Apache Spark commented on SPARK-40813:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38314

> Add limit and offset to Connect DSL
> ---
>
> Key: SPARK-40813
> URL: https://issues.apache.org/jira/browse/SPARK-40813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40813) Add limit and offset to Connect DSL

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620711#comment-17620711
 ] 

Apache Spark commented on SPARK-40813:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38314

> Add limit and offset to Connect DSL
> ---
>
> Key: SPARK-40813
> URL: https://issues.apache.org/jira/browse/SPARK-40813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40851) TimestampFormatter behavior changed when using the latest Java

2022-10-19 Thread Yang Jie (Jira)

Yang Jie created SPARK-40851:


 Summary: TimestampFormatter behavior changed when using the latest 
Java
 Key: SPARK-40851
 URL: https://issues.apache.org/jira/browse/SPARK-40851
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


{code:java}
[info] *** 12 TESTS FAILED ***
[error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
[error] Failed tests:
[error] org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
[error] org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
[error] org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
[error] org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
[error] org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
We can reproduce this issue using Java 8u352/11.0.17/17.0.5,  the test errors 
are similar to the following:

run
{code:java}
build/sbt clean "catalyst/testOnly *CastWithAnsiOffSuite" {code}
with 8u352:
{code:java}
[info] - SPARK-35711: cast timestamp without time zone to timestamp with local 
time zone *** FAILED *** (190 milliseconds)
[info]   Incorrect evaluation (codegen off): cast(0001-01-01 00:00:00 as 
timestamp), actual: -6213561782000, expected: -621355968 
(ExpressionEvalHelper.scala:209)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
[info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
[info]   at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564)
[info]   at 
org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209)
[info]   at 
org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.checkEvaluationWithoutCodegen(CastSuiteBase.scala:49)
[info]   at 
org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87)
[info]   at 
org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.checkEvaluation(CastSuiteBase.scala:49)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$198(CastSuiteBase.scala:893)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$198$adapted(CastSuiteBase.scala:890)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$197(CastSuiteBase.scala:890)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at 
org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$196(CastSuiteBase.scala:890)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$196$adapted(CastSuiteBase.scala:888)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at 
org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$195(CastSuiteBase.scala:888)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
[info]   at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at 
org.scalatest.B

[jira] [Updated] (SPARK-40851) TimestampFormatter behavior changed when using the latest Java 8/11/17

2022-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40851:
-
Summary: TimestampFormatter behavior changed when using the latest Java 
8/11/17  (was: TimestampFormatter behavior changed when using the latest Java)

> TimestampFormatter behavior changed when using the latest Java 8/11/17
> --
>
> Key: SPARK-40851
> URL: https://issues.apache.org/jira/browse/SPARK-40851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Blocker
>
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]   org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]   org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]   org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]   org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]   org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> We can reproduce this issue using Java 8u352/11.0.17/17.0.5,  the test errors 
> are similar to the following:
> run
> {code:java}
> build/sbt clean "catalyst/testOnly *CastWithAnsiOffSuite" {code}
> with 8u352:
> {code:java}
> [info] - SPARK-35711: cast timestamp without time zone to timestamp with 
> local time zone *** FAILED *** (190 milliseconds)
> [info]   Incorrect evaluation (codegen off): cast(0001-01-01 00:00:00 as 
> timestamp), actual: -6213561782000, expected: -621355968 
> (ExpressionEvalHelper.scala:209)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
> [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
> [info]   at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.checkEvaluationWithoutCodegen(CastSuiteBase.scala:49)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.checkEvaluation(CastSuiteBase.scala:49)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$198(CastSuiteBase.scala:893)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$198$adapted(CastSuiteBase.scala:890)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$197(CastSuiteBase.scala:890)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$196(CastSuiteBase.scala:890)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$196$adapted(CastSuiteBase.scala:888)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.CastSuiteBase.$anonfun$new$195(CastSuiteBase.scala:888)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSu

[jira] [Created] (SPARK-40850) Tests for Spark SQL Intrepetered Queries may execute Codegen

2022-10-19 Thread Holden Karau (Jira)

Holden Karau created SPARK-40850:


 Summary: Tests for Spark SQL Intrepetered Queries may execute 
Codegen
 Key: SPARK-40850
 URL: https://issues.apache.org/jira/browse/SPARK-40850
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.3.0, 3.3.1
Reporter: Holden Karau


We also need to set SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false" in 
PlanTest

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40847) SPARK: Load Data from Dataframe or RDD to DynamoDB

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40847.
--
Resolution: Invalid

Let's ask questions into Spark user mailing list before filing it as an issue. 
You would be able to get a better answer there.

> SPARK: Load Data from Dataframe or RDD to DynamoDB 
> ---
>
> Key: SPARK-40847
> URL: https://issues.apache.org/jira/browse/SPARK-40847
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Vivek Garg
>Priority: Major
>  Labels: spark
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
> ClientNum | Value_1 | Value_2 | Value_3 | Value_4
> 14 | A | B | C | null
> 19 | X | Y | null | null
> 21 | R | null | null | null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
> var jobConf = new JobConf(sc.hadoopConfiguration)
> jobConf.set("dynamodb.servicename", "dynamodb")
> jobConf.set("dynamodb.input.tableName", "table_name")
> jobConf.set("dynamodb.output.tableName", "table_name")
> jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
> jobConf.set("dynamodb.regionid", "eu-west-1")
> jobConf.set("dynamodb.throughput.read", "1")
> jobConf.set("dynamodb.throughput.read.percent", "1")
> jobConf.set("dynamodb.throughput.write", "1")
> jobConf.set("dynamodb.throughput.write.percent", "1")
> jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
> jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
> #Import Data
> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
> #Convert the dataframe to rdd
> val df_rdd = df.rdd
> > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> > MapPartitionsRDD[10] at rdd at :41
> #Print first rdd
> df_rdd.take(1)
> > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
> var ddbInsertFormattedRDD = df_rdd.map(a =>
> { var ddbMap = new HashMap[String, AttributeValue]() var ClientNum = new 
> AttributeValue() ClientNum.setN(a.get(0).toString) ddbMap.put("ClientNum", 
> ClientNum) var Value_1 = new AttributeValue() Value_1.setS(a.get(1).toString) 
> ddbMap.put("Value_1", Value_1) var Value_2 = new AttributeValue() 
> Value_2.setS(a.get(2).toString) ddbMap.put("Value_2", Value_2) var Value_3 = 
> new AttributeValue() Value_3.setS(a.get(3).toString) ddbMap.put("Value_3", 
> Value_3) var Value_4 = new AttributeValue() Value_4.setS(a.get(4).toString) 
> ddbMap.put("Value_4", Value_4) var item = new DynamoDBItemWritable() 
> item.setItem(ddbMap) (new Text(""), item) }
> )
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thank you.
> [Power BI 
> Certification|https://www.igmguru.com/data-science-bi/power-bi-certification-training/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40539) PySpark readwriter API parity for Spark Connect

2022-10-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40539:
---

Assignee: Rui Wang

> PySpark readwriter API parity for Spark Connect
> ---
>
> Key: SPARK-40539
> URL: https://issues.apache.org/jira/browse/SPARK-40539
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark Connect / PySpark ReadWriter parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39590) Python API Parity in Structure Streaming

2022-10-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-39590.
--
Resolution: Duplicate

Closed as duplicated.

> Python API Parity in Structure Streaming
> 
>
> Key: SPARK-39590
> URL: https://issues.apache.org/jira/browse/SPARK-39590
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> New APIs in Structured Streaming tend to get added to Java/Scala first.  This 
> creates a situation where the Python API have fallen behind.  For example 
> map/flatMapGroupsWithState is not supported in the Pyspark.  We need Pyspark 
> API to catch up with the Java/Scala APIs and, where necessary, provide 
> tighter integrations with native python data processing frameworks such as 
> Pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40539) PySpark readwriter API parity for Spark Connect

2022-10-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40539.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38086
[https://github.com/apache/spark/pull/38086]

> PySpark readwriter API parity for Spark Connect
> ---
>
> Key: SPARK-40539
> URL: https://issues.apache.org/jira/browse/SPARK-40539
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark Connect / PySpark ReadWriter parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40025) Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark

2022-10-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-40025:
-
Description: 
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-40431 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements

SPARK-40849 - Async log purge

SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency

  was:
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements

SPARK-40849 - Async log purge

SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency


> Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark
> --
>
> Key: SPARK-40025
> URL: https://issues.apache.org/jira/browse/SPARK-40025
> Project: Spark
>  Issue Type: Umbrella
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Project Lightspeed is an umbrella project aimed at improving a couple of key 
> aspects of Spark Streaming:
>  * Improving the latency and ensuring it is predictable
>  * Enhancing functionality for processing data with new operators and APIs
>  
> Umbrella Jira to track all tickets under Project Lightspeed
> SPARK-39585 - Multiple Stateful Operators in Structured Streaming
> SPARK-39586 - Advanced Windowing in Structured Streaming
> SPARK-39587 - Schema Evolution for Stateful Pipelines
> SPARK-39589 - Asynchronous I/O support
> SPARK-40431 - Python API for Arbitrary Stateful Processing
> SPARK-39591 - Offset Management Improvements
> SPARK-40849 - Async log purge
> SPARK-39592 - Asynchronous State Checkpointing
> SPARK-39593 - Configurable State Checkpointing Frequency



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40656) Schema-registry support for Protobuf format

2022-10-19 Thread Raghu Angadi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi resolved SPARK-40656.
--
Resolution: Won't Do

> Schema-registry support for Protobuf format
> ---
>
> Key: SPARK-40656
> URL: https://issues.apache.org/jira/browse/SPARK-40656
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add support for reading protobuf schema (definition) from Confluent 
> schema-registry. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40659) Schema evolution for protobuf (and Avro too?)

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620661#comment-17620661
 ] 

Raghu Angadi commented on SPARK-40659:
--

Right, I don't think that is the reason. Databricks might port Avro 
schema-registry support to open-source later. May be the team that added it 
didn't get around to open-sourcing it.

> Schema evolution for protobuf (and Avro too?)
> -
>
> Key: SPARK-40659
> URL: https://issues.apache.org/jira/browse/SPARK-40659
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Protobuf & Avro should support schema evolution in streaming. We need to 
> throw a specific error message when we detect newer version of the the schema 
> in schema registry.
> A couple of options for detecting version change at runtime:
>  * How do we detect newer version from schema registry? It is contacted only 
> during planning currently.
>  * We could detect version id in coming messages.
>  ** What if the id in the incoming message is newer than what our 
> schema-registry reports after the restart?
>  *** This indicates delayed syncs between customers schema-registry servers 
> (should be rare). We can keep erroring out until it is fixed.
>  *** Make sure we log the schema id used during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620659#comment-17620659
 ] 

Raghu Angadi commented on SPARK-40658:
--

Thats awesome! Lets look at both and merge them.

> Protobuf v2 & v3 support
> 
>
> Key: SPARK-40658
> URL: https://issues.apache.org/jira/browse/SPARK-40658
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We want to ensure Protobuf functions support both Protobuf version 2 and 
> version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support

2022-10-19 Thread Mohan Parthasarathy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620657#comment-17620657
 ] 

Mohan Parthasarathy commented on SPARK-40658:
-

[~rangadi]  I did base it off from your latest PR. I am essentially running 
most of the protobuf functions suite for v2 and v3. I also added a test case 
for defaultValues. Will issue a PR once merged.

> Protobuf v2 & v3 support
> 
>
> Key: SPARK-40658
> URL: https://issues.apache.org/jira/browse/SPARK-40658
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We want to ensure Protobuf functions support both Protobuf version 2 and 
> version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620655#comment-17620655
 ] 

Raghu Angadi commented on SPARK-40658:
--

[~mposdev21] more tests are good. Lets add them.

In my branch on top of [Java support 
PR|https://github.com/apache/spark/pull/38286], I am running pretty much all of 
the current tests with V2 and V3 protobufs (both with java class & descriptor 
files). 

I will send that PR soon after Java support PR merges. I am able to get maven 
build to generate V2 and V3 classes and descriptor sets. Haven't figured out 
how to do the same with SBT. 

> Protobuf v2 & v3 support
> 
>
> Key: SPARK-40658
> URL: https://issues.apache.org/jira/browse/SPARK-40658
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We want to ensure Protobuf functions support both Protobuf version 2 and 
> version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40846:


Assignee: Yang Jie

> GA test failed with Java 8u352
> --
>
> Key: SPARK-40846
> URL: https://issues.apache.org/jira/browse/SPARK-40846
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> catalyst test failed
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> run TimestampFormatterSuite with 8u352 locally:
>  
> {code:java}
> [info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** 
> (21 milliseconds)
> [info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
> 1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
> [info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:2

[jira] [Resolved] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40846.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38311
[https://github.com/apache/spark/pull/38311]

> GA test failed with Java 8u352
> --
>
> Key: SPARK-40846
> URL: https://issues.apache.org/jira/browse/SPARK-40846
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> catalyst test failed
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> run TimestampFormatterSuite with 8u352 locally:
>  
> {code:java}
> [info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** 
> (21 milliseconds)
> [info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
> 1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
> [info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:10

[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support

2022-10-19 Thread Mohan Parthasarathy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620651#comment-17620651
 ] 

Mohan Parthasarathy commented on SPARK-40658:
-

[~rangadi] I started adding some test cases recently and mostly working except 
for one test case. We can discuss more  about the test cases if you are 
interested.

> Protobuf v2 & v3 support
> 
>
> Key: SPARK-40658
> URL: https://issues.apache.org/jira/browse/SPARK-40658
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We want to ensure Protobuf functions support both Protobuf version 2 and 
> version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40849) Async log purge

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620641#comment-17620641
 ] 

Apache Spark commented on SPARK-40849:
--

User 'jerrypeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38313

> Async log purge
> ---
>
> Key: SPARK-40849
> URL: https://issues.apache.org/jira/browse/SPARK-40849
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Purging old entries in both the offset log and commit log will be done 
> asynchronously.
>  
> For every micro-batch, older entries in both offset log and commit log are 
> deleted. This is done so that the offset log and commit log do not 
> continually grow.  Please reference logic here
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539]
>  
>  
> The time spent performing these log purges is grouped with the “walCommit” 
> execution time in the StreamingProgressListener metrics.  Around two thirds 
> of the “walCommit” execution time is performing these purge operations thus 
> making these operations asynchronous will also reduce latency.  Also, we do 
> not necessarily need to perform the purges every micro-batch.  When these 
> purges are executed asynchronously, they do not need to block micro-batch 
> execution and we don’t need to start another purge until the current one is 
> finished.  The purges can happen essentially in the background.  We will just 
> have to synchronize the purges with the offset WAL commits and completion 
> commits so that we don’t have concurrent modifications of the offset log and 
> commit log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40849) Async log purge

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40849:


Assignee: Apache Spark

> Async log purge
> ---
>
> Key: SPARK-40849
> URL: https://issues.apache.org/jira/browse/SPARK-40849
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Assignee: Apache Spark
>Priority: Major
>
> Purging old entries in both the offset log and commit log will be done 
> asynchronously.
>  
> For every micro-batch, older entries in both offset log and commit log are 
> deleted. This is done so that the offset log and commit log do not 
> continually grow.  Please reference logic here
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539]
>  
>  
> The time spent performing these log purges is grouped with the “walCommit” 
> execution time in the StreamingProgressListener metrics.  Around two thirds 
> of the “walCommit” execution time is performing these purge operations thus 
> making these operations asynchronous will also reduce latency.  Also, we do 
> not necessarily need to perform the purges every micro-batch.  When these 
> purges are executed asynchronously, they do not need to block micro-batch 
> execution and we don’t need to start another purge until the current one is 
> finished.  The purges can happen essentially in the background.  We will just 
> have to synchronize the purges with the offset WAL commits and completion 
> commits so that we don’t have concurrent modifications of the offset log and 
> commit log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40849) Async log purge

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40849:


Assignee: (was: Apache Spark)

> Async log purge
> ---
>
> Key: SPARK-40849
> URL: https://issues.apache.org/jira/browse/SPARK-40849
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Purging old entries in both the offset log and commit log will be done 
> asynchronously.
>  
> For every micro-batch, older entries in both offset log and commit log are 
> deleted. This is done so that the offset log and commit log do not 
> continually grow.  Please reference logic here
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539]
>  
>  
> The time spent performing these log purges is grouped with the “walCommit” 
> execution time in the StreamingProgressListener metrics.  Around two thirds 
> of the “walCommit” execution time is performing these purge operations thus 
> making these operations asynchronous will also reduce latency.  Also, we do 
> not necessarily need to perform the purges every micro-batch.  When these 
> purges are executed asynchronously, they do not need to block micro-batch 
> execution and we don’t need to start another purge until the current one is 
> finished.  The purges can happen essentially in the background.  We will just 
> have to synchronize the purges with the offset WAL commits and completion 
> commits so that we don’t have concurrent modifications of the offset log and 
> commit log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40849) Async log purge

2022-10-19 Thread Boyang Jerry Peng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-40849:
--
Description: 
Purging old entries in both the offset log and commit log will be done 
asynchronously.

 

For every micro-batch, older entries in both offset log and commit log are 
deleted. This is done so that the offset log and commit log do not continually 
grow.  Please reference logic here

 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539]
 

 

The time spent performing these log purges is grouped with the “walCommit” 
execution time in the StreamingProgressListener metrics.  Around two thirds of 
the “walCommit” execution time is performing these purge operations thus making 
these operations asynchronous will also reduce latency.  Also, we do not 
necessarily need to perform the purges every micro-batch.  When these purges 
are executed asynchronously, they do not need to block micro-batch execution 
and we don’t need to start another purge until the current one is finished.  
The purges can happen essentially in the background.  We will just have to 
synchronize the purges with the offset WAL commits and completion commits so 
that we don’t have concurrent modifications of the offset log and commit log.

  was:Purging old entries in both the offset log and commit log will be done 
asynchronously


> Async log purge
> ---
>
> Key: SPARK-40849
> URL: https://issues.apache.org/jira/browse/SPARK-40849
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Purging old entries in both the offset log and commit log will be done 
> asynchronously.
>  
> For every micro-batch, older entries in both offset log and commit log are 
> deleted. This is done so that the offset log and commit log do not 
> continually grow.  Please reference logic here
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539]
>  
>  
> The time spent performing these log purges is grouped with the “walCommit” 
> execution time in the StreamingProgressListener metrics.  Around two thirds 
> of the “walCommit” execution time is performing these purge operations thus 
> making these operations asynchronous will also reduce latency.  Also, we do 
> not necessarily need to perform the purges every micro-batch.  When these 
> purges are executed asynchronously, they do not need to block micro-batch 
> execution and we don’t need to start another purge until the current one is 
> finished.  The purges can happen essentially in the background.  We will just 
> have to synchronize the purges with the offset WAL commits and completion 
> commits so that we don’t have concurrent modifications of the offset log and 
> commit log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40849) Async log purge

2022-10-19 Thread Boyang Jerry Peng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-40849:
--
Description: Purging old entries in both the offset log and commit log will 
be done asynchronously

> Async log purge
> ---
>
> Key: SPARK-40849
> URL: https://issues.apache.org/jira/browse/SPARK-40849
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Purging old entries in both the offset log and commit log will be done 
> asynchronously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40025) Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark

2022-10-19 Thread Boyang Jerry Peng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-40025:
--
Description: 
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements

SPARK-40849 

 

SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency

  was:
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements


SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency


> Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark
> --
>
> Key: SPARK-40025
> URL: https://issues.apache.org/jira/browse/SPARK-40025
> Project: Spark
>  Issue Type: Umbrella
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Project Lightspeed is an umbrella project aimed at improving a couple of key 
> aspects of Spark Streaming:
>  * Improving the latency and ensuring it is predictable
>  * Enhancing functionality for processing data with new operators and APIs
>  
> Umbrella Jira to track all tickets under Project Lightspeed
> SPARK-39585 - Multiple Stateful Operators in Structured Streaming
> SPARK-39586 - Advanced Windowing in Structured Streaming
> SPARK-39587 - Schema Evolution for Stateful Pipelines
> SPARK-39589 - Asynchronous I/O support
> SPARK-39590 - Python API for Arbitrary Stateful Processing
> SPARK-39591 - Offset Management Improvements
> SPARK-40849 
>  
> SPARK-39592 - Asynchronous State Checkpointing
> SPARK-39593 - Configurable State Checkpointing Frequency



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40025) Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark

2022-10-19 Thread Boyang Jerry Peng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-40025:
--
Description: 
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements

SPARK-40849 - Async log purge

SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency

  was:
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements

SPARK-40849 

 

SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency


> Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark
> --
>
> Key: SPARK-40025
> URL: https://issues.apache.org/jira/browse/SPARK-40025
> Project: Spark
>  Issue Type: Umbrella
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Project Lightspeed is an umbrella project aimed at improving a couple of key 
> aspects of Spark Streaming:
>  * Improving the latency and ensuring it is predictable
>  * Enhancing functionality for processing data with new operators and APIs
>  
> Umbrella Jira to track all tickets under Project Lightspeed
> SPARK-39585 - Multiple Stateful Operators in Structured Streaming
> SPARK-39586 - Advanced Windowing in Structured Streaming
> SPARK-39587 - Schema Evolution for Stateful Pipelines
> SPARK-39589 - Asynchronous I/O support
> SPARK-39590 - Python API for Arbitrary Stateful Processing
> SPARK-39591 - Offset Management Improvements
> SPARK-40849 - Async log purge
> SPARK-39592 - Asynchronous State Checkpointing
> SPARK-39593 - Configurable State Checkpointing Frequency



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40849) Async log purge

2022-10-19 Thread Boyang Jerry Peng (Jira)

Boyang Jerry Peng created SPARK-40849:
-

 Summary: Async log purge
 Key: SPARK-40849
 URL: https://issues.apache.org/jira/browse/SPARK-40849
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Boyang Jerry Peng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40025) Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark

2022-10-19 Thread Boyang Jerry Peng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-40025:
--
Description: 
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements


SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency

  was:
Project Lightspeed is an umbrella project aimed at improving a couple of key 
aspects of Spark Streaming:
 * Improving the latency and ensuring it is predictable
 * Enhancing functionality for processing data with new operators and APIs

 

Umbrella Jira to track all tickets under Project Lightspeed

SPARK-39585 - Multiple Stateful Operators in Structured Streaming
SPARK-39586 - Advanced Windowing in Structured Streaming
SPARK-39587 - Schema Evolution for Stateful Pipelines
SPARK-39589 - Asynchronous I/O support
SPARK-39590 - Python API for Arbitrary Stateful Processing
SPARK-39591 - Offset Management Improvements
SPARK-39592 - Asynchronous State Checkpointing
SPARK-39593 - Configurable State Checkpointing Frequency


> Project Lightspeed: Faster and Simpler Stream Processing with Apache Spark
> --
>
> Key: SPARK-40025
> URL: https://issues.apache.org/jira/browse/SPARK-40025
> Project: Spark
>  Issue Type: Umbrella
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Project Lightspeed is an umbrella project aimed at improving a couple of key 
> aspects of Spark Streaming:
>  * Improving the latency and ensuring it is predictable
>  * Enhancing functionality for processing data with new operators and APIs
>  
> Umbrella Jira to track all tickets under Project Lightspeed
> SPARK-39585 - Multiple Stateful Operators in Structured Streaming
> SPARK-39586 - Advanced Windowing in Structured Streaming
> SPARK-39587 - Schema Evolution for Stateful Pipelines
> SPARK-39589 - Asynchronous I/O support
> SPARK-39590 - Python API for Arbitrary Stateful Processing
> SPARK-39591 - Offset Management Improvements
> SPARK-39592 - Asynchronous State Checkpointing
> SPARK-39593 - Configurable State Checkpointing Frequency



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40659) Schema evolution for protobuf (and Avro too?)

2022-10-19 Thread Sandish Kumar HN (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620581#comment-17620581
 ] 

Sandish Kumar HN commented on SPARK-40659:
--

[~rangadi] is the reason being confluent schema registry not open-sourced? I 
see that Apache Flink uses a confluent schema registry 
[https://github.com/apache/flink/blob/master/flink-formats/flink-avro-confluent-registry/pom.xml#L39]
  

> Schema evolution for protobuf (and Avro too?)
> -
>
> Key: SPARK-40659
> URL: https://issues.apache.org/jira/browse/SPARK-40659
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Protobuf & Avro should support schema evolution in streaming. We need to 
> throw a specific error message when we detect newer version of the the schema 
> in schema registry.
> A couple of options for detecting version change at runtime:
>  * How do we detect newer version from schema registry? It is contacted only 
> during planning currently.
>  * We could detect version id in coming messages.
>  ** What if the id in the incoming message is newer than what our 
> schema-registry reports after the restart?
>  *** This indicates delayed syncs between customers schema-registry servers 
> (should be rare). We can keep erroring out until it is fixed.
>  *** Make sure we log the schema id used during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40844) Flip the default value of Kafka offset fetching config

2022-10-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-40844:
-
Labels: release-notes  (was: )

> Flip the default value of Kafka offset fetching config
> --
>
> Key: SPARK-40844
> URL: https://issues.apache.org/jira/browse/SPARK-40844
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: release-notes
> Fix For: 3.4.0
>
>
> Discussion thread: 
> [https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40844) Flip the default value of Kafka offset fetching config

2022-10-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-40844:


Assignee: Jungtaek Lim

> Flip the default value of Kafka offset fetching config
> --
>
> Key: SPARK-40844
> URL: https://issues.apache.org/jira/browse/SPARK-40844
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> Discussion thread: 
> [https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40844) Flip the default value of Kafka offset fetching config

2022-10-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-40844.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38306
[https://github.com/apache/spark/pull/38306]

> Flip the default value of Kafka offset fetching config
> --
>
> Key: SPARK-40844
> URL: https://issues.apache.org/jira/browse/SPARK-40844
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> Discussion thread: 
> [https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40659) Schema evolution for protobuf (and Avro too?)

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620557#comment-17620557
 ] 

Raghu Angadi commented on SPARK-40659:
--

Schema-support for Avro is not in open-source Spark. When we implement 
automatic schema evolution, we will do that for both Avro and Protobuf. 

I think we should close this for now. WDYT?

When we back port schema-registry support, we will do it for both Avro and 
Protobuf together. 

 

> Schema evolution for protobuf (and Avro too?)
> -
>
> Key: SPARK-40659
> URL: https://issues.apache.org/jira/browse/SPARK-40659
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Protobuf & Avro should support schema evolution in streaming. We need to 
> throw a specific error message when we detect newer version of the the schema 
> in schema registry.
> A couple of options for detecting version change at runtime:
>  * How do we detect newer version from schema registry? It is contacted only 
> during planning currently.
>  * We could detect version id in coming messages.
>  ** What if the id in the incoming message is newer than what our 
> schema-registry reports after the restart?
>  *** This indicates delayed syncs between customers schema-registry servers 
> (should be rare). We can keep erroring out until it is fixed.
>  *** Make sure we log the schema id used during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40656) Schema-registry support for Protobuf format

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620556#comment-17620556
 ] 

Raghu Angadi commented on SPARK-40656:
--

Schema-support for Avro is not here. I will close this. When we back port 
schema-registry support, we will do it for both Avro and Protobuf together. 

Committers, please close this as "Won't Do".

> Schema-registry support for Protobuf format
> ---
>
> Key: SPARK-40656
> URL: https://issues.apache.org/jira/browse/SPARK-40656
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add support for reading protobuf schema (definition) from Confluent 
> schema-registry. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40777) Use error classes for Protobuf exceptions

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620553#comment-17620553
 ] 

Raghu Angadi commented on SPARK-40777:
--

I filed a separate ticket SPARK-40848 for generating descriptor files. I will 
do that. We don't need to do that in here. 

> Use error classes for Protobuf exceptions
> -
>
> Key: SPARK-40777
> URL: https://issues.apache.org/jira/browse/SPARK-40777
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We should use error classes for all the exceptions.
> A follow up from Protobuf PR [https://github.com/apache/spark/pull/37972]
>  
> cc: [~sanysand...@gmail.com] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support

2022-10-19 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620551#comment-17620551
 ] 

Raghu Angadi commented on SPARK-40658:
--

[~sanysand...@gmail.com] , [~mposdev21] I am working on this. I think we can 
support both without any issue. 

 

> Protobuf v2 & v3 support
> 
>
> Key: SPARK-40658
> URL: https://issues.apache.org/jira/browse/SPARK-40658
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> We want to ensure Protobuf functions support both Protobuf version 2 and 
> version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40848) Protobuf: Generate descriptor files at build time

2022-10-19 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40848:


 Summary: Protobuf: Generate descriptor files at build time
 Key: SPARK-40848
 URL: https://issues.apache.org/jira/browse/SPARK-40848
 Project: Spark
  Issue Type: Improvement
  Components: Protobuf
Affects Versions: 3.3.0
Reporter: Raghu Angadi


Generate descriptor files during the build rather than pre-creating them. 

[~rangadi] will do this. 

cc: [~sanysand...@gmail.com] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2022-10-19 Thread Pankaj Nagla (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620511#comment-17620511
 ] 

Pankaj Nagla commented on SPARK-34827:
--

Thank you for sharing such good information. Very informative and effective 
post. [Msbi 
Training|https://www.igmguru.com/data-science-bi/msbi-certification-training/] 
offers the best solutions for Business Intelligence and data mining. MSBI uses 
Visual Studio data tools and SQL servers to make great decisions in our 
business activities.

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-40791) The semantics of `F` in `DateTimeFormatter` have changed

2022-10-19 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620487#comment-17620487
 ] 

Dongjoon Hyun edited comment on SPARK-40791 at 10/19/22 6:12 PM:
-

According to the PR comment, Java 11 has the same issue, [~LuciferYang]?
bq. hmm... the latest 11(11.0.17) and 17(17.0.5) have the same issue ...


was (Author: dongjoon):
According to the PR comment, Java 11 has the same issue, [~LuciferYang]?
> hmm... the latest 11(11.0.17) and 17(17.0.5) have the same issue ...

> The semantics of `F` in `DateTimeFormatter` have changed
> 
>
> Key: SPARK-40791
> URL: https://issues.apache.org/jira/browse/SPARK-40791
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> val createSql =
>   """
> |create temporary view v as select col from values
> | (timestamp '1582-06-01 11:33:33.123UTC+08'),
> | (timestamp '1970-01-01 00:00:00.000Europe/Paris'),
> | (timestamp '1970-12-31 23:59:59.999Asia/Srednekolymsk'),
> | (timestamp '1996-04-01 00:33:33.123Australia/Darwin'),
> | (timestamp '2018-11-17 13:33:33.123Z'),
> | (timestamp '2020-01-01 01:33:33.123Asia/Shanghai'),
> | (timestamp '2100-01-01 01:33:33.123America/Los_Angeles') t(col)
> | """.stripMargin
> sql(createSql)
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
>   val rows = sql("select col, date_format(col, 'F') from v").collect()
>   // scalastyle:off
>   rows.foreach(println)
> } {code}
>  
> Before Java 19, the result is 
>  
> {code:java}
> [1582-05-31 19:40:35.123,3]
> [1969-12-31 15:00:00.0,3]
> [1970-12-31 04:59:59.999,3]
> [1996-03-31 07:03:33.123,3]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,3]
> [2100-01-01 01:33:33.123,1] {code}
> Java 19
>  
> {code:java}
> [1582-05-31 19:40:35.123,5]
> [1969-12-31 15:00:00.0,5]
> [1970-12-31 04:59:59.999,5]
> [1996-03-31 07:03:33.123,5]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,5]
> [2100-01-01 01:33:33.123,1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40791) The semantics of `F` in `DateTimeFormatter` have changed

2022-10-19 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620487#comment-17620487
 ] 

Dongjoon Hyun commented on SPARK-40791:
---

According to the PR comment, Java 11 has the same issue, [~LuciferYang]?
> hmm... the latest 11(11.0.17) and 17(17.0.5) have the same issue ...

> The semantics of `F` in `DateTimeFormatter` have changed
> 
>
> Key: SPARK-40791
> URL: https://issues.apache.org/jira/browse/SPARK-40791
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> val createSql =
>   """
> |create temporary view v as select col from values
> | (timestamp '1582-06-01 11:33:33.123UTC+08'),
> | (timestamp '1970-01-01 00:00:00.000Europe/Paris'),
> | (timestamp '1970-12-31 23:59:59.999Asia/Srednekolymsk'),
> | (timestamp '1996-04-01 00:33:33.123Australia/Darwin'),
> | (timestamp '2018-11-17 13:33:33.123Z'),
> | (timestamp '2020-01-01 01:33:33.123Asia/Shanghai'),
> | (timestamp '2100-01-01 01:33:33.123America/Los_Angeles') t(col)
> | """.stripMargin
> sql(createSql)
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
>   val rows = sql("select col, date_format(col, 'F') from v").collect()
>   // scalastyle:off
>   rows.foreach(println)
> } {code}
>  
> Before Java 19, the result is 
>  
> {code:java}
> [1582-05-31 19:40:35.123,3]
> [1969-12-31 15:00:00.0,3]
> [1970-12-31 04:59:59.999,3]
> [1996-03-31 07:03:33.123,3]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,3]
> [2100-01-01 01:33:33.123,1] {code}
> Java 19
>  
> {code:java}
> [1582-05-31 19:40:35.123,5]
> [1969-12-31 15:00:00.0,5]
> [1970-12-31 04:59:59.999,5]
> [1996-03-31 07:03:33.123,5]
> [2018-11-17 05:33:33.123,3]
> [2019-12-31 09:33:33.123,5]
> [2100-01-01 01:33:33.123,1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40847) SPARK: Load Data from Dataframe or RDD to DynamoDB

2022-10-19 Thread Vivek Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Garg updated SPARK-40847:
---
Description: 
I am using spark 2.1 on EMR and i have a dataframe like this:

ClientNum | Value_1 | Value_2 | Value_3 | Value_4
14 | A | B | C | null
19 | X | Y | null | null
21 | R | null | null | null
I want to load data into DynamoDB table with ClientNum as key fetching:

Analyze Your Data on Amazon DynamoDB with apche Spark11

Using Spark SQL for ETL3

here is my code that I tried to solve:

var jobConf = new JobConf(sc.hadoopConfiguration)
jobConf.set("dynamodb.servicename", "dynamodb")
jobConf.set("dynamodb.input.tableName", "table_name")
jobConf.set("dynamodb.output.tableName", "table_name")
jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
jobConf.set("dynamodb.regionid", "eu-west-1")
jobConf.set("dynamodb.throughput.read", "1")
jobConf.set("dynamodb.throughput.read.percent", "1")
jobConf.set("dynamodb.throughput.write", "1")
jobConf.set("dynamodb.throughput.write.percent", "1")

jobConf.set("mapred.output.format.class", 
"org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
jobConf.set("mapred.input.format.class", 
"org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")

#Import Data
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load(path)
I performed a transformation to have an RDD that matches the types that the 
DynamoDB custom output format knows how to write. The custom output format 
expects a tuple containing the Text and DynamoDBItemWritable types.

Create a new RDD with those types in it, in the following map call:

#Convert the dataframe to rdd
val df_rdd = df.rdd
> df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41

#Print first rdd
df_rdd.take(1)
> res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])

var ddbInsertFormattedRDD = df_rdd.map(a =>

{ var ddbMap = new HashMap[String, AttributeValue]() var ClientNum = new 
AttributeValue() ClientNum.setN(a.get(0).toString) ddbMap.put("ClientNum", 
ClientNum) var Value_1 = new AttributeValue() Value_1.setS(a.get(1).toString) 
ddbMap.put("Value_1", Value_1) var Value_2 = new AttributeValue() 
Value_2.setS(a.get(2).toString) ddbMap.put("Value_2", Value_2) var Value_3 = 
new AttributeValue() Value_3.setS(a.get(3).toString) ddbMap.put("Value_3", 
Value_3) var Value_4 = new AttributeValue() Value_4.setS(a.get(4).toString) 
ddbMap.put("Value_4", Value_4) var item = new DynamoDBItemWritable() 
item.setItem(ddbMap) (new Text(""), item) }
)
This last call uses the job configuration that defines the EMR-DDB connector to 
write out the new RDD you created in the expected format:

ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
fails with the follwoing error:

Caused by: java.lang.NullPointerException
null values caused the error, if I try with ClientNum and Value_1 it works data 
is correctly inserted on DynamoDB table.

Thank you.
[Power BI 
Certification|https://www.igmguru.com/data-science-bi/power-bi-certification-training/]

  was:
I am using spark 2.1 on EMR and i have a dataframe like this:

ClientNum | Value_1 | Value_2 | Value_3 | Value_4
14 | A | B | C | null
19 | X | Y | null | null
21 | R | null | null | null
I want to load data into DynamoDB table with ClientNum as key fetching:

Analyze Your Data on Amazon DynamoDB with apche Spark11

Using Spark SQL for ETL3

here is my code that I tried to solve:

var jobConf = new JobConf(sc.hadoopConfiguration)
jobConf.set("dynamodb.servicename", "dynamodb")
jobConf.set("dynamodb.input.tableName", "table_name")
jobConf.set("dynamodb.output.tableName", "table_name")
jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
jobConf.set("dynamodb.regionid", "eu-west-1")
jobConf.set("dynamodb.throughput.read", "1")
jobConf.set("dynamodb.throughput.read.percent", "1")
jobConf.set("dynamodb.throughput.write", "1")
jobConf.set("dynamodb.throughput.write.percent", "1")

jobConf.set("mapred.output.format.class", 
"org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
jobConf.set("mapred.input.format.class", 
"org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")

#Import Data
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load(path)
I performed a transformation to have an RDD that matches the types that the 
DynamoDB custom output format knows how to write. The custom output format 
expects a tuple containing the Text and DynamoDBItemWritable types.

Create a new RDD with those types in it, in the following map call:

#Convert the dataframe to rdd
val df_rdd = df.rdd
> df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41

#Print first rdd
df_rdd.take(1)
> res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])

var ddbInsertFormatted

[jira] [Created] (SPARK-40847) SPARK: Load Data from Dataframe or RDD to DynamoDB

2022-10-19 Thread Vivek Garg (Jira)

Vivek Garg created SPARK-40847:
--

 Summary: SPARK: Load Data from Dataframe or RDD to DynamoDB 
 Key: SPARK-40847
 URL: https://issues.apache.org/jira/browse/SPARK-40847
 Project: Spark
  Issue Type: Question
  Components: Deploy
Affects Versions: 2.1.1
Reporter: Vivek Garg


I am using spark 2.1 on EMR and i have a dataframe like this:

ClientNum | Value_1 | Value_2 | Value_3 | Value_4
14 | A | B | C | null
19 | X | Y | null | null
21 | R | null | null | null
I want to load data into DynamoDB table with ClientNum as key fetching:

Analyze Your Data on Amazon DynamoDB with apche Spark11

Using Spark SQL for ETL3

here is my code that I tried to solve:

var jobConf = new JobConf(sc.hadoopConfiguration)
jobConf.set("dynamodb.servicename", "dynamodb")
jobConf.set("dynamodb.input.tableName", "table_name")
jobConf.set("dynamodb.output.tableName", "table_name")
jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
jobConf.set("dynamodb.regionid", "eu-west-1")
jobConf.set("dynamodb.throughput.read", "1")
jobConf.set("dynamodb.throughput.read.percent", "1")
jobConf.set("dynamodb.throughput.write", "1")
jobConf.set("dynamodb.throughput.write.percent", "1")

jobConf.set("mapred.output.format.class", 
"org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
jobConf.set("mapred.input.format.class", 
"org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")

#Import Data
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load(path)
I performed a transformation to have an RDD that matches the types that the 
DynamoDB custom output format knows how to write. The custom output format 
expects a tuple containing the Text and DynamoDBItemWritable types.

Create a new RDD with those types in it, in the following map call:

#Convert the dataframe to rdd
val df_rdd = df.rdd
> df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41

#Print first rdd
df_rdd.take(1)
> res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])

var ddbInsertFormattedRDD = df_rdd.map(a =>

{ var ddbMap = new HashMap[String, AttributeValue]() var ClientNum = new 
AttributeValue() ClientNum.setN(a.get(0).toString) ddbMap.put("ClientNum", 
ClientNum) var Value_1 = new AttributeValue() Value_1.setS(a.get(1).toString) 
ddbMap.put("Value_1", Value_1) var Value_2 = new AttributeValue() 
Value_2.setS(a.get(2).toString) ddbMap.put("Value_2", Value_2) var Value_3 = 
new AttributeValue() Value_3.setS(a.get(3).toString) ddbMap.put("Value_3", 
Value_3) var Value_4 = new AttributeValue() Value_4.setS(a.get(4).toString) 
ddbMap.put("Value_4", Value_4) var item = new DynamoDBItemWritable() 
item.setItem(ddbMap) (new Text(""), item) }
)
This last call uses the job configuration that defines the EMR-DDB connector to 
write out the new RDD you created in the expected format:

ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
fails with the follwoing error:

Caused by: java.lang.NullPointerException
null values caused the error, if I try with ClientNum and Value_1 it works data 
is correctly inserted on DynamoDB table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40819:


Assignee: (was: Apache Spark)

> Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type 
> instead of automatically converting to LongType 
> 
>
> Key: SPARK-40819
> URL: https://issues.apache.org/jira/browse/SPARK-40819
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Alfred Davidson
>Priority: Critical
>
> Since 3.2 parquet files containing attributes with type "INT64 
> (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read 
> throws:
>  
> {code:java}
> Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: 
> INT64 (TIMESTAMP(NANOS,true))
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521)
>   at 
> org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76)
>  {code}
> Prior to 3.2 successfully reads the parquet automatically converting to a 
> LongType.
> I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 
> introduced the change in behaviour, more specifically here: 
> [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154]
>  which throws the QueryCompilationErrors.illegalParquetTypeError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40819:


Assignee: Apache Spark

> Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type 
> instead of automatically converting to LongType 
> 
>
> Key: SPARK-40819
> URL: https://issues.apache.org/jira/browse/SPARK-40819
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Alfred Davidson
>Assignee: Apache Spark
>Priority: Critical
>
> Since 3.2 parquet files containing attributes with type "INT64 
> (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read 
> throws:
>  
> {code:java}
> Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: 
> INT64 (TIMESTAMP(NANOS,true))
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521)
>   at 
> org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76)
>  {code}
> Prior to 3.2 successfully reads the parquet automatically converting to a 
> LongType.
> I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 
> introduced the change in behaviour, more specifically here: 
> [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154]
>  which throws the QueryCompilationErrors.illegalParquetTypeError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620298#comment-17620298
 ] 

Apache Spark commented on SPARK-40819:
--

User 'awdavidson' has created a pull request for this issue:
https://github.com/apache/spark/pull/38312

> Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type 
> instead of automatically converting to LongType 
> 
>
> Key: SPARK-40819
> URL: https://issues.apache.org/jira/browse/SPARK-40819
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Alfred Davidson
>Priority: Critical
>
> Since 3.2 parquet files containing attributes with type "INT64 
> (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read 
> throws:
>  
> {code:java}
> Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: 
> INT64 (TIMESTAMP(NANOS,true))
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521)
>   at 
> org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76)
>  {code}
> Prior to 3.2 successfully reads the parquet automatically converting to a 
> LongType.
> I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 
> introduced the change in behaviour, more specifically here: 
> [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154]
>  which throws the QueryCompilationErrors.illegalParquetTypeError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620272#comment-17620272
 ] 

Apache Spark commented on SPARK-40846:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38311

> GA test failed with Java 8u352
> --
>
> Key: SPARK-40846
> URL: https://issues.apache.org/jira/browse/SPARK-40846
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> catalyst test failed
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> run TimestampFormatterSuite with 8u352 locally:
>  
> {code:java}
> [info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** 
> (21 milliseconds)
> [info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
> 1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
> [info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Tran

[jira] [Assigned] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40846:


Assignee: (was: Apache Spark)

> GA test failed with Java 8u352
> --
>
> Key: SPARK-40846
> URL: https://issues.apache.org/jira/browse/SPARK-40846
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> catalyst test failed
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> run TimestampFormatterSuite with 8u352 locally:
>  
> {code:java}
> [info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** 
> (21 milliseconds)
> [info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
> 1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
> [info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at

[jira] [Assigned] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40846:


Assignee: Apache Spark

> GA test failed with Java 8u352
> --
>
> Key: SPARK-40846
> URL: https://issues.apache.org/jira/browse/SPARK-40846
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> catalyst test failed
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> run TimestampFormatterSuite with 8u352 locally:
>  
> {code:java}
> [info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** 
> (21 milliseconds)
> [info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
> 1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
> [info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer

[jira] [Commented] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620260#comment-17620260
 ] 

Apache Spark commented on SPARK-40846:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38311

> GA test failed with Java 8u352
> --
>
> Key: SPARK-40846
> URL: https://issues.apache.org/jira/browse/SPARK-40846
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> catalyst test failed
> {code:java}
> [info] *** 12 TESTS FAILED ***
> [error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
> [error] Failed tests:
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
> [error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
> [error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
> [error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
> [error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
> run TimestampFormatterSuite with 8u352 locally:
>  
> {code:java}
> [info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** 
> (21 milliseconds)
> [info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
> 1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
> [info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
> [info]   at 
> org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Tran

[jira] [Created] (SPARK-40846) GA test failed with Java 8u352

2022-10-19 Thread Yang Jie (Jira)

Yang Jie created SPARK-40846:


 Summary: GA test failed with Java 8u352
 Key: SPARK-40846
 URL: https://issues.apache.org/jira/browse/SPARK-40846
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie


catalyst test failed
{code:java}
[info] *** 12 TESTS FAILED ***
[error] Failed: Total 6746, Failed 12, Errors 0, Passed 6734, Ignored 5
[error] Failed tests:
[error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOffSuite
[error]     org.apache.spark.sql.catalyst.util.TimestampFormatterSuite
[error]     org.apache.spark.sql.catalyst.expressions.CastWithAnsiOnSuite
[error]     org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite
[error]     org.apache.spark.sql.catalyst.expressions.TryCastSuite {code}
run TimestampFormatterSuite with 8u352 locally:

 
{code:java}
[info] - SPARK-31557: rebasing in legacy formatters/parsers *** FAILED *** (21 
milliseconds)
[info]   zoneId = Antarctica/Vostok 1000-01-01T06:52:23 did not equal 
1000-01-01T01:02:03 (TimestampFormatterSuite.scala:281)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$33(TimestampFormatterSuite.scala:281)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
[info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
[info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
[info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$31(TimestampFormatterSuite.scala:280)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.Assertions.withClue(Assertions.scala:1065)
[info]   at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
[info]   at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$30(TimestampFormatterSuite.scala:271)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at 
org.apache.spark.sql.catalyst.util.DateTimeTestUtils$.withDefaultTimeZone(DateTimeTestUtils.scala:61)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$29(TimestampFormatterSuite.scala:271)
[info]   at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
[info]   at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28(TimestampFormatterSuite.scala:270)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$28$adapted(TimestampFormatterSuite.scala:268)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$27(TimestampFormatterSuite.scala:268)
[info]   at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
[info]   at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.withSQLConf(TimestampFormatterSuite.scala:31)
[info]   at 
org.apache.spark.sql.catalyst.util.TimestampFormatterSuite.$anonfun$new$26(TimestampFormatterSuite.scala:268)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.Super

[jira] [Commented] (SPARK-40734) KafkaMicroBatchV2SourceWithAdminSuite failed

2022-10-19 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620167#comment-17620167
 ] 

Yang Jie commented on SPARK-40734:
--

It looks like flaky test, 

re-run will succeed, but the inducement for failure has not been found yet

 
{code:java}
- ensure stream-stream self-join generates only one offset in log and correct 
metrics *** FAILED ***
  Timed out waiting for stream: The code passed to failAfter did not complete 
within 30 seconds.
  java.base/java.lang.Thread.getStackTrace(Thread.java:2550)
    org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:277)
    org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
    org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
    
org.apache.spark.sql.kafka010.KafkaSourceTest.failAfter(KafkaMicroBatchSourceSuite.scala:53)
    
org.apache.spark.sql.streaming.StreamTest.$anonfun$testStream$7(StreamTest.scala:479)
    
org.apache.spark.sql.streaming.StreamTest.$anonfun$testStream$7$adapted(StreamTest.scala:478)
    scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
    scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
    scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
  
    Caused by:  null
    
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1766)
    
org.apache.spark.sql.execution.streaming.StreamExecution.awaitOffset(StreamExecution.scala:465)
    
org.apache.spark.sql.streaming.StreamTest.$anonfun$testStream$8(StreamTest.scala:480)
    
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    
org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
    
org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
    
org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
    
org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
    
org.apache.spark.sql.kafka010.KafkaSourceTest.failAfter(KafkaMicroBatchSourceSuite.scala:53)
    
org.apache.spark.sql.streaming.StreamTest.$anonfun$testStream$7(StreamTest.scala:479)
  
  
  == Progress ==
     AssertOnQuery(, )
     AddKafkaData(topics = Set(topic-51), data = WrappedArray(1, 2), message = )
  => CheckAnswer: [1,1,1],[2,2,2]
     AddKafkaData(topics = Set(topic-51), data = WrappedArray(6, 3), message = )
     CheckAnswer: [1,1,1],[2,2,2],[3,3,3],[1,6,1],[1,1,6],[1,6,6]
     AssertOnQuery(, )
  
  == Stream ==
  Output Mode: Append
  Stream state: {KafkaV2[Subscribe[topic-51]]: {"topic-51":{"1":0,"0":1}}}
  Thread state: alive
  Thread stack trace: java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
  java.base/java.lang.ProcessImpl.(ProcessImpl.java:319)
  java.base/java.lang.ProcessImpl.start(ProcessImpl.java:249)
  java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
  java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
  org.apache.hadoop.util.Shell.runCommand(Shell.java:937)
  org.apache.hadoop.util.Shell.run(Shell.java:900)
  org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
  org.apache.hadoop.util.Shell.execCommand(Shell.java:1306)
  org.apache.hadoop.util.Shell.execCommand(Shell.java:1288)
  
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978)
  
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324)
  
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294)
  
org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439)
  org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428)
  org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
  org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1305)
  
org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:102)
  
org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:360)
  org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:400)
  org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626)
  org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701)
  org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697)
  org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
  org.apache.hadoop.fs.FileContext.create(FileContext.java:703)
  
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:359)
  
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:140)
  
org.apache.spa

[jira] [Updated] (SPARK-40734) KafkaMicroBatchV2SourceWithAdminSuite failed

2022-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40734:
-
Summary: KafkaMicroBatchV2SourceWithAdminSuite failed  (was: 
KafkaMicroBatchSourceSuite failed)

> KafkaMicroBatchV2SourceWithAdminSuite failed
> 
>
> Key: SPARK-40734
> URL: https://issues.apache.org/jira/browse/SPARK-40734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> "ensure stream-stream self-join generates only one offset in log and correct 
> metrics" failed
> Failure reason to be supplemented



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40734) KafkaMicroBatchV2SourceWithAdminSuite failed

2022-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40734:
-
Description: 
- ensure stream-stream self-join generates only one offset in log and correct 
metrics *** FAILED ***

- read Kafka transactional messages: read_committed *** FAILED ***

Failure reason to be supplemented

  was:
"ensure stream-stream self-join generates only one offset in log and correct 
metrics" failed

Failure reason to be supplemented


> KafkaMicroBatchV2SourceWithAdminSuite failed
> 
>
> Key: SPARK-40734
> URL: https://issues.apache.org/jira/browse/SPARK-40734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> - ensure stream-stream self-join generates only one offset in log and correct 
> metrics *** FAILED ***
> - read Kafka transactional messages: read_committed *** FAILED ***
> Failure reason to be supplemented



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40753) Fix bug in test case for catalog directory operation

2022-10-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40753:
---

Assignee: xiaoping.huang

> Fix bug in test case for catalog directory operation
> 
>
> Key: SPARK-40753
> URL: https://issues.apache.org/jira/browse/SPARK-40753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: xiaoping.huang
>Assignee: xiaoping.huang
>Priority: Minor
>
> The implementation class of ExternalCatalog will perform folder operations 
> when performing operations such as create/drop database/table/partition. The 
> test case creates a folder in advance when obtaining the DB/Partition path 
> URI, resulting in the result of the test case is not convincing enough force.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40753) Fix bug in test case for catalog directory operation

2022-10-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40753.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38206
[https://github.com/apache/spark/pull/38206]

> Fix bug in test case for catalog directory operation
> 
>
> Key: SPARK-40753
> URL: https://issues.apache.org/jira/browse/SPARK-40753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: xiaoping.huang
>Assignee: xiaoping.huang
>Priority: Minor
> Fix For: 3.4.0
>
>
> The implementation class of ExternalCatalog will perform folder operations 
> when performing operations such as create/drop database/table/partition. The 
> test case creates a folder in advance when obtaining the DB/Partition path 
> URI, resulting in the result of the test case is not convincing enough force.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40820) Creating StructType from Json

2022-10-19 Thread Anthony Wainer Cachay Guivin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620099#comment-17620099
 ] 

Anthony Wainer Cachay Guivin commented on SPARK-40820:
--

Here an example, many dataframes are being created from a schema, this schema 
is created from a Json.
The input parameters to create a schema is StructType.fromJson(json), this 
internally uses StructField.fromJson().

The issue is when the StructField parses the Json, which forces to define the 
nullable and metadata attributes inside.

![image]([https://user-images.githubusercontent.com/7476964/196637396-d437278c-f462-41dd-8323-3d613c05214b.png])

it is understandable that name and type are mandatory, but the others should be 
optional.

The current parsing does not allow this. If more than 1000 fields are defined, 
this would be a headache and unnecessary metadata.

> Creating StructType from Json
> -
>
> Key: SPARK-40820
> URL: https://issues.apache.org/jira/browse/SPARK-40820
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Anthony Wainer Cachay Guivin
>Priority: Minor
>
> When create a StructType from a Python dictionary you utilize the 
> [StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L569-L571]
>  method.
> To create a schema can be created as follows from the code below, but it 
> requires to put inside the json: Nullable and Metadata, this is inconsistent 
> because within the DataType class this by default.
> {code:python}
> json = {
> "name": "name",
> "type": "string"
> }
> StructField.fromJson(json)
> {code}
> Error:
> {code:python}
> from pyspark.sql.types import StructField
> json = {
>             "name": "name",
>             "type": "string"
>         }
> StructField.fromJson(json)
> >>
> Traceback (most recent call last):
>   File "code.py", line 90, in runcode
>     exec(code, self.locals)
>   File "", line 1, in 
>   File "pyspark/sql/types.py", line 583, in fromJson
>     json["nullable"],
> KeyError: 'nullable' {code}
>  
> Proposed coding solution:
> Instead use indexes for getting from a dictionary, it would be better to use 
> .get
> {code:python}
> def fromJson(cls, json: Dict[str, Any]) -> "StructField":
> return StructField(
> json["name"],
> _parse_datatype_json_value(json["type"]),
> json.get("nullable"),
> json.get("metadata"),
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40813) Add limit and offset to Connect DSL

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40813:


Assignee: Rui Wang

> Add limit and offset to Connect DSL
> ---
>
> Key: SPARK-40813
> URL: https://issues.apache.org/jira/browse/SPARK-40813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40813) Add limit and offset to Connect DSL

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40813.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38275
[https://github.com/apache/spark/pull/38275]

> Add limit and offset to Connect DSL
> ---
>
> Key: SPARK-40813
> URL: https://issues.apache.org/jira/browse/SPARK-40813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40839) [Python] Implement `DataFrame.sample`

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620086#comment-17620086
 ] 

Apache Spark commented on SPARK-40839:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38310

> [Python] Implement `DataFrame.sample`
> -
>
> Key: SPARK-40839
> URL: https://issues.apache.org/jira/browse/SPARK-40839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40839) [Python] Implement `DataFrame.sample`

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40839:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> [Python] Implement `DataFrame.sample`
> -
>
> Key: SPARK-40839
> URL: https://issues.apache.org/jira/browse/SPARK-40839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40839) [Python] Implement `DataFrame.sample`

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40839:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> [Python] Implement `DataFrame.sample`
> -
>
> Key: SPARK-40839
> URL: https://issues.apache.org/jira/browse/SPARK-40839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40779) Fix `corrwith` to work properly with different anchor.

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40779:


Assignee: Haejoon Lee

> Fix `corrwith` to work properly with different anchor.
> --
>
> Key: SPARK-40779
> URL: https://issues.apache.org/jira/browse/SPARK-40779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> DataFrame.corrwith is not working properly when different anchor in pandas 
> 1.5.0
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40779) Fix `corrwith` to work properly with different anchor.

2022-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40779.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38292
[https://github.com/apache/spark/pull/38292]

> Fix `corrwith` to work properly with different anchor.
> --
>
> Key: SPARK-40779
> URL: https://issues.apache.org/jira/browse/SPARK-40779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> DataFrame.corrwith is not working properly when different anchor in pandas 
> 1.5.0
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40845) Add template support for SPARK_GPG_KEY

2022-10-19 Thread Yikun Jiang (Jira)

Yikun Jiang created SPARK-40845:
---

 Summary: Add template support for SPARK_GPG_KEY
 Key: SPARK-40845
 URL: https://issues.apache.org/jira/browse/SPARK-40845
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Docker
Affects Versions: 3.4.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40823) Connect Proto should carry unparsed identifiers

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620072#comment-17620072
 ] 

Apache Spark commented on SPARK-40823:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38309

> Connect Proto should carry unparsed identifiers
> ---
>
> Key: SPARK-40823
> URL: https://issues.apache.org/jira/browse/SPARK-40823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40823) Connect Proto should carry unparsed identifiers

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620070#comment-17620070
 ] 

Apache Spark commented on SPARK-40823:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38309

> Connect Proto should carry unparsed identifiers
> ---
>
> Key: SPARK-40823
> URL: https://issues.apache.org/jira/browse/SPARK-40823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40823) Connect Proto should carry unparsed identifiers

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620067#comment-17620067
 ] 

Apache Spark commented on SPARK-40823:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38308

> Connect Proto should carry unparsed identifiers
> ---
>
> Key: SPARK-40823
> URL: https://issues.apache.org/jira/browse/SPARK-40823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40844) Flip the default value of Kafka offset fetching config

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40844:


Assignee: (was: Apache Spark)

> Flip the default value of Kafka offset fetching config
> --
>
> Key: SPARK-40844
> URL: https://issues.apache.org/jira/browse/SPARK-40844
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Discussion thread: 
> [https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40844) Flip the default value of Kafka offset fetching config

2022-10-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620051#comment-17620051
 ] 

Apache Spark commented on SPARK-40844:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/38306

> Flip the default value of Kafka offset fetching config
> --
>
> Key: SPARK-40844
> URL: https://issues.apache.org/jira/browse/SPARK-40844
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Discussion thread: 
> [https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40844) Flip the default value of Kafka offset fetching config

2022-10-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40844:


Assignee: Apache Spark

> Flip the default value of Kafka offset fetching config
> --
>
> Key: SPARK-40844
> URL: https://issues.apache.org/jira/browse/SPARK-40844
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> Discussion thread: 
> [https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

78 matches

Mail list logo