[jira] [Updated] (SPARK-45283) Make StatusTrackerSuite less fragile

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45283:
---
Labels: pull-request-available  (was: )

> Make StatusTrackerSuite less fragile
> 
>
> Key: SPARK-45283
> URL: https://issues.apache.org/jira/browse/SPARK-45283
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It's discovered from [Github 
> Actions|https://github.com/xiongbo-sjtu/spark/actions/runs/6270601155/job/17028788767]
>  that StatusTrackerSuite can run into random failures because 
> FutureAction.jobIds is not a sorted sequence (by design), as shown in the 
> following stack trace (highlighted in red).  The proposed fix is to update 
> the unit test to remove the nondeterministic behavior.
> {quote}[info] StatusTrackerSuite:
> [info] - basic status API usage (99 milliseconds)
> [info] - getJobIdsForGroup() (56 milliseconds)
> [info] - getJobIdsForGroup() with takeAsync() (48 milliseconds)
> [info] - getJobIdsForGroup() with takeAsync() across multiple partitions (58 
> milliseconds)
> [info] - getJobIdsForTag() *** FAILED *** (10 seconds, 77 milliseconds)
> {color:#FF}[info] The code passed to eventually never returned normally. 
> Attempted 651 times over 10.00505994401 seconds. Last failure message: 
> Set(3, 2, 1) was not equal to Set(1, 2). (StatusTrackerSuite.scala:148){color}
> [info] org.scalatest.exceptions.TestFailedDueToTimeoutException:
> [info] at 
> org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:219)
> [info] at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
> [info] at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:348)
> [info] at 
> org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:347)
> [info] at 
> org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
> [info] at 
> org.apache.spark.StatusTrackerSuite.$anonfun$new$21(StatusTrackerSuite.scala:148)
> [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> [info] at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> [info] at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> [info] at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> [info] at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info] at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info] at scala.collection.immutable.List.foreach(List.scala:333)
> [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info] at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info] at org.scalatest.Suite.run(Suite.scala:1114)
> [info] at org.scalatest.Suite.run$(Suite.scala:1096)
> [info] at 
> 

[jira] [Updated] (SPARK-45387) Partition key filter cannot be pushed down when using cast

2023-09-30 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45387:

Target Version/s:   (was: 3.1.1, 3.3.0)

> Partition key filter cannot be pushed down when using cast
> --
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source, but if the filter condition is number, e.g. dt = 123, that cannot be 
> pushed down to data source, causing spark to pull all of that table's 
> partition meta data to client, which is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45390) Remove `distutils` usage

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45390:
--
Description: [PEP-632|https://peps.python.org/pep-0632] deprecated 
{{distutils}} module in {{3.10}} and dropped in {{3.12}} in favor of 
{{packaging}} package.

> Remove `distutils` usage
> 
>
> Key: SPARK-45390
> URL: https://issues.apache.org/jira/browse/SPARK-45390
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in 
> {{3.10}} and dropped in {{3.12}} in favor of {{packaging}} package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45390) Remove `distutils` usage

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45390:
--
Description: [PEP-632|https://peps.python.org/pep-0632] deprecated 
{{distutils}} module in Python {{3.10}} and dropped in Python {{3.12}} in favor 
of {{packaging}} package.  (was: [PEP-632|https://peps.python.org/pep-0632] 
deprecated {{distutils}} module in {{3.10}} and dropped in {{3.12}} in favor of 
{{packaging}} package.)

> Remove `distutils` usage
> 
>
> Key: SPARK-45390
> URL: https://issues.apache.org/jira/browse/SPARK-45390
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in 
> Python {{3.10}} and dropped in Python {{3.12}} in favor of {{packaging}} 
> package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45390) Remove `distutils` usage

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45390:
--
Summary: Remove `distutils` usage  (was: Remove distutils usage)

> Remove `distutils` usage
> 
>
> Key: SPARK-45390
> URL: https://issues.apache.org/jira/browse/SPARK-45390
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45390) Remove distutils usage

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45390:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Remove distutils usage
> --
>
> Key: SPARK-45390
> URL: https://issues.apache.org/jira/browse/SPARK-45390
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45390) Remove distutils usage

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45390:
-

Assignee: Dongjoon Hyun

> Remove distutils usage
> --
>
> Key: SPARK-45390
> URL: https://issues.apache.org/jira/browse/SPARK-45390
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45390) Remove distutils usage

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45390:
---
Labels: pull-request-available  (was: )

> Remove distutils usage
> --
>
> Key: SPARK-45390
> URL: https://issues.apache.org/jira/browse/SPARK-45390
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45390) Remove distutils usage

2023-09-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45390:
-

 Summary: Remove distutils usage
 Key: SPARK-45390
 URL: https://issues.apache.org/jira/browse/SPARK-45390
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45389) Correct MetaException matching rule on getting partition metadata

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45389:
---
Labels: pull-request-available  (was: )

> Correct MetaException matching rule on getting partition metadata
> -
>
> Key: SPARK-45389
> URL: https://issues.apache.org/jira/browse/SPARK-45389
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45389) Correct MetaException matching rule on getting partition metadata

2023-09-30 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-45389:
-

 Summary: Correct MetaException matching rule on getting partition 
metadata
 Key: SPARK-45389
 URL: https://issues.apache.org/jira/browse/SPARK-45389
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.3
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36321) Do not fail application in kubernetes if name is too long

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36321.
---
Fix Version/s: 3.3.1
   Resolution: Duplicate

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-36321) Do not fail application in kubernetes if name is too long

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-36321.
-

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36321) Do not fail application in kubernetes if name is too long

2023-09-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770752#comment-17770752
 ] 

Dongjoon Hyun commented on SPARK-36321:
---

Yes, [~wypoon] 

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>  Labels: pull-request-available
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck

2023-09-30 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-45227:

Fix Version/s: 3.3.4

> Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an 
> executor process randomly gets stuck
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, pull-request-available, 
> race-condition, stuck, threadsafe
> Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote} * Another task that's sent to the executor but didn't get launched 
> since the single-threaded dispatcher was stuck (presumably in an "infinite 
> loop" as explained later).
> {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote}* Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> 

[jira] [Updated] (SPARK-45388) Eliminate unnecessary reflection invocation in Hive shim classes

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45388:
---
Labels: pull-request-available  (was: )

> Eliminate unnecessary reflection invocation in Hive shim classes
> 
>
> Key: SPARK-45388
> URL: https://issues.apache.org/jira/browse/SPARK-45388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45388) Eliminate unnecessary reflection invocation in Hive shim classes

2023-09-30 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-45388:
-

 Summary: Eliminate unnecessary reflection invocation in Hive shim 
classes
 Key: SPARK-45388
 URL: https://issues.apache.org/jira/browse/SPARK-45388
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44762) Add more documentation and examples for using job tags for interrupt

2023-09-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44762:


Assignee: Juliusz Sompolski

> Add more documentation and examples for using job tags for interrupt
> 
>
> Key: SPARK-44762
> URL: https://issues.apache.org/jira/browse/SPARK-44762
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>
> Add documentation to spark.addJob tag with similar examples and explanation 
> like SparkContext.setJobGroup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44762) Add more documentation and examples for using job tags for interrupt

2023-09-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44762.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43182
[https://github.com/apache/spark/pull/43182]

> Add more documentation and examples for using job tags for interrupt
> 
>
> Key: SPARK-44762
> URL: https://issues.apache.org/jira/browse/SPARK-44762
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add documentation to spark.addJob tag with similar examples and explanation 
> like SparkContext.setJobGroup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45331) Upgrade Scala to 2.13.12

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45331.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43185
[https://github.com/apache/spark/pull/43185]

> Upgrade Scala to 2.13.12
> 
>
> Key: SPARK-45331
> URL: https://issues.apache.org/jira/browse/SPARK-45331
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> wait SPARK-45330



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45331) Upgrade Scala to 2.13.12

2023-09-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45331:
-

Assignee: Yang Jie

> Upgrade Scala to 2.13.12
> 
>
> Key: SPARK-45331
> URL: https://issues.apache.org/jira/browse/SPARK-45331
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> wait SPARK-45330



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45359) DataFrame.{columns, colRegex, explain} should raise exceptions when plan is invalid

2023-09-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45359.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43150
[https://github.com/apache/spark/pull/43150]

> DataFrame.{columns, colRegex, explain} should raise exceptions when plan is 
> invalid
> ---
>
> Key: SPARK-45359
> URL: https://issues.apache.org/jira/browse/SPARK-45359
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45359) DataFrame.{columns, colRegex, explain} should raise exceptions when plan is invalid

2023-09-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45359:


Assignee: Ruifeng Zheng

> DataFrame.{columns, colRegex, explain} should raise exceptions when plan is 
> invalid
> ---
>
> Key: SPARK-45359
> URL: https://issues.apache.org/jira/browse/SPARK-45359
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45358) Remove shim classes for Hive prior 2.0.0

2023-09-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45358.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43151
[https://github.com/apache/spark/pull/43151]

> Remove shim classes for Hive prior 2.0.0
> 
>
> Key: SPARK-45358
> URL: https://issues.apache.org/jira/browse/SPARK-45358
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45358) Remove shim classes for Hive prior 2.0.0

2023-09-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45358:


Assignee: Cheng Pan

> Remove shim classes for Hive prior 2.0.0
> 
>
> Key: SPARK-45358
> URL: https://issues.apache.org/jira/browse/SPARK-45358
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45387) Partition key filter cannot be pushed down when using cast

2023-09-30 Thread TianyiMa (Jira)
TianyiMa created SPARK-45387:


 Summary: Partition key filter cannot be pushed down when using cast
 Key: SPARK-45387
 URL: https://issues.apache.org/jira/browse/SPARK-45387
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0, 3.3.0, 3.1.2, 3.1.1
Reporter: TianyiMa


Suppose we have a partitioned table `table_pt` with partition colum `dt` which 
is StringType and the table metadata is managed by Hive Metastore, if we filter 
partition by dt = '123', this filter can be pushed down to data source, but if 
the filter condition is number, e.g. dt = 123, that cannot be pushed down to 
data source, causing spark to pull all of that table's partition meta data to 
client, which is poor of performance if the table has thousands of partitions 
and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45386:
---
Labels: pull-request-available  (was: )

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-09-30 Thread Emil Ejbyfeldt (Jira)
Emil Ejbyfeldt created SPARK-45386:
--

 Summary: Correctness issue when persisting using StorageLevel.NONE
 Key: SPARK-45386
 URL: https://issues.apache.org/jira/browse/SPARK-45386
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0, 4.0.0
Reporter: Emil Ejbyfeldt


When using spark 3.5.0 this code
{code:java}
import org.apache.spark.storage.StorageLevel
spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45385:
--

Assignee: Apache Spark  (was: Max Gekk)

> Deprecate spark.sql.parser.escapedStringLiterals
> 
>
> Key: SPARK-45385
> URL: https://issues.apache.org/jira/browse/SPARK-45385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
> old and not maintained anymore. Deprecation and removing of the config in the 
> future versions should improve code maintenance. Also there is an alternative 
> approach by using RAW string literals in which Spark doesn't especially 
> handle escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45385:
--

Assignee: Max Gekk  (was: Apache Spark)

> Deprecate spark.sql.parser.escapedStringLiterals
> 
>
> Key: SPARK-45385
> URL: https://issues.apache.org/jira/browse/SPARK-45385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
> old and not maintained anymore. Deprecation and removing of the config in the 
> future versions should improve code maintenance. Also there is an alternative 
> approach by using RAW string literals in which Spark doesn't especially 
> handle escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770674#comment-17770674
 ] 

ASF GitHub Bot commented on SPARK-45385:


User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/43187

> Deprecate spark.sql.parser.escapedStringLiterals
> 
>
> Key: SPARK-45385
> URL: https://issues.apache.org/jira/browse/SPARK-45385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
> old and not maintained anymore. Deprecation and removing of the config in the 
> future versions should improve code maintenance. Also there is an alternative 
> approach by using RAW string literals in which Spark doesn't especially 
> handle escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45385:
--

Assignee: Max Gekk  (was: Apache Spark)

> Deprecate spark.sql.parser.escapedStringLiterals
> 
>
> Key: SPARK-45385
> URL: https://issues.apache.org/jira/browse/SPARK-45385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
> old and not maintained anymore. Deprecation and removing of the config in the 
> future versions should improve code maintenance. Also there is an alternative 
> approach by using RAW string literals in which Spark doesn't especially 
> handle escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45385:
--

Assignee: Apache Spark  (was: Max Gekk)

> Deprecate spark.sql.parser.escapedStringLiterals
> 
>
> Key: SPARK-45385
> URL: https://issues.apache.org/jira/browse/SPARK-45385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
> old and not maintained anymore. Deprecation and removing of the config in the 
> future versions should improve code maintenance. Also there is an alternative 
> approach by using RAW string literals in which Spark doesn't especially 
> handle escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770673#comment-17770673
 ] 

ASF GitHub Bot commented on SPARK-45385:


User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/43187

> Deprecate spark.sql.parser.escapedStringLiterals
> 
>
> Key: SPARK-45385
> URL: https://issues.apache.org/jira/browse/SPARK-45385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
> old and not maintained anymore. Deprecation and removing of the config in the 
> future versions should improve code maintenance. Also there is an alternative 
> approach by using RAW string literals in which Spark doesn't especially 
> handle escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals

2023-09-30 Thread Max Gekk (Jira)
Max Gekk created SPARK-45385:


 Summary: Deprecate spark.sql.parser.escapedStringLiterals
 Key: SPARK-45385
 URL: https://issues.apache.org/jira/browse/SPARK-45385
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


The config allows to switch to legacy behaviour of Spark 1.6 which is pretty 
old and not maintained anymore. Deprecation and removing of the config in the 
future versions should improve code maintenance. Also there is an alternative 
approach by using RAW string literals in which Spark doesn't especially handle 
escaped character sequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24042) High-order function: zip_with_index

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770659#comment-17770659
 ] 

ZygD commented on SPARK-24042:
--

[~Tagar]  has incorrectly linked another issue to this one. Even this is 
resolved, the other one is not. Can we "unlink" the issue SPARK-23074?

> High-order function: zip_with_index
> ---
>
> Key: SPARK-24042
> URL: https://issues.apache.org/jira/browse/SPARK-24042
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Implement function {{zip_with_index(array[, indexFirst])}} that transforms 
> the input array by encapsulating elements into pairs with indexes indicating 
> the order.
> Examples:
> {{zip_with_index(array("d", "a", null, "b")) => 
> [("d",0),("a",1),(null,2),("b",3)]}}
> {{zip_with_index(array("d", "a", null, "b"), true) => 
> [(0,"d"),(1,"a"),(2,null),(3,"b")]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:58 AM:
---

[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. [The linked 
closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about 
arrays, while this is not. 


was (Author: JIRAUSER286869):
[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. [The linked 
closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about 
arrays, while this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:57 AM:
---

[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. [The linked 
closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about 
arrays, while this is not. 


was (Author: JIRAUSER286869):
[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. The linked closed 
issue is about arrays, while this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:54 AM:
---

[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. The linked closed 
issue is about arrays, while this is not. 


was (Author: JIRAUSER286869):
The problem is not solved! This was incorrectly closed. [The linked closed 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM:
---

The problem is not solved! This was incorrectly closed. The linked closed issue 
is about arrays, and this is not. 


was (Author: JIRAUSER286869):
The problem is not solved! This was incorrectly closed. [The linked 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM:
---

The problem is not solved! This was incorrectly closed. [The linked closed 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 


was (Author: JIRAUSER286869):
The problem is not solved! This was incorrectly closed. The linked closed issue 
is about arrays, and this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD commented on SPARK-23074:
--

The problem is not solved! This was incorrectly closed. [The linked 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org