[jira] [Updated] (SPARK-45283) Make StatusTrackerSuite less fragile
[ https://issues.apache.org/jira/browse/SPARK-45283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45283: --- Labels: pull-request-available (was: ) > Make StatusTrackerSuite less fragile > > > Key: SPARK-45283 > URL: https://issues.apache.org/jira/browse/SPARK-45283 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Minor > Labels: pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > It's discovered from [Github > Actions|https://github.com/xiongbo-sjtu/spark/actions/runs/6270601155/job/17028788767] > that StatusTrackerSuite can run into random failures because > FutureAction.jobIds is not a sorted sequence (by design), as shown in the > following stack trace (highlighted in red). The proposed fix is to update > the unit test to remove the nondeterministic behavior. > {quote}[info] StatusTrackerSuite: > [info] - basic status API usage (99 milliseconds) > [info] - getJobIdsForGroup() (56 milliseconds) > [info] - getJobIdsForGroup() with takeAsync() (48 milliseconds) > [info] - getJobIdsForGroup() with takeAsync() across multiple partitions (58 > milliseconds) > [info] - getJobIdsForTag() *** FAILED *** (10 seconds, 77 milliseconds) > {color:#FF}[info] The code passed to eventually never returned normally. > Attempted 651 times over 10.00505994401 seconds. Last failure message: > Set(3, 2, 1) was not equal to Set(1, 2). (StatusTrackerSuite.scala:148){color} > [info] org.scalatest.exceptions.TestFailedDueToTimeoutException: > [info] at > org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:219) > [info] at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226) > [info] at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:348) > [info] at > org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:347) > [info] at > org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457) > [info] at > org.apache.spark.StatusTrackerSuite.$anonfun$new$21(StatusTrackerSuite.scala:148) > [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > [info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > [info] at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > [info] at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > [info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:333) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at >
[jira] [Updated] (SPARK-45387) Partition key filter cannot be pushed down when using cast
[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-45387: Target Version/s: (was: 3.1.1, 3.3.0) > Partition key filter cannot be pushed down when using cast > -- > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0 >Reporter: TianyiMa >Priority: Critical > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source, but if the filter condition is number, e.g. dt = 123, that cannot be > pushed down to data source, causing spark to pull all of that table's > partition meta data to client, which is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45390) Remove `distutils` usage
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45390: -- Description: [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in {{3.10}} and dropped in {{3.12}} in favor of {{packaging}} package. > Remove `distutils` usage > > > Key: SPARK-45390 > URL: https://issues.apache.org/jira/browse/SPARK-45390 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in > {{3.10}} and dropped in {{3.12}} in favor of {{packaging}} package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45390) Remove `distutils` usage
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45390: -- Description: [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in Python {{3.10}} and dropped in Python {{3.12}} in favor of {{packaging}} package. (was: [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in {{3.10}} and dropped in {{3.12}} in favor of {{packaging}} package.) > Remove `distutils` usage > > > Key: SPARK-45390 > URL: https://issues.apache.org/jira/browse/SPARK-45390 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > [PEP-632|https://peps.python.org/pep-0632] deprecated {{distutils}} module in > Python {{3.10}} and dropped in Python {{3.12}} in favor of {{packaging}} > package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45390) Remove `distutils` usage
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45390: -- Summary: Remove `distutils` usage (was: Remove distutils usage) > Remove `distutils` usage > > > Key: SPARK-45390 > URL: https://issues.apache.org/jira/browse/SPARK-45390 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45390) Remove distutils usage
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45390: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Remove distutils usage > -- > > Key: SPARK-45390 > URL: https://issues.apache.org/jira/browse/SPARK-45390 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45390) Remove distutils usage
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45390: - Assignee: Dongjoon Hyun > Remove distutils usage > -- > > Key: SPARK-45390 > URL: https://issues.apache.org/jira/browse/SPARK-45390 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45390) Remove distutils usage
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45390: --- Labels: pull-request-available (was: ) > Remove distutils usage > -- > > Key: SPARK-45390 > URL: https://issues.apache.org/jira/browse/SPARK-45390 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45390) Remove distutils usage
Dongjoon Hyun created SPARK-45390: - Summary: Remove distutils usage Key: SPARK-45390 URL: https://issues.apache.org/jira/browse/SPARK-45390 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45389) Correct MetaException matching rule on getting partition metadata
[ https://issues.apache.org/jira/browse/SPARK-45389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45389: --- Labels: pull-request-available (was: ) > Correct MetaException matching rule on getting partition metadata > - > > Key: SPARK-45389 > URL: https://issues.apache.org/jira/browse/SPARK-45389 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45389) Correct MetaException matching rule on getting partition metadata
Cheng Pan created SPARK-45389: - Summary: Correct MetaException matching rule on getting partition metadata Key: SPARK-45389 URL: https://issues.apache.org/jira/browse/SPARK-45389 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.3 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36321) Do not fail application in kubernetes if name is too long
[ https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36321. --- Fix Version/s: 3.3.1 Resolution: Duplicate > Do not fail application in kubernetes if name is too long > - > > Key: SPARK-36321 > URL: https://issues.apache.org/jira/browse/SPARK-36321 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1 > > > If we have a long spark app name and start with k8s master, we will get the > execption. > {code:java} > java.lang.IllegalArgumentException: > 'a-89fe2f7ae71c3570' in > spark.kubernetes.executor.podNamePrefix is invalid. must conform > https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names > and the value length <= 47 > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) > at org.apache.spark.SparkConf.get(SparkConf.scala:261) > at > org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) > at > org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) > {code} > Use app name as the executor pod name is the Spark internal behavior and we > should not make application failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-36321) Do not fail application in kubernetes if name is too long
[ https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-36321. - > Do not fail application in kubernetes if name is too long > - > > Key: SPARK-36321 > URL: https://issues.apache.org/jira/browse/SPARK-36321 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1 > > > If we have a long spark app name and start with k8s master, we will get the > execption. > {code:java} > java.lang.IllegalArgumentException: > 'a-89fe2f7ae71c3570' in > spark.kubernetes.executor.podNamePrefix is invalid. must conform > https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names > and the value length <= 47 > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) > at org.apache.spark.SparkConf.get(SparkConf.scala:261) > at > org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) > at > org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) > {code} > Use app name as the executor pod name is the Spark internal behavior and we > should not make application failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36321) Do not fail application in kubernetes if name is too long
[ https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770752#comment-17770752 ] Dongjoon Hyun commented on SPARK-36321: --- Yes, [~wypoon] > Do not fail application in kubernetes if name is too long > - > > Key: SPARK-36321 > URL: https://issues.apache.org/jira/browse/SPARK-36321 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > Labels: pull-request-available > > If we have a long spark app name and start with k8s master, we will get the > execption. > {code:java} > java.lang.IllegalArgumentException: > 'a-89fe2f7ae71c3570' in > spark.kubernetes.executor.podNamePrefix is invalid. must conform > https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names > and the value length <= 47 > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) > at org.apache.spark.SparkConf.get(SparkConf.scala:261) > at > org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) > at > org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) > {code} > Use app name as the executor pod name is the Spark internal behavior and we > should not make application failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45227) Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-45227: Fix Version/s: 3.3.4 > Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an > executor process randomly gets stuck > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, pull-request-available, > race-condition, stuck, threadsafe > Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4 > > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} * Another task that's sent to the executor but didn't get launched > since the single-threaded dispatcher was stuck (presumably in an "infinite > loop" as explained later). > {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote}* Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at >
[jira] [Updated] (SPARK-45388) Eliminate unnecessary reflection invocation in Hive shim classes
[ https://issues.apache.org/jira/browse/SPARK-45388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45388: --- Labels: pull-request-available (was: ) > Eliminate unnecessary reflection invocation in Hive shim classes > > > Key: SPARK-45388 > URL: https://issues.apache.org/jira/browse/SPARK-45388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45388) Eliminate unnecessary reflection invocation in Hive shim classes
Cheng Pan created SPARK-45388: - Summary: Eliminate unnecessary reflection invocation in Hive shim classes Key: SPARK-45388 URL: https://issues.apache.org/jira/browse/SPARK-45388 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44762) Add more documentation and examples for using job tags for interrupt
[ https://issues.apache.org/jira/browse/SPARK-44762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44762: Assignee: Juliusz Sompolski > Add more documentation and examples for using job tags for interrupt > > > Key: SPARK-44762 > URL: https://issues.apache.org/jira/browse/SPARK-44762 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > > Add documentation to spark.addJob tag with similar examples and explanation > like SparkContext.setJobGroup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44762) Add more documentation and examples for using job tags for interrupt
[ https://issues.apache.org/jira/browse/SPARK-44762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44762. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43182 [https://github.com/apache/spark/pull/43182] > Add more documentation and examples for using job tags for interrupt > > > Key: SPARK-44762 > URL: https://issues.apache.org/jira/browse/SPARK-44762 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add documentation to spark.addJob tag with similar examples and explanation > like SparkContext.setJobGroup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45331) Upgrade Scala to 2.13.12
[ https://issues.apache.org/jira/browse/SPARK-45331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45331. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43185 [https://github.com/apache/spark/pull/43185] > Upgrade Scala to 2.13.12 > > > Key: SPARK-45331 > URL: https://issues.apache.org/jira/browse/SPARK-45331 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > wait SPARK-45330 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45331) Upgrade Scala to 2.13.12
[ https://issues.apache.org/jira/browse/SPARK-45331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45331: - Assignee: Yang Jie > Upgrade Scala to 2.13.12 > > > Key: SPARK-45331 > URL: https://issues.apache.org/jira/browse/SPARK-45331 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > wait SPARK-45330 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45359) DataFrame.{columns, colRegex, explain} should raise exceptions when plan is invalid
[ https://issues.apache.org/jira/browse/SPARK-45359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45359. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43150 [https://github.com/apache/spark/pull/43150] > DataFrame.{columns, colRegex, explain} should raise exceptions when plan is > invalid > --- > > Key: SPARK-45359 > URL: https://issues.apache.org/jira/browse/SPARK-45359 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45359) DataFrame.{columns, colRegex, explain} should raise exceptions when plan is invalid
[ https://issues.apache.org/jira/browse/SPARK-45359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45359: Assignee: Ruifeng Zheng > DataFrame.{columns, colRegex, explain} should raise exceptions when plan is > invalid > --- > > Key: SPARK-45359 > URL: https://issues.apache.org/jira/browse/SPARK-45359 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45358) Remove shim classes for Hive prior 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-45358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45358. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43151 [https://github.com/apache/spark/pull/43151] > Remove shim classes for Hive prior 2.0.0 > > > Key: SPARK-45358 > URL: https://issues.apache.org/jira/browse/SPARK-45358 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45358) Remove shim classes for Hive prior 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-45358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45358: Assignee: Cheng Pan > Remove shim classes for Hive prior 2.0.0 > > > Key: SPARK-45358 > URL: https://issues.apache.org/jira/browse/SPARK-45358 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45387) Partition key filter cannot be pushed down when using cast
TianyiMa created SPARK-45387: Summary: Partition key filter cannot be pushed down when using cast Key: SPARK-45387 URL: https://issues.apache.org/jira/browse/SPARK-45387 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0, 3.3.0, 3.1.2, 3.1.1 Reporter: TianyiMa Suppose we have a partitioned table `table_pt` with partition colum `dt` which is StringType and the table metadata is managed by Hive Metastore, if we filter partition by dt = '123', this filter can be pushed down to data source, but if the filter condition is number, e.g. dt = 123, that cannot be pushed down to data source, causing spark to pull all of that table's partition meta data to client, which is poor of performance if the table has thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE
[ https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45386: --- Labels: pull-request-available (was: ) > Correctness issue when persisting using StorageLevel.NONE > - > > Key: SPARK-45386 > URL: https://issues.apache.org/jira/browse/SPARK-45386 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Emil Ejbyfeldt >Priority: Major > Labels: pull-request-available > > When using spark 3.5.0 this code > {code:java} > import org.apache.spark.storage.StorageLevel > spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code} > incorrectly returns 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE
Emil Ejbyfeldt created SPARK-45386: -- Summary: Correctness issue when persisting using StorageLevel.NONE Key: SPARK-45386 URL: https://issues.apache.org/jira/browse/SPARK-45386 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Emil Ejbyfeldt When using spark 3.5.0 this code {code:java} import org.apache.spark.storage.StorageLevel spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code} incorrectly returns 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
[ https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45385: -- Assignee: Apache Spark (was: Max Gekk) > Deprecate spark.sql.parser.escapedStringLiterals > > > Key: SPARK-45385 > URL: https://issues.apache.org/jira/browse/SPARK-45385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > The config allows to switch to legacy behaviour of Spark 1.6 which is pretty > old and not maintained anymore. Deprecation and removing of the config in the > future versions should improve code maintenance. Also there is an alternative > approach by using RAW string literals in which Spark doesn't especially > handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
[ https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45385: -- Assignee: Max Gekk (was: Apache Spark) > Deprecate spark.sql.parser.escapedStringLiterals > > > Key: SPARK-45385 > URL: https://issues.apache.org/jira/browse/SPARK-45385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > The config allows to switch to legacy behaviour of Spark 1.6 which is pretty > old and not maintained anymore. Deprecation and removing of the config in the > future versions should improve code maintenance. Also there is an alternative > approach by using RAW string literals in which Spark doesn't especially > handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
[ https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770674#comment-17770674 ] ASF GitHub Bot commented on SPARK-45385: User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/43187 > Deprecate spark.sql.parser.escapedStringLiterals > > > Key: SPARK-45385 > URL: https://issues.apache.org/jira/browse/SPARK-45385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > The config allows to switch to legacy behaviour of Spark 1.6 which is pretty > old and not maintained anymore. Deprecation and removing of the config in the > future versions should improve code maintenance. Also there is an alternative > approach by using RAW string literals in which Spark doesn't especially > handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
[ https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45385: -- Assignee: Max Gekk (was: Apache Spark) > Deprecate spark.sql.parser.escapedStringLiterals > > > Key: SPARK-45385 > URL: https://issues.apache.org/jira/browse/SPARK-45385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > The config allows to switch to legacy behaviour of Spark 1.6 which is pretty > old and not maintained anymore. Deprecation and removing of the config in the > future versions should improve code maintenance. Also there is an alternative > approach by using RAW string literals in which Spark doesn't especially > handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
[ https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45385: -- Assignee: Apache Spark (was: Max Gekk) > Deprecate spark.sql.parser.escapedStringLiterals > > > Key: SPARK-45385 > URL: https://issues.apache.org/jira/browse/SPARK-45385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > The config allows to switch to legacy behaviour of Spark 1.6 which is pretty > old and not maintained anymore. Deprecation and removing of the config in the > future versions should improve code maintenance. Also there is an alternative > approach by using RAW string literals in which Spark doesn't especially > handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
[ https://issues.apache.org/jira/browse/SPARK-45385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770673#comment-17770673 ] ASF GitHub Bot commented on SPARK-45385: User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/43187 > Deprecate spark.sql.parser.escapedStringLiterals > > > Key: SPARK-45385 > URL: https://issues.apache.org/jira/browse/SPARK-45385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > The config allows to switch to legacy behaviour of Spark 1.6 which is pretty > old and not maintained anymore. Deprecation and removing of the config in the > future versions should improve code maintenance. Also there is an alternative > approach by using RAW string literals in which Spark doesn't especially > handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45385) Deprecate spark.sql.parser.escapedStringLiterals
Max Gekk created SPARK-45385: Summary: Deprecate spark.sql.parser.escapedStringLiterals Key: SPARK-45385 URL: https://issues.apache.org/jira/browse/SPARK-45385 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk The config allows to switch to legacy behaviour of Spark 1.6 which is pretty old and not maintained anymore. Deprecation and removing of the config in the future versions should improve code maintenance. Also there is an alternative approach by using RAW string literals in which Spark doesn't especially handle escaped character sequences. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24042) High-order function: zip_with_index
[ https://issues.apache.org/jira/browse/SPARK-24042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770659#comment-17770659 ] ZygD commented on SPARK-24042: -- [~Tagar] has incorrectly linked another issue to this one. Even this is resolved, the other one is not. Can we "unlink" the issue SPARK-23074? > High-order function: zip_with_index > --- > > Key: SPARK-24042 > URL: https://issues.apache.org/jira/browse/SPARK-24042 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Implement function {{zip_with_index(array[, indexFirst])}} that transforms > the input array by encapsulating elements into pairs with indexes indicating > the order. > Examples: > {{zip_with_index(array("d", "a", null, "b")) => > [("d",0),("a",1),(null,2),("b",3)]}} > {{zip_with_index(array("d", "a", null, "b"), true) => > [(0,"d"),(1,"a"),(2,null),(3,"b")]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:58 AM: --- [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. [The linked closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, while this is not. was (Author: JIRAUSER286869): [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. [The linked closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about arrays, while this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:57 AM: --- [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. [The linked closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about arrays, while this is not. was (Author: JIRAUSER286869): [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. The linked closed issue is about arrays, while this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:54 AM: --- [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. The linked closed issue is about arrays, while this is not. was (Author: JIRAUSER286869): The problem is not solved! This was incorrectly closed. [The linked closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM: --- The problem is not solved! This was incorrectly closed. The linked closed issue is about arrays, and this is not. was (Author: JIRAUSER286869): The problem is not solved! This was incorrectly closed. [The linked issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM: --- The problem is not solved! This was incorrectly closed. [The linked closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. was (Author: JIRAUSER286869): The problem is not solved! This was incorrectly closed. The linked closed issue is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD commented on SPARK-23074: -- The problem is not solved! This was incorrectly closed. [The linked issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org