[jira] [Updated] (SPARK-46954) XML: Perf optimizations
[ https://issues.apache.org/jira/browse/SPARK-46954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46954: --- Labels: pull-request-available (was: ) > XML: Perf optimizations > --- > > Key: SPARK-46954 > URL: https://issues.apache.org/jira/browse/SPARK-46954 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
Yu-Jhe Li created SPARK-46957: - Summary: Migrated shuffle data files from the decommissioned node should be removed when job completed Key: SPARK-46957 URL: https://issues.apache.org/jira/browse/SPARK-46957 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Yu-Jhe Li Hi, we have a long-lived Spark application run on a standalone cluster on GCP and we are using spot instances. To reduce the impact of preempted instances, we have enabled node decommission to let the preempted node migrate its shuffle data to other instances before it is deleted by GCP. However, we found the migrated shuffle data from the decommissioned node is never removed. (same behavior on spark-3.5) *Reproduce steps:* 1. Start spark-shell with 3 executors and enable decommission on both driver/worker {code:java} start-worker.sh[3331]: Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 -Dspark.decommission.enabled=true -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master-01.com:7077 {code} {code:java} /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ --total-executor-cores 12 \ --conf spark.decommission.enabled=true \ --conf spark.storage.decommission.enabled=true \ --conf spark.storage.decommission.shuffleBlocks.enabled=true \ --conf spark.storage.decommission.rddBlocks.enabled=true{code} 2. Manually stop 1 worker during execution {code:java} (1 to 10).foreach { i => println(s"start iter $i ...") val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus nec sodales tempor, odio augue euismod ipsum, nec tristique e" val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", "mystr") df.repartition(6).count() System.gc() println(s"finished iter $i, wait 15s for next round") Thread.sleep(15*1000) } System.gc() start iter 1 ... finished iter 1, wait 15s for next round ... {code} 3. Check the migrated shuffle data files on the remaining workers {*}decommissioned node{*}: migrated shuffle file successfully {code:java} less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} {*}remaining shuffle data files on the other workers{*}: the migrated shuffle files are never removed {code:java} 10.67.5.134 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data 10.67.5.139 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/27/shuffle_4_41_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/36/shuffle_4_44_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cd
[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
[ https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Jhe Li updated SPARK-46957: -- Description: Hi, we have a long-lived Spark application run on a standalone cluster on GCP and we are using spot instances. To reduce the impact of preempted instances, we have enabled node decommission to let the preempted node migrate its shuffle data to other instances before it is deleted by GCP. However, we found the migrated shuffle data from the decommissioned node is never removed. (same behavior on spark-3.5) *Reproduce steps:* 1. Start spark-shell with 3 executors and enable decommission on both driver/worker {noformat} start-worker.sh[3331]: Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 -Dspark.decommission.enabled=true -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master-01.com:7077 {noformat} {code:java} start-worker.sh[3331]: Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 -Dspark.decommission.enabled=true -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master-01.com:7077 {code} {code:java} /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ --total-executor-cores 12 \ --conf spark.decommission.enabled=true \ --conf spark.storage.decommission.enabled=true \ --conf spark.storage.decommission.shuffleBlocks.enabled=true \ --conf spark.storage.decommission.rddBlocks.enabled=true{code} 2. Manually stop 1 worker during execution {code:java} (1 to 10).foreach { i => println(s"start iter $i ...") val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus nec sodales tempor, odio augue euismod ipsum, nec tristique e" val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", "mystr") df.repartition(6).count() System.gc() println(s"finished iter $i, wait 15s for next round") Thread.sleep(15*1000) } System.gc() start iter 1 ... finished iter 1, wait 15s for next round ... {code} 3. Check the migrated shuffle data files on the remaining workers {*}decommissioned node{*}: migrated shuffle file successfully {code:java} less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} {*}remaining shuffle data files on the other workers{*}: the migrated shuffle files are never removed {code:java} 10.67.5.134 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data 10.67.5.139 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/27/shuffle_4_41_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/36/shuffle_4_44_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spar
[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
[ https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Jhe Li updated SPARK-46957: -- Description: Hi, we have a long-lived Spark application run on a standalone cluster on GCP and we are using spot instances. To reduce the impact of preempted instances, we have enabled node decommission to let the preempted node migrate its shuffle data to other instances before it is deleted by GCP. However, we found the migrated shuffle data from the decommissioned node is never removed. (same behavior on spark-3.5) *Reproduce steps:* 1. Start spark-shell with 3 executors and enable decommission on both driver/worker {code:java} start-worker.sh[3331]: Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 -Dspark.decommission.enabled=true -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master-01.com:7077 {code} {code:java} /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ --total-executor-cores 12 \ --conf spark.decommission.enabled=true \ --conf spark.storage.decommission.enabled=true \ --conf spark.storage.decommission.shuffleBlocks.enabled=true \ --conf spark.storage.decommission.rddBlocks.enabled=true{code} 2. Manually stop 1 worker during execution {code:java} (1 to 10).foreach { i => println(s"start iter $i ...") val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus nec sodales tempor, odio augue euismod ipsum, nec tristique e" val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", "mystr") df.repartition(6).count() System.gc() println(s"finished iter $i, wait 15s for next round") Thread.sleep(15*1000) } System.gc() start iter 1 ... finished iter 1, wait 15s for next round ... {code} 3. Check the migrated shuffle data files on the remaining workers {*}decommissioned node{*}: migrated shuffle file successfully {code:java} less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} {*}remaining shuffle data files on the other workers{*}: the migrated shuffle files are never removed {code:java} 10.67.5.134 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data 10.67.5.139 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/27/shuffle_4_41_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/36/shuffle_4_44_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/29/shuffle_5_55_0.data {code} was: Hi, we have a long-lived Spark application run on a standalone cluster on GCP and we are using spot instances. To reduce the impact of preempted
[jira] [Updated] (SPARK-46957) Migrated shuffle data files from the decommissioned node should be removed when job completed
[ https://issues.apache.org/jira/browse/SPARK-46957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Jhe Li updated SPARK-46957: -- Description: Hi, we have a long-lived Spark application run on a standalone cluster on GCP and we are using spot instances. To reduce the impact of preempted instances, we have enabled node decommission to let the preempted node migrate its shuffle data to other instances before it is deleted by GCP. However, we found the migrated shuffle data from the decommissioned node is never removed. (same behavior on spark-3.5) *Reproduce steps:* 1. Start spark-shell with 3 executors and enable decommission on both driver/worker {code:java} start-worker.sh[3331]: Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Dspark.worker.cleanup.appDataTtl=1800 -Dspark.decommission.enabled=true -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master-01.com:7077 {code} {code:java} /opt/spark/bin/spark-shell --master spark://master-01.spark.com:7077 \ --total-executor-cores 12 \ --conf spark.decommission.enabled=true \ --conf spark.storage.decommission.enabled=true \ --conf spark.storage.decommission.shuffleBlocks.enabled=true \ --conf spark.storage.decommission.rddBlocks.enabled=true{code} 2. Manually stop 1 worker during execution {code:java} (1 to 10).foreach { i => println(s"start iter $i ...") val longString = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer eget tortor id libero ultricies faucibus nec ac neque. Vivamus ac risus vitae mi efficitur lacinia. Quisque dignissim quam vel tellus placerat, non laoreet elit rhoncus. Nam et magna id dui tempor sagittis. Aliquam erat volutpat. Integer tristique purus ac eros bibendum, at varius velit viverra. Sed eleifend luctus massa, ac accumsan leo feugiat ac. Sed id nisl et enim tristique auctor. Sed vel ante nec leo placerat tincidunt. Ut varius, risus nec sodales tempor, odio augue euismod ipsum, nec tristique e" val df = (1 to 1 * i).map(j => (j, s"${j}_${longString}")).toDF("id", "mystr") df.repartition(6).count() System.gc() println(s"finished iter $i, wait 15s for next round") Thread.sleep(15*1000) } System.gc() start iter 1 ... finished iter 1, wait 15s for next round ... {code} 3. Check the migrated shuffle data files on the remaining workers {*}decommissioned node{*}: migrated shuffle file successfully {code:java} less /mnt/spark_work/app-20240202084807-0003/1/stdout | grep 'Migrated ' 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_41 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_38 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_47 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_4_44 to BlockManagerId(2, 10.67.5.139, 35949, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_52 to BlockManagerId(0, 10.67.5.134, 36175, None) 24/02/02 08:48:53 INFO BlockManagerDecommissioner: Migrated migrate_shuffle_5_55 to BlockManagerId(2, 10.67.5.139, 35949, None) {code} {*}remaining shuffle data files on the other workers{*}: the migrated shuffle files are never removed {code:java} 10.67.5.134 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/13/shuffle_4_47_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/31/shuffle_4_38_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-b25878b3-8b3c-4cff-ba4d-41f6d128da7c/executor-b8f83524-9270-4f35-83ca-ceb13af2b7d1/blockmgr-f05c4d8e-e1a5-4822-a6e9-49be760b67a2/3a/shuffle_5_52_0.data 10.67.5.139 | CHANGED | rc=0 >> -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/27/shuffle_4_41_0.data -rw-r--r-- 1 spark spark 126 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/36/shuffle_4_44_0.data -rw-r--r-- 1 spark spark 32 Feb 2 08:48 /mnt/spark/spark-ab501bec-ddd2-4b82-af3e-f2731066e580/executor-1ca5ad78-1d75-453d-88ab-487d7cdfacb7/blockmgr-f09eb18d-b0e4-48f9-a4ed-5587cef25a16/29/shuffle_5_55_0.data {code} *Expected behavior:* The migrated shuffle data files should be removed after job completed was: Hi, we have a long-lived Spark application run on a stan
[jira] [Commented] (SPARK-20624) SPIP: Add better handling for node shutdown
[ https://issues.apache.org/jira/browse/SPARK-20624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813592#comment-17813592 ] Yu-Jhe Li commented on SPARK-20624: --- Hi, we found the migrated shuffle files from the decommissioned node are never deleted even if the job had been completed for a long time. I have created an issue https://issues.apache.org/jira/browse/SPARK-46957 to address this issue. Can anyone help? > SPIP: Add better handling for node shutdown > --- > > Key: SPARK-20624 > URL: https://issues.apache.org/jira/browse/SPARK-20624 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Holden Karau >Priority: Major > > While we've done some good work with better handling when Spark is choosing > to decommission nodes (SPARK-7955), it might make sense in environments where > we get preempted without our own choice (e.g. YARN over-commit, EC2 spot > instances, GCE Preemptiable instances, etc.) to do something for the data on > the node (or at least not schedule any new tasks). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46949) Support CHAR/VARCHAR through ResolveDefaultColumns
[ https://issues.apache.org/jira/browse/SPARK-46949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46949. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44991 [https://github.com/apache/spark/pull/44991] > Support CHAR/VARCHAR through ResolveDefaultColumns > --- > > Key: SPARK-46949 > URL: https://issues.apache.org/jira/browse/SPARK-46949 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46949) Support CHAR/VARCHAR through ResolveDefaultColumns
[ https://issues.apache.org/jira/browse/SPARK-46949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46949: Assignee: Kent Yao > Support CHAR/VARCHAR through ResolveDefaultColumns > --- > > Key: SPARK-46949 > URL: https://issues.apache.org/jira/browse/SPARK-46949 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46958) FIx coerceDefaultValue when canUpCast
Kent Yao created SPARK-46958: Summary: FIx coerceDefaultValue when canUpCast Key: SPARK-46958 URL: https://issues.apache.org/jira/browse/SPARK-46958 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao ``` create table src(key int, c string DEFAULT date'2018-11-17') using parquet; Time taken: 0.133 seconds spark-sql (default)> desc src; [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46958) FIx coerceDefaultValue when canUpCast
[ https://issues.apache.org/jira/browse/SPARK-46958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46958: - Affects Version/s: 3.5.0 > FIx coerceDefaultValue when canUpCast > - > > Key: SPARK-46958 > URL: https://issues.apache.org/jira/browse/SPARK-46958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > > ``` > create table src(key int, c string DEFAULT date'2018-11-17') using parquet; > Time taken: 0.133 seconds > spark-sql (default)> desc src; > [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. > You hit a bug in Spark or the Spark plugins you use. Please, report this bug > to the corresponding communities or vendors, and provide the full stack trace. > org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase > analysis failed with an internal error. You hit a bug in Spark or the Spark > plugins you use. Please, report this bug to the corresponding communities or > vendors, and provide the full stack trace. > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46946) Supporting broadcast of multiple filtering keys in DynamicPruning
[ https://issues.apache.org/jira/browse/SPARK-46946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46946: --- Assignee: Thang Long Vu > Supporting broadcast of multiple filtering keys in DynamicPruning > - > > Key: SPARK-46946 > URL: https://issues.apache.org/jira/browse/SPARK-46946 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Thang Long Vu >Assignee: Thang Long Vu >Priority: Major > Labels: pull-request-available, releasenotes > > This PR extends `DynamicPruningSubquery` to support broadcasting of multiple > filtering keys (instead of one as before). The majority of the PR is to > simply generalise singularity to plurality. > Note: We actually do not use the multiple filtering keys > `DynamicPruningSubquery` in this PR, we are doing this to make supporting DPP > Null Safe Equality or multiple Equality predicates easier in the future. > In Null Safe Equality JOIN, the JOIN condition `a <=> b` is transformed to > `Coalesce(key1, Literal(key1.dataType)) = Coalesce(key2, > Literal(key2.dataType)) AND IsNull(key1) = IsNull(key2)`. In order to have > the highest pruning efficiency, we broadcast the 2 keys `Coalesce(key, > Literal(key.dataType))` and `IsNull(key)` and use them to prune the other > side at the same time. > Before, the `DynamicPruningSubquery` only has one broadcasting key and we > only supports DPP for one `EqualTo` JOIN predicate, now we are extending the > subquery to multiple broadcasting keys. Please note that DPP has not been > supported for multiple JOIN predicates. > Put it in another way, at the moment, we don't insert a DPP Filter for > multiple JOIN predicates at the same time, only potentially insert a DPP > Filter for a given Equality JOIN predicate. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46946) Supporting broadcast of multiple filtering keys in DynamicPruning
[ https://issues.apache.org/jira/browse/SPARK-46946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46946. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44988 [https://github.com/apache/spark/pull/44988] > Supporting broadcast of multiple filtering keys in DynamicPruning > - > > Key: SPARK-46946 > URL: https://issues.apache.org/jira/browse/SPARK-46946 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Thang Long Vu >Assignee: Thang Long Vu >Priority: Major > Labels: pull-request-available, releasenotes > Fix For: 4.0.0 > > > This PR extends `DynamicPruningSubquery` to support broadcasting of multiple > filtering keys (instead of one as before). The majority of the PR is to > simply generalise singularity to plurality. > Note: We actually do not use the multiple filtering keys > `DynamicPruningSubquery` in this PR, we are doing this to make supporting DPP > Null Safe Equality or multiple Equality predicates easier in the future. > In Null Safe Equality JOIN, the JOIN condition `a <=> b` is transformed to > `Coalesce(key1, Literal(key1.dataType)) = Coalesce(key2, > Literal(key2.dataType)) AND IsNull(key1) = IsNull(key2)`. In order to have > the highest pruning efficiency, we broadcast the 2 keys `Coalesce(key, > Literal(key.dataType))` and `IsNull(key)` and use them to prune the other > side at the same time. > Before, the `DynamicPruningSubquery` only has one broadcasting key and we > only supports DPP for one `EqualTo` JOIN predicate, now we are extending the > subquery to multiple broadcasting keys. Please note that DPP has not been > supported for multiple JOIN predicates. > Put it in another way, at the moment, we don't insert a DPP Filter for > multiple JOIN predicates at the same time, only potentially insert a DPP > Filter for a given Equality JOIN predicate. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46959) CSV reader reads data inconsistently depending on column position
Martin Rueckl created SPARK-46959: - Summary: CSV reader reads data inconsistently depending on column position Key: SPARK-46959 URL: https://issues.apache.org/jira/browse/SPARK-46959 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.1 Reporter: Martin Rueckl Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe !image-2024-02-02-13-05-26-203.png|width=352,height=120! As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46959) CSV reader reads data inconsistently depending on column position
[ https://issues.apache.org/jira/browse/SPARK-46959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Rueckl updated SPARK-46959: -- Description: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. was: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe !image-2024-02-02-13-05-26-203.png|width=352,height=120! As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. > CSV reader reads data inconsistently depending on column position > - > > Key: SPARK-46959 > URL: https://issues.apache.org/jira/browse/SPARK-46959 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Martin Rueckl >Priority: Critical > > Reading the following CSV > {code:java} > "a";"b";"c";"d" > 10;100,00;"Some;String";"ok" > 20;200,00;"";"still ok" > 30;300,00;"also ok";"" > 40;400,00;"";"" {code} > with these options > {code:java} > spark.read > .option("header","true") > .option("sep",";") > .option("encoding","ISO-8859-1") > .option("lineSep","\r\n") > .option("nullValue","") > .option("quote",'"') > .option("escape","") {code} > results in the followin inconsistent dataframe > > As one can see, the quoted empty fields of the last column are not correctly > read as null, whereas it works for column c. > If I recall correctly, this only happens when the "escape" option is set to > an empty string. Not setting it to "" (defaults to "\") seems to not cause > this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46959) CSV reader reads data inconsistently depending on column position
[ https://issues.apache.org/jira/browse/SPARK-46959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Rueckl updated SPARK-46959: -- Description: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. was: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| | | | | | As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. > CSV reader reads data inconsistently depending on column position > - > > Key: SPARK-46959 > URL: https://issues.apache.org/jira/browse/SPARK-46959 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Martin Rueckl >Priority: Critical > > Reading the following CSV > {code:java} > "a";"b";"c";"d" > 10;100,00;"Some;String";"ok" > 20;200,00;"";"still ok" > 30;300,00;"also ok";"" > 40;400,00;"";"" {code} > with these options > {code:java} > spark.read > .option("header","true") > .option("sep",";") > .option("encoding","ISO-8859-1") > .option("lineSep","\r\n") > .option("nullValue","") > .option("quote",'"') > .option("escape","") {code} > results in the followin inconsistent dataframe > > ||a||b||c||d|| > |10|100,00|Some;String|ok| > |20|200,00||still ok| > |30|300,00|also ok|"| > |40|400,00||"| > > > As one can see, the quoted empty fields of the last column are not correctly > read as null, whereas it works for column c. > If I recall correctly, this only happens when the "escape" option is set to > an empty string. Not setting it to "" (defaults to "\") seems to not cause > this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46959) CSV reader reads data inconsistently depending on column position
[ https://issues.apache.org/jira/browse/SPARK-46959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Rueckl updated SPARK-46959: -- Description: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| | | | | | As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. was: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. > CSV reader reads data inconsistently depending on column position > - > > Key: SPARK-46959 > URL: https://issues.apache.org/jira/browse/SPARK-46959 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Martin Rueckl >Priority: Critical > > Reading the following CSV > {code:java} > "a";"b";"c";"d" > 10;100,00;"Some;String";"ok" > 20;200,00;"";"still ok" > 30;300,00;"also ok";"" > 40;400,00;"";"" {code} > with these options > {code:java} > spark.read > .option("header","true") > .option("sep",";") > .option("encoding","ISO-8859-1") > .option("lineSep","\r\n") > .option("nullValue","") > .option("quote",'"') > .option("escape","") {code} > results in the followin inconsistent dataframe > > ||a||b||c||d|| > |10|100,00|Some;String|ok| > |20|200,00||still ok| > |30|300,00|also ok|"| > |40|400,00||"| > | | | | | > > > As one can see, the quoted empty fields of the last column are not correctly > read as null, whereas it works for column c. > If I recall correctly, this only happens when the "escape" option is set to > an empty string. Not setting it to "" (defaults to "\") seems to not cause > this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46959) CSV reader reads data inconsistently depending on column position
[ https://issues.apache.org/jira/browse/SPARK-46959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Rueckl updated SPARK-46959: -- Description: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| As one can see, the quoted empty fields of the last column are not correctly read as null but instead contain a single double quote. It works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. was: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. > CSV reader reads data inconsistently depending on column position > - > > Key: SPARK-46959 > URL: https://issues.apache.org/jira/browse/SPARK-46959 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Martin Rueckl >Priority: Critical > > Reading the following CSV > {code:java} > "a";"b";"c";"d" > 10;100,00;"Some;String";"ok" > 20;200,00;"";"still ok" > 30;300,00;"also ok";"" > 40;400,00;"";"" {code} > with these options > {code:java} > spark.read > .option("header","true") > .option("sep",";") > .option("encoding","ISO-8859-1") > .option("lineSep","\r\n") > .option("nullValue","") > .option("quote",'"') > .option("escape","") {code} > results in the followin inconsistent dataframe > > ||a||b||c||d|| > |10|100,00|Some;String|ok| > |20|200,00||still ok| > |30|300,00|also ok|"| > |40|400,00||"| > As one can see, the quoted empty fields of the last column are not correctly > read as null but instead contain a single double quote. It works for column c. > If I recall correctly, this only happens when the "escape" option is set to > an empty string. Not setting it to "" (defaults to "\") seems to not cause > this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46959) CSV reader reads data inconsistently depending on column position
[ https://issues.apache.org/jira/browse/SPARK-46959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Rueckl updated SPARK-46959: -- Description: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| As one can see, the quoted empty fields of the last column are not correctly read as null but instead contain a single double quote. It works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. I observed this on databricks spark runtime 13.2 (think that is spark 3.4.1). was: Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe ||a||b||c||d|| |10|100,00|Some;String|ok| |20|200,00||still ok| |30|300,00|also ok|"| |40|400,00||"| As one can see, the quoted empty fields of the last column are not correctly read as null but instead contain a single double quote. It works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. > CSV reader reads data inconsistently depending on column position > - > > Key: SPARK-46959 > URL: https://issues.apache.org/jira/browse/SPARK-46959 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Martin Rueckl >Priority: Critical > > Reading the following CSV > {code:java} > "a";"b";"c";"d" > 10;100,00;"Some;String";"ok" > 20;200,00;"";"still ok" > 30;300,00;"also ok";"" > 40;400,00;"";"" {code} > with these options > {code:java} > spark.read > .option("header","true") > .option("sep",";") > .option("encoding","ISO-8859-1") > .option("lineSep","\r\n") > .option("nullValue","") > .option("quote",'"') > .option("escape","") {code} > results in the followin inconsistent dataframe > > ||a||b||c||d|| > |10|100,00|Some;String|ok| > |20|200,00||still ok| > |30|300,00|also ok|"| > |40|400,00||"| > As one can see, the quoted empty fields of the last column are not correctly > read as null but instead contain a single double quote. It works for column c. > If I recall correctly, this only happens when the "escape" option is set to > an empty string. Not setting it to "" (defaults to "\") seems to not cause > this bug. > I observed this on databricks spark runtime 13.2 (think that is spark 3.4.1). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46911) Add deleteIfExists operator to StatefulProcessorHandle
[ https://issues.apache.org/jira/browse/SPARK-46911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-46911. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44903 [https://github.com/apache/spark/pull/44903] > Add deleteIfExists operator to StatefulProcessorHandle > -- > > Key: SPARK-46911 > URL: https://issues.apache.org/jira/browse/SPARK-46911 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Assignee: Eric Marnadi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding the {{deleteIfExists}} method to the {{StatefulProcessorHandle}} in > order to remove state variables from the State Store -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46911) Add deleteIfExists operator to StatefulProcessorHandle
[ https://issues.apache.org/jira/browse/SPARK-46911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-46911: Assignee: Eric Marnadi > Add deleteIfExists operator to StatefulProcessorHandle > -- > > Key: SPARK-46911 > URL: https://issues.apache.org/jira/browse/SPARK-46911 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Assignee: Eric Marnadi >Priority: Major > Labels: pull-request-available > > Adding the {{deleteIfExists}} method to the {{StatefulProcessorHandle}} in > order to remove state variables from the State Store -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42399) CONV() silently overflows returning wrong results
[ https://issues.apache.org/jira/browse/SPARK-42399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-42399: - Labels: correctness pull-request-available (was: pull-request-available) > CONV() silently overflows returning wrong results > - > > Key: SPARK-42399 > URL: https://issues.apache.org/jira/browse/SPARK-42399 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Critical > Labels: correctness, pull-request-available > > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 2.114 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.ansi.enabled = true; > spark.sql.ansi.enabled true > Time taken: 0.068 seconds, Fetched 1 row(s) > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 0.05 seconds, Fetched 1 row(s) > In ANSI mode we should raise an error for sure. > In non ANSI either an error or a NULL maybe be acceptable. > Alternatively, of course, we could consider if we can support arbitrary > domains since the result is a STRING again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42399) CONV() silently overflows returning wrong results
[ https://issues.apache.org/jira/browse/SPARK-42399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-42399: - Affects Version/s: 3.5.0 > CONV() silently overflows returning wrong results > - > > Key: SPARK-42399 > URL: https://issues.apache.org/jira/browse/SPARK-42399 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Serge Rielau >Priority: Critical > Labels: correctness, pull-request-available > > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 2.114 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.ansi.enabled = true; > spark.sql.ansi.enabled true > Time taken: 0.068 seconds, Fetched 1 row(s) > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 0.05 seconds, Fetched 1 row(s) > In ANSI mode we should raise an error for sure. > In non ANSI either an error or a NULL maybe be acceptable. > Alternatively, of course, we could consider if we can support arbitrary > domains since the result is a STRING again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42399) CONV() silently overflows returning wrong results
[ https://issues.apache.org/jira/browse/SPARK-42399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813733#comment-17813733 ] Nicholas Chammas commented on SPARK-42399: -- This issue does indeed appear to be resolved on {{master}} when ANSI mode is enabled: {code:java} >>> spark.sql(f"SELECT CONV('{'f' * 64}', 16, 10) AS >>> result").show(truncate=False) ++ |result | ++ |18446744073709551615| ++ >>> spark.conf.set("spark.sql.ansi.enabled", "true") >>> spark.sql(f"SELECT CONV('{'f' * 64}', 16, 10) AS >>> result").show(truncate=False) Traceback (most recent call last): ... pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003 == SQL (line 1, position 8) == SELECT CONV('', 16, 10) AS result {code} However, there is still a silent overflow when ANSI mode is disabled. The error message suggests this is intended behavior. cc [~gengliang] and [~gurwls223], who resolved SPARK-42427. > CONV() silently overflows returning wrong results > - > > Key: SPARK-42399 > URL: https://issues.apache.org/jira/browse/SPARK-42399 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Serge Rielau >Priority: Critical > Labels: correctness, pull-request-available > > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 2.114 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.ansi.enabled = true; > spark.sql.ansi.enabled true > Time taken: 0.068 seconds, Fetched 1 row(s) > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 0.05 seconds, Fetched 1 row(s) > In ANSI mode we should raise an error for sure. > In non ANSI either an error or a NULL maybe be acceptable. > Alternatively, of course, we could consider if we can support arbitrary > domains since the result is a STRING again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42399) CONV() silently overflows returning wrong results
[ https://issues.apache.org/jira/browse/SPARK-42399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-42399: - Affects Version/s: (was: 3.5.0) > CONV() silently overflows returning wrong results > - > > Key: SPARK-42399 > URL: https://issues.apache.org/jira/browse/SPARK-42399 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Critical > Labels: correctness, pull-request-available > > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 2.114 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.ansi.enabled = true; > spark.sql.ansi.enabled true > Time taken: 0.068 seconds, Fetched 1 row(s) > spark-sql> SELECT > CONV(SUBSTRING('0x', > 3), 16, 10); > 18446744073709551615 > Time taken: 0.05 seconds, Fetched 1 row(s) > In ANSI mode we should raise an error for sure. > In non ANSI either an error or a NULL maybe be acceptable. > Alternatively, of course, we could consider if we can support arbitrary > domains since the result is a STRING again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38167) CSV parsing error when using escape='"'
[ https://issues.apache.org/jira/browse/SPARK-38167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813741#comment-17813741 ] Nicholas Chammas commented on SPARK-38167: -- [~marnixvandenbroek] - Could you link to the bug report you filed with Univocity? cc [~maxgekk] - I believe you have hit some parsing bugs in Univocity recently. > CSV parsing error when using escape='"' > > > Key: SPARK-38167 > URL: https://issues.apache.org/jira/browse/SPARK-38167 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.2.1 > Environment: Pyspark on a single-node Databricks managed Spark 3.1.2 > cluster. >Reporter: Marnix van den Broek >Priority: Major > Labels: correctness, csv, csvparser, data-integrity > > hi all, > When reading CSV files with Spark, I ran into a parsing bug. > {*}The summary{*}: > When > # reading a comma separated, double-quote quoted CSV file using the csv > reader options _escape='"'_ and {_}header=True{_}, > # with a row containing a quoted empty field > # followed by a quoted field starting with a comma and followed by one or > more characters > selecting columns from the dataframe at or after the field described in 3) > gives incorrect and inconsistent results > {*}In detail{*}: > When I instruct Spark to read this CSV file: > > {code:java} > col1,col2 > "",",a" > {code} > > using the CSV reader options escape='"' (unnecessary for the example, > necessary for the files I'm processing) and header=True, I expect the > following result: > > {code:java} > spark.read.csv(path, escape='"', header=True).show() > > +++ > |col1|col2| > +++ > |null| ,a| > +++ {code} > > Spark does yield this result, so far so good. However, when I select col2 > from the dataframe, Spark yields an incorrect result: > > {code:java} > spark.read.csv(path, escape='"', header=True).select('col2').show() > > ++ > |col2| > ++ > | a"| > ++{code} > > If you run this example with more columns in the file, and more commas in the > field, e.g. ",,,a", the problem compounds, as Spark shifts many values to > the right, causing unexpected and incorrect results. The inconsistency > between both methods surprised me, as it implies the parsing is evaluated > differently between both methods. > I expect the bug to be located in the quote-balancing and un-escaping methods > of the csv parser, but I can't find where that code is located in the code > base. I'd be happy to take a look at it if anyone can point me where it is. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45786) Inaccurate Decimal multiplication and division results
[ https://issues.apache.org/jira/browse/SPARK-45786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813766#comment-17813766 ] Nicholas Chammas commented on SPARK-45786: -- [~kazuyukitanimura] - I'm just curious: How did you find this bug? Was it something you stumbled on by accident or did you search for it using something like a fuzzer? > Inaccurate Decimal multiplication and division results > -- > > Key: SPARK-45786 > URL: https://issues.apache.org/jira/browse/SPARK-45786 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.4, 3.3.3, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > Decimal multiplication and division results may be inaccurate due to rounding > issues. > h2. Multiplication: > {code:scala} > scala> sql("select -14120025096157587712113961295153.858047 * > -0.4652").show(truncate=false) > ++ > > |(-14120025096157587712113961295153.858047 * -0.4652)| > ++ > |6568635674732509803675414794505.574764 | > ++ > {code} > The correct answer is > {quote}6568635674732509803675414794505.574763 > {quote} > Please note that the last digit is 3 instead of 4 as > > {code:scala} > scala> > java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652")) > val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644 > {code} > Since the factional part .574763 is followed by 4644, it should not be > rounded up. > h2. Division: > {code:scala} > scala> sql("select -0.172787979 / > 533704665545018957788294905796.5").show(truncate=false) > +-+ > |(-0.172787979 / 533704665545018957788294905796.5)| > +-+ > |-3.237521E-31| > +-+ > {code} > The correct answer is > {quote}-3.237520E-31 > {quote} > Please note that the last digit is 0 instead of 1 as > > {code:scala} > scala> > java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), > 100, java.math.RoundingMode.DOWN) > val res22: java.math.BigDecimal = > -3.237520489418037889998826491401059986665344697406144511563561222578738E-31 > {code} > Since the factional part .237520 is followed by 4894..., it should not be > rounded up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46915) Simplify `UnaryMinus` and align error class
[ https://issues.apache.org/jira/browse/SPARK-46915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-46915: Assignee: BingKun Pan > Simplify `UnaryMinus` and align error class > --- > > Key: SPARK-46915 > URL: https://issues.apache.org/jira/browse/SPARK-46915 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46915) Simplify `UnaryMinus` and align error class
[ https://issues.apache.org/jira/browse/SPARK-46915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46915. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44942 [https://github.com/apache/spark/pull/44942] > Simplify `UnaryMinus` and align error class > --- > > Key: SPARK-46915 > URL: https://issues.apache.org/jira/browse/SPARK-46915 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40549) PYSPARK: Observation computes the wrong results when using `corr` function
[ https://issues.apache.org/jira/browse/SPARK-40549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813780#comment-17813780 ] Nicholas Chammas commented on SPARK-40549: -- I think this is just a consequence of floating point arithmetic being imprecise. {code:python} >>> for i in range(10): ... o = Observation(f"test_{i}") ... df_o = df.observe(o, F.corr("id", "id2")) ... df_o.count() ... print(o.get) ... {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0002} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0002} {'corr(id, id2)': 0.} {'corr(id, id2)': 1.0} {code} Unfortunately, {{corr}} seems to convert to float internally, so even if you give it decimals you will get a similar result: {code:python} >>> from decimal import Decimal >>> import pyspark.sql.functions as F >>> >>> df = spark.createDataFrame( ... [(Decimal(i), Decimal(i * 10)) for i in range(10)], ... schema="id decimal, id2 decimal", ... )for i in range(10): o = Observation(f"test_{i}") df_o = df.observe(o, F.corr("id", "id2")) df_o.count() print(o.get) >>> >>> for i in range(10): ... o = Observation(f"test_{i}") ... df_o = df.observe(o, F.corr("id", "id2")) ... df_o.count() ... print(o.get) ... {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 0.} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0002} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {'corr(id, id2)': 1.0} {code} I don't think there is anything that can be done here. > PYSPARK: Observation computes the wrong results when using `corr` function > --- > > Key: SPARK-40549 > URL: https://issues.apache.org/jira/browse/SPARK-40549 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 > Environment: {code:java} > // lsb_release -a > No LSB modules are available. > Distributor ID: Ubuntu > Description: Ubuntu 22.04.1 LTS > Release: 22.04 > Codename: jammy {code} > {code:java} > // python -V > python 3.10.4 > {code} > {code:java} > // lshw -class cpu > *-cpu > description: CPU product: AMD Ryzen 9 3900X 12-Core Processor > vendor: Advanced Micro Devices [AMD] physical id: f bus info: > cpu@0 version: 23.113.0 serial: Unknown slot: AM4 > size: 2194MHz capacity: 4672MHz width: 64 bits clock: > 100MHz capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae > mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht > syscall nx mmxext fxsr_opt pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl > nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma > cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy > svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit > wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 > cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm > rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves > cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr > rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean > flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif > v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es cpufreq > configuration: cores=12 enabledcores=12 microcode=141561875 threads=24 > {code} >Reporter: Herminio Vazquez >Priority: Major > Labels: correctness > > Minimalistic description of the odd computation results. > When creating a new `Observation` object and computing a simple correlation > function between 2 columns, the results appear to be non-deterministic. > {code:java} > # Init > from pyspark.sql import SparkSession, Observation > import pyspark.sql.functions as F > df = spark.createDataFrame([(float(i), float(i*10),) for i in range(10)], > schema="id double, id2 double") > for i in range(10): > o = Observation(f"test_{i}") > df_o = df.observe(o, F.corr("id", "id2").eqNullSafe(1.0)) > df_o.count() > print(o.get) > # Results > {'(corr(id, id2) <=> 1.0)': False} > {'(corr(id, id2) <=> 1.0)': False} > {'(corr(id, id2) <=> 1.0)': False} > {'(corr(id, id2) <=> 1.0)': True} > {'(corr(id, id2) <=> 1.0)': True} > {'(corr(id, id2) <=> 1.0)': True} > {'(corr(id, id2) <=> 1.0)': True} > {'(corr(id, id2) <=> 1.0)': True} > {'(corr(id, id2) <=> 1.0)': True} > {'(corr(id, id2) <=> 1.0)': False}{code} > -
[jira] [Created] (SPARK-46960) Testing Multiple Input Streams for TransformWithState operator
Eric Marnadi created SPARK-46960: Summary: Testing Multiple Input Streams for TransformWithState operator Key: SPARK-46960 URL: https://issues.apache.org/jira/browse/SPARK-46960 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Eric Marnadi Adding unit tests to ensure multiple input streams are supported for the TransformWithState operator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46961) Adding processorHandle as a Context Variable
Eric Marnadi created SPARK-46961: Summary: Adding processorHandle as a Context Variable Key: SPARK-46961 URL: https://issues.apache.org/jira/browse/SPARK-46961 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Eric Marnadi Adding unit tests to ensure multiple input streams are supported for the TransformWithState operator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46866) Streaming python data source API
[ https://issues.apache.org/jira/browse/SPARK-46866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-46866: --- Issue Type: Epic (was: Improvement) > Streaming python data source API > > > Key: SPARK-46866 > URL: https://issues.apache.org/jira/browse/SPARK-46866 > Project: Spark > Issue Type: Epic > Components: SS >Affects Versions: 3.5.0 >Reporter: Chaoqin Li >Priority: Major > > This is a follow up of https://issues.apache.org/jira/browse/SPARK-44076. The > idea is to enable Python developers to develop streaming data sources in > python. The goal is to make a Python-based API that is simple and easy to > use, thus making Spark more accessible to the wider Python developer > community. > > Design doc: > https://docs.google.com/document/d/1cJ-w1hGPOBFp-5DLmf68sTLsAOwb55oW6SAuuAUFEM4/edit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46962) Implement python worker to run python streaming data source
Chaoqin Li created SPARK-46962: -- Summary: Implement python worker to run python streaming data source Key: SPARK-46962 URL: https://issues.apache.org/jira/browse/SPARK-46962 Project: Spark Issue Type: Improvement Components: SS Affects Versions: 4.0.0 Reporter: Chaoqin Li Implement python worker to run python streaming data source and communicate with JVM through socket. Create a PythonMicrobatchStream to invoke RPC function call -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46963) Verify AQE is not enabled for Structured Streaming
Bo Gao created SPARK-46963: -- Summary: Verify AQE is not enabled for Structured Streaming Key: SPARK-46963 URL: https://issues.apache.org/jira/browse/SPARK-46963 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Bo Gao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46963) Verify AQE is not enabled for Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-46963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46963: --- Labels: pull-request-available (was: ) > Verify AQE is not enabled for Structured Streaming > -- > > Key: SPARK-46963 > URL: https://issues.apache.org/jira/browse/SPARK-46963 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bo Gao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46964) Change the signature of the hllInvalidLgK query execution error to take an integer as 4th argument
Menelaos Karavelas created SPARK-46964: -- Summary: Change the signature of the hllInvalidLgK query execution error to take an integer as 4th argument Key: SPARK-46964 URL: https://issues.apache.org/jira/browse/SPARK-46964 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Menelaos Karavelas The current signature of the {{hllInvalidLgK}} query execution error takes four arguments: # The SQL function (a string). # The minimum possible {{lgk}} value (an integer). # The maximum possible {{lgk}} value (an integer). # The actual invalid {{lgk}} value (a string). There is no meaningful reason for the 4th argument to be a string. This issue is about changing the 4th argument to an integer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46964) Change the signature of the hllInvalidLgK query execution error to take an integer as 4th argument
[ https://issues.apache.org/jira/browse/SPARK-46964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46964: --- Labels: pull-request-available (was: ) > Change the signature of the hllInvalidLgK query execution error to take an > integer as 4th argument > -- > > Key: SPARK-46964 > URL: https://issues.apache.org/jira/browse/SPARK-46964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Menelaos Karavelas >Priority: Trivial > Labels: pull-request-available > > The current signature of the {{hllInvalidLgK}} query execution error takes > four arguments: > # The SQL function (a string). > # The minimum possible {{lgk}} value (an integer). > # The maximum possible {{lgk}} value (an integer). > # The actual invalid {{lgk}} value (a string). > There is no meaningful reason for the 4th argument to be a string. This issue > is about changing the 4th argument to an integer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46964) Change the signature of the hllInvalidLgK query execution error to take an integer as 4th argument
[ https://issues.apache.org/jira/browse/SPARK-46964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-46964. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44995 [https://github.com/apache/spark/pull/44995] > Change the signature of the hllInvalidLgK query execution error to take an > integer as 4th argument > -- > > Key: SPARK-46964 > URL: https://issues.apache.org/jira/browse/SPARK-46964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Menelaos Karavelas >Assignee: Menelaos Karavelas >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > The current signature of the {{hllInvalidLgK}} query execution error takes > four arguments: > # The SQL function (a string). > # The minimum possible {{lgk}} value (an integer). > # The maximum possible {{lgk}} value (an integer). > # The actual invalid {{lgk}} value (a string). > There is no meaningful reason for the 4th argument to be a string. This issue > is about changing the 4th argument to an integer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46964) Change the signature of the hllInvalidLgK query execution error to take an integer as 4th argument
[ https://issues.apache.org/jira/browse/SPARK-46964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-46964: -- Assignee: Menelaos Karavelas > Change the signature of the hllInvalidLgK query execution error to take an > integer as 4th argument > -- > > Key: SPARK-46964 > URL: https://issues.apache.org/jira/browse/SPARK-46964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Menelaos Karavelas >Assignee: Menelaos Karavelas >Priority: Trivial > Labels: pull-request-available > > The current signature of the {{hllInvalidLgK}} query execution error takes > four arguments: > # The SQL function (a string). > # The minimum possible {{lgk}} value (an integer). > # The maximum possible {{lgk}} value (an integer). > # The actual invalid {{lgk}} value (a string). > There is no meaningful reason for the 4th argument to be a string. This issue > is about changing the 4th argument to an integer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46965) Check logType in Utils.getLog
Dongjoon Hyun created SPARK-46965: - Summary: Check logType in Utils.getLog Key: SPARK-46965 URL: https://issues.apache.org/jira/browse/SPARK-46965 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46965) Check logType in Utils.getLog
[ https://issues.apache.org/jira/browse/SPARK-46965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46965: --- Labels: pull-request-available (was: ) > Check logType in Utils.getLog > - > > Key: SPARK-46965 > URL: https://issues.apache.org/jira/browse/SPARK-46965 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46965) Check logType in Utils.getLog
[ https://issues.apache.org/jira/browse/SPARK-46965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46965: - Assignee: Dongjoon Hyun > Check logType in Utils.getLog > - > > Key: SPARK-46965 > URL: https://issues.apache.org/jira/browse/SPARK-46965 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46638) Create API to acquire execution memory for 'eval' and 'terminate' methods
[ https://issues.apache.org/jira/browse/SPARK-46638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel resolved SPARK-46638. Resolution: Won't Fix > Create API to acquire execution memory for 'eval' and 'terminate' methods > - > > Key: SPARK-46638 > URL: https://issues.apache.org/jira/browse/SPARK-46638 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46966) Create API for 'analyze' method to indicate subset of input table columns to select
Daniel created SPARK-46966: -- Summary: Create API for 'analyze' method to indicate subset of input table columns to select Key: SPARK-46966 URL: https://issues.apache.org/jira/browse/SPARK-46966 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46966) Create API for 'analyze' method to indicate subset of input table columns to select
[ https://issues.apache.org/jira/browse/SPARK-46966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46966: --- Labels: pull-request-available (was: ) > Create API for 'analyze' method to indicate subset of input table columns to > select > --- > > Key: SPARK-46966 > URL: https://issues.apache.org/jira/browse/SPARK-46966 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46965) Check logType in Utils.getLog
[ https://issues.apache.org/jira/browse/SPARK-46965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46965. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45006 [https://github.com/apache/spark/pull/45006] > Check logType in Utils.getLog > - > > Key: SPARK-46965 > URL: https://issues.apache.org/jira/browse/SPARK-46965 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46890) CSV fails on a column with default and without enforcing schema
[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-46890: Assignee: Daniel > CSV fails on a column with default and without enforcing schema > --- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-29-13-22-05-326.png > > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46890) CSV fails on a column with default and without enforcing schema
[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46890. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44939 [https://github.com/apache/spark/pull/44939] > CSV fails on a column with default and without enforcing schema > --- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-01-29-13-22-05-326.png > > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44111) Prepare Apache Spark 4.0.0
[ https://issues.apache.org/jira/browse/SPARK-44111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44111: --- Labels: pull-request-available (was: ) > Prepare Apache Spark 4.0.0 > -- > > Key: SPARK-44111 > URL: https://issues.apache.org/jira/browse/SPARK-44111 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > > For now, this issue aims to collect ideas for planning Apache Spark 4.0.0. > We will add more items which will be excluded from Apache Spark 3.5.0 > (Feature Freeze: July 16th, 2023). > {code} > Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3) > Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8) > Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x) > Spark 4: 2024.06 (4.0.0, NEW) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46950) Align `not available codec` error-class
[ https://issues.apache.org/jira/browse/SPARK-46950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-46950: Assignee: BingKun Pan > Align `not available codec` error-class > --- > > Key: SPARK-46950 > URL: https://issues.apache.org/jira/browse/SPARK-46950 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46950) Align `not available codec` error-class
[ https://issues.apache.org/jira/browse/SPARK-46950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46950. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44992 [https://github.com/apache/spark/pull/44992] > Align `not available codec` error-class > --- > > Key: SPARK-46950 > URL: https://issues.apache.org/jira/browse/SPARK-46950 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46967) Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI
[ https://issues.apache.org/jira/browse/SPARK-46967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46967: -- Component/s: Web UI > Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI > - > > Key: SPARK-46967 > URL: https://issues.apache.org/jira/browse/SPARK-46967 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46967) Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI
Dongjoon Hyun created SPARK-46967: - Summary: Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI Key: SPARK-46967 URL: https://issues.apache.org/jira/browse/SPARK-46967 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46968) Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql
Max Gekk created SPARK-46968: Summary: Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql Key: SPARK-46968 URL: https://issues.apache.org/jira/browse/SPARK-46968 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 4.0.0 Replace all UnsupportedOperationException by SparkUnsupportedOperationException in sql/core code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46967) Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI
[ https://issues.apache.org/jira/browse/SPARK-46967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46967: --- Labels: pull-request-available (was: ) > Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI > - > > Key: SPARK-46967 > URL: https://issues.apache.org/jira/browse/SPARK-46967 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46968) Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql
[ https://issues.apache.org/jira/browse/SPARK-46968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-46968: - Description: Replace all UnsupportedOperationException by SparkUnsupportedOperationException in the *sql* code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. (was: Replace all UnsupportedOperationException by SparkUnsupportedOperationException in sql/core code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix.) > Replace UnsupportedOperationException by SparkUnsupportedOperationException > in sql > -- > > Key: SPARK-46968 > URL: https://issues.apache.org/jira/browse/SPARK-46968 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Replace all UnsupportedOperationException by > SparkUnsupportedOperationException in the *sql* code base, and introduce new > legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org