[jira] [Updated] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.
[ https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21869: - Fix Version/s: (was: 3.0.0) > A cached Kafka producer should not be closed if any task is using it. > - > > Key: SPARK-21869 > URL: https://issues.apache.org/jira/browse/SPARK-21869 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Shixiong Zhu >Assignee: Gabor Somogyi >Priority: Major > > Right now a cached Kafka producer may be closed if a large task uses it for > more than 10 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.
[ https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992975#comment-16992975 ] Shixiong Zhu commented on SPARK-21869: -- Reopened this. https://github.com/apache/spark/pull/25853 has been reverted. > A cached Kafka producer should not be closed if any task is using it. > - > > Key: SPARK-21869 > URL: https://issues.apache.org/jira/browse/SPARK-21869 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Shixiong Zhu >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > Right now a cached Kafka producer may be closed if a large task uses it for > more than 10 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.
[ https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-21869: -- > A cached Kafka producer should not be closed if any task is using it. > - > > Key: SPARK-21869 > URL: https://issues.apache.org/jira/browse/SPARK-21869 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Shixiong Zhu >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > Right now a cached Kafka producer may be closed if a large task uses it for > more than 10 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30208) A race condition when reading from Kafka in PySpark
Shixiong Zhu created SPARK-30208: Summary: A race condition when reading from Kafka in PySpark Key: SPARK-30208 URL: https://issues.apache.org/jira/browse/SPARK-30208 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.4 Reporter: Jiawen Zhu When using PySpark to read from Kafka, there is a race condition that Spark may use KafkaConsumer in multiple threads at the same time and throw the following error: {code} java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2215) at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2104) at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2059) at org.apache.spark.sql.kafka010.InternalKafkaConsumer.close(KafkaDataConsumer.scala:451) at org.apache.spark.sql.kafka010.KafkaDataConsumer$NonCachedKafkaDataConsumer.release(KafkaDataConsumer.scala:508) at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.close(KafkaSourceRDD.scala:126) at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:66) at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anonfun$compute$3.apply(KafkaSourceRDD.scala:131) at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anonfun$compute$3.apply(KafkaSourceRDD.scala:130) at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:162) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:131) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:131) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:144) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:142) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:142) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:130) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:155) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} When using PySpark, reading from Kafka is actually happening in a separate writer thread rather that the task thread. When a task is early terminated (e.g., there is a limit operator), the task thread may stop the KafkaConsumer when the writer thread is using it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29953) File stream source cleanup options may break a file sink output
[ https://issues.apache.org/jira/browse/SPARK-29953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-29953. -- Fix Version/s: 3.0.0 Assignee: Jungtaek Lim Resolution: Fixed > File stream source cleanup options may break a file sink output > --- > > Key: SPARK-29953 > URL: https://issues.apache.org/jira/browse/SPARK-29953 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > SPARK-20568 added options to file streaming source to clean up processed > files. However, when applying these options to a directory that was written > by a file streaming sink, it will make the directory not queryable any more > because we delete files from the directory but they are still tracked by file > sink logs. > I think we should block the options if the input source is a file streaming > sink path (has "_spark_metadata" folder). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29953) File stream source cleanup options may break a file sink output
[ https://issues.apache.org/jira/browse/SPARK-29953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-29953: - Description: SPARK-20568 added options to file streaming source to clean up processed files. However, when applying these options to a directory that was written by a file streaming sink, it will make the directory not queryable any more because we delete files from the directory but they are still tracked by file sink logs. I think we should block the options if the input source is a file streaming sink path (has "_spark_metadata" folder). was: SPARK-20568 added options to file streaming source to clean up processed files. However, when applying these options to a directory that was written by a file streaming sink, it will make the directory not queryable any more. I think we should block the options if the input source is a file streaming sink path (has "_spark_metadata" folder). > File stream source cleanup options may break a file sink output > --- > > Key: SPARK-29953 > URL: https://issues.apache.org/jira/browse/SPARK-29953 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Major > > SPARK-20568 added options to file streaming source to clean up processed > files. However, when applying these options to a directory that was written > by a file streaming sink, it will make the directory not queryable any more > because we delete files from the directory but they are still tracked by file > sink logs. > I think we should block the options if the input source is a file streaming > sink path (has "_spark_metadata" folder). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29953) File stream source cleanup options may break a file sink output
[ https://issues.apache.org/jira/browse/SPARK-29953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976971#comment-16976971 ] Shixiong Zhu commented on SPARK-29953: -- cc [~kabhwan] > File stream source cleanup options may break a file sink output > --- > > Key: SPARK-29953 > URL: https://issues.apache.org/jira/browse/SPARK-29953 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Major > > SPARK-20568 added options to file streaming source to clean up processed > files. However, when applying these options to a directory that was written > by a file streaming sink, it will make the directory not queryable any more. > I think we should block the options if the input source is a file streaming > sink path (has "_spark_metadata" folder). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29953) File stream source cleanup options may break a file sink output
Shixiong Zhu created SPARK-29953: Summary: File stream source cleanup options may break a file sink output Key: SPARK-29953 URL: https://issues.apache.org/jira/browse/SPARK-29953 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.0.0 Reporter: Shixiong Zhu SPARK-20568 added options to file streaming source to clean up processed files. However, when applying these options to a directory that was written by a file streaming sink, it will make the directory not queryable any more. I think we should block the options if the input source is a file streaming sink path (has "_spark_metadata" folder). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961412#comment-16961412 ] Shixiong Zhu commented on SPARK-28841: -- [~srowen] Yep. ":" is not a valid char in a HDFS path. But it's valid for some other file systems, such as local file system, or S3AFileSystem, and they allow the user to create such files. I pointed out the codes that hit this issue in HDFS-14762. > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961402#comment-16961402 ] Shixiong Zhu commented on SPARK-28841: -- [~srowen] "?" is to trigger glob pattern codes. It matches any chars. That's not a typo. The code I post here is trying to read files that match the pattern "/tmp/test/foo?bar". But if "/tmp/test/foo:bar" exists, it will fail because of HDFS-14762. > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27254) Cleanup complete but becoming invalid output files in ManifestFileCommitProtocol if job is aborted
[ https://issues.apache.org/jira/browse/SPARK-27254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-27254. -- Fix Version/s: 3.0.0 Assignee: Jungtaek Lim Resolution: Fixed > Cleanup complete but becoming invalid output files in > ManifestFileCommitProtocol if job is aborted > -- > > Key: SPARK-27254 > URL: https://issues.apache.org/jira/browse/SPARK-27254 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Minor > Fix For: 3.0.0 > > > ManifestFileCommitProtocol doesn't clean up complete (but will become > invalid) output files when job is aborted. > ManifestFileCommitProtocol doesn't do anything for cleaning up when job is > aborted but just maintains the metadata which list of complete output files > are written. SPARK-27210 addressed for task level cleanup, but it still > doesn't clean up it as job level. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29099) org.apache.spark.sql.catalyst.catalog.CatalogTable.lastAccessTime is not set
[ https://issues.apache.org/jira/browse/SPARK-29099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-29099. -- Resolution: Duplicate > org.apache.spark.sql.catalyst.catalog.CatalogTable.lastAccessTime is not set > > > Key: SPARK-29099 > URL: https://issues.apache.org/jira/browse/SPARK-29099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Shixiong Zhu >Priority: Major > > I noticed that > "org.apache.spark.sql.catalyst.catalog.CatalogTable.lastAccessTime" is always > 0 in my environment. Looks like Spark never updates this field in metastore > when reading a table. This is fine considering the cost to update it when > reading a table is high. > However, "Last Access" in "describe extended" always shows "Thu Jan 01 > 00:00:00 UTC 1970" and this is confusing. Can we show something alternative > to indicate it's not set? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29099) org.apache.spark.sql.catalyst.catalog.CatalogTable.lastAccessTime is not set
Shixiong Zhu created SPARK-29099: Summary: org.apache.spark.sql.catalyst.catalog.CatalogTable.lastAccessTime is not set Key: SPARK-29099 URL: https://issues.apache.org/jira/browse/SPARK-29099 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Shixiong Zhu I noticed that "org.apache.spark.sql.catalyst.catalog.CatalogTable.lastAccessTime" is always 0 in my environment. Looks like Spark never updates this field in metastore when reading a table. This is fine considering the cost to update it when reading a table is high. However, "Last Access" in "describe extended" always shows "Thu Jan 01 00:00:00 UTC 1970" and this is confusing. Can we show something alternative to indicate it's not set? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28976) Use KeyLock to simplify MapOutputTracker.getStatuses
[ https://issues.apache.org/jira/browse/SPARK-28976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-28976. -- Fix Version/s: 3.0.0 Resolution: Fixed > Use KeyLock to simplify MapOutputTracker.getStatuses > > > Key: SPARK-28976 > URL: https://issues.apache.org/jira/browse/SPARK-28976 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28976) Use KeyLock to simplify MapOutputTracker.getStatuses
[ https://issues.apache.org/jira/browse/SPARK-28976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-28976: Assignee: Shixiong Zhu > Use KeyLock to simplify MapOutputTracker.getStatuses > > > Key: SPARK-28976 > URL: https://issues.apache.org/jira/browse/SPARK-28976 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28976) Use KeyLock to simplify MapOutputTracker.getStatuses
Shixiong Zhu created SPARK-28976: Summary: Use KeyLock to simplify MapOutputTracker.getStatuses Key: SPARK-28976 URL: https://issues.apache.org/jira/browse/SPARK-28976 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Shixiong Zhu -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3137) Use finer grained locking in TorrentBroadcast.readObject
[ https://issues.apache.org/jira/browse/SPARK-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-3137. - Fix Version/s: 3.0.0 Resolution: Fixed > Use finer grained locking in TorrentBroadcast.readObject > > > Key: SPARK-3137 > URL: https://issues.apache.org/jira/browse/SPARK-3137 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > > TorrentBroadcast.readObject uses a global lock so only one task can be > fetching the blocks at the same time. > This is not optimal if we are running multiple stages concurrently because > they should be able to independently fetch their own blocks. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28883) Fix a flaky test: ThriftServerQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-28883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921413#comment-16921413 ] Shixiong Zhu commented on SPARK-28883: -- Marked this is a 3.0.0 blocker. We should either fix the issue and re-enable the test or just revert changes before 3.0.0. > Fix a flaky test: ThriftServerQueryTestSuite > > > Key: SPARK-28883 > URL: https://issues.apache.org/jira/browse/SPARK-28883 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Blocker > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109764/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/ > (2 failures) > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109768/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/ > (4 failures) > Error message: > {noformat} > java.sql.SQLException: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:14431: java.net.ConnectException: Connection refused > (Connection refused) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28883) Fix a flaky test: ThriftServerQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-28883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28883: - Priority: Blocker (was: Major) > Fix a flaky test: ThriftServerQueryTestSuite > > > Key: SPARK-28883 > URL: https://issues.apache.org/jira/browse/SPARK-28883 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Blocker > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109764/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/ > (2 failures) > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109768/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/ > (4 failures) > Error message: > {noformat} > java.sql.SQLException: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:14431: java.net.ConnectException: Connection refused > (Connection refused) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28883) Fix a flaky test: ThriftServerQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-28883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28883: - Target Version/s: 3.0.0 > Fix a flaky test: ThriftServerQueryTestSuite > > > Key: SPARK-28883 > URL: https://issues.apache.org/jira/browse/SPARK-28883 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Blocker > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109764/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/ > (2 failures) > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109768/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/ > (4 failures) > Error message: > {noformat} > java.sql.SQLException: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:14431: java.net.ConnectException: Connection refused > (Connection refused) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-3137) Use finer grained locking in TorrentBroadcast.readObject
[ https://issues.apache.org/jira/browse/SPARK-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-3137: - Assignee: Shixiong Zhu > Use finer grained locking in TorrentBroadcast.readObject > > > Key: SPARK-3137 > URL: https://issues.apache.org/jira/browse/SPARK-3137 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > TorrentBroadcast.readObject uses a global lock so only one task can be > fetching the blocks at the same time. > This is not optimal if we are running multiple stages concurrently because > they should be able to independently fetch their own blocks. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files
[ https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-28025: Assignee: Jungtaek Lim > HDFSBackedStateStoreProvider should not leak .crc files > > > Key: SPARK-28025 > URL: https://issues.apache.org/jira/browse/SPARK-28025 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 > Environment: Spark 2.4.3 > Kubernetes 1.11(?) (OpenShift) > StateStore storage on a mounted PVC. Viewed as a local filesystem by the > `FileContextBasedCheckpointFileManager` : > {noformat} > scala> glusterfm.isLocal > res17: Boolean = true{noformat} >Reporter: Gerard Maas >Assignee: Jungtaek Lim >Priority: Major > > The HDFSBackedStateStoreProvider when using the default CheckpointFileManager > is leaving '.crc' files behind. There's a .crc file created for each > `atomicFile` operation of the CheckpointFileManager. > Over time, the number of files becomes very large. It makes the state store > file system constantly increase in size and, in our case, deteriorates the > file system performance. > Here's a sample of one of our spark storage volumes after 2 days of execution > (4 stateful streaming jobs, each on a different sub-dir): > # > {noformat} > Total files in PVC (used for checkpoints and state store) > $find . | wc -l > 431796 > # .crc files > $find . -name "*.crc" | wc -l > 418053{noformat} > With each .crc file taking one storage block, the used storage runs into the > GBs of data. > These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, > shows serious performance deterioration with this large number of files: > {noformat} > DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files
[ https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-28025. -- Fix Version/s: 3.0.0 Resolution: Fixed > HDFSBackedStateStoreProvider should not leak .crc files > > > Key: SPARK-28025 > URL: https://issues.apache.org/jira/browse/SPARK-28025 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 > Environment: Spark 2.4.3 > Kubernetes 1.11(?) (OpenShift) > StateStore storage on a mounted PVC. Viewed as a local filesystem by the > `FileContextBasedCheckpointFileManager` : > {noformat} > scala> glusterfm.isLocal > res17: Boolean = true{noformat} >Reporter: Gerard Maas >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > The HDFSBackedStateStoreProvider when using the default CheckpointFileManager > is leaving '.crc' files behind. There's a .crc file created for each > `atomicFile` operation of the CheckpointFileManager. > Over time, the number of files becomes very large. It makes the state store > file system constantly increase in size and, in our case, deteriorates the > file system performance. > Here's a sample of one of our spark storage volumes after 2 days of execution > (4 stateful streaming jobs, each on a different sub-dir): > # > {noformat} > Total files in PVC (used for checkpoints and state store) > $find . | wc -l > 431796 > # .crc files > $find . -name "*.crc" | wc -l > 418053{noformat} > With each .crc file taking one storage block, the used storage runs into the > GBs of data. > These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, > shows serious performance deterioration with this large number of files: > {noformat} > DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912719#comment-16912719 ] Shixiong Zhu edited comment on SPARK-28841 at 8/21/19 10:35 PM: Okey, at least the following codes are legit but failing. {code} scala> :paste // Entering paste mode (ctrl-D to finish) import sys.process._ "touch /tmp/test/foo:bar".!! spark.read.text("/tmp/test/foo?bar") // Exiting paste mode, now interpreting. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: foo:bar at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:241) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657) at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:245) at org.apache.spark.deploy.SparkHadoopUtil.globPathIfNecessary(SparkHadoopUtil.scala:255) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:549) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:711) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:683) ... 57 elided Caused by: java.net.URISyntaxException: Relative path in absolute URI: foo:bar at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 76 more {code} was (Author: zsxwing): Okey, at least the following codes are legit but failing. {code} scala> :paste // Entering paste mode (ctrl-D to finish) import org.apache.hadoop.fs._ val fs = new Path("/").getFileSystem(spark.sessionState.newHadoopConf()) fs.create(new Path("/tmp/test/foo:bar")).close() // Exiting paste mode, now interpreting. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.fs.ChecksumFileSystem.getChecksumFile(ChecksumFileSystem.java:90) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:402) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778) ... 59 elided Caused by: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 69 more {code} > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.2#803003) --
[jira] [Commented] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912719#comment-16912719 ] Shixiong Zhu commented on SPARK-28841: -- Okey, at least the following codes are legit but failing. {code} scala> :paste // Entering paste mode (ctrl-D to finish) import org.apache.hadoop.fs._ val fs = new Path("/").getFileSystem(spark.sessionState.newHadoopConf()) fs.create(new Path("/tmp/test/foo:bar")).close() spark.read.text("/tmp/test/foo?bar") // Exiting paste mode, now interpreting. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.fs.ChecksumFileSystem.getChecksumFile(ChecksumFileSystem.java:90) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:402) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778) ... 59 elided Caused by: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 69 more {code} > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912719#comment-16912719 ] Shixiong Zhu edited comment on SPARK-28841 at 8/21/19 10:00 PM: Okey, at least the following codes are legit but failing. {code} scala> :paste // Entering paste mode (ctrl-D to finish) import org.apache.hadoop.fs._ val fs = new Path("/").getFileSystem(spark.sessionState.newHadoopConf()) fs.create(new Path("/tmp/test/foo:bar")).close() // Exiting paste mode, now interpreting. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.fs.ChecksumFileSystem.getChecksumFile(ChecksumFileSystem.java:90) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:402) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778) ... 59 elided Caused by: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 69 more {code} was (Author: zsxwing): Okey, at least the following codes are legit but failing. {code} scala> :paste // Entering paste mode (ctrl-D to finish) import org.apache.hadoop.fs._ val fs = new Path("/").getFileSystem(spark.sessionState.newHadoopConf()) fs.create(new Path("/tmp/test/foo:bar")).close() spark.read.text("/tmp/test/foo?bar") // Exiting paste mode, now interpreting. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.fs.ChecksumFileSystem.getChecksumFile(ChecksumFileSystem.java:90) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:402) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778) ... 59 elided Caused by: java.net.URISyntaxException: Relative path in absolute URI: .foo:bar.crc at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 69 more {code} > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28841) Spark cannot read a relative path containing ":"
Shixiong Zhu created SPARK-28841: Summary: Spark cannot read a relative path containing ":" Key: SPARK-28841 URL: https://issues.apache.org/jira/browse/SPARK-28841 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: Shixiong Zhu Reproducer: {code} spark.read.parquet("test:test") {code} Error: {code} java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: test:test {code} This is actually a Hadoop issue since the error is thrown from "new Path("test:test")". I'm creating this ticket to see if we can work around this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912572#comment-16912572 ] Shixiong Zhu commented on SPARK-28841: -- In a second thought, this is probably a limitation of "Path" API since it cannot tell whether "test" is a schema or just a part of a relate path > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28841) Spark cannot read a relative path containing ":"
[ https://issues.apache.org/jira/browse/SPARK-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912572#comment-16912572 ] Shixiong Zhu edited comment on SPARK-28841 at 8/21/19 6:04 PM: --- In a second thought, this is probably a limitation of "Path" API since it cannot tell whether "test" is a schema or just a part of a relate path. was (Author: zsxwing): In a second thought, this is probably a limitation of "Path" API since it cannot tell whether "test" is a schema or just a part of a relate path > Spark cannot read a relative path containing ":" > > > Key: SPARK-28841 > URL: https://issues.apache.org/jira/browse/SPARK-28841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Reproducer: > {code} > spark.read.parquet("test:test") > {code} > Error: > {code} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:test > {code} > This is actually a Hadoop issue since the error is thrown from "new > Path("test:test")". I'm creating this ticket to see if we can work around > this issue in Spark. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28605) Performance regression in SS's foreach
[ https://issues.apache.org/jira/browse/SPARK-28605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911635#comment-16911635 ] Shixiong Zhu commented on SPARK-28605: -- [~kabhwan] Thanks for pointing it out. Yes, we can close this ticket. > Performance regression in SS's foreach > -- > > Key: SPARK-28605 > URL: https://issues.apache.org/jira/browse/SPARK-28605 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > Labels: regresssion > > When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole > partition without reading data. But in ForeachSink v2, due to the API > limitation, it needs to read the whole partition even if all data just gets > dropped. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28605) Performance regression in SS's foreach
[ https://issues.apache.org/jira/browse/SPARK-28605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-28605. -- Resolution: Invalid > Performance regression in SS's foreach > -- > > Key: SPARK-28605 > URL: https://issues.apache.org/jira/browse/SPARK-28605 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > Labels: regresssion > > When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole > partition without reading data. But in ForeachSink v2, due to the API > limitation, it needs to read the whole partition even if all data just gets > dropped. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28650) Fix the guarantee of ForeachWriter
[ https://issues.apache.org/jira/browse/SPARK-28650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-28650. -- Fix Version/s: 2.4.5 Assignee: Jungtaek Lim Resolution: Fixed > Fix the guarantee of ForeachWriter > -- > > Key: SPARK-28650 > URL: https://issues.apache.org/jira/browse/SPARK-28650 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Jungtaek Lim >Priority: Major > Fix For: 2.4.5 > > > Right now ForeachWriter has the following guarantee: > {code} > If the streaming query is being executed in the micro-batch mode, then every > partition > represented by a unique tuple (partitionId, epochId) is guaranteed to have > the same data. > Hence, (partitionId, epochId) can be used to deduplicate and/or > transactionally commit data > and achieve exactly-once guarantees. > {code} > > But we can break this easily actually when restarting a query but a batch is > re-run (e.g., upgrade Spark) > * Source returns a different DataFrame that has a different partition number > (e.g., we start to not create empty partitions in Kafka Source V2). > * A new added optimization rule may change the number of partitions in the > new run. > * Change the file split size in the new run. > Since we cannot guarantee that the same (partitionId, epochId) has the same > data. We should update the document for "ForeachWriter". -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28650) Fix the guarantee of ForeachWriter
[ https://issues.apache.org/jira/browse/SPARK-28650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904085#comment-16904085 ] Shixiong Zhu commented on SPARK-28650: -- Go ahead. I'm not working on this. For the signature of "open", I don't think it's worth to change it since that would break the binary compatibility. > Fix the guarantee of ForeachWriter > -- > > Key: SPARK-28650 > URL: https://issues.apache.org/jira/browse/SPARK-28650 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Right now ForeachWriter has the following guarantee: > {code} > If the streaming query is being executed in the micro-batch mode, then every > partition > represented by a unique tuple (partitionId, epochId) is guaranteed to have > the same data. > Hence, (partitionId, epochId) can be used to deduplicate and/or > transactionally commit data > and achieve exactly-once guarantees. > {code} > > But we can break this easily actually when restarting a query but a batch is > re-run (e.g., upgrade Spark) > * Source returns a different DataFrame that has a different partition number > (e.g., we start to not create empty partitions in Kafka Source V2). > * A new added optimization rule may change the number of partitions in the > new run. > * Change the file split size in the new run. > Since we cannot guarantee that the same (partitionId, epochId) has the same > data. We should update the document for "ForeachWriter". -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Docs Text: All fields of the Structured Streaming's file source schema will be forced to be nullable since Spark 3.0.0. This protects users from corruptions when the specified or inferred schema is not compatible with actual data. If you would like the original behavior, you can set the SQL conf "spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is added to reduce the migration work when upgrading to Spark 3.0.0 and will be removed in future. Please update your codes to work with the new behavior as soon as possible. > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Tomasz Magdanski >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > It can cause corrupted parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-28651: Assignee: Shixiong Zhu > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Tomasz >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > It can cause corrupted parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Labels: release-notes (was: ) > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Tomasz >Priority: Major > Labels: release-notes > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > It can cause corrupted parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Description: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. It can cause corrupted parquet files due to the schema mismatch. was: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. Tcaused corrupted parquet files due to the schema mismatch. > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Tomasz >Priority: Major > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > It can cause corrupted parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Description: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. Tcaused corrupted parquet files due to the schema mismatch. was: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. This issue was reported by Tomasz Magdanski. He found it caused corrupted parquet files due to the schema mismatch. > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Tomasz >Priority: Major > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > Tcaused corrupted parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Reporter: Tomasz (was: Shixiong Zhu) > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Tomasz >Priority: Major > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > This issue was reported by Tomasz Magdanski. He found it caused corrupted > parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Description: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. This issue was reported by Tomasz Magdanski. He found it caused corrupted parquet files due to the schema mismatch. was: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > This issue was reported by Tomasz Magdanski. He found it caused corrupted > parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Description: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. was: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. This issue was rpo > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28651: - Description: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. This issue was rpo was: Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. > Streaming file source doesn't change the schema to nullable automatically > - > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > This issue was rpo -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28651) Streaming file source doesn't change the schema to nullable automatically
Shixiong Zhu created SPARK-28651: Summary: Streaming file source doesn't change the schema to nullable automatically Key: SPARK-28651 URL: https://issues.apache.org/jira/browse/SPARK-28651 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.3 Reporter: Shixiong Zhu Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically. We should make streaming DataFrame consistent with batch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26152) Synchronize Worker Cleanup with Worker Shutdown
[ https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902477#comment-16902477 ] Shixiong Zhu edited comment on SPARK-26152 at 8/7/19 9:07 PM: -- [~ajithshetty] does your PR fix the flaky test? If I read correctly, the failure is because of OOM. RejectedExecutionException is just side effect because OOM triggers an unexpected executor shut down. Should we try to figure out why OOM happens? Maybe we have some memory leak. was (Author: zsxwing): [~ajithshetty] does your PR fix the flaky test? If I read correctly, the failure is because of OOM. RejectedExecutionException is just side effect because OOM triggers the executor shut down. Should we try to figure out why OOM happens? Maybe we have some memory leak. > Synchronize Worker Cleanup with Worker Shutdown > --- > > Key: SPARK-26152 > URL: https://issues.apache.org/jira/browse/SPARK-26152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 3.0.0, 2.4.3 >Reporter: Dongjoon Hyun >Assignee: Ajith S >Priority: Critical > Fix For: 2.4.4, 3.0.0 > > Attachments: Screenshot from 2019-03-11 17-03-40.png > > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627 > (2018-11-16) > {code} > BroadcastSuite: > - Using TorrentBroadcast locally > - Accessing TorrentBroadcast variables from multiple threads > - Accessing TorrentBroadcast variables in a local cluster (encryption = off) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@59428a1 rejected from > java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = > 1, active threads = 1, queued tasks = 0, completed tasks = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63) > at > scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) > at > scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870) > at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106) > at > scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.Threa
[jira] [Commented] (SPARK-26152) Synchronize Worker Cleanup with Worker Shutdown
[ https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902477#comment-16902477 ] Shixiong Zhu commented on SPARK-26152: -- [~ajithshetty] does your PR fix the flaky test? If I read correctly, the failure is because of OOM. RejectedExecutionException is just side effect because OOM triggers the executor shut down. Should we try to figure out why OOM happens? Maybe we have some memory leak. > Synchronize Worker Cleanup with Worker Shutdown > --- > > Key: SPARK-26152 > URL: https://issues.apache.org/jira/browse/SPARK-26152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 3.0.0, 2.4.3 >Reporter: Dongjoon Hyun >Assignee: Ajith S >Priority: Critical > Fix For: 2.4.4, 3.0.0 > > Attachments: Screenshot from 2019-03-11 17-03-40.png > > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627 > (2018-11-16) > {code} > BroadcastSuite: > - Using TorrentBroadcast locally > - Accessing TorrentBroadcast variables from multiple threads > - Accessing TorrentBroadcast variables in a local cluster (encryption = off) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@59428a1 rejected from > java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = > 1, active threads = 1, queued tasks = 0, completed tasks = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63) > at > scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) > at > scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870) > at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106) > at > scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@40a5bf17 rejected from > java.util.concurrent.ThreadPoolExecutor@5a73967[Shutting down, pool size = 1, > active threads = 1, queued tasks = 0, complete
[jira] [Created] (SPARK-28650) Fix the guarantee of ForeachWriter
Shixiong Zhu created SPARK-28650: Summary: Fix the guarantee of ForeachWriter Key: SPARK-28650 URL: https://issues.apache.org/jira/browse/SPARK-28650 Project: Spark Issue Type: Documentation Components: Structured Streaming Affects Versions: 2.4.3 Reporter: Shixiong Zhu Right now ForeachWriter has the following guarantee: {code} If the streaming query is being executed in the micro-batch mode, then every partition represented by a unique tuple (partitionId, epochId) is guaranteed to have the same data. Hence, (partitionId, epochId) can be used to deduplicate and/or transactionally commit data and achieve exactly-once guarantees. {code} But we can break this easily actually when restarting a query but a batch is re-run (e.g., upgrade Spark) * Source returns a different DataFrame that has a different partition number (e.g., we start to not create empty partitions in Kafka Source V2). * A new added optimization rule may change the number of partitions in the new run. * Change the file split size in the new run. Since we cannot guarantee that the same (partitionId, epochId) has the same data. We should update the document for "ForeachWriter". -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28605) Performance regression in SS's foreach
[ https://issues.apache.org/jira/browse/SPARK-28605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899789#comment-16899789 ] Shixiong Zhu commented on SPARK-28605: -- By the way, this is not a critical regression. It's not easy to hit this issue. > Performance regression in SS's foreach > -- > > Key: SPARK-28605 > URL: https://issues.apache.org/jira/browse/SPARK-28605 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > Labels: regresssion > > When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole > partition without reading data. But in ForeachSink v2, due to the API > limitation, it needs to read the whole partition even if all data just gets > dropped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28605) Performance regression in SS's foreach
[ https://issues.apache.org/jira/browse/SPARK-28605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899787#comment-16899787 ] Shixiong Zhu commented on SPARK-28605: -- This is a regression at all 2.4 branches. It's caused by SPARK-23099. > Performance regression in SS's foreach > -- > > Key: SPARK-28605 > URL: https://issues.apache.org/jira/browse/SPARK-28605 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > Labels: regresssion > > When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole > partition without reading data. But in ForeachSink v2, due to the API > limitation, it needs to read the whole partition even if all data just gets > dropped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28605) Performance regression in SS's foreach
[ https://issues.apache.org/jira/browse/SPARK-28605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28605: - Affects Version/s: 2.4.0 2.4.1 2.4.2 > Performance regression in SS's foreach > -- > > Key: SPARK-28605 > URL: https://issues.apache.org/jira/browse/SPARK-28605 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > Labels: regresssion > > When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole > partition without reading data. But in ForeachSink v2, due to the API > limitation, it needs to read the whole partition even if all data just gets > dropped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28574) Allow to config different sizes for event queues
[ https://issues.apache.org/jira/browse/SPARK-28574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-28574. -- Resolution: Fixed > Allow to config different sizes for event queues > > > Key: SPARK-28574 > URL: https://issues.apache.org/jira/browse/SPARK-28574 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yun Zou >Assignee: Yun Zou >Priority: Major > Fix For: 3.0.0 > > > Right now all event queues shard the same size config. We should allow the > user to set a size for a special event queue. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28574) Allow to config different sizes for event queues
[ https://issues.apache.org/jira/browse/SPARK-28574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-28574: Assignee: Yun Zou > Allow to config different sizes for event queues > > > Key: SPARK-28574 > URL: https://issues.apache.org/jira/browse/SPARK-28574 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yun Zou >Assignee: Yun Zou >Priority: Major > Fix For: 3.0.0 > > > Right now all event queues shard the same size config. We should allow the > user to set a size for a special event queue. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28605) Performance regression in SS's foreach
[ https://issues.apache.org/jira/browse/SPARK-28605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28605: - Labels: regresssion (was: ) > Performance regression in SS's foreach > -- > > Key: SPARK-28605 > URL: https://issues.apache.org/jira/browse/SPARK-28605 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > Labels: regresssion > > When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole > partition without reading data. But in ForeachSink v2, due to the API > limitation, it needs to read the whole partition even if all data just gets > dropped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28605) Performance regression in SS's foreach
Shixiong Zhu created SPARK-28605: Summary: Performance regression in SS's foreach Key: SPARK-28605 URL: https://issues.apache.org/jira/browse/SPARK-28605 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.3 Reporter: Shixiong Zhu When "ForeachWriter.open" return "false", ForeachSink v1 will skip the whole partition without reading data. But in ForeachSink v2, due to the API limitation, it needs to read the whole partition even if all data just gets dropped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure
[ https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28556: - Docs Text: In Spark 3.0, the type of "error" parameter in the "org.apache.spark.sql.util.QueryExecutionListener.onFailure" method is changed to "java.lang.Throwable" from "java.lang.Exception" to accept more types of failures such as "java.lang.Error" and its subclasses. (was: In Spark 3.0, the type of "error" parameter in the "org.apache.spark.sql.util.QueryExecutionListener.onFailure" method is changed to "Throwable" from "Exception" to accept more types of failures such as java.lang.Error and its subclasses.) > Error should also be sent to QueryExecutionListener.onFailure > - > > Key: SPARK-28556 > URL: https://issues.apache.org/jira/browse/SPARK-28556 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Right now Error is not sent to QueryExecutionListener.onFailure. If there is > any Error when running a query, QueryExecutionListener.onFailure cannot be > triggered. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure
[ https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28556: - Docs Text: In Spark 3.0, the type of "error" parameter in the "org.apache.spark.sql.util.QueryExecutionListener.onFailure" method is changed to "Throwable" from "Exception" to accept more types of failures such as java.lang.Error and its subclasses. > Error should also be sent to QueryExecutionListener.onFailure > - > > Key: SPARK-28556 > URL: https://issues.apache.org/jira/browse/SPARK-28556 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Right now Error is not sent to QueryExecutionListener.onFailure. If there is > any Error when running a query, QueryExecutionListener.onFailure cannot be > triggered. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure
[ https://issues.apache.org/jira/browse/SPARK-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28556: - Labels: release-notes (was: ) > Error should also be sent to QueryExecutionListener.onFailure > - > > Key: SPARK-28556 > URL: https://issues.apache.org/jira/browse/SPARK-28556 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > > Right now Error is not sent to QueryExecutionListener.onFailure. If there is > any Error when running a query, QueryExecutionListener.onFailure cannot be > triggered. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28556) Error should also be sent to QueryExecutionListener.onFailure
Shixiong Zhu created SPARK-28556: Summary: Error should also be sent to QueryExecutionListener.onFailure Key: SPARK-28556 URL: https://issues.apache.org/jira/browse/SPARK-28556 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Right now Error is not sent to QueryExecutionListener.onFailure. If there is any Error when running a query, QueryExecutionListener.onFailure cannot be triggered. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16754) NPE when defining case class and searching Encoder in the same line
[ https://issues.apache.org/jira/browse/SPARK-16754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893125#comment-16893125 ] Shixiong Zhu commented on SPARK-16754: -- I think prepending `org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)` to every command should fix the problem. I already forget why we didn't do this for Scala 2.11. > NPE when defining case class and searching Encoder in the same line > --- > > Key: SPARK-16754 > URL: https://issues.apache.org/jira/browse/SPARK-16754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.3.0 > Environment: Spark Shell for Scala 2.11 >Reporter: Shixiong Zhu >Priority: Minor > > Reproducer: > {code} > scala> :paste > // Entering paste mode (ctrl-D to finish) > case class TestCaseClass(value: Int) > import spark.implicits._ > Seq(TestCaseClass(1)).toDS().collect() > // Exiting paste mode, now interpreting. > java.lang.RuntimeException: baseClassName: $line14.$read > at > org.apache.spark.sql.catalyst.encoders.OuterScopes$$anonfun$getOuterScope$1.apply(OuterScopes.scala:62) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance$$anonfun$12.apply(objects.scala:251) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance$$anonfun$12.apply(objects.scala:251) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance.doGenCode(objects.scala:251) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$$anonfun$3.apply(GenerateSafeProjection.scala:145) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$$anonfun$3.apply(GenerateSafeProjection.scala:142) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:142) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:36) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:821) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.constructProjection$lzycompute(ExpressionEncoder.scala:258) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.constructProjection(ExpressionEncoder.scala:258) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:289) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$15.apply(Dataset.scala:2218) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$15.apply(Dataset.scala:2218) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2218) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2568) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2217) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:) > at org.apache.spark.sql.Dataset.withCallback(Dataset.
[jira] [Updated] (SPARK-28489) KafkaOffsetRangeCalculator.getRanges may drop offsets
[ https://issues.apache.org/jira/browse/SPARK-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28489: - Affects Version/s: 2.4.0 2.4.1 2.4.2 > KafkaOffsetRangeCalculator.getRanges may drop offsets > - > > Key: SPARK-28489 > URL: https://issues.apache.org/jira/browse/SPARK-28489 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Labels: correctness, dataloss > > KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors. > > This only affects queries using "minPartitions" option. A workaround is just > removing the "minPartitions" option from the query. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28489) KafkaOffsetRangeCalculator.getRanges may drop offsets
[ https://issues.apache.org/jira/browse/SPARK-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28489: - Description: KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors. This only affects queries using "minPartitions" option. A workaround is just removing the "minPartitions" option from the query. was:KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors. > KafkaOffsetRangeCalculator.getRanges may drop offsets > - > > Key: SPARK-28489 > URL: https://issues.apache.org/jira/browse/SPARK-28489 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Labels: correctness, dataloss > > KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors. > > This only affects queries using "minPartitions" option. A workaround is just > removing the "minPartitions" option from the query. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28489) KafkaOffsetRangeCalculator.getRanges may drop offsets
Shixiong Zhu created SPARK-28489: Summary: KafkaOffsetRangeCalculator.getRanges may drop offsets Key: SPARK-28489 URL: https://issues.apache.org/jira/browse/SPARK-28489 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.3 Reporter: Shixiong Zhu Assignee: Shixiong Zhu KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28486) PythonBroadcast may delete the broadcast file while a Python worker still needs it
[ https://issues.apache.org/jira/browse/SPARK-28486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28486: - Issue Type: Bug (was: New Feature) > PythonBroadcast may delete the broadcast file while a Python worker still > needs it > -- > > Key: SPARK-28486 > URL: https://issues.apache.org/jira/browse/SPARK-28486 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Steps to reproduce: > * Run "bin/pyspark --master local[1,1] --conf spark.memory.fraction=0.0001" > to start PySpark > * Run the following codes: > {code:java} > b = sc.broadcast([100]) > sc.parallelize([0],1).map(lambda x: 0 if x == 0 else b.value[0]).collect() > sc._jvm.java.lang.System.gc() > import time > time.sleep(5) > sc._jvm.java.lang.System.gc() > time.sleep(5) > sc.parallelize([1],1).map(lambda x: 0 if x == 0 else b.value[0]).collect() > {code} > * Error: > {code} > IOError: [Errno 2] No such file or directory: > u'.../spark-ee2a0da1-7d2e-48fd-be9a-fdcc89c5076c/broadcast4970491472715621982' > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452) > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588) > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at > org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at > org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at > org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at > org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ... 1 more > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28486) PythonBroadcast may delete the broadcast file while a Python worker still needs it
Shixiong Zhu created SPARK-28486: Summary: PythonBroadcast may delete the broadcast file while a Python worker still needs it Key: SPARK-28486 URL: https://issues.apache.org/jira/browse/SPARK-28486 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 2.4.3 Reporter: Shixiong Zhu Steps to reproduce: * Run "bin/pyspark --master local[1,1] --conf spark.memory.fraction=0.0001" to start PySpark * Run the following codes: {code:java} b = sc.broadcast([100]) sc.parallelize([0],1).map(lambda x: 0 if x == 0 else b.value[0]).collect() sc._jvm.java.lang.System.gc() import time time.sleep(5) sc._jvm.java.lang.System.gc() time.sleep(5) sc.parallelize([1],1).map(lambda x: 0 if x == 0 else b.value[0]).collect() {code} * Error: {code} IOError: [Errno 2] No such file or directory: u'.../spark-ee2a0da1-7d2e-48fd-be9a-fdcc89c5076c/broadcast4970491472715621982' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28456) Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections
[ https://issues.apache.org/jira/browse/SPARK-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-28456: - Description: Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows: {code:java} object FooEncoder { private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]() implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy() } {code} This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs. {code:java} object FooEncoder { private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]() implicit def encoder: Encoder[Foo] = _encoder.makeCopy } {code} was: Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows: {code} object FooEncoder { private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]() implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy() } {code} This PR proposes a new method `copyEncoder` in `Encoder` so that the above codes can be rewritten using public APIs. {code} object FooEncoder { private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]() implicit def encoder: Encoder[Foo] = _encoder.copyEncoder() } {code} Regarding the method name, - Why not use `copy`? It conflicts with `case class`'s copy. - Why not use `clone`? It conflicts with `Object.clone`. Summary: Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections (was: Add a public API `Encoder.copyEncoder` to allow creating Encoder without touching Scala reflections) > Add a public API `Encoder.makeCopy` to allow creating Encoder without > touching Scala reflections > > > Key: SPARK-28456 > URL: https://issues.apache.org/jira/browse/SPARK-28456 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in > multiple `Dataset`s. However, creating an `Encoder` for a complicated class > is slow due to Scala reflections. To reduce the cost of Encoder creation, > right now I usually use the private API `ExpressionEncoder.copy` as follows: > {code:java} > object FooEncoder { > private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]() > implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy() > } > {code} > This PR proposes a new method `makeCopy` in `Encoder` so that the above codes > can be rewritten using public APIs. > {code:java} > object FooEncoder { > private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]() > implicit def encoder: Encoder[Foo] = _encoder.makeCopy > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28456) Add a public API `Encoder.copyEncoder` to allow creating Encoder without touching Scala reflections
Shixiong Zhu created SPARK-28456: Summary: Add a public API `Encoder.copyEncoder` to allow creating Encoder without touching Scala reflections Key: SPARK-28456 URL: https://issues.apache.org/jira/browse/SPARK-28456 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.3 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows: {code} object FooEncoder { private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]() implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy() } {code} This PR proposes a new method `copyEncoder` in `Encoder` so that the above codes can be rewritten using public APIs. {code} object FooEncoder { private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]() implicit def encoder: Encoder[Foo] = _encoder.copyEncoder() } {code} Regarding the method name, - Why not use `copy`? It conflicts with `case class`'s copy. - Why not use `clone`? It conflicts with `Object.clone`. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.
[ https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-20547. -- Resolution: Fixed Fix Version/s: 3.0.0 > ExecutorClassLoader's findClass may not work correctly when a task is > cancelled. > > > Key: SPARK-20547 > URL: https://issues.apache.org/jira/browse/SPARK-20547 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0, 2.2.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 3.0.0 > > > ExecutorClassLoader's findClass may throw some transient exception. For > example, when a task is cancelled, if ExecutorClassLoader is running, you may > see InterruptedException or IOException, even if this class can be loaded. > Then the result of findClass will be cached by JVM, and later when the same > class is being loaded (note: in this case, this class may be still loadable), > it will just throw NoClassDefFoundError. > We should make ExecutorClassLoader retry on transient exceptions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27711) InputFileBlockHolder should be unset at the end of tasks
[ https://issues.apache.org/jira/browse/SPARK-27711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27711: - Component/s: PySpark > InputFileBlockHolder should be unset at the end of tasks > > > Key: SPARK-27711 > URL: https://issues.apache.org/jira/browse/SPARK-27711 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.3 >Reporter: Jose Torres >Priority: Major > > InputFileBlockHolder should be unset at the end of each task. Otherwise the > value of input_file_name() can leak over to other tasks instead of beginning > as empty string. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.
[ https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20547: - Labels: (was: bulk-closed) > ExecutorClassLoader's findClass may not work correctly when a task is > cancelled. > > > Key: SPARK-20547 > URL: https://issues.apache.org/jira/browse/SPARK-20547 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0, 2.2.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > > ExecutorClassLoader's findClass may throw some transient exception. For > example, when a task is cancelled, if ExecutorClassLoader is running, you may > see InterruptedException or IOException, even if this class can be loaded. > Then the result of findClass will be cached by JVM, and later when the same > class is being loaded (note: in this case, this class may be still loadable), > it will just throw NoClassDefFoundError. > We should make ExecutorClassLoader retry on transient exceptions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.
[ https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-20547: -- Assignee: Shixiong Zhu > ExecutorClassLoader's findClass may not work correctly when a task is > cancelled. > > > Key: SPARK-20547 > URL: https://issues.apache.org/jira/browse/SPARK-20547 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.1.0, 2.2.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Labels: bulk-closed > > ExecutorClassLoader's findClass may throw some transient exception. For > example, when a task is cancelled, if ExecutorClassLoader is running, you may > see InterruptedException or IOException, even if this class can be loaded. > Then the result of findClass will be cached by JVM, and later when the same > class is being loaded (note: in this case, this class may be still loadable), > it will just throw NoClassDefFoundError. > We should make ExecutorClassLoader retry on transient exceptions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-11095) Simplify Netty RPC implementation by using a separate thread pool for each endpoint
[ https://issues.apache.org/jira/browse/SPARK-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-11095: -- > Simplify Netty RPC implementation by using a separate thread pool for each > endpoint > --- > > Key: SPARK-11095 > URL: https://issues.apache.org/jira/browse/SPARK-11095 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > The dispatcher class and the inbox class of the current Netty-based RPC > implementation is fairly complicated. It uses a single, shared thread pool to > execute all the endpoints. This is similar to how Akka does actor message > dispatching. The benefit of this design is that this RPC implementation can > support a very large number of endpoints, as they are all multiplexed into a > single thread pool for execution. The downside is the complexity resulting > from synchronization and coordination. > An alternative implementation is to have a separate message queue and thread > pool for each endpoint. The dispatcher simply routes the messages to the > appropriate message queue, and the threads poll the queue for messages to > process. > If the endpoint is single threaded, then the thread pool should contain only > a single thread. If the endpoint supports concurrent execution, then the > thread pool should contain more threads. > Two additional things we need to be careful with are: > 1. An endpoint should only process normal messages after OnStart is called. > This can be done by having the thread that starts the endpoint processing > OnStart. > 2. An endpoint should process OnStop after all normal messages have been > processed. I think this can be done by having a busy loop to spin until the > size of the message queue is 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11095) Simplify Netty RPC implementation by using a separate thread pool for each endpoint
[ https://issues.apache.org/jira/browse/SPARK-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-11095. -- Resolution: Won't Do > Simplify Netty RPC implementation by using a separate thread pool for each > endpoint > --- > > Key: SPARK-11095 > URL: https://issues.apache.org/jira/browse/SPARK-11095 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > The dispatcher class and the inbox class of the current Netty-based RPC > implementation is fairly complicated. It uses a single, shared thread pool to > execute all the endpoints. This is similar to how Akka does actor message > dispatching. The benefit of this design is that this RPC implementation can > support a very large number of endpoints, as they are all multiplexed into a > single thread pool for execution. The downside is the complexity resulting > from synchronization and coordination. > An alternative implementation is to have a separate message queue and thread > pool for each endpoint. The dispatcher simply routes the messages to the > appropriate message queue, and the threads poll the queue for messages to > process. > If the endpoint is single threaded, then the thread pool should contain only > a single thread. If the endpoint supports concurrent execution, then the > thread pool should contain more threads. > Two additional things we need to be careful with are: > 1. An endpoint should only process normal messages after OnStart is called. > This can be done by having the thread that starts the endpoint processing > OnStart. > 2. An endpoint should process OnStop after all normal messages have been > processed. I think this can be done by having a busy loop to spin until the > size of the message queue is 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-17858) Provide option for Spark SQL to skip corrupt files
[ https://issues.apache.org/jira/browse/SPARK-17858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-17858: -- Assignee: Shixiong Zhu > Provide option for Spark SQL to skip corrupt files > -- > > Key: SPARK-17858 > URL: https://issues.apache.org/jira/browse/SPARK-17858 > Project: Spark > Issue Type: Improvement >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > In Spark 2.0, corrupt files will fail a SQL query. However, the user may just > want to skip corrupt files and still run the query. > Another painful thing is the current exception doesn't contain the paths of > corrupt files, makes the user hard to fix their files. It's better to include > the paths in the error message. > Note: In Spark 1.6, Spark SQL always skip corrupt files because of > SPARK-17850. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17858) Provide option for Spark SQL to skip corrupt files
[ https://issues.apache.org/jira/browse/SPARK-17858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-17858. -- Resolution: Duplicate > Provide option for Spark SQL to skip corrupt files > -- > > Key: SPARK-17858 > URL: https://issues.apache.org/jira/browse/SPARK-17858 > Project: Spark > Issue Type: Improvement >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > In Spark 2.0, corrupt files will fail a SQL query. However, the user may just > want to skip corrupt files and still run the query. > Another painful thing is the current exception doesn't contain the paths of > corrupt files, makes the user hard to fix their files. It's better to include > the paths in the error message. > Note: In Spark 1.6, Spark SQL always skip corrupt files because of > SPARK-17850. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10719) SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10
[ https://issues.apache.org/jira/browse/SPARK-10719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-10719: - Fix Version/s: 2.3.0 > SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10 > -- > > Key: SPARK-10719 > URL: https://issues.apache.org/jira/browse/SPARK-10719 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1, 1.4.1, 1.5.0, 1.6.0 > Environment: Scala 2.10 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > Fix For: 2.3.0 > > > Sometimes the following codes failed > {code} > val conf = new SparkConf().setAppName("sql-memory-leak") > val sc = new SparkContext(conf) > val sqlContext = new SQLContext(sc) > import sqlContext.implicits._ > (1 to 1000).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} > The stack trace is > {code} > Exception in thread "main" java.lang.UnsupportedOperationException: tail of > empty list > at scala.collection.immutable.Nil$.tail(List.scala:339) > at scala.collection.immutable.Nil$.tail(List.scala:334) > at scala.reflect.internal.SymbolTable.popPhase(SymbolTable.scala:172) > at > scala.reflect.internal.Symbols$Symbol.unsafeTypeParams(Symbols.scala:1477) > at scala.reflect.internal.Symbols$TypeSymbol.tpe(Symbols.scala:2777) > at scala.reflect.internal.Mirrors$RootsBase.init(Mirrors.scala:235) > at > scala.reflect.runtime.JavaMirrors$class.createMirror(JavaMirrors.scala:34) > at > scala.reflect.runtime.JavaMirrors$class.runtimeMirror(JavaMirrors.scala:61) > at > scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:12) > at > scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:12) > at SparkApp$$anonfun$main$1.apply$mcJI$sp(SparkApp.scala:16) > at SparkApp$$anonfun$main$1.apply(SparkApp.scala:15) > at SparkApp$$anonfun$main$1.apply(SparkApp.scala:15) > at scala.Function1$class.apply$mcVI$sp(Function1.scala:39) > at > scala.runtime.AbstractFunction1.apply$mcVI$sp(AbstractFunction1.scala:12) > at > scala.collection.parallel.immutable.ParRange$ParRangeIterator.foreach(ParRange.scala:91) > at > scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:975) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) > at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) > at > scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:972) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:172) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:514) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:162) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) > at > scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} > Finally, I found the problem. The codes generated by Scala compiler to find > the implicit TypeTag are not thread safe because of an issue in Scala 2.10: > https://issues.scala-lang.org/browse/SI-6240 > This issue was fixed in Scala 2.11 but not backported to 2.10. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10719) SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10
[ https://issues.apache.org/jira/browse/SPARK-10719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu closed SPARK-10719. Assignee: Shixiong Zhu We can close this since Scala 2.10 has been dropped in Spark 2.3.0. > SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10 > -- > > Key: SPARK-10719 > URL: https://issues.apache.org/jira/browse/SPARK-10719 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1, 1.4.1, 1.5.0, 1.6.0 > Environment: Scala 2.10 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: bulk-closed > > Sometimes the following codes failed > {code} > val conf = new SparkConf().setAppName("sql-memory-leak") > val sc = new SparkContext(conf) > val sqlContext = new SQLContext(sc) > import sqlContext.implicits._ > (1 to 1000).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} > The stack trace is > {code} > Exception in thread "main" java.lang.UnsupportedOperationException: tail of > empty list > at scala.collection.immutable.Nil$.tail(List.scala:339) > at scala.collection.immutable.Nil$.tail(List.scala:334) > at scala.reflect.internal.SymbolTable.popPhase(SymbolTable.scala:172) > at > scala.reflect.internal.Symbols$Symbol.unsafeTypeParams(Symbols.scala:1477) > at scala.reflect.internal.Symbols$TypeSymbol.tpe(Symbols.scala:2777) > at scala.reflect.internal.Mirrors$RootsBase.init(Mirrors.scala:235) > at > scala.reflect.runtime.JavaMirrors$class.createMirror(JavaMirrors.scala:34) > at > scala.reflect.runtime.JavaMirrors$class.runtimeMirror(JavaMirrors.scala:61) > at > scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:12) > at > scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:12) > at SparkApp$$anonfun$main$1.apply$mcJI$sp(SparkApp.scala:16) > at SparkApp$$anonfun$main$1.apply(SparkApp.scala:15) > at SparkApp$$anonfun$main$1.apply(SparkApp.scala:15) > at scala.Function1$class.apply$mcVI$sp(Function1.scala:39) > at > scala.runtime.AbstractFunction1.apply$mcVI$sp(AbstractFunction1.scala:12) > at > scala.collection.parallel.immutable.ParRange$ParRangeIterator.foreach(ParRange.scala:91) > at > scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:975) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) > at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) > at > scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:972) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:172) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:514) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:162) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) > at > scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} > Finally, I found the problem. The codes generated by Scala compiler to find > the implicit TypeTag are not thread safe because of an issue in Scala 2.10: > https://issues.scala-lang.org/browse/SI-6240 > This issue was fixed in Scala 2.11 but not backported to 2.10. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27753) Support SQL expressions for interval parameter in Structured Streaming
Shixiong Zhu created SPARK-27753: Summary: Support SQL expressions for interval parameter in Structured Streaming Key: SPARK-27753 URL: https://issues.apache.org/jira/browse/SPARK-27753 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 2.4.3 Reporter: Shixiong Zhu Structured Streaming has several methods that accept an interval string. It would be great that we can use the parser to parse it so that we can also support SQL expressions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27735) Interval string in upper case is not supported in Trigger
[ https://issues.apache.org/jira/browse/SPARK-27735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27735: - Description: Some APIs in Structured Streaming requires the user to specify an interval. Right now these APIs don't accept upper-case strings. > Interval string in upper case is not supported in Trigger > - > > Key: SPARK-27735 > URL: https://issues.apache.org/jira/browse/SPARK-27735 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Some APIs in Structured Streaming requires the user to specify an interval. > Right now these APIs don't accept upper-case strings. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27735) Interval string in upper case is not supported in Trigger
Shixiong Zhu created SPARK-27735: Summary: Interval string in upper case is not supported in Trigger Key: SPARK-27735 URL: https://issues.apache.org/jira/browse/SPARK-27735 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.3 Reporter: Shixiong Zhu Assignee: Shixiong Zhu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27494) Null keys/values don't work in Kafka source v2
[ https://issues.apache.org/jira/browse/SPARK-27494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27494: - Description: Right now Kafka source v2 doesn't support null keys or values. * When processing a null key, all of the following keys in the same partition will be null. This is a correctness bug. * When processing a null value, it will throw NPE. The workaround is setting sql conf "spark.sql.streaming.disabledV2MicroBatchReaders" to "org.apache.spark.sql.kafka010.KafkaSourceProvider" to use the v1 source. was:Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values. > Null keys/values don't work in Kafka source v2 > -- > > Key: SPARK-27494 > URL: https://issues.apache.org/jira/browse/SPARK-27494 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Assignee: Genmao Yu >Priority: Major > Labels: correctness > Fix For: 3.0.0, 2.4.3 > > > Right now Kafka source v2 doesn't support null keys or values. > * When processing a null key, all of the following keys in the same > partition will be null. This is a correctness bug. > * When processing a null value, it will throw NPE. > The workaround is setting sql conf > "spark.sql.streaming.disabledV2MicroBatchReaders" to > "org.apache.spark.sql.kafka010.KafkaSourceProvider" to use the v1 source. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27494) Null keys/values don't work in Kafka source v2
[ https://issues.apache.org/jira/browse/SPARK-27494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27494: - Summary: Null keys/values don't work in Kafka source v2 (was: Null values don't work in Kafka source v2) > Null keys/values don't work in Kafka source v2 > -- > > Key: SPARK-27494 > URL: https://issues.apache.org/jira/browse/SPARK-27494 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Assignee: Genmao Yu >Priority: Major > Labels: correctness > Fix For: 3.0.0, 2.4.3 > > > Right now Kafka source v2 doesn't support null values. The issue is in > org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow > which doesn't handle null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27494) Null values don't work in Kafka source v2
[ https://issues.apache.org/jira/browse/SPARK-27494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27494: - Labels: correctness (was: ) > Null values don't work in Kafka source v2 > - > > Key: SPARK-27494 > URL: https://issues.apache.org/jira/browse/SPARK-27494 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Assignee: Genmao Yu >Priority: Major > Labels: correctness > Fix For: 3.0.0, 2.4.3 > > > Right now Kafka source v2 doesn't support null values. The issue is in > org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow > which doesn't handle null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27496) RPC should send back the fatal errors
Shixiong Zhu created SPARK-27496: Summary: RPC should send back the fatal errors Key: SPARK-27496 URL: https://issues.apache.org/jira/browse/SPARK-27496 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.1 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Right now, when a fatal error throws from "receiveAndReply", the sender will not be notified. We should try our best to send it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct
[ https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820460#comment-16820460 ] Shixiong Zhu commented on SPARK-27468: -- [~shahid] You need to use "--master local-cluster[2,1,1024]". The local mode has only one BlockManager. > "Storage Level" in "RDD Storage Page" is not correct > > > Key: SPARK-27468 > URL: https://issues.apache.org/jira/browse/SPARK-27468 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Priority: Major > Attachments: Screenshot from 2019-04-17 10-42-55.png > > > I ran the following unit test and checked the UI. > {code} > val conf = new SparkConf() > .setAppName("test") > .setMaster("local-cluster[2,1,1024]") > .set("spark.ui.enabled", "true") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) > rdd.count() > Thread.sleep(360) > {code} > The storage level is "Memory Deserialized 1x Replicated" in the RDD storage > page. > I tried to debug and found this is because Spark emitted the following two > events: > {code} > event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, > 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 > replicas),56,0)) > event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, > 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 > replicas),56,0)) > {code} > The storage level in the second event will overwrite the first one. "1 > replicas" comes from this line: > https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 > Maybe AppStatusListener should calculate the replicas from events? > Another fact we may need to think about is when replicas is 2, will two Spark > events arrive in the same order? Currently, two RPCs from different executors > can arrive in any order. > Credit goes to [~srfnmnk] who reported this issue originally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27494) Null values don't work in Kafka source v2
Shixiong Zhu created SPARK-27494: Summary: Null values don't work in Kafka source v2 Key: SPARK-27494 URL: https://issues.apache.org/jira/browse/SPARK-27494 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.1 Reporter: Shixiong Zhu Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct
[ https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27468: - Description: I ran the following unit test and checked the UI. {code} val conf = new SparkConf() .setAppName("test") .setMaster("local-cluster[2,1,1024]") .set("spark.ui.enabled", "true") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) rdd.count() Thread.sleep(360) {code} The storage level is "Memory Deserialized 1x Replicated" in the RDD storage page. I tried to debug and found this is because Spark emitted the following two events: {code} event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 replicas),56,0)) event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 replicas),56,0)) {code} The storage level in the second event will overwrite the first one. "1 replicas" comes from this line: https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 Maybe AppStatusListener should calculate the replicas from events? Another fact we may need to think about is when replicas is 2, will two Spark events arrive in the same order? Currently, two RPCs from different executors can arrive in any order. Credit goes to [~srfnmnk] who reported this issue originally. was: I ran the following unit test and checked the UI. {code} val conf = new SparkConf() .setAppName("test") .setMaster("local-cluster[2,1,1024]") .set("spark.ui.enabled", "true") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) rdd.count() Thread.sleep(360) {code} The storage level is "Memory Deserialized 1x Replicated" in the RDD storage page. I tried to debug and found this is because Spark emitted the following two events: {code} event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 replicas),56,0)) event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 replicas),56,0)) {code} The storage level in the second event will overwrite the first one. "1 replicas" comes from this line: https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 Maybe AppStatusListener should calculate the replicas from events? Another fact we may need to think about is when replicas is 2, will two Spark events arrive in the same order? Currently, two RPCs from different executors can arrive in any order. Credit goes to @dani > "Storage Level" in "RDD Storage Page" is not correct > > > Key: SPARK-27468 > URL: https://issues.apache.org/jira/browse/SPARK-27468 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Priority: Major > > I ran the following unit test and checked the UI. > {code} > val conf = new SparkConf() > .setAppName("test") > .setMaster("local-cluster[2,1,1024]") > .set("spark.ui.enabled", "true") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) > rdd.count() > Thread.sleep(360) > {code} > The storage level is "Memory Deserialized 1x Replicated" in the RDD storage > page. > I tried to debug and found this is because Spark emitted the following two > events: > {code} > event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, > 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 > replicas),56,0)) > event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, > 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 > replicas),56,0)) > {code} > The storage level in the second event will overwrite the first one. "1 > replicas" comes from this line: > https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 > Maybe AppStatusListener should calculate the replicas from events? > Another fact we may need to think about is when replicas is 2, will two Spark > events arrive in the same order? Currently, two RPCs from different executors > can arrive in any order. > Credit goes to [~srfnmnk] who reported this issue originally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) ---
[jira] [Updated] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct
[ https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27468: - Description: I ran the following unit test and checked the UI. {code} val conf = new SparkConf() .setAppName("test") .setMaster("local-cluster[2,1,1024]") .set("spark.ui.enabled", "true") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) rdd.count() Thread.sleep(360) {code} The storage level is "Memory Deserialized 1x Replicated" in the RDD storage page. I tried to debug and found this is because Spark emitted the following two events: {code} event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 replicas),56,0)) event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 replicas),56,0)) {code} The storage level in the second event will overwrite the first one. "1 replicas" comes from this line: https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 Maybe AppStatusListener should calculate the replicas from events? Another fact we may need to think about is when replicas is 2, will two Spark events arrive in the same order? Currently, two RPCs from different executors can arrive in any order. Credit goes to @dani was: I ran the following unit test and checked the UI. {code} val conf = new SparkConf() .setAppName("test") .setMaster("local-cluster[2,1,1024]") .set("spark.ui.enabled", "true") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) rdd.count() Thread.sleep(360) {code} The storage level is "Memory Deserialized 1x Replicated" in the RDD storage page. I tried to debug and found this is because Spark emitted the following two events: {code} event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 replicas),56,0)) event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 replicas),56,0)) {code} The storage level in the second event will overwrite the first one. "1 replicas" comes from this line: https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 Maybe AppStatusListener should calculate the replicas from events? Another fact we may need to think about is when replicas is 2, will two Spark events arrive in the same order? Currently, two RPCs from different executors can arrive in any order. > "Storage Level" in "RDD Storage Page" is not correct > > > Key: SPARK-27468 > URL: https://issues.apache.org/jira/browse/SPARK-27468 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Priority: Major > > I ran the following unit test and checked the UI. > {code} > val conf = new SparkConf() > .setAppName("test") > .setMaster("local-cluster[2,1,1024]") > .set("spark.ui.enabled", "true") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) > rdd.count() > Thread.sleep(360) > {code} > The storage level is "Memory Deserialized 1x Replicated" in the RDD storage > page. > I tried to debug and found this is because Spark emitted the following two > events: > {code} > event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, > 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 > replicas),56,0)) > event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, > 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 > replicas),56,0)) > {code} > The storage level in the second event will overwrite the first one. "1 > replicas" comes from this line: > https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 > Maybe AppStatusListener should calculate the replicas from events? > Another fact we may need to think about is when replicas is 2, will two Spark > events arrive in the same order? Currently, two RPCs from different executors > can arrive in any order. > Credit goes to @dani -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org F
[jira] [Created] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct
Shixiong Zhu created SPARK-27468: Summary: "Storage Level" in "RDD Storage Page" is not correct Key: SPARK-27468 URL: https://issues.apache.org/jira/browse/SPARK-27468 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.1 Reporter: Shixiong Zhu I ran the following unit test and checked the UI. {code} val conf = new SparkConf() .setAppName("test") .setMaster("local-cluster[2,1,1024]") .set("spark.ui.enabled", "true") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2) rdd.count() Thread.sleep(360) {code} The storage level is "Memory Deserialized 1x Replicated" in the RDD storage page. I tried to debug and found this is because Spark emitted the following two events: {code} event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 replicas),56,0)) event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 replicas),56,0)) {code} The storage level in the second event will overwrite the first one. "1 replicas" comes from this line: https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457 Maybe AppStatusListener should calculate the replicas from events? Another fact we may need to think about is when replicas is 2, will two Spark events arrive in the same order? Currently, two RPCs from different executors can arrive in any order. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27394) The staleness of UI may last minutes or hours when no tasks start or finish
[ https://issues.apache.org/jira/browse/SPARK-27394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27394: - Fix Version/s: 2.4.2 > The staleness of UI may last minutes or hours when no tasks start or finish > --- > > Key: SPARK-27394 > URL: https://issues.apache.org/jira/browse/SPARK-27394 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.2, 3.0.0 > > > Run the following codes on a cluster that has at least 2 cores. > {code} > sc.makeRDD(1 to 1000, 1000).foreach { i => > Thread.sleep(30) > } > {code} > The jobs page will just show one running task. > This is because when the second task event calls > "AppStatusListener.maybeUpdate" for a job, it will just ignore since the gap > between two events is smaller than `spark.ui.liveUpdate.period`. > After the second task event, in the above case, because there won't be any > other task events, the Spark UI will be always stale until the next task > event gets fired (after 300 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27419) When setting spark.executor.heartbeatInterval to a value less than 1 seconds, it will always fail
[ https://issues.apache.org/jira/browse/SPARK-27419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-27419. -- Resolution: Fixed Fix Version/s: 2.4.2 > When setting spark.executor.heartbeatInterval to a value less than 1 seconds, > it will always fail > - > > Key: SPARK-27419 > URL: https://issues.apache.org/jira/browse/SPARK-27419 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.2 > > > When setting spark.executor.heartbeatInterval to a value less than 1 seconds > in branch-2.4, it will always fail because the value will be converted to 0 > and the heartbeat will always timeout and finally kill the executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27419) When setting spark.executor.heartbeatInterval to a value less than 1 seconds, it will always fail
Shixiong Zhu created SPARK-27419: Summary: When setting spark.executor.heartbeatInterval to a value less than 1 seconds, it will always fail Key: SPARK-27419 URL: https://issues.apache.org/jira/browse/SPARK-27419 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.1, 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu When setting spark.executor.heartbeatInterval to a value less than 1 seconds in branch-2.4, it will always fail because the value will be converted to 0 and the heartbeat will always timeout and finally kill the executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27348) HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-27348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812657#comment-16812657 ] Shixiong Zhu commented on SPARK-27348: -- [~sandeep.katta2007] I cannot reproduce this locally. Ideally, when we decide to remove an executor, we should remove it from all places rather than counting on a TCP disconnect event which may not happen sometimes. > HeartbeatReceiver doesn't remove lost executors from > CoarseGrainedSchedulerBackend > -- > > Key: SPARK-27348 > URL: https://issues.apache.org/jira/browse/SPARK-27348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Major > > When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost > executors from CoarseGrainedSchedulerBackend. When a connection of an > executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not > receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still > thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask > TaskScheduler to run tasks on this lost executor. This task will never finish > and the job will hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27394) The staleness of UI may last minutes or hours when no tasks start or finish
Shixiong Zhu created SPARK-27394: Summary: The staleness of UI may last minutes or hours when no tasks start or finish Key: SPARK-27394 URL: https://issues.apache.org/jira/browse/SPARK-27394 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.4.1, 2.4.0 Reporter: Shixiong Zhu Run the following codes on a cluster that has at least 2 cores. {code} sc.makeRDD(1 to 1000, 1000).foreach { i => Thread.sleep(30) } {code} The jobs page will just show one running task. This is because when the second task event calls "AppStatusListener.maybeUpdate" for a job, it will just ignore since the gap between two events is smaller than `spark.ui.liveUpdate.period`. After the second task event, in the above case, because there won't be any other task events, the Spark UI will be always stale until the next task event gets fired (after 300 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27394) The staleness of UI may last minutes or hours when no tasks start or finish
[ https://issues.apache.org/jira/browse/SPARK-27394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-27394: Assignee: Shixiong Zhu > The staleness of UI may last minutes or hours when no tasks start or finish > --- > > Key: SPARK-27394 > URL: https://issues.apache.org/jira/browse/SPARK-27394 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Run the following codes on a cluster that has at least 2 cores. > {code} > sc.makeRDD(1 to 1000, 1000).foreach { i => > Thread.sleep(30) > } > {code} > The jobs page will just show one running task. > This is because when the second task event calls > "AppStatusListener.maybeUpdate" for a job, it will just ignore since the gap > between two events is smaller than `spark.ui.liveUpdate.period`. > After the second task event, in the above case, because there won't be any > other task events, the Spark UI will be always stale until the next task > event gets fired (after 300 seconds). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27348) HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-27348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27348: - Description: When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost executors from CoarseGrainedSchedulerBackend. When a connection is not gracefully shut down, CoarseGrainedSchedulerBackend may not receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask TaskScheduler to run tasks on this lost executor. This task will never finish and the job will hang forever. (was: When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost executors from CoarseGrainedSchedulerBackend. When a connection is gracefully shut down, CoarseGrainedSchedulerBackend will not receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask TaskScheduler to run tasks on this lost executor. This task will never finish and the job will hang forever.) > HeartbeatReceiver doesn't remove lost executors from > CoarseGrainedSchedulerBackend > -- > > Key: SPARK-27348 > URL: https://issues.apache.org/jira/browse/SPARK-27348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Major > > When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost > executors from CoarseGrainedSchedulerBackend. When a connection is not > gracefully shut down, CoarseGrainedSchedulerBackend may not receive a > disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a > lost executor is still alive. CoarseGrainedSchedulerBackend may ask > TaskScheduler to run tasks on this lost executor. This task will never finish > and the job will hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27348) HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-27348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27348: - Description: When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost executors from CoarseGrainedSchedulerBackend. When a connection of an executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask TaskScheduler to run tasks on this lost executor. This task will never finish and the job will hang forever. (was: When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost executors from CoarseGrainedSchedulerBackend. When a connection is not gracefully shut down, CoarseGrainedSchedulerBackend may not receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask TaskScheduler to run tasks on this lost executor. This task will never finish and the job will hang forever.) > HeartbeatReceiver doesn't remove lost executors from > CoarseGrainedSchedulerBackend > -- > > Key: SPARK-27348 > URL: https://issues.apache.org/jira/browse/SPARK-27348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Major > > When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost > executors from CoarseGrainedSchedulerBackend. When a connection of an > executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not > receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still > thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask > TaskScheduler to run tasks on this lost executor. This task will never finish > and the job will hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27348) HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend
Shixiong Zhu created SPARK-27348: Summary: HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend Key: SPARK-27348 URL: https://issues.apache.org/jira/browse/SPARK-27348 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: Shixiong Zhu When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost executors from CoarseGrainedSchedulerBackend. When a connection is gracefully shut down, CoarseGrainedSchedulerBackend will not receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask TaskScheduler to run tasks on this lost executor. This task will never finish and the job will hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27275) Potential corruption in EncryptedMessage.transferTo
[ https://issues.apache.org/jira/browse/SPARK-27275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27275: - Labels: correctness (was: ) > Potential corruption in EncryptedMessage.transferTo > --- > > Key: SPARK-27275 > URL: https://issues.apache.org/jira/browse/SPARK-27275 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: correctness > > `EncryptedMessage.transferTo` has a potential corruption issue. When the > underlying buffer has more than `1024 * 32` bytes (this should be rare but it > could happen in error messages that send over the wire), it may just send a > partial message as `EncryptedMessage.count` becomes less than `transferred`. > This will cause the client hang forever (or timeout) as it will wait until > receiving expected length of bytes, or weird errors (such as corruption or > silent correctness issue) if the channel is reused by other messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27275) Potential corruption in EncryptedMessage.transferTo
Shixiong Zhu created SPARK-27275: Summary: Potential corruption in EncryptedMessage.transferTo Key: SPARK-27275 URL: https://issues.apache.org/jira/browse/SPARK-27275 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu `EncryptedMessage.transferTo` has a potential corruption issue. When the underlying buffer has more than `1024 * 32` bytes (this should be rare but it could happen in error messages that send over the wire), it may just send a partial message as `EncryptedMessage.count` becomes less than `transferred`. This will cause the client hang forever (or timeout) as it will wait until receiving expected length of bytes, or weird errors (such as corruption or silent correctness issue) if the channel is reused by other messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27210) Cleanup incomplete output files in ManifestFileCommitProtocol if task is aborted
[ https://issues.apache.org/jira/browse/SPARK-27210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-27210. -- Resolution: Fixed Assignee: Jungtaek Lim Fix Version/s: 3.0.0 > Cleanup incomplete output files in ManifestFileCommitProtocol if task is > aborted > > > Key: SPARK-27210 > URL: https://issues.apache.org/jira/browse/SPARK-27210 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Minor > Fix For: 3.0.0 > > > Unlike HadoopMapReduceCommitProtocol, ManifestFileCommitProtocol doesn't > clean up incomplete output files for both cases: task is aborted as well as > job is aborted. > In HadoopMapReduceCommitProtocol, it leverages stage directory to write > intermediate files so once job is aborted it can simply delete stage > directory to clean up everything. Even HadoopMapReduceCommitProtocol puts > more effort on cleaning up intermediate files on task side if task is aborted. > ManifestFileCommitProtocol doesn't do anything for cleaning up but just > maintains the metadata which list of complete output files are written. It > should be better if ManifestFileCommitProtocol can do the best effort to > clean up: not sure it can do job level cleanup since it doesn't leverage > stage directory, but it's clear that it can still put best effort to do task > level cleanup. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27221) Improve the assert error message in TreeNode.parseToJson
[ https://issues.apache.org/jira/browse/SPARK-27221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27221: - Summary: Improve the assert error message in TreeNode.parseToJson (was: Improve the assert error message in TreeNode) > Improve the assert error message in TreeNode.parseToJson > > > Key: SPARK-27221 > URL: https://issues.apache.org/jira/browse/SPARK-27221 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > > When TreeNode.parseToJson may throw an assert error without any error message > when a TreeNode is not implemented properly, and it's hard to find the bad > TreeNode implementation. > It's better to improve the error message to indicate the type of TreeNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org