[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3
[ https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Pedamallu updated SPARK-34107: --- Attachment: SHS_Profiling_Sorted.csv > Spark History not loading when service has to load 300k applications > initially from S3 > -- > > Key: SPARK-34107 > URL: https://issues.apache.org/jira/browse/SPARK-34107 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shashank Pedamallu >Priority: Major > Attachments: SHS_Profiling_Sorted.csv, blank_shs.png > > > Spark History Service is having trouble loading when loading initially with > 300k+ applications from S3. Following are the details and snapshots: > Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) > {noformat} > spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ > | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l > 305571 > spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} > {noformat} > Logs when starting SparkHistory: > {noformat} > root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private# > > /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh > --properties-file /etc/spark-history-config/shs-default.properties > 2021/01/14 02:40:28 Spark spark wrapper is disabled > 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0 > 2021/01/14 02:40:28 Statsd disabled > 2021/01/14 02:40:28 Debug log: /tmp/.log > 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 > Mozart 0 > 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with > arguments [org.apache.spark.deploy.history.HistoryServer --properties-file > /etc/spark-history-config/shs-default.properties] > 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: > 2077@shs-with-statsd-86d7f54679-t8fqr > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT > 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root > 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root > 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to: > 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to: > 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; > users with admin permissions: ; groups with admin permissions > 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties > 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period > at 10 second(s). > 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system > started > 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to > org.sparkproject.jetty.util.log.Slf4jLog > 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: > 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm > 1.8.0_242-b08 > 21/01/14 02:40:31 INFO Server: Started @1999ms > 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f > {HTTP/1.1,[http/1.1]} {0.0.0.0:18080} > 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080. > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@b9dfc5a > {/,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@1bbae752 > {/json,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@5cf87cfd > {/api,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@74971ed9 > {/static,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@1542af63 > {/history,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and > started at http://shs-with-statsd-86d7f54679-t8fqr:18080 > 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 > seconds > 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning >
[jira] [Commented] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3
[ https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264672#comment-17264672 ] Shashank Pedamallu commented on SPARK-34107: Screenshot of the spark history: !blank_shs.png! Also, please find attached the dynamic tracing analysis (using [btrace|[http://example.com|https://github.com/btraceio/btrace]][^SHS_Profiling_Sorted.csv] > Spark History not loading when service has to load 300k applications > initially from S3 > -- > > Key: SPARK-34107 > URL: https://issues.apache.org/jira/browse/SPARK-34107 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shashank Pedamallu >Priority: Major > Attachments: SHS_Profiling_Sorted.csv, blank_shs.png > > > Spark History Service is having trouble loading when loading initially with > 300k+ applications from S3. Following are the details and snapshots: > Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) > {noformat} > spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ > | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l > 305571 > spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} > {noformat} > Logs when starting SparkHistory: > {noformat} > root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private# > > /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh > --properties-file /etc/spark-history-config/shs-default.properties > 2021/01/14 02:40:28 Spark spark wrapper is disabled > 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0 > 2021/01/14 02:40:28 Statsd disabled > 2021/01/14 02:40:28 Debug log: /tmp/.log > 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 > Mozart 0 > 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with > arguments [org.apache.spark.deploy.history.HistoryServer --properties-file > /etc/spark-history-config/shs-default.properties] > 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: > 2077@shs-with-statsd-86d7f54679-t8fqr > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT > 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root > 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root > 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to: > 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to: > 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; > users with admin permissions: ; groups with admin permissions > 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties > 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period > at 10 second(s). > 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system > started > 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to > org.sparkproject.jetty.util.log.Slf4jLog > 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: > 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm > 1.8.0_242-b08 > 21/01/14 02:40:31 INFO Server: Started @1999ms > 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f > {HTTP/1.1,[http/1.1]} {0.0.0.0:18080} > 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080. > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@b9dfc5a > {/,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@1bbae752 > {/json,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@5cf87cfd > {/api,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@74971ed9 > {/static,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@1542af63 > {/history,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0,
[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3
[ https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Pedamallu updated SPARK-34107: --- Attachment: blank_shs.png > Spark History not loading when service has to load 300k applications > initially from S3 > -- > > Key: SPARK-34107 > URL: https://issues.apache.org/jira/browse/SPARK-34107 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shashank Pedamallu >Priority: Major > Attachments: blank_shs.png > > > Spark History Service is having trouble loading when loading initially with > 300k+ applications from S3. Following are the details and snapshots: > Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) > {noformat} > spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ > | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l > 305571 > spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} > {noformat} > Logs when starting SparkHistory: > {noformat} > root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private# > > /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh > --properties-file /etc/spark-history-config/shs-default.properties > 2021/01/14 02:40:28 Spark spark wrapper is disabled > 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0 > 2021/01/14 02:40:28 Statsd disabled > 2021/01/14 02:40:28 Debug log: /tmp/.log > 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 > Mozart 0 > 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with > arguments [org.apache.spark.deploy.history.HistoryServer --properties-file > /etc/spark-history-config/shs-default.properties] > 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: > 2077@shs-with-statsd-86d7f54679-t8fqr > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP > 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT > 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root > 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root > 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to: > 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to: > 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; > users with admin permissions: ; groups with admin permissions > 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties > 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period > at 10 second(s). > 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system > started > 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to > org.sparkproject.jetty.util.log.Slf4jLog > 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: > 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm > 1.8.0_242-b08 > 21/01/14 02:40:31 INFO Server: Started @1999ms > 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f > {HTTP/1.1,[http/1.1]} {0.0.0.0:18080} > 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080. > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@b9dfc5a > {/,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@1bbae752 > {/json,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@5cf87cfd > {/api,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@74971ed9 > {/static,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO ContextHandler: Started > o.s.j.s.ServletContextHandler@1542af63 > {/history,null,AVAILABLE,@Spark} > 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and > started at http://shs-with-statsd-86d7f54679-t8fqr:18080 > 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 > seconds > 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning >
[jira] [Assigned] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34096: Assignee: Apache Spark > Improve performance for nth_value ignore nulls over offset window > - > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > The current > {code:java} > UnboundedOffsetWindowFunctionFrame > {code} > and > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34096: Assignee: (was: Apache Spark) > Improve performance for nth_value ignore nulls over offset window > - > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > The current > {code:java} > UnboundedOffsetWindowFunctionFrame > {code} > and > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264649#comment-17264649 ] Apache Spark commented on SPARK-34096: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31178 > Improve performance for nth_value ignore nulls over offset window > - > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > The current > {code:java} > UnboundedOffsetWindowFunctionFrame > {code} > and > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264639#comment-17264639 ] Apache Spark commented on SPARK-34110: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/31177 > Upgrade ZooKeeper to 3.6.2 > -- > > Key: SPARK-34110 > URL: https://issues.apache.org/jira/browse/SPARK-34110 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > When running Spark on JDK 14: > {noformat} > 21/01/13 20:25:32,533 WARN > [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] > zookeeper.ClientCnxn:1164 : Session 0x0 for server > apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing > socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not > resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > {noformat} > Please see ZOOKEEPER-3779 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34110: Assignee: (was: Apache Spark) > Upgrade ZooKeeper to 3.6.2 > -- > > Key: SPARK-34110 > URL: https://issues.apache.org/jira/browse/SPARK-34110 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > When running Spark on JDK 14: > {noformat} > 21/01/13 20:25:32,533 WARN > [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] > zookeeper.ClientCnxn:1164 : Session 0x0 for server > apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing > socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not > resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > {noformat} > Please see ZOOKEEPER-3779 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34110: Assignee: Apache Spark > Upgrade ZooKeeper to 3.6.2 > -- > > Key: SPARK-34110 > URL: https://issues.apache.org/jira/browse/SPARK-34110 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > When running Spark on JDK 14: > {noformat} > 21/01/13 20:25:32,533 WARN > [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] > zookeeper.ClientCnxn:1164 : Session 0x0 for server > apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing > socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not > resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > {noformat} > Please see ZOOKEEPER-3779 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-34096: --- Summary: Improve performance for nth_value ignore nulls over offset window (was: Improve performance for nth_value ignore nulls with offset window) > Improve performance for nth_value ignore nulls over offset window > - > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > The current > {code:java} > UnboundedOffsetWindowFunctionFrame > {code} > and > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls with offset window
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-34096: --- Summary: Improve performance for nth_value ignore nulls with offset window (was: Improve performance for nth_value ignore nulls) > Improve performance for nth_value ignore nulls with offset window > - > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > The current > {code:java} > UnboundedOffsetWindowFunctionFrame > {code} > and > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34112) Upgrade ORC
Dongjoon Hyun created SPARK-34112: - Summary: Upgrade ORC Key: SPARK-34112 URL: https://issues.apache.org/jira/browse/SPARK-34112 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.2.0 Reporter: Dongjoon Hyun Apache ORC doesn't support Java 14 yet. We need to upgrade it when it's ready. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264631#comment-17264631 ] Hyukjin Kwon commented on SPARK-34111: -- I just marked it as a blocker because the duplicated jars might cause an issue that's hard to debug. However, I am fine with lowering the priority, [~dongjoon]. I will leave it to you. > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264630#comment-17264630 ] Dongjoon Hyun commented on SPARK-34110: --- Thank you, [~yumwang]! > Upgrade ZooKeeper to 3.6.2 > -- > > Key: SPARK-34110 > URL: https://issues.apache.org/jira/browse/SPARK-34110 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > When running Spark on JDK 14: > {noformat} > 21/01/13 20:25:32,533 WARN > [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] > zookeeper.ClientCnxn:1164 : Session 0x0 for server > apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing > socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not > resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > {noformat} > Please see ZOOKEEPER-3779 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264629#comment-17264629 ] Kent Yao commented on SPARK-34111: -- thanks [~hyukjin.kwon] for pinging me. > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264628#comment-17264628 ] Dongjoon Hyun commented on SPARK-34111: --- Oh, is this a blocker? > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level
[ https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264626#comment-17264626 ] Dongjoon Hyun commented on SPARK-23431: --- Thank you! > Expose the new executor memory metrics at the stage level > - > > Key: SPARK-23431 > URL: https://issues.apache.org/jira/browse/SPARK-23431 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Assignee: Terry Kim >Priority: Major > Fix For: 3.1.0 > > > Collect and show the new executor memory metrics for each stage, to provide > more information on how memory is used per stage. > Modify the AppStatusListener to track the peak values for JVM used memory, > execution memory, storage memory, and unified memory for each executor for > each stage. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264621#comment-17264621 ] Yuming Wang commented on SPARK-34110: - Another issue is: {noformat} 21/01/13 22:49:56,890 ERROR [Driver] server.HiveServer2:186 : Unable to create a znode for this server instance java.lang.Exception: Max znode creation wait time: 120s exhausted at org.apache.hive.service.server.HiveServer2.addServerInstanceToZooKeeper(HiveServer2.java:183) at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:128) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:230) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:159) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732) {noformat} > Upgrade ZooKeeper to 3.6.2 > -- > > Key: SPARK-34110 > URL: https://issues.apache.org/jira/browse/SPARK-34110 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > When running Spark on JDK 14: > {noformat} > 21/01/13 20:25:32,533 WARN > [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] > zookeeper.ClientCnxn:1164 : Session 0x0 for server > apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing > socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not > resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > {noformat} > Please see ZOOKEEPER-3779 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264620#comment-17264620 ] Hyukjin Kwon commented on SPARK-34111: -- cc [~Qin Yao] FYI > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34111: - Description: After SPARK-33705, we now happened to have two jars in the release artifact with Hadoop 3: {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: {code} ... jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar ... javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar ... {code} It can potentially cause an issue, and we should better remove {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN tests. was: After SPARK-33705, we now happened to have two jars in the release artifact with Hadoop 3: {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: {code} ... jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar ... javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar ... {code} It can potentially cause an issue, and we should better remove {{javax.servlet-api-3.1.0.jar }} which is apparently only required for YARN tests. > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
Hyukjin Kwon created SPARK-34111: Summary: Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar Key: SPARK-34111 URL: https://issues.apache.org/jira/browse/SPARK-34111 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.1.0 Reporter: Hyukjin Kwon After SPARK-33705, we now happened to have two jars in the release artifact with Hadoop 3: {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: {code} ... jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar ... javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar ... {code} It can potentially cause an issue, and we should better remove {{javax.servlet-api-3.1.0.jar }} which is apparently only required for YARN tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-34110: Description: When running Spark on JDK 14: {noformat} 21/01/13 20:25:32,533 WARN [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] zookeeper.ClientCnxn:1164 : Session 0x0 for server apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing socket connection and attempting reconnect java.lang.IllegalArgumentException: Unable to canonicalize address carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not resolvable at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) {noformat} Please see ZOOKEEPER-3779 for more details. was: When running Spark on JDK 14: {noformat} 21/01/13 20:25:32,533 WARN [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] zookeeper.ClientCnxn:1164 : Session 0x0 for server apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing socket connection and attempting reconnect java.lang.IllegalArgumentException: Unable to canonicalize address carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not resolvable at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) {noformat} > Upgrade ZooKeeper to 3.6.2 > -- > > Key: SPARK-34110 > URL: https://issues.apache.org/jira/browse/SPARK-34110 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > When running Spark on JDK 14: > {noformat} > 21/01/13 20:25:32,533 WARN > [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] > zookeeper.ClientCnxn:1164 : Session 0x0 for server > apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing > socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not > resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > {noformat} > Please see ZOOKEEPER-3779 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34106) Hide FValueTest and AnovaTest
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-34106. -- Resolution: Duplicate > Hide FValueTest and AnovaTest > - > > Key: SPARK-34106 > URL: https://issues.apache.org/jira/browse/SPARK-34106 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > > hide the added test classes for now. > they are not very practical for big data. If there are valid use cases, we > should see more requests from the community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34110) Upgrade ZooKeeper to 3.6.2
Yuming Wang created SPARK-34110: --- Summary: Upgrade ZooKeeper to 3.6.2 Key: SPARK-34110 URL: https://issues.apache.org/jira/browse/SPARK-34110 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.2.0 Reporter: Yuming Wang When running Spark on JDK 14: {noformat} 21/01/13 20:25:32,533 WARN [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] zookeeper.ClientCnxn:1164 : Session 0x0 for server apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing socket connection and attempting reconnect java.lang.IllegalArgumentException: Unable to canonicalize address carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not resolvable at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33507) Improve and fix cache behavior in v1 and v2
[ https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264608#comment-17264608 ] Chao Sun edited comment on SPARK-33507 at 1/14/21, 5:23 AM: Thanks [~hyukjin.kwon]. From my side, there is no regression. Although I feel SPARK-34052 is a bit important since it concerns correctness. I'n working on a fix but got delayed by a few other issues found along the way :(. The issue has been there for a long time though so I'm fine moving this to the next release. was (Author: csun): Thanks [~hyukjin.kwon]. From my side, there is no regression. Although I feel SPARK-34052 is a bit important since it concerns correctness. I'n working on a fix but got delayed by a few other issues found during the process :( > Improve and fix cache behavior in v1 and v2 > --- > > Key: SPARK-33507 > URL: https://issues.apache.org/jira/browse/SPARK-33507 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Critical > > This is an umbrella JIRA to track fixes & improvements for caching behavior > in Spark datasource v1 and v2, which includes: > - fix existing cache behavior in v1 and v2. > - fix inconsistent cache behavior between v1 and v2 > - implement missing features in v2 to align with those in v1. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2
[ https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264608#comment-17264608 ] Chao Sun commented on SPARK-33507: -- Thanks [~hyukjin.kwon]. From my side, there is no regression. Although I feel SPARK-34052 is a bit important since it concerns correctness. I'n working on a fix but got delayed by a few other issues found during the process :( > Improve and fix cache behavior in v1 and v2 > --- > > Key: SPARK-33507 > URL: https://issues.apache.org/jira/browse/SPARK-33507 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Critical > > This is an umbrella JIRA to track fixes & improvements for caching behavior > in Spark datasource v1 and v2, which includes: > - fix existing cache behavior in v1 and v2. > - fix inconsistent cache behavior between v1 and v2 > - implement missing features in v2 to align with those in v1. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2
[ https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264601#comment-17264601 ] Hyukjin Kwon commented on SPARK-33507: -- Hey guys, I think we should just start RC regardless of these issues here. Looks like it's going to take too long and the release schedule will have to be delayed. These are non-regressions, right? Please directly ping me and let me know if there are regressions. > Improve and fix cache behavior in v1 and v2 > --- > > Key: SPARK-33507 > URL: https://issues.apache.org/jira/browse/SPARK-33507 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Critical > > This is an umbrella JIRA to track fixes & improvements for caching behavior > in Spark datasource v1 and v2, which includes: > - fix existing cache behavior in v1 and v2. > - fix inconsistent cache behavior between v1 and v2 > - implement missing features in v2 to align with those in v1. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34081) Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join
[ https://issues.apache.org/jira/browse/SPARK-34081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34081: --- Assignee: Yuming Wang > Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as > broadcast join > --- > > Key: SPARK-34081 > URL: https://issues.apache.org/jira/browse/SPARK-34081 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases. > {code:scala} > spark.range(5000L).selectExpr("id % 1 as a", "id % 1 as > b").write.saveAsTable("t1") spark.range(4000L).selectExpr("id % 8000 as > c", "id % 8000 as d").write.saveAsTable("t2") > spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM > t2").explain > {code} > Current: > {noformat} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) >+- HashAggregate(keys=[a#16L, b#17L], functions=[]) > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, > [id=#72] > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) >+- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), > coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), > coalesce(d#19L, 0), isnull(d#19L)], LeftSemi > :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) > ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS > FIRST], false, 0 > : +- Exchange hashpartitioning(coalesce(a#16L, 0), > isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, > [id=#65] > : +- FileScan parquet default.t1[a#16L,b#17L] Batched: > true, DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) > ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS > FIRST], false, 0 > +- Exchange hashpartitioning(coalesce(c#18L, 0), > isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, > [id=#66] > +- HashAggregate(keys=[c#18L, d#19L], functions=[]) >+- Exchange hashpartitioning(c#18L, d#19L, 5), > ENSURE_REQUIREMENTS, [id=#61] > +- HashAggregate(keys=[c#18L, d#19L], > functions=[]) > +- FileScan parquet default.t2[c#18L,d#19L] > Batched: true, DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > {noformat} > > Expected: > {noformat} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) >+- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, > [id=#74] > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, > 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), > isnull(d#19L)], LeftSemi > :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC > NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS > FIRST], false, 0 > : +- Exchange hashpartitioning(coalesce(a#16L, 0), > isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, > [id=#67] > : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > :+- Exchange hashpartitioning(a#16L, b#17L, 5), > ENSURE_REQUIREMENTS, [id=#61] > : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > : +- FileScan parquet default.t1[a#16L,b#17L] > Batched: true, DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC > NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS > FIRST], false, 0 >+- Exchange hashpartitioning(coalesce(c#18L, 0), > isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, > [id=#68] > +-
[jira] [Resolved] (SPARK-34081) Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join
[ https://issues.apache.org/jira/browse/SPARK-34081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34081. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31145 [https://github.com/apache/spark/pull/31145] > Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as > broadcast join > --- > > Key: SPARK-34081 > URL: https://issues.apache.org/jira/browse/SPARK-34081 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases. > {code:scala} > spark.range(5000L).selectExpr("id % 1 as a", "id % 1 as > b").write.saveAsTable("t1") spark.range(4000L).selectExpr("id % 8000 as > c", "id % 8000 as d").write.saveAsTable("t2") > spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM > t2").explain > {code} > Current: > {noformat} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) >+- HashAggregate(keys=[a#16L, b#17L], functions=[]) > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, > [id=#72] > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) >+- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), > coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), > coalesce(d#19L, 0), isnull(d#19L)], LeftSemi > :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) > ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS > FIRST], false, 0 > : +- Exchange hashpartitioning(coalesce(a#16L, 0), > isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, > [id=#65] > : +- FileScan parquet default.t1[a#16L,b#17L] Batched: > true, DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) > ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS > FIRST], false, 0 > +- Exchange hashpartitioning(coalesce(c#18L, 0), > isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, > [id=#66] > +- HashAggregate(keys=[c#18L, d#19L], functions=[]) >+- Exchange hashpartitioning(c#18L, d#19L, 5), > ENSURE_REQUIREMENTS, [id=#61] > +- HashAggregate(keys=[c#18L, d#19L], > functions=[]) > +- FileScan parquet default.t2[c#18L,d#19L] > Batched: true, DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > {noformat} > > Expected: > {noformat} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) >+- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, > [id=#74] > +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, > 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), > isnull(d#19L)], LeftSemi > :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC > NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS > FIRST], false, 0 > : +- Exchange hashpartitioning(coalesce(a#16L, 0), > isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, > [id=#67] > : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > :+- Exchange hashpartitioning(a#16L, b#17L, 5), > ENSURE_REQUIREMENTS, [id=#61] > : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) > : +- FileScan parquet default.t1[a#16L,b#17L] > Batched: true, DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC > NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS > FIRST], false, 0 >+- Exchange
[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-34096: --- Description: The current {code:java} UnboundedOffsetWindowFunctionFrame {code} and {code:java} UnboundedPrecedingOffsetWindowFunctionFrame {code} only support nth_value that respect nulls. So nth_value will execute {code:java} updateExpressions {code} multiple times. was: The current {code:java} UnboundedPrecedingOffsetWindowFunctionFrame {code} only support nth_value that respect nulls. So nth_value will execute {code:java} updateExpressions {code} multiple times. > Improve performance for nth_value ignore nulls > -- > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > The current > {code:java} > UnboundedOffsetWindowFunctionFrame > {code} > and > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls
[ https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-34096: --- Summary: Improve performance for nth_value ignore nulls (was: Improve performance for nth_value ignore nulls over unbounded preceding window frame) > Improve performance for nth_value ignore nulls > -- > > Key: SPARK-34096 > URL: https://issues.apache.org/jira/browse/SPARK-34096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > The current > {code:java} > UnboundedPrecedingOffsetWindowFunctionFrame > {code} > only support nth_value that respect nulls. So nth_value will execute > {code:java} > updateExpressions > {code} > multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34109) Killing executors excluded on failure, results in additional executors being marked as excluded due to fetch failures
[ https://issues.apache.org/jira/browse/SPARK-34109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaruna Godthi updated SPARK-34109: -- Description: Configuration: {code:java} spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled spark.excludeOnFailure.application.fetchFailure.enabled: true # aka deprecated spark.blacklist.application.fetchFailure.enabled spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated spark.blacklist.killBlacklistedExecutors {code} In this case, we have noticed when a few executors are excluded due to task failures (maybe due to host issues), then those executors are killed after being excluded. However, when other executors try to fetch shuffle blocks from these killed executors, then these other executors also end up getting excluded due to `spark.excludeOnFailure.application.fetchFailure.enabled`. Instead, the fetch failures in case of fetch from these excluded executors should not be considered when excluding executors based on `spark.excludeOnFailure.application.fetchFailure.enabled` was: Configuration: ``` spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled spark.excludeOnFailure.application.fetchFailure.enabled: true # aka deprecated spark.blacklist.application.fetchFailure.enabled spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated spark.blacklist.killBlacklistedExecutors ``` In this case, we have noticed when a few executors are excluded due to task failures (maybe due to host issues), then those executors are killed after being excluded. However, when other executors try to fetch shuffle blocks from these killed executors, then these other executors also end up getting excluded due to `spark.excludeOnFailure.application.fetchFailure.enabled`. Instead, the fetch failures in case of fetch from these excluded executors should not be considered when excluding executors based on `spark.excludeOnFailure.application.fetchFailure.enabled` > Killing executors excluded on failure, results in additional executors being > marked as excluded due to fetch failures > - > > Key: SPARK-34109 > URL: https://issues.apache.org/jira/browse/SPARK-34109 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Aaruna Godthi >Priority: Major > > Configuration: > > {code:java} > spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled > spark.excludeOnFailure.application.fetchFailure.enabled: true # aka > deprecated spark.blacklist.application.fetchFailure.enabled > spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated > spark.blacklist.killBlacklistedExecutors > {code} > > > > In this case, we have noticed when a few executors are excluded due to task > failures (maybe due to host issues), then those executors are killed after > being excluded. > However, when other executors try to fetch shuffle blocks from these killed > executors, then these other executors also end up getting excluded due to > `spark.excludeOnFailure.application.fetchFailure.enabled`. > Instead, the fetch failures in case of fetch from these excluded executors > should not be considered when excluding executors based on > `spark.excludeOnFailure.application.fetchFailure.enabled` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34109) Killing executors excluded on failure, results in additional executors being marked as excluded due to fetch failures
Aaruna Godthi created SPARK-34109: - Summary: Killing executors excluded on failure, results in additional executors being marked as excluded due to fetch failures Key: SPARK-34109 URL: https://issues.apache.org/jira/browse/SPARK-34109 Project: Spark Issue Type: Bug Components: Kubernetes, Shuffle, Spark Core Affects Versions: 3.0.1, 3.0.0 Reporter: Aaruna Godthi Configuration: ``` spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled spark.excludeOnFailure.application.fetchFailure.enabled: true # aka deprecated spark.blacklist.application.fetchFailure.enabled spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated spark.blacklist.killBlacklistedExecutors ``` In this case, we have noticed when a few executors are excluded due to task failures (maybe due to host issues), then those executors are killed after being excluded. However, when other executors try to fetch shuffle blocks from these killed executors, then these other executors also end up getting excluded due to `spark.excludeOnFailure.application.fetchFailure.enabled`. Instead, the fetch failures in case of fetch from these excluded executors should not be considered when excluding executors based on `spark.excludeOnFailure.application.fetchFailure.enabled` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34086) RaiseError generates too much code and may fails codegen in length check for char varchar
[ https://issues.apache.org/jira/browse/SPARK-34086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34086. - Fix Version/s: 3.1.1 Resolution: Fixed Issue resolved by pull request 31168 [https://github.com/apache/spark/pull/31168] > RaiseError generates too much code and may fails codegen in length check for > char varchar > - > > Key: SPARK-34086 > URL: https://issues.apache.org/jira/browse/SPARK-34086 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.1 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/ > We can reduce more than 8000 bytes by removing the unnecessary CONCAT > expression. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34086) RaiseError generates too much code and may fails codegen in length check for char varchar
[ https://issues.apache.org/jira/browse/SPARK-34086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34086: --- Assignee: Kent Yao > RaiseError generates too much code and may fails codegen in length check for > char varchar > - > > Key: SPARK-34086 > URL: https://issues.apache.org/jira/browse/SPARK-34086 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/ > We can reduce more than 8000 bytes by removing the unnecessary CONCAT > expression. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23431) Expose the new executor memory metrics at the stage level
[ https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-23431: --- Fix Version/s: 3.1.0 > Expose the new executor memory metrics at the stage level > - > > Key: SPARK-23431 > URL: https://issues.apache.org/jira/browse/SPARK-23431 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Assignee: Terry Kim >Priority: Major > Fix For: 3.1.0 > > > Collect and show the new executor memory metrics for each stage, to provide > more information on how memory is used per stage. > Modify the AppStatusListener to track the peak values for JVM used memory, > execution memory, storage memory, and unified memory for each executor for > each stage. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23431) Expose the new executor memory metrics at the stage level
[ https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264571#comment-17264571 ] Gengliang Wang edited comment on SPARK-23431 at 1/14/21, 3:20 AM: -- [~dongjoon] Done. Sorry for missing the fixed version field. was (Author: gengliang.wang): [~dongjoon]Done. Sorry for missing the fixed version field. > Expose the new executor memory metrics at the stage level > - > > Key: SPARK-23431 > URL: https://issues.apache.org/jira/browse/SPARK-23431 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Assignee: Terry Kim >Priority: Major > Fix For: 3.1.0 > > > Collect and show the new executor memory metrics for each stage, to provide > more information on how memory is used per stage. > Modify the AppStatusListener to track the peak values for JVM used memory, > execution memory, storage memory, and unified memory for each executor for > each stage. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level
[ https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264571#comment-17264571 ] Gengliang Wang commented on SPARK-23431: [~dongjoon]Done. Sorry for missing the fixed version field. > Expose the new executor memory metrics at the stage level > - > > Key: SPARK-23431 > URL: https://issues.apache.org/jira/browse/SPARK-23431 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Assignee: Terry Kim >Priority: Major > Fix For: 3.1.0 > > > Collect and show the new executor memory metrics for each stage, to provide > more information on how memory is used per stage. > Modify the AppStatusListener to track the peak values for JVM used memory, > execution memory, storage memory, and unified memory for each executor for > each stage. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34108) Caching with permanent view doesn't work in certain cases
[ https://issues.apache.org/jira/browse/SPARK-34108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-34108: - Summary: Caching with permanent view doesn't work in certain cases (was: Caching doesn't work completely with permanent view) > Caching with permanent view doesn't work in certain cases > - > > Key: SPARK-34108 > URL: https://issues.apache.org/jira/browse/SPARK-34108 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Major > > Currently, caching a permanent view doesn't work in certain cases. For > instance, in the following: > {code:sql} > CREATE TABLE t (key bigint, value string) USING parquet > CREATE VIEW v1 AS SELECT key FROM t > CACHE TABLE v1 > SELECT key FROM t > {code} > The last SELECT query will hit the cached {{v1}}. On the other hand: > {code:sql} > CREATE TABLE t (key bigint, value string) USING parquet > CREATE VIEW v1 AS SELECT key FROM t ORDER by key > CACHE TABLE v1 > SELECT key FROM t ORDER BY key > {code} > The SELECT won't hit the cache. > It seems this is related to {{EliminateView}}. In the second case, it will > insert an extra project operator which makes the comparison on canonicalized > plan during cache lookup fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34108) Caching doesn't work completely with permanent view
[ https://issues.apache.org/jira/browse/SPARK-34108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-34108: - Description: Currently, caching a permanent view doesn't work in certain cases. For instance, in the following: {code:sql} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t CACHE TABLE v1 SELECT key FROM t {code} The last SELECT query will hit the cached {{v1}}. On the other hand: {code:sql} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t ORDER by key CACHE TABLE v1 SELECT key FROM t ORDER BY key {code} The SELECT won't hit the cache. It seems this is related to {{EliminateView}}. In the second case, it will insert an extra project operator which makes the comparison on canonicalized plan during cache lookup fail. was: Currently, caching a permanent view doesn't work in some cases. For instance, in the following: {code:sql} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t CACHE TABLE v1 SELECT key FROM t {code} The last SELECT query will hit the cached {{v1}}. However, in the following: {code:sql} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t ORDER by key CACHE TABLE v1 SELECT key FROM t ORDER BY key {code} The SELECT won't hit the cache. It seems this is related to {{EliminateView}}. In the second case, it will insert an extra project operator which makes the comparison on canonicalized plan during cache lookup fail. > Caching doesn't work completely with permanent view > --- > > Key: SPARK-34108 > URL: https://issues.apache.org/jira/browse/SPARK-34108 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Major > > Currently, caching a permanent view doesn't work in certain cases. For > instance, in the following: > {code:sql} > CREATE TABLE t (key bigint, value string) USING parquet > CREATE VIEW v1 AS SELECT key FROM t > CACHE TABLE v1 > SELECT key FROM t > {code} > The last SELECT query will hit the cached {{v1}}. On the other hand: > {code:sql} > CREATE TABLE t (key bigint, value string) USING parquet > CREATE VIEW v1 AS SELECT key FROM t ORDER by key > CACHE TABLE v1 > SELECT key FROM t ORDER BY key > {code} > The SELECT won't hit the cache. > It seems this is related to {{EliminateView}}. In the second case, it will > insert an extra project operator which makes the comparison on canonicalized > plan during cache lookup fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34108) Caching doesn't work completely with permanent view
[ https://issues.apache.org/jira/browse/SPARK-34108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-34108: - Description: Currently, caching a permanent view doesn't work in some cases. For instance, in the following: {code:sql} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t CACHE TABLE v1 SELECT key FROM t {code} The last SELECT query will hit the cached {{v1}}. However, in the following: {code:sql} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t ORDER by key CACHE TABLE v1 SELECT key FROM t ORDER BY key {code} The SELECT won't hit the cache. It seems this is related to {{EliminateView}}. In the second case, it will insert an extra project operator which makes the comparison on canonicalized plan during cache lookup fail. was: Currently, caching a permanent view doesn't work in some cases. For instance, in the following: {code} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t CACHE TABLE v1 SELECT key FROM t {code} The last SELECT query will hit the cached {{v1}}. However, in the following: {code} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t ORDER by key CACHE TABLE v1 SELECT key FROM t ORDER BY key {code} The SELECT won't hit the cache. It seems this is related to {{EliminateView}}. In the second case, it will insert an extra project operator which makes the comparison on canonicalized plan during cache lookup fail. > Caching doesn't work completely with permanent view > --- > > Key: SPARK-34108 > URL: https://issues.apache.org/jira/browse/SPARK-34108 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Major > > Currently, caching a permanent view doesn't work in some cases. For instance, > in the following: > {code:sql} > CREATE TABLE t (key bigint, value string) USING parquet > CREATE VIEW v1 AS SELECT key FROM t > CACHE TABLE v1 > SELECT key FROM t > {code} > The last SELECT query will hit the cached {{v1}}. However, in the following: > {code:sql} > CREATE TABLE t (key bigint, value string) USING parquet > CREATE VIEW v1 AS SELECT key FROM t ORDER by key > CACHE TABLE v1 > SELECT key FROM t ORDER BY key > {code} > The SELECT won't hit the cache. > It seems this is related to {{EliminateView}}. In the second case, it will > insert an extra project operator which makes the comparison on canonicalized > plan during cache lookup fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34108) Caching doesn't work completely with permanent view
Chao Sun created SPARK-34108: Summary: Caching doesn't work completely with permanent view Key: SPARK-34108 URL: https://issues.apache.org/jira/browse/SPARK-34108 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Chao Sun Currently, caching a permanent view doesn't work in some cases. For instance, in the following: {code} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t CACHE TABLE v1 SELECT key FROM t {code} The last SELECT query will hit the cached {{v1}}. However, in the following: {code} CREATE TABLE t (key bigint, value string) USING parquet CREATE VIEW v1 AS SELECT key FROM t ORDER by key CACHE TABLE v1 SELECT key FROM t ORDER BY key {code} The SELECT won't hit the cache. It seems this is related to {{EliminateView}}. In the second case, it will insert an extra project operator which makes the comparison on canonicalized plan during cache lookup fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34106: Assignee: Apache Spark (was: zhengruifeng) > Hide FValueTest and AnovaTest > - > > Key: SPARK-34106 > URL: https://issues.apache.org/jira/browse/SPARK-34106 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Major > > hide the added test classes for now. > they are not very practical for big data. If there are valid use cases, we > should see more requests from the community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34106) Hide FValueTest and AnovaTest
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264567#comment-17264567 ] Apache Spark commented on SPARK-34106: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/31176 > Hide FValueTest and AnovaTest > - > > Key: SPARK-34106 > URL: https://issues.apache.org/jira/browse/SPARK-34106 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > > hide the added test classes for now. > they are not very practical for big data. If there are valid use cases, we > should see more requests from the community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34106: Assignee: zhengruifeng (was: Apache Spark) > Hide FValueTest and AnovaTest > - > > Key: SPARK-34106 > URL: https://issues.apache.org/jira/browse/SPARK-34106 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > > hide the added test classes for now. > they are not very practical for big data. If there are valid use cases, we > should see more requests from the community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33311) Improve semantics for REFRESH TABLE
[ https://issues.apache.org/jira/browse/SPARK-33311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-33311: - Parent: SPARK-33507 Issue Type: Sub-task (was: Improvement) > Improve semantics for REFRESH TABLE > --- > > Key: SPARK-33311 > URL: https://issues.apache.org/jira/browse/SPARK-33311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.1 >Reporter: Chao Sun >Priority: Major > > Currently, the semantics for {{REFRESH TABLE t}} is not well defined for view > (let's say {{view}}) that reference the table {{t}}: > 1. If {{view}} is cached, the behavior is not well-defined. Should Spark > invalidate the cache (current behavior) or recache it? > 2. If {{view}} is a temporary view, currently refreshing {{t}} does not > refresh {{view}} since it will just reuse the logical plan defined in the > session catalog. This could lead query failures (although with a helpful > error message) or to incorrect results depending on the refresh behavior. > I think we should clear define and document the behavior here, so that users > won't get confused. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3
[ https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Pedamallu updated SPARK-34107: --- Description: Spark History Service is having trouble loading when loading initially with 300k+ applications from S3. Following are the details and snapshots: Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) {noformat} spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l 305571 spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} {noformat} Logs when starting SparkHistory: {noformat} root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private# /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh --properties-file /etc/spark-history-config/shs-default.properties 2021/01/14 02:40:28 Spark spark wrapper is disabled 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0 2021/01/14 02:40:28 Statsd disabled 2021/01/14 02:40:28 Debug log: /tmp/.log 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 Mozart 0 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with arguments [org.apache.spark.deploy.history.HistoryServer --properties-file /etc/spark-history-config/shs-default.properties] 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 2077@shs-with-statsd-86d7f54679-t8fqr 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to: 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to: 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system started 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to org.sparkproject.jetty.util.log.Slf4jLog 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 1.8.0_242-b08 21/01/14 02:40:31 INFO Server: Started @1999ms 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f {HTTP/1.1,[http/1.1]} {0.0.0.0:18080} 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080. 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@b9dfc5a {/,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1bbae752 {/json,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5cf87cfd {/api,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@74971ed9 {/static,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1542af63 {/history,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://shs-with-statsd-86d7f54679-t8fqr:18080 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 seconds 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning s3a://-company/spark-history-fs-logDirectory/ with lastScanTime==-1{noformat} was: Spark History Service is having trouble loading when loading initially with 300k+ applications from S3. Following are the details and snapshots: Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) {noformat} spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l 305571 spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} {noformat} Logs when starting SparkHistory: {noformat} root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3
[ https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Pedamallu updated SPARK-34107: --- Description: Spark History Service is having trouble loading when loading initially with 300k+ applications from S3. Following are the details and snapshots: Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) {noformat} spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l 305571 spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} {noformat} Logs when starting SparkHistory: {noformat} root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private# /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh --properties-file /etc/spark-history-config/shs-default.properties 2021/01/14 02:40:28 Spark spark wrapper is disabled 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0 2021/01/14 02:40:28 Statsd disabled 2021/01/14 02:40:28 Debug log: /tmp/.log 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 Mozart 0 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with arguments [org.apache.spark.deploy.history.HistoryServer --properties-file /etc/spark-history-config/shs-default.properties] 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 2077@shs-with-statsd-86d7f54679-t8fqr 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to: 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to: 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system started 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to org.sparkproject.jetty.util.log.Slf4jLog 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 1.8.0_242-b08 21/01/14 02:40:31 INFO Server: Started @1999ms 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f {HTTP/1.1,[http/1.1]} {0.0.0.0:18080} 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080. 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@b9dfc5a {/,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1bbae752 {/json,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5cf87cfd {/api,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@74971ed9 {/static,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1542af63 {/history,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://shs-with-statsd-86d7f54679-t8fqr:18080 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 seconds 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning s3a://-company/spark-history-fs-logDirectory/ with lastScanTime==-1{noformat} was: Spark History Service is having trouble loading when loading initially with 300k+ applications from S3. Following are the details and snapshots: Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) {noformat} spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l 305571 spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} Logs when starting SparkHistory: {noformat} root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
[jira] [Created] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3
Shashank Pedamallu created SPARK-34107: -- Summary: Spark History not loading when service has to load 300k applications initially from S3 Key: SPARK-34107 URL: https://issues.apache.org/jira/browse/SPARK-34107 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Shashank Pedamallu Spark History Service is having trouble loading when loading initially with 300k+ applications from S3. Following are the details and snapshots: Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity) {noformat} spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $ | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l 305571 spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat} Logs when starting SparkHistory: {noformat} root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private# /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh --properties-file /etc/spark-history-config/shs-default.properties 2021/01/14 02:40:28 Spark spark wrapper is disabled 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0 2021/01/14 02:40:28 Statsd disabled 2021/01/14 02:40:28 Debug log: /tmp/.log 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 Mozart 0 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with arguments [org.apache.spark.deploy.history.HistoryServer --properties-file /etc/spark-history-config/shs-default.properties] 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 2077@shs-with-statsd-86d7f54679-t8fqr 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to: 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to: 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system started 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to org.sparkproject.jetty.util.log.Slf4jLog 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 1.8.0_242-b08 21/01/14 02:40:31 INFO Server: Started @1999ms 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f{HTTP/1.1,[http/1.1]}{0.0.0.0:18080} 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080. 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@b9dfc5a{/,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1bbae752{/json,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5cf87cfd{/api,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@74971ed9{/static,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1542af63{/history,null,AVAILABLE,@Spark} 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://shs-with-statsd-86d7f54679-t8fqr:18080 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 seconds 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning s3a://-company/spark-history-fs-logDirectory/ with lastScanTime==-1{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34106) Hide FValueTest and AnovaTest
zhengruifeng created SPARK-34106: Summary: Hide FValueTest and AnovaTest Key: SPARK-34106 URL: https://issues.apache.org/jira/browse/SPARK-34106 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.2.0, 3.1.1 Reporter: zhengruifeng hide the added test classes for now. they are not very practical for big data. If there are valid use cases, we should see more requests from the community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest
[ https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-34106: Assignee: zhengruifeng > Hide FValueTest and AnovaTest > - > > Key: SPARK-34106 > URL: https://issues.apache.org/jira/browse/SPARK-34106 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > > hide the added test classes for now. > they are not very practical for big data. If there are valid use cases, we > should see more requests from the community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed
[ https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33557: - Fix Version/s: 3.0.2 > spark.storage.blockManagerSlaveTimeoutMs default value does not follow > spark.network.timeout value when the latter was changed > -- > > Key: SPARK-33557 > URL: https://issues.apache.org/jira/browse/SPARK-33557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Ohad >Assignee: Yang Jie >Priority: Minor > Fix For: 3.0.2, 3.1.0 > > > According to the documentation "spark.network.timeout" is the default timeout > for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the > user sets "spark.network.timeout" the effective value of > "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was > not specifically changed. > However this is not the case since the default value of > "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of > "spark.network.timeout" (120s) > > "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object > of "org.apache.spark.internal.config" as follows: > {code:java} > private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT = > ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs") > .version("0.7.0") > .timeConf(TimeUnit.MILLISECONDS) > .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString) > {code} > So it seems like the its default value is indeed "fixed" to > "spark.network.timeout" default value. > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read
[ https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264538#comment-17264538 ] Takeshi Yamamuro commented on SPARK-34097: -- Ah, I see. Thanks for the report. This issue reminds me of the issue: https://issues.apache.org/jira/browse/SPARK-28587. Since the timestamp part in the where clause looks database-depenedent, I'm thinking now that we might need to handle it in JdbcDialect... > overflow for datetime datatype when creating stride + JDBC parallel read > - > > Key: SPARK-34097 > URL: https://issues.apache.org/jira/browse/SPARK-34097 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 > Environment: spark 3.0.1 > sql server v12.0 >Reporter: Pradip Sodha >Priority: Major > > I'm trying to do JDBC parallel read with datetime column as partition column > {code:java} > create table eData (eid int, start_time datetime) -- sql server v12.0 > --inserting some data{code} > > {code:java} > val df = spark // spark 3.0.1 > .read > .format("jdbc") > .option("url", "jdbc:sqlserver://...") > .option("partitionColumn", "start_time") > .option("lowerBound", "2000-01-01T01:01:11.546") > .option("upperBound", "2000-01-02T01:01:11.547") > .option("numPartitions", "10") > .option("dbtable", "eData") > .load(); > df.show(false){code} > and getting this error, > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 7, 10.139.64.6, executor 0): > com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when > converting date and/or time from character string. at > com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731) > at > org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) > at > org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144) > at org.apache.spark.scheduler.Task.run(Task.scala:117) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ... > Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed > when converting date and/or time from character string.{code} > > which is expted because, because the query desing by spark is, > {code:java} > 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses > of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or > "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND > "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 > 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= > '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', > "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 > 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < > '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND > "start_time" < '2000-01-01 17:49:11.5467',
[jira] [Commented] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read
[ https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264516#comment-17264516 ] Hyukjin Kwon commented on SPARK-34097: -- cc [~maropu] FYI > overflow for datetime datatype when creating stride + JDBC parallel read > - > > Key: SPARK-34097 > URL: https://issues.apache.org/jira/browse/SPARK-34097 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 > Environment: spark 3.0.1 > sql server v12.0 >Reporter: Pradip Sodha >Priority: Major > > I'm trying to do JDBC parallel read with datetime column as partition column > {code:java} > create table eData (eid int, start_time datetime) -- sql server v12.0 > --inserting some data{code} > > {code:java} > val df = spark // spark 3.0.1 > .read > .format("jdbc") > .option("url", "jdbc:sqlserver://...") > .option("partitionColumn", "start_time") > .option("lowerBound", "2000-01-01T01:01:11.546") > .option("upperBound", "2000-01-02T01:01:11.547") > .option("numPartitions", "10") > .option("dbtable", "eData") > .load(); > df.show(false){code} > and getting this error, > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 7, 10.139.64.6, executor 0): > com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when > converting date and/or time from character string. at > com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731) > at > org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) > at > org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144) > at org.apache.spark.scheduler.Task.run(Task.scala:117) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ... > Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed > when converting date and/or time from character string.{code} > > which is expted because, because the query desing by spark is, > {code:java} > 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses > of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or > "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND > "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 > 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= > '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', > "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 > 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < > '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND > "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01 > 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >= > '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469', > "start_time" >= '2000-01-01 22:37:11.5469' > {code} > so, the date use in
[jira] [Resolved] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
[ https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34100. -- Resolution: Cannot Reproduce > pyspark 2.4 packages can't be installed via pip on Amazon Linux 2 > - > > Key: SPARK-34100 > URL: https://issues.apache.org/jira/browse/SPARK-34100 > Project: Spark > Issue Type: Bug > Components: Deploy, PySpark >Affects Versions: 2.4.7 > Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also > tested with pip 20.3.3), using Docker or EMR 5.32.0 > > Example Dockerfile to reproduce: > {{FROM amazonlinux:2}} > {{RUN yum install -y python3}} > {{RUN pip3 install pyspark==2.4.7}} > >Reporter: Devin Boyer >Priority: Minor > > I'm unable to install the pyspark Python package on Amazon Linux 2, whether > in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with > Python 3.7 and pip 9.0.3, but upgrading pip yields the same result. > > When installing the package, the installation will fail with the error > "ValueError: bad marshal data (unknown type code)". Full example stack below. > > This bug prevents use of pyspark for simple testing environments, and from > using tools where the pyspark package is a dependency, like > [https://github.com/awslabs/python-deequ.] > > Stack Trace: > {{Step 3/3 : RUN pip3 install pyspark==2.4.7}} > {{ ---> Running in 2c6e1c1de62f}} > {{WARNING: Running pip install with root privileges is generally not a good > idea. Try `pip3 install --user` instead.}} > {{Collecting pyspark==2.4.7}} > {{ Downloading > https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz > (217.9MB)}} > {{ Complete output from command python setup.py egg_info:}} > {{ Could not import pypandoc - required to package PySpark}} > {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown > distribution option: 'long_description_content_type'}} > {{ warnings.warn(msg)}} > {{ zip_safe flag not set; analyzing archive contents...}} > {{ Traceback (most recent call last):}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, > in save_modules}} > {{ yield saved}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, > in run_setup}} > {{ _execfile(setup_script, ns)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in > _execfile}} > {{ exec(code, globals, locals)}} > {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in > }} > {{ # using Python imports instead which will be resolved correctly.}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, > in setup}} > {{ return distutils.core.setup(**attrs)}} > {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} > {{ dist.run_commands()}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} > {{ self.run_command(cmd)}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} > {{ cmd_obj.run()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 218, in run}} > {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 269, in zip_safe}} > {{ return analyze_egg(self.bdist_dir, self.stubs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 379, in analyze_egg}} > {{ safe = scan_module(egg_dir, base, name, stubs) and safe}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 416, in scan_module}} > {{ code = marshal.load(f)}} > {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the > above exception, another exception occurred:}}{{Traceback (most recent call > last):}} > {{ File "", line 1, in }} > {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }} > {{ 'Programming Language :: Python :: Implementation :: PyPy']}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, > in setup}} > {{ _install_setup_requires(attrs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, > in _install_setup_requires}} > {{ dist.fetch_build_eggs(dist.setup_requires)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in > fetch_build_eggs}} > {{ replace_conflicting=True,}} > {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line > 866, in resolve}} > {{ replace_conflicting=replace_conflicting}} > {{ File
[jira] [Commented] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
[ https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264510#comment-17264510 ] Hyukjin Kwon commented on SPARK-34100: -- If this is fixed upstream, it's best to identify the ticket and port back instead of filing new JIRA to request backport. I am resolving this ticket for now but it would be great if we can identify the ticket that fixed this issue. > pyspark 2.4 packages can't be installed via pip on Amazon Linux 2 > - > > Key: SPARK-34100 > URL: https://issues.apache.org/jira/browse/SPARK-34100 > Project: Spark > Issue Type: Bug > Components: Deploy, PySpark >Affects Versions: 2.4.7 > Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also > tested with pip 20.3.3), using Docker or EMR 5.32.0 > > Example Dockerfile to reproduce: > {{FROM amazonlinux:2}} > {{RUN yum install -y python3}} > {{RUN pip3 install pyspark==2.4.7}} > >Reporter: Devin Boyer >Priority: Minor > > I'm unable to install the pyspark Python package on Amazon Linux 2, whether > in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with > Python 3.7 and pip 9.0.3, but upgrading pip yields the same result. > > When installing the package, the installation will fail with the error > "ValueError: bad marshal data (unknown type code)". Full example stack below. > > This bug prevents use of pyspark for simple testing environments, and from > using tools where the pyspark package is a dependency, like > [https://github.com/awslabs/python-deequ.] > > Stack Trace: > {{Step 3/3 : RUN pip3 install pyspark==2.4.7}} > {{ ---> Running in 2c6e1c1de62f}} > {{WARNING: Running pip install with root privileges is generally not a good > idea. Try `pip3 install --user` instead.}} > {{Collecting pyspark==2.4.7}} > {{ Downloading > https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz > (217.9MB)}} > {{ Complete output from command python setup.py egg_info:}} > {{ Could not import pypandoc - required to package PySpark}} > {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown > distribution option: 'long_description_content_type'}} > {{ warnings.warn(msg)}} > {{ zip_safe flag not set; analyzing archive contents...}} > {{ Traceback (most recent call last):}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, > in save_modules}} > {{ yield saved}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, > in run_setup}} > {{ _execfile(setup_script, ns)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in > _execfile}} > {{ exec(code, globals, locals)}} > {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in > }} > {{ # using Python imports instead which will be resolved correctly.}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, > in setup}} > {{ return distutils.core.setup(**attrs)}} > {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} > {{ dist.run_commands()}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} > {{ self.run_command(cmd)}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} > {{ cmd_obj.run()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 218, in run}} > {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 269, in zip_safe}} > {{ return analyze_egg(self.bdist_dir, self.stubs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 379, in analyze_egg}} > {{ safe = scan_module(egg_dir, base, name, stubs) and safe}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 416, in scan_module}} > {{ code = marshal.load(f)}} > {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the > above exception, another exception occurred:}}{{Traceback (most recent call > last):}} > {{ File "", line 1, in }} > {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }} > {{ 'Programming Language :: Python :: Implementation :: PyPy']}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, > in setup}} > {{ _install_setup_requires(attrs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, > in _install_setup_requires}} > {{ dist.fetch_build_eggs(dist.setup_requires)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in >
[jira] [Updated] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read
[ https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34097: - Target Version/s: (was: 3.0.1) > overflow for datetime datatype when creating stride + JDBC parallel read > - > > Key: SPARK-34097 > URL: https://issues.apache.org/jira/browse/SPARK-34097 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 > Environment: spark 3.0.1 > sql server v12.0 >Reporter: Pradip Sodha >Priority: Major > > I'm trying to do JDBC parallel read with datetime column as partition column > {code:java} > create table eData (eid int, start_time datetime) -- sql server v12.0 > --inserting some data{code} > > {code:java} > val df = spark // spark 3.0.1 > .read > .format("jdbc") > .option("url", "jdbc:sqlserver://...") > .option("partitionColumn", "start_time") > .option("lowerBound", "2000-01-01T01:01:11.546") > .option("upperBound", "2000-01-02T01:01:11.547") > .option("numPartitions", "10") > .option("dbtable", "eData") > .load(); > df.show(false){code} > and getting this error, > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 7, 10.139.64.6, executor 0): > com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when > converting date and/or time from character string. at > com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770) > at > com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731) > at > org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) > at > org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144) > at org.apache.spark.scheduler.Task.run(Task.scala:117) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ... > Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed > when converting date and/or time from character string.{code} > > which is expted because, because the query desing by spark is, > {code:java} > 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses > of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or > "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND > "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 > 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= > '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', > "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 > 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < > '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND > "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01 > 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >= > '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469', > "start_time" >= '2000-01-01 22:37:11.5469' > {code} > so, the date use in query is '2000-01-01
[jira] [Resolved] (SPARK-34075) Hidden directories are being listed for partition inference
[ https://issues.apache.org/jira/browse/SPARK-34075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34075. -- Fix Version/s: 3.1.1 Resolution: Fixed Issue resolved by pull request 31169 [https://github.com/apache/spark/pull/31169] > Hidden directories are being listed for partition inference > --- > > Key: SPARK-34075 > URL: https://issues.apache.org/jira/browse/SPARK-34075 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Burak Yavuz >Assignee: Gengliang Wang >Priority: Blocker > Fix For: 3.1.1 > > > Marking this as a blocker since it seems to be a regression. We are running > Delta's tests against Spark 3.1 as part of QA here: > [https://github.com/delta-io/delta/pull/579] > > We have noticed that one of our tests regressed with: > {code:java} > java.lang.AssertionError: assertion failed: Conflicting directory structures > detected. Suspicious paths: > [info] > file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551 > [info] > file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551/_delta_log > [info] > [info] If provided paths are partition directories, please set "basePath" in > the options of the data source to specify the root directory of the table. If > there are multiple root directories, please load them separately and then > union them. > [info] at scala.Predef$.assert(Predef.scala:223) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:172) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:104) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:158) > [info] at > org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:73) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50) > [info] at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:167) > [info] at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:62) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:45) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) > [info] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:45) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:40) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216) > [info] at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > [info] at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > [info] at scala.collection.immutable.List.foldLeft(List.scala:89) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205) > [info] at scala.collection.immutable.List.foreach(List.scala:392) > [info] at >
[jira] [Assigned] (SPARK-34075) Hidden directories are being listed for partition inference
[ https://issues.apache.org/jira/browse/SPARK-34075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34075: Assignee: Gengliang Wang > Hidden directories are being listed for partition inference > --- > > Key: SPARK-34075 > URL: https://issues.apache.org/jira/browse/SPARK-34075 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Burak Yavuz >Assignee: Gengliang Wang >Priority: Blocker > > Marking this as a blocker since it seems to be a regression. We are running > Delta's tests against Spark 3.1 as part of QA here: > [https://github.com/delta-io/delta/pull/579] > > We have noticed that one of our tests regressed with: > {code:java} > java.lang.AssertionError: assertion failed: Conflicting directory structures > detected. Suspicious paths: > [info] > file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551 > [info] > file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551/_delta_log > [info] > [info] If provided paths are partition directories, please set "basePath" in > the options of the data source to specify the root directory of the table. If > there are multiple root directories, please load them separately and then > union them. > [info] at scala.Predef$.assert(Predef.scala:223) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:172) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:104) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:158) > [info] at > org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:73) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50) > [info] at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:167) > [info] at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:62) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:45) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) > [info] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:45) > [info] at > org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:40) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216) > [info] at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > [info] at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > [info] at scala.collection.immutable.List.foldLeft(List.scala:89) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205) > [info] at scala.collection.immutable.List.foreach(List.scala:392) > [info] at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205) > [info] at >
[jira] [Assigned] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34103: Assignee: Dongjoon Hyun > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34103. -- Fix Version/s: 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 31174 [https://github.com/apache/spark/pull/31174] > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0, 3.0.2 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed
[ https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264496#comment-17264496 ] Apache Spark commented on SPARK-33557: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31175 > spark.storage.blockManagerSlaveTimeoutMs default value does not follow > spark.network.timeout value when the latter was changed > -- > > Key: SPARK-33557 > URL: https://issues.apache.org/jira/browse/SPARK-33557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Ohad >Assignee: Yang Jie >Priority: Minor > Fix For: 3.1.0 > > > According to the documentation "spark.network.timeout" is the default timeout > for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the > user sets "spark.network.timeout" the effective value of > "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was > not specifically changed. > However this is not the case since the default value of > "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of > "spark.network.timeout" (120s) > > "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object > of "org.apache.spark.internal.config" as follows: > {code:java} > private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT = > ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs") > .version("0.7.0") > .timeConf(TimeUnit.MILLISECONDS) > .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString) > {code} > So it seems like the its default value is indeed "fixed" to > "spark.network.timeout" default value. > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed
[ https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264494#comment-17264494 ] Apache Spark commented on SPARK-33557: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31175 > spark.storage.blockManagerSlaveTimeoutMs default value does not follow > spark.network.timeout value when the latter was changed > -- > > Key: SPARK-33557 > URL: https://issues.apache.org/jira/browse/SPARK-33557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Ohad >Assignee: Yang Jie >Priority: Minor > Fix For: 3.1.0 > > > According to the documentation "spark.network.timeout" is the default timeout > for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the > user sets "spark.network.timeout" the effective value of > "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was > not specifically changed. > However this is not the case since the default value of > "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of > "spark.network.timeout" (120s) > > "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object > of "org.apache.spark.internal.config" as follows: > {code:java} > private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT = > ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs") > .version("0.7.0") > .timeConf(TimeUnit.MILLISECONDS) > .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString) > {code} > So it seems like the its default value is indeed "fixed" to > "spark.network.timeout" default value. > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34051) Support 32-bit unicode escape in string literals
[ https://issues.apache.org/jira/browse/SPARK-34051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-34051. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31096 [https://github.com/apache/spark/pull/31096] > Support 32-bit unicode escape in string literals > > > Key: SPARK-34051 > URL: https://issues.apache.org/jira/browse/SPARK-34051 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.2.0 > > > Currently, Spark supports 16-bit unicode escape like "\u0041" in string > literals. > I think It's nice if 32-bit unicode is also supported like PostgreSQL and > modern programming languages do (e.g, C++11, Rust). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34068) Remove redundant collection conversion in Spark code
[ https://issues.apache.org/jira/browse/SPARK-34068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-34068. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31125 [https://github.com/apache/spark/pull/31125] > Remove redundant collection conversion in Spark code > > > Key: SPARK-34068 > URL: https://issues.apache.org/jira/browse/SPARK-34068 > Project: Spark > Issue Type: Improvement > Components: GraphX, MLlib, Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0 > > > There are some redundant collection conversion can be removed, for version > compatibility, clean up these with Scala-2.13 profile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34068) Remove redundant collection conversion in Spark code
[ https://issues.apache.org/jira/browse/SPARK-34068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-34068: Assignee: Yang Jie > Remove redundant collection conversion in Spark code > > > Key: SPARK-34068 > URL: https://issues.apache.org/jira/browse/SPARK-34068 > Project: Spark > Issue Type: Improvement > Components: GraphX, MLlib, Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > There are some redundant collection conversion can be removed, for version > compatibility, clean up these with Scala-2.13 profile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34105) In addition to killing exlcuded/flakey executors which should support decommissioning
Holden Karau created SPARK-34105: Summary: In addition to killing exlcuded/flakey executors which should support decommissioning Key: SPARK-34105 URL: https://issues.apache.org/jira/browse/SPARK-34105 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: Holden Karau Decommissioning will give the executor a chance to migrate it's files to a more stable node. Note: we want SPARK-34104 to be integrated as well so that flaky executors which can not decommission are eventually killed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34104) Allow users to specify a maximum decommissioning time
Holden Karau created SPARK-34104: Summary: Allow users to specify a maximum decommissioning time Key: SPARK-34104 URL: https://issues.apache.org/jira/browse/SPARK-34104 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.0, 3.2.0, 3.1.1 Reporter: Holden Karau We currently have the ability for users to set the predicted time of the cluster manager or cloud provider to terminate a decommissioning executor, but for nodes where Spark it's self is triggering decommissioning we should add the ability of users to specify a maximum time we want to allow the executor to decommission. This is important especially if we start to in more places (like with excluded executors that are found to be flaky, that may or may not be able to decommission successfully). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34104) Allow users to specify a maximum decommissioning time
[ https://issues.apache.org/jira/browse/SPARK-34104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-34104: Assignee: Holden Karau > Allow users to specify a maximum decommissioning time > - > > Key: SPARK-34104 > URL: https://issues.apache.org/jira/browse/SPARK-34104 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0, 3.1.1 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > We currently have the ability for users to set the predicted time of the > cluster manager or cloud provider to terminate a decommissioning executor, > but for nodes where Spark it's self is triggering decommissioning we should > add the ability of users to specify a maximum time we want to allow the > executor to decommission. > > This is important especially if we start to in more places (like with > excluded executors that are found to be flaky, that may or may not be able to > decommission successfully). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34104) Allow users to specify a maximum decommissioning time
[ https://issues.apache.org/jira/browse/SPARK-34104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264488#comment-17264488 ] Holden Karau commented on SPARK-34104: -- I'm working on this. > Allow users to specify a maximum decommissioning time > - > > Key: SPARK-34104 > URL: https://issues.apache.org/jira/browse/SPARK-34104 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0, 3.1.1 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > We currently have the ability for users to set the predicted time of the > cluster manager or cloud provider to terminate a decommissioning executor, > but for nodes where Spark it's self is triggering decommissioning we should > add the ability of users to specify a maximum time we want to allow the > executor to decommission. > > This is important especially if we start to in more places (like with > excluded executors that are found to be flaky, that may or may not be able to > decommission successfully). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level
[ https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264468#comment-17264468 ] Dongjoon Hyun commented on SPARK-23431: --- Hi, [~Gengliang.Wang]. What is the fixed version of this JIRA? Could you set it please? > Expose the new executor memory metrics at the stage level > - > > Key: SPARK-23431 > URL: https://issues.apache.org/jira/browse/SPARK-23431 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Assignee: Terry Kim >Priority: Major > > Collect and show the new executor memory metrics for each stage, to provide > more information on how memory is used per stage. > Modify the AppStatusListener to track the peak values for JVM used memory, > execution memory, storage memory, and unified memory for each executor for > each stage. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34103: -- Comment: was deleted (was: User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174) > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264463#comment-17264463 ] Apache Spark commented on SPARK-23429: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174 > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edward Lu >Assignee: Edward Lu >Priority: Major > Fix For: 3.0.0 > > > Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, > onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the > executors REST API. This information will help provide insight into how > executor and driver JVM memory is used, and for the different memory regions. > It can be used to help determine good values for spark.executor.memory, > spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This > will track the memory usage at the executor level. The new ExecutorMetrics > will be sent by executors to the driver as part of the Heartbeat. A heartbeat > will be added for the driver as well, to collect these metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264462#comment-17264462 ] Apache Spark commented on SPARK-23429: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174 > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edward Lu >Assignee: Edward Lu >Priority: Major > Fix For: 3.0.0 > > > Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, > onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the > executors REST API. This information will help provide insight into how > executor and driver JVM memory is used, and for the different memory regions. > It can be used to help determine good values for spark.executor.memory, > spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This > will track the memory usage at the executor level. The new ExecutorMetrics > will be sent by executors to the driver as part of the Heartbeat. A heartbeat > will be added for the driver as well, to collect these metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264460#comment-17264460 ] Apache Spark commented on SPARK-23429: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174 > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edward Lu >Assignee: Edward Lu >Priority: Major > Fix For: 3.0.0 > > > Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, > onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the > executors REST API. This information will help provide insight into how > executor and driver JVM memory is used, and for the different memory regions. > It can be used to help determine good values for spark.executor.memory, > spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This > will track the memory usage at the executor level. The new ExecutorMetrics > will be sent by executors to the driver as part of the Heartbeat. A heartbeat > will be added for the driver as well, to collect these metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264461#comment-17264461 ] Apache Spark commented on SPARK-23429: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174 > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edward Lu >Assignee: Edward Lu >Priority: Major > Fix For: 3.0.0 > > > Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, > onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the > executors REST API. This information will help provide insight into how > executor and driver JVM memory is used, and for the different memory regions. > It can be used to help determine good values for spark.executor.memory, > spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This > will track the memory usage at the executor level. The new ExecutorMetrics > will be sent by executors to the driver as part of the Heartbeat. A heartbeat > will be added for the driver as well, to collect these metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34103: -- Affects Version/s: 3.1.0 3.0.0 3.0.1 > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264459#comment-17264459 ] Apache Spark commented on SPARK-34103: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174 > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34103: Assignee: (was: Apache Spark) > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264458#comment-17264458 ] Apache Spark commented on SPARK-34103: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31174 > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34103: Assignee: Apache Spark > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
Dongjoon Hyun created SPARK-34103: - Summary: Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x Key: SPARK-34103 URL: https://issues.apache.org/jira/browse/SPARK-34103 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.2.0, 3.1.1 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-23429: - Assignee: Edward Lu > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edward Lu >Assignee: Edward Lu >Priority: Major > Fix For: 3.0.0 > > > Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, > onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the > executors REST API. This information will help provide insight into how > executor and driver JVM memory is used, and for the different memory regions. > It can be used to help determine good values for spark.executor.memory, > spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This > will track the memory usage at the executor level. The new ExecutorMetrics > will be sent by executors to the driver as part of the Heartbeat. A heartbeat > will be added for the driver as well, to collect these metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34102) Spark SQL cannot escape both \ and other special characters
Noah Kawasaki created SPARK-34102: - Summary: Spark SQL cannot escape both \ and other special characters Key: SPARK-34102 URL: https://issues.apache.org/jira/browse/SPARK-34102 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 2.4.5, 2.3.0, 2.2.2, 2.1.3, 2.0.2 Reporter: Noah Kawasaki Spark literal string parsing does not properly escape backslashes or other special characters. This is an extension of this issue: https://issues.apache.org/jira/browse/SPARK-17647# The issue is that depending on how spark.sql.parser.escapedStringLiterals is set, you will either be able to correctly get escaped backslashes in a string literal, but not escaped other special characters, OR, you can have correctly escaped other special characters, but not correctly escaped backslashes. So you have to choose which configuration you care about more. I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all experience the issue: {code:java} # These do not return the expected backslash SET spark.sql.parser.escapedStringLiterals=false; SELECT '\\'; > \ (should return \\) SELECT 'hi\hi'; > hihi (should return hi\hi) # These are correctly escaped SELECT '\"'; > " SELECT '\''; > '{code} If I switch this: {code:java} # These now work SET spark.sql.parser.escapedStringLiterals=true; SELECT '\\'; > \\ SELECT 'hi\hi'; > hi\hi # These are now not correctly escaped SELECT '\"'; > \" (should return ") SELECT '\''; > \' (should return ' ){code} So basically we have to choose: SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes correctly escaped but not other special characters SET spark.sql.parser.escapedStringLiterals=true; if we want other special characters correctly escaped but not backslashes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264452#comment-17264452 ] Noah Kawasaki commented on SPARK-17647: --- I can also confirm that this issue is not fully resolved. Like what [~swiegleb] has shown, escape characters are not fully supported. I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all experience the issue: {code:java} # These do not return the expected backslash SET spark.sql.parser.escapedStringLiterals=false; SELECT '\\'; > \ (should return \\) SELECT 'hi\hi'; > hihi (should return hi\hi) # These are correctly escaped SELECT '\"'; > " SELECT '\''; > '{code} If I switch this: {code:java} # These now work SET spark.sql.parser.escapedStringLiterals=true; SELECT '\\'; > \\ SELECT 'hi\hi'; > hi\hi # These are now not correctly escaped SELECT '\"'; > \" (should return ") SELECT '\''; > \' (should return ' ){code} So basically we have to choose: SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes correctly escaped but not other special characters SET spark.sql.parser.escapedStringLiterals=true; if we want other special characters correctly escaped but not backslashes > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Major > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
[ https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264451#comment-17264451 ] Apache Spark commented on SPARK-34101: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31173 > Make spark-sql CLI configurable for the behavior of printing header by SET > command > -- > > Key: SPARK-34101 > URL: https://issues.apache.org/jira/browse/SPARK-34101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we > can change the behavior of printing header. > But spark-sql CLI doesn't allow users to change Hive specific configurations > dynamically by SET command. > So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
[ https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34101: Assignee: Kousuke Saruta (was: Apache Spark) > Make spark-sql CLI configurable for the behavior of printing header by SET > command > -- > > Key: SPARK-34101 > URL: https://issues.apache.org/jira/browse/SPARK-34101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we > can change the behavior of printing header. > But spark-sql CLI doesn't allow users to change Hive specific configurations > dynamically by SET command. > So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
[ https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34101: Assignee: Apache Spark (was: Kousuke Saruta) > Make spark-sql CLI configurable for the behavior of printing header by SET > command > -- > > Key: SPARK-34101 > URL: https://issues.apache.org/jira/browse/SPARK-34101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we > can change the behavior of printing header. > But spark-sql CLI doesn't allow users to change Hive specific configurations > dynamically by SET command. > So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
[ https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264450#comment-17264450 ] Apache Spark commented on SPARK-34101: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31173 > Make spark-sql CLI configurable for the behavior of printing header by SET > command > -- > > Key: SPARK-34101 > URL: https://issues.apache.org/jira/browse/SPARK-34101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we > can change the behavior of printing header. > But spark-sql CLI doesn't allow users to change Hive specific configurations > dynamically by SET command. > So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
[ https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34101: --- Description: Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we can change the behavior of printing header. But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command. So, it's better to support the way to change the behavior by SET command. was: Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can change the behavior of printing header. But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command. So, it's better to support the way to change the behavior by SET command. > Make spark-sql CLI configurable for the behavior of printing header by SET > command > -- > > Key: SPARK-34101 > URL: https://issues.apache.org/jira/browse/SPARK-34101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we > can change the behavior of printing header. > But spark-sql CLI doesn't allow users to change Hive specific configurations > dynamically by SET command. > So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
[ https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34101: --- Description: Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can change the behavior of printing header. But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command. So, it's better to support the way to change the behavior by SET command. was: Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can change the behavior of printing header. But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command. So, it's better to support the way to change the behavior by SET command. > Make spark-sql CLI configurable for the behavior of printing header by SET > command > -- > > Key: SPARK-34101 > URL: https://issues.apache.org/jira/browse/SPARK-34101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can > change the behavior of printing header. > But spark-sql CLI doesn't allow users to change Hive specific configurations > dynamically by SET command. > So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command
Kousuke Saruta created SPARK-34101: -- Summary: Make spark-sql CLI configurable for the behavior of printing header by SET command Key: SPARK-34101 URL: https://issues.apache.org/jira/browse/SPARK-34101 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can change the behavior of printing header. But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command. So, it's better to support the way to change the behavior by SET command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19169) columns changed orc table encouter 'IndexOutOfBoundsException' when read the old schema files
[ https://issues.apache.org/jira/browse/SPARK-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264325#comment-17264325 ] Dongjoon Hyun commented on SPARK-19169: --- [~angerszhuuu]. Given the context, this looks like one of the ancient issue about the code between Hive and ORC. Please use the `convertMetastoreOrc` option as an workaround if you still see the issue with Apache Spark 2.3.2. I added native ORC reader to the Spark to avoid that kind of Hive ORC issue. BTW, both Apache Spark 2.3.2 and its Apache ORC 1.4.4 are EOL versions. I'd like to recommend to upgrade to the latest versions. If there is a real issue, it would be great if we can have a reproducible examples with Apache Spark 3.1.0 RC1. > columns changed orc table encouter 'IndexOutOfBoundsException' when read the > old schema files > - > > Key: SPARK-19169 > URL: https://issues.apache.org/jira/browse/SPARK-19169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: roncenzhao >Priority: Major > > We hava an orc table called orc_test_tbl and hava inserted some data into it. > After that, we change the table schema by droping some columns. > When reading the old schema file, we get the follow exception. > ``` > java.lang.IndexOutOfBoundsException: toIndex = 65 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:962) > at java.util.ArrayList.subList(ArrayList.java:954) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.getSchemaOnRead(RecordReaderFactory.java:161) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.createTreeReader(RecordReaderFactory.java:66) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:202) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1215) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:245) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
[ https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264309#comment-17264309 ] Devin Boyer edited comment on SPARK-34100 at 1/13/21, 5:43 PM: --- Noting that I found a workaround here: it appears that this is due to [an issue with the version of the setuptools|https://stackoverflow.com/a/55167875/316079] package bundled into the Python distribution with Amazon Linux 2, and the "wheel" library not being installed. If this command is run on an Amazon Linux 2 installation with Python 3.7 installed, then pyspark 2.4.x package installation succeeds: {{pip3 install --upgrade --force-reinstall setuptools && pip3 install wheel}} I noticed this doesn't happen with 3.0.x package versions, so maybe there's a difference in how the package is distributed between 2.4 and 3.x? was (Author: drboyer): Noting that I found a workaround here: it appears that this is due to an issue with the version of the setuptools package bundled into the Python distribution with Amazon Linux 2, and the "wheel" library not being installed. If this command is run on an Amazon Linux 2 installation with Python 3.7 installed, then pyspark 2.4.x package installation succeeds: {{pip3 install --upgrade --force-reinstall setuptools && pip3 install wheel}} I noticed this doesn't happen with 3.0.x package versions, so maybe there's a difference in how the package is distributed between 2.4 and 3.x? > pyspark 2.4 packages can't be installed via pip on Amazon Linux 2 > - > > Key: SPARK-34100 > URL: https://issues.apache.org/jira/browse/SPARK-34100 > Project: Spark > Issue Type: Bug > Components: Deploy, PySpark >Affects Versions: 2.4.7 > Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also > tested with pip 20.3.3), using Docker or EMR 5.32.0 > > Example Dockerfile to reproduce: > {{FROM amazonlinux:2}} > {{RUN yum install -y python3}} > {{RUN pip3 install pyspark==2.4.7}} > >Reporter: Devin Boyer >Priority: Minor > > I'm unable to install the pyspark Python package on Amazon Linux 2, whether > in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with > Python 3.7 and pip 9.0.3, but upgrading pip yields the same result. > > When installing the package, the installation will fail with the error > "ValueError: bad marshal data (unknown type code)". Full example stack below. > > This bug prevents use of pyspark for simple testing environments, and from > using tools where the pyspark package is a dependency, like > [https://github.com/awslabs/python-deequ.] > > Stack Trace: > {{Step 3/3 : RUN pip3 install pyspark==2.4.7}} > {{ ---> Running in 2c6e1c1de62f}} > {{WARNING: Running pip install with root privileges is generally not a good > idea. Try `pip3 install --user` instead.}} > {{Collecting pyspark==2.4.7}} > {{ Downloading > https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz > (217.9MB)}} > {{ Complete output from command python setup.py egg_info:}} > {{ Could not import pypandoc - required to package PySpark}} > {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown > distribution option: 'long_description_content_type'}} > {{ warnings.warn(msg)}} > {{ zip_safe flag not set; analyzing archive contents...}} > {{ Traceback (most recent call last):}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, > in save_modules}} > {{ yield saved}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, > in run_setup}} > {{ _execfile(setup_script, ns)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in > _execfile}} > {{ exec(code, globals, locals)}} > {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in > }} > {{ # using Python imports instead which will be resolved correctly.}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, > in setup}} > {{ return distutils.core.setup(**attrs)}} > {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} > {{ dist.run_commands()}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} > {{ self.run_command(cmd)}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} > {{ cmd_obj.run()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 218, in run}} > {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 269, in zip_safe}}
[jira] [Commented] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
[ https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264309#comment-17264309 ] Devin Boyer commented on SPARK-34100: - Noting that I found a workaround here: it appears that this is due to an issue with the version of the setuptools package bundled into the Python distribution with Amazon Linux 2, and the "wheel" library not being installed. If this command is run on an Amazon Linux 2 installation with Python 3.7 installed, then pyspark 2.4.x package installation succeeds: {{pip3 install --upgrade --force-reinstall setuptools && pip3 install wheel}} I noticed this doesn't happen with 3.0.x package versions, so maybe there's a difference in how the package is distributed between 2.4 and 3.x? > pyspark 2.4 packages can't be installed via pip on Amazon Linux 2 > - > > Key: SPARK-34100 > URL: https://issues.apache.org/jira/browse/SPARK-34100 > Project: Spark > Issue Type: Bug > Components: Deploy, PySpark >Affects Versions: 2.4.7 > Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also > tested with pip 20.3.3), using Docker or EMR 5.32.0 > > Example Dockerfile to reproduce: > {{FROM amazonlinux:2}} > {{RUN yum install -y python3}} > {{RUN pip3 install pyspark==2.4.7}} > >Reporter: Devin Boyer >Priority: Minor > > I'm unable to install the pyspark Python package on Amazon Linux 2, whether > in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with > Python 3.7 and pip 9.0.3, but upgrading pip yields the same result. > > When installing the package, the installation will fail with the error > "ValueError: bad marshal data (unknown type code)". Full example stack below. > > This bug prevents use of pyspark for simple testing environments, and from > using tools where the pyspark package is a dependency, like > [https://github.com/awslabs/python-deequ.] > > Stack Trace: > {{Step 3/3 : RUN pip3 install pyspark==2.4.7}} > {{ ---> Running in 2c6e1c1de62f}} > {{WARNING: Running pip install with root privileges is generally not a good > idea. Try `pip3 install --user` instead.}} > {{Collecting pyspark==2.4.7}} > {{ Downloading > https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz > (217.9MB)}} > {{ Complete output from command python setup.py egg_info:}} > {{ Could not import pypandoc - required to package PySpark}} > {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown > distribution option: 'long_description_content_type'}} > {{ warnings.warn(msg)}} > {{ zip_safe flag not set; analyzing archive contents...}} > {{ Traceback (most recent call last):}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, > in save_modules}} > {{ yield saved}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, > in run_setup}} > {{ _execfile(setup_script, ns)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in > _execfile}} > {{ exec(code, globals, locals)}} > {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in > }} > {{ # using Python imports instead which will be resolved correctly.}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, > in setup}} > {{ return distutils.core.setup(**attrs)}} > {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} > {{ dist.run_commands()}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} > {{ self.run_command(cmd)}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} > {{ cmd_obj.run()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 218, in run}} > {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 269, in zip_safe}} > {{ return analyze_egg(self.bdist_dir, self.stubs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 379, in analyze_egg}} > {{ safe = scan_module(egg_dir, base, name, stubs) and safe}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 416, in scan_module}} > {{ code = marshal.load(f)}} > {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the > above exception, another exception occurred:}}{{Traceback (most recent call > last):}} > {{ File "", line 1, in }} > {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }} > {{ 'Programming Language :: Python :: Implementation :: PyPy']}} > {{ File
[jira] [Commented] (SPARK-32333) Drop references to Master
[ https://issues.apache.org/jira/browse/SPARK-32333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264303#comment-17264303 ] Neil Shah-Quinn commented on SPARK-32333: - I'm glad there's a plan to improve this language! For what it's worth, I like "Scheduler" or "Coordinator". They're short and accurately reflect that (as I understand it) its role is simply to assign executors which then communicate directly with the driver program. > Drop references to Master > - > > Key: SPARK-32333 > URL: https://issues.apache.org/jira/browse/SPARK-32333 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > We have a lot of references to "master" in the code base. It will be > beneficial to remove references to problematic language that can alienate > potential community members. > SPARK-32004 removed references to slave > > Here is a IETF draft to fix up some of the most egregious examples > (master/slave, whitelist/backlist) with proposed alternatives. > https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
Devin Boyer created SPARK-34100: --- Summary: pyspark 2.4 packages can't be installed via pip on Amazon Linux 2 Key: SPARK-34100 URL: https://issues.apache.org/jira/browse/SPARK-34100 Project: Spark Issue Type: Bug Components: Deploy, PySpark Affects Versions: 2.4.7 Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also tested with pip 20.3.3), using Docker or EMR 5.32.0 Example Dockerfile to reproduce: {{FROM amazonlinux:2}} {{RUN yum install -y python3}} {{RUN pip3 install pyspark==2.4.7}} Reporter: Devin Boyer I'm unable to install the pyspark Python package on Amazon Linux 2, whether in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with Python 3.7 and pip 9.0.3, but upgrading pip yields the same result. When installing the package, the installation will fail with the error "ValueError: bad marshal data (unknown type code)". Full example stack below. This bug prevents use of pyspark for simple testing environments, and from using tools where the pyspark package is a dependency, like [https://github.com/awslabs/python-deequ.] Stack Trace: {{Step 3/3 : RUN pip3 install pyspark==2.4.7}} {{ ---> Running in 2c6e1c1de62f}} {{WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.}} {{Collecting pyspark==2.4.7}} {{ Downloading https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz (217.9MB)}} {{ Complete output from command python setup.py egg_info:}} {{ Could not import pypandoc - required to package PySpark}} {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'long_description_content_type'}} {{ warnings.warn(msg)}} {{ zip_safe flag not set; analyzing archive contents...}} {{ Traceback (most recent call last):}} {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, in save_modules}} {{ yield saved}} {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, in setup_context}} {{ yield}} {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, in run_setup}} {{ _execfile(setup_script, ns)}} {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in _execfile}} {{ exec(code, globals, locals)}} {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in }} {{ # using Python imports instead which will be resolved correctly.}} {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, in setup}} {{ return distutils.core.setup(**attrs)}} {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} {{ dist.run_commands()}} {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} {{ self.run_command(cmd)}} {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} {{ cmd_obj.run()}} {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 218, in run}} {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 269, in zip_safe}} {{ return analyze_egg(self.bdist_dir, self.stubs)}} {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 379, in analyze_egg}} {{ safe = scan_module(egg_dir, base, name, stubs) and safe}} {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 416, in scan_module}} {{ code = marshal.load(f)}} {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the above exception, another exception occurred:}}{{Traceback (most recent call last):}} {{ File "", line 1, in }} {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }} {{ 'Programming Language :: Python :: Implementation :: PyPy']}} {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, in setup}} {{ _install_setup_requires(attrs)}} {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, in _install_setup_requires}} {{ dist.fetch_build_eggs(dist.setup_requires)}} {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in fetch_build_eggs}} {{ replace_conflicting=True,}} {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 866, in resolve}} {{ replace_conflicting=replace_conflicting}} {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1146, in best_match}} {{ return self.obtain(req, installer)}} {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1158, in obtain}} {{ return installer(requirement)}} {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 528, in fetch_build_egg}} {{ return cmd.easy_install(req)}} {{ File
[jira] [Assigned] (SPARK-34070) Replaces find and emptiness check with exists.
[ https://issues.apache.org/jira/browse/SPARK-34070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-34070: Assignee: Yang Jie > Replaces find and emptiness check with exists. > -- > > Key: SPARK-34070 > URL: https://issues.apache.org/jira/browse/SPARK-34070 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > > Semantic consistency and code Simpilefications > before: > {code:java} > seq.find(p).isDefined > or > seq.find(p).isEmpty > {code} > after: > {code:java} > seq.exists(p) > or > !seq.exists(p) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34070) Replaces find and emptiness check with exists.
[ https://issues.apache.org/jira/browse/SPARK-34070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-34070. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31130 [https://github.com/apache/spark/pull/31130] > Replaces find and emptiness check with exists. > -- > > Key: SPARK-34070 > URL: https://issues.apache.org/jira/browse/SPARK-34070 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.2.0 > > > Semantic consistency and code Simpilefications > before: > {code:java} > seq.find(p).isDefined > or > seq.find(p).isEmpty > {code} > after: > {code:java} > seq.exists(p) > or > !seq.exists(p) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34070) Replaces find and emptiness check with exists.
[ https://issues.apache.org/jira/browse/SPARK-34070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-34070: - Priority: Trivial (was: Minor) > Replaces find and emptiness check with exists. > -- > > Key: SPARK-34070 > URL: https://issues.apache.org/jira/browse/SPARK-34070 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > > Semantic consistency and code Simpilefications > before: > {code:java} > seq.find(p).isDefined > or > seq.find(p).isEmpty > {code} > after: > {code:java} > seq.exists(p) > or > !seq.exists(p) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org