date:20210113

[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3

2021-01-13 Thread Shashank Pedamallu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashank Pedamallu updated SPARK-34107:
---
Attachment: SHS_Profiling_Sorted.csv

> Spark History not loading when service has to load 300k applications 
> initially from S3
> --
>
> Key: SPARK-34107
> URL: https://issues.apache.org/jira/browse/SPARK-34107
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shashank Pedamallu
>Priority: Major
> Attachments: SHS_Profiling_Sorted.csv, blank_shs.png
>
>
> Spark History Service is having trouble loading when loading initially with 
> 300k+ applications from S3. Following are the details and snapshots:
> Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
> {noformat}
> spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
> | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
>   305571
> spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
> {noformat}
> Logs when starting SparkHistory:
> {noformat}
> root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
>  
> /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh
>  --properties-file /etc/spark-history-config/shs-default.properties
>  2021/01/14 02:40:28 Spark spark wrapper is disabled
>  2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0
>  2021/01/14 02:40:28 Statsd disabled
>  2021/01/14 02:40:28 Debug log: /tmp/.log
>  2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 
> Mozart 0
>  2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with 
> arguments [org.apache.spark.deploy.history.HistoryServer --properties-file 
> /etc/spark-history-config/shs-default.properties]
>  21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 
> 2077@shs-with-statsd-86d7f54679-t8fqr
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT
>  21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>  21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root
>  21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root
>  21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to:
>  21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to:
>  21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root); groups 
> with view permissions: Set(); users with modify permissions: Set(root); 
> groups with modify permissions: Set()
>  21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; 
> users with admin permissions: ; groups with admin permissions
>  21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
>  21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period 
> at 10 second(s).
>  21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system 
> started
>  21/01/14 02:40:31 INFO log: Logging initialized @1933ms to 
> org.sparkproject.jetty.util.log.Slf4jLog
>  21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 
> 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 
> 1.8.0_242-b08
>  21/01/14 02:40:31 INFO Server: Started @1999ms
>  21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f
> {HTTP/1.1,[http/1.1]} {0.0.0.0:18080}
> 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080.
>  21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@b9dfc5a
> {/,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@1bbae752
> {/json,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@5cf87cfd
> {/api,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@74971ed9
> {/static,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@1542af63
> {/history,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and 
> started at http://shs-with-statsd-86d7f54679-t8fqr:18080
>  21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 
> seconds
>  21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning 
>

[jira] [Commented] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3

2021-01-13 Thread Shashank Pedamallu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264672#comment-17264672
 ] 

Shashank Pedamallu commented on SPARK-34107:


Screenshot of the spark history:

!blank_shs.png!

 

 

Also, please find attached the dynamic tracing analysis (using 
[btrace|[http://example.com|https://github.com/btraceio/btrace]][^SHS_Profiling_Sorted.csv]

> Spark History not loading when service has to load 300k applications 
> initially from S3
> --
>
> Key: SPARK-34107
> URL: https://issues.apache.org/jira/browse/SPARK-34107
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shashank Pedamallu
>Priority: Major
> Attachments: SHS_Profiling_Sorted.csv, blank_shs.png
>
>
> Spark History Service is having trouble loading when loading initially with 
> 300k+ applications from S3. Following are the details and snapshots:
> Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
> {noformat}
> spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
> | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
>   305571
> spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
> {noformat}
> Logs when starting SparkHistory:
> {noformat}
> root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
>  
> /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh
>  --properties-file /etc/spark-history-config/shs-default.properties
>  2021/01/14 02:40:28 Spark spark wrapper is disabled
>  2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0
>  2021/01/14 02:40:28 Statsd disabled
>  2021/01/14 02:40:28 Debug log: /tmp/.log
>  2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 
> Mozart 0
>  2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with 
> arguments [org.apache.spark.deploy.history.HistoryServer --properties-file 
> /etc/spark-history-config/shs-default.properties]
>  21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 
> 2077@shs-with-statsd-86d7f54679-t8fqr
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT
>  21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>  21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root
>  21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root
>  21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to:
>  21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to:
>  21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root); groups 
> with view permissions: Set(); users with modify permissions: Set(root); 
> groups with modify permissions: Set()
>  21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; 
> users with admin permissions: ; groups with admin permissions
>  21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
>  21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period 
> at 10 second(s).
>  21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system 
> started
>  21/01/14 02:40:31 INFO log: Logging initialized @1933ms to 
> org.sparkproject.jetty.util.log.Slf4jLog
>  21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 
> 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 
> 1.8.0_242-b08
>  21/01/14 02:40:31 INFO Server: Started @1999ms
>  21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f
> {HTTP/1.1,[http/1.1]} {0.0.0.0:18080}
> 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080.
>  21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@b9dfc5a
> {/,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@1bbae752
> {/json,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@5cf87cfd
> {/api,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@74971ed9
> {/static,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@1542af63
> {/history,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0,

[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3

2021-01-13 Thread Shashank Pedamallu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashank Pedamallu updated SPARK-34107:
---
Attachment: blank_shs.png

> Spark History not loading when service has to load 300k applications 
> initially from S3
> --
>
> Key: SPARK-34107
> URL: https://issues.apache.org/jira/browse/SPARK-34107
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shashank Pedamallu
>Priority: Major
> Attachments: blank_shs.png
>
>
> Spark History Service is having trouble loading when loading initially with 
> 300k+ applications from S3. Following are the details and snapshots:
> Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
> {noformat}
> spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
> | => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
>   305571
> spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
> {noformat}
> Logs when starting SparkHistory:
> {noformat}
> root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
>  
> /go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh
>  --properties-file /etc/spark-history-config/shs-default.properties
>  2021/01/14 02:40:28 Spark spark wrapper is disabled
>  2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0
>  2021/01/14 02:40:28 Statsd disabled
>  2021/01/14 02:40:28 Debug log: /tmp/.log
>  2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 
> Mozart 0
>  2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with 
> arguments [org.apache.spark.deploy.history.HistoryServer --properties-file 
> /etc/spark-history-config/shs-default.properties]
>  21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 
> 2077@shs-with-statsd-86d7f54679-t8fqr
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP
>  21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT
>  21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>  21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root
>  21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root
>  21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to:
>  21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to:
>  21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root); groups 
> with view permissions: Set(); users with modify permissions: Set(root); 
> groups with modify permissions: Set()
>  21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; 
> users with admin permissions: ; groups with admin permissions
>  21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
>  21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period 
> at 10 second(s).
>  21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system 
> started
>  21/01/14 02:40:31 INFO log: Logging initialized @1933ms to 
> org.sparkproject.jetty.util.log.Slf4jLog
>  21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 
> 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 
> 1.8.0_242-b08
>  21/01/14 02:40:31 INFO Server: Started @1999ms
>  21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f
> {HTTP/1.1,[http/1.1]} {0.0.0.0:18080}
> 21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080.
>  21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@b9dfc5a
> {/,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@1bbae752
> {/json,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@5cf87cfd
> {/api,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@74971ed9
> {/static,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO ContextHandler: Started 
> o.s.j.s.ServletContextHandler@1542af63
> {/history,null,AVAILABLE,@Spark}
> 21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and 
> started at http://shs-with-statsd-86d7f54679-t8fqr:18080
>  21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 
> seconds
>  21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning 
>

[jira] [Assigned] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34096:


Assignee: Apache Spark

> Improve performance for nth_value ignore nulls over offset window
> -
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> The current 
> {code:java}
> UnboundedOffsetWindowFunctionFrame
> {code}
> and
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34096:


Assignee: (was: Apache Spark)

> Improve performance for nth_value ignore nulls over offset window
> -
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current 
> {code:java}
> UnboundedOffsetWindowFunctionFrame
> {code}
> and
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264649#comment-17264649
 ] 

Apache Spark commented on SPARK-34096:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/31178

> Improve performance for nth_value ignore nulls over offset window
> -
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current 
> {code:java}
> UnboundedOffsetWindowFunctionFrame
> {code}
> and
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264639#comment-17264639
 ] 

Apache Spark commented on SPARK-34110:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31177

> Upgrade ZooKeeper to 3.6.2
> --
>
> Key: SPARK-34110
> URL: https://issues.apache.org/jira/browse/SPARK-34110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> When running Spark on JDK 14:
> {noformat}
> 21/01/13 20:25:32,533 WARN 
> [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
> zookeeper.ClientCnxn:1164 : Session 0x0 for server 
> apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address 
> carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
> resolvable
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> {noformat}
> Please see ZOOKEEPER-3779 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34110:


Assignee: (was: Apache Spark)

> Upgrade ZooKeeper to 3.6.2
> --
>
> Key: SPARK-34110
> URL: https://issues.apache.org/jira/browse/SPARK-34110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> When running Spark on JDK 14:
> {noformat}
> 21/01/13 20:25:32,533 WARN 
> [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
> zookeeper.ClientCnxn:1164 : Session 0x0 for server 
> apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address 
> carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
> resolvable
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> {noformat}
> Please see ZOOKEEPER-3779 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34110:


Assignee: Apache Spark

> Upgrade ZooKeeper to 3.6.2
> --
>
> Key: SPARK-34110
> URL: https://issues.apache.org/jira/browse/SPARK-34110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> When running Spark on JDK 14:
> {noformat}
> 21/01/13 20:25:32,533 WARN 
> [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
> zookeeper.ClientCnxn:1164 : Session 0x0 for server 
> apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address 
> carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
> resolvable
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> {noformat}
> Please see ZOOKEEPER-3779 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls over offset window

2021-01-13 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-34096:
---
Summary: Improve performance for nth_value ignore nulls over offset window  
(was: Improve performance for nth_value ignore nulls with offset window)

> Improve performance for nth_value ignore nulls over offset window
> -
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current 
> {code:java}
> UnboundedOffsetWindowFunctionFrame
> {code}
> and
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls with offset window

2021-01-13 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-34096:
---
Summary: Improve performance for nth_value ignore nulls with offset window  
(was: Improve performance for nth_value ignore nulls)

> Improve performance for nth_value ignore nulls with offset window
> -
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current 
> {code:java}
> UnboundedOffsetWindowFunctionFrame
> {code}
> and
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34112) Upgrade ORC

2021-01-13 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-34112:
-

 Summary: Upgrade ORC
 Key: SPARK-34112
 URL: https://issues.apache.org/jira/browse/SPARK-34112
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun


Apache ORC doesn't support Java 14 yet. We need to upgrade it when it's ready.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264631#comment-17264631
 ] 

Hyukjin Kwon commented on SPARK-34111:
--

I just marked it as a blocker because the duplicated jars might cause an issue 
that's hard to debug. However, I am fine with lowering the priority, 
[~dongjoon]. I will leave it to you.

> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264630#comment-17264630
 ] 

Dongjoon Hyun commented on SPARK-34110:
---

Thank you, [~yumwang]!

> Upgrade ZooKeeper to 3.6.2
> --
>
> Key: SPARK-34110
> URL: https://issues.apache.org/jira/browse/SPARK-34110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> When running Spark on JDK 14:
> {noformat}
> 21/01/13 20:25:32,533 WARN 
> [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
> zookeeper.ClientCnxn:1164 : Session 0x0 for server 
> apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address 
> carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
> resolvable
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> {noformat}
> Please see ZOOKEEPER-3779 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-13 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264629#comment-17264629
 ] 

Kent Yao commented on SPARK-34111:
--

thanks [~hyukjin.kwon] for pinging me. 

> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-13 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264628#comment-17264628
 ] 

Dongjoon Hyun commented on SPARK-34111:
---

Oh, is this a blocker?

> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level

2021-01-13 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264626#comment-17264626
 ] 

Dongjoon Hyun commented on SPARK-23431:
---

Thank you!

> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Edward Lu
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.1.0
>
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264621#comment-17264621
 ] 

Yuming Wang commented on SPARK-34110:
-

Another issue is:

{noformat}
21/01/13 22:49:56,890 ERROR [Driver] server.HiveServer2:186 : Unable to create 
a znode for this server instance
java.lang.Exception: Max znode creation wait time: 120s exhausted
at 
org.apache.hive.service.server.HiveServer2.addServerInstanceToZooKeeper(HiveServer2.java:183)
at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:128)
at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:230)
at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:159)
at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732)
{noformat}


> Upgrade ZooKeeper to 3.6.2
> --
>
> Key: SPARK-34110
> URL: https://issues.apache.org/jira/browse/SPARK-34110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> When running Spark on JDK 14:
> {noformat}
> 21/01/13 20:25:32,533 WARN 
> [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
> zookeeper.ClientCnxn:1164 : Session 0x0 for server 
> apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address 
> carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
> resolvable
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> {noformat}
> Please see ZOOKEEPER-3779 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264620#comment-17264620
 ] 

Hyukjin Kwon commented on SPARK-34111:
--

cc [~Qin Yao] FYI

> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34111:
-
Description: 
After SPARK-33705, we now happened to have two jars in the release artifact 
with Hadoop 3:

{{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:

{code}
...
jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
...
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
...
{code}

It can potentially cause an issue, and we should better remove 
{{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
tests.

  was:
After SPARK-33705, we now happened to have two jars in the release artifact 
with Hadoop 3:

{{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:

{code}
...
jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
...
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
...
{code}

It can potentially cause an issue, and we should better remove 
{{javax.servlet-api-3.1.0.jar
}} which is apparently only required for YARN tests.


> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-13 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-34111:


 Summary: Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
javax.servlet-api-3.1.0.jar
 Key: SPARK-34111
 URL: https://issues.apache.org/jira/browse/SPARK-34111
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


After SPARK-33705, we now happened to have two jars in the release artifact 
with Hadoop 3:

{{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:

{code}
...
jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
...
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
...
{code}

It can potentially cause an issue, and we should better remove 
{{javax.servlet-api-3.1.0.jar
}} which is apparently only required for YARN tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34110:

Description: 
When running Spark on JDK 14:
{noformat}
21/01/13 20:25:32,533 WARN 
[Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
zookeeper.ClientCnxn:1164 : Session 0x0 for server 
apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
socket connection and attempting reconnect
java.lang.IllegalArgumentException: Unable to canonicalize address 
carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
resolvable
at 
org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
at 
org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
{noformat}

Please see ZOOKEEPER-3779 for more details.


  was:
When running Spark on JDK 14:
{noformat}
21/01/13 20:25:32,533 WARN 
[Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
zookeeper.ClientCnxn:1164 : Session 0x0 for server 
apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
socket connection and attempting reconnect
java.lang.IllegalArgumentException: Unable to canonicalize address 
carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
resolvable
at 
org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
at 
org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
{noformat}




> Upgrade ZooKeeper to 3.6.2
> --
>
> Key: SPARK-34110
> URL: https://issues.apache.org/jira/browse/SPARK-34110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> When running Spark on JDK 14:
> {noformat}
> 21/01/13 20:25:32,533 WARN 
> [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
> zookeeper.ClientCnxn:1164 : Session 0x0 for server 
> apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
> socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address 
> carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
> resolvable
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
>   at 
> org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> {noformat}
> Please see ZOOKEEPER-3779 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-34106.
--
Resolution: Duplicate

> Hide FValueTest and AnovaTest
> -
>
> Key: SPARK-34106
> URL: https://issues.apache.org/jira/browse/SPARK-34106
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> hide the added test classes for now.
> they are not very practical for big data. If there are valid use cases, we 
> should see more requests from the community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34110) Upgrade ZooKeeper to 3.6.2

2021-01-13 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-34110:
---

 Summary: Upgrade ZooKeeper to 3.6.2
 Key: SPARK-34110
 URL: https://issues.apache.org/jira/browse/SPARK-34110
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.2.0
Reporter: Yuming Wang


When running Spark on JDK 14:
{noformat}
21/01/13 20:25:32,533 WARN 
[Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] 
zookeeper.ClientCnxn:1164 : Session 0x0 for server 
apache-spark-zk-3.vip.hadoop.com/:2181, unexpected error, closing 
socket connection and attempting reconnect
java.lang.IllegalArgumentException: Unable to canonicalize address 
carmel-rno-zk-3.vip.hadoop.ebay.com/:2181 because it's not 
resolvable
at 
org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
at 
org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2021-01-13 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264608#comment-17264608
 ] 

Chao Sun edited comment on SPARK-33507 at 1/14/21, 5:23 AM:


Thanks [~hyukjin.kwon]. From my side, there is no regression. Although I feel 
SPARK-34052 is a bit important since it concerns correctness. I'n working on a 
fix but got delayed by a few other issues found along the way :(.

The issue has been there for a long time though so I'm fine moving this to the 
next release.
 

 


was (Author: csun):
Thanks [~hyukjin.kwon]. From my side, there is no regression. Although I feel 
SPARK-34052 is a bit important since it concerns correctness. I'n working on a 
fix but got delayed by a few other issues found during the process :(

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Critical
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix existing cache behavior in v1 and v2.
>   - fix inconsistent cache behavior between v1 and v2
>   - implement missing features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2021-01-13 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264608#comment-17264608
 ] 

Chao Sun commented on SPARK-33507:
--

Thanks [~hyukjin.kwon]. From my side, there is no regression. Although I feel 
SPARK-34052 is a bit important since it concerns correctness. I'n working on a 
fix but got delayed by a few other issues found during the process :(

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Critical
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix existing cache behavior in v1 and v2.
>   - fix inconsistent cache behavior between v1 and v2
>   - implement missing features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2021-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264601#comment-17264601
 ] 

Hyukjin Kwon commented on SPARK-33507:
--

Hey guys, I think we should just start RC regardless of these issues here. 
Looks like it's going to take too long and the release schedule will have to be 
delayed.
These are non-regressions, right? Please directly ping me and let me know if 
there are regressions.

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Critical
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix existing cache behavior in v1 and v2.
>   - fix inconsistent cache behavior between v1 and v2
>   - implement missing features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34081) Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join

2021-01-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34081:
---

Assignee: Yuming Wang

> Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as 
> broadcast join
> ---
>
> Key: SPARK-34081
> URL: https://issues.apache.org/jira/browse/SPARK-34081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases.
> {code:scala}
> spark.range(5000L).selectExpr("id % 1 as a", "id % 1 as 
> b").write.saveAsTable("t1") spark.range(4000L).selectExpr("id % 8000 as 
> c", "id % 8000 as d").write.saveAsTable("t2")
> spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM 
> t2").explain
> {code}
> Current:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>+- HashAggregate(keys=[a#16L, b#17L], functions=[])
>   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>  +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, 
> [id=#72]
> +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>+- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), 
> coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), 
> coalesce(d#19L, 0), isnull(d#19L)], LeftSemi
>   :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) 
> ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS 
> FIRST], false, 0
>   :  +- Exchange hashpartitioning(coalesce(a#16L, 0), 
> isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, 
> [id=#65]
>   : +- FileScan parquet default.t1[a#16L,b#17L] Batched: 
> true, DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
>   +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) 
> ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS 
> FIRST], false, 0
>  +- Exchange hashpartitioning(coalesce(c#18L, 0), 
> isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, 
> [id=#66]
> +- HashAggregate(keys=[c#18L, d#19L], functions=[])
>+- Exchange hashpartitioning(c#18L, d#19L, 5), 
> ENSURE_REQUIREMENTS, [id=#61]
>   +- HashAggregate(keys=[c#18L, d#19L], 
> functions=[])
>  +- FileScan parquet default.t2[c#18L,d#19L] 
> Batched: true, DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
>  
> Expected:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>+- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, 
> [id=#74]
>   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>  +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 
> 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), 
> isnull(d#19L)], LeftSemi
> :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC 
> NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS 
> FIRST], false, 0
> :  +- Exchange hashpartitioning(coalesce(a#16L, 0), 
> isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, 
> [id=#67]
> : +- HashAggregate(keys=[a#16L, b#17L], functions=[])
> :+- Exchange hashpartitioning(a#16L, b#17L, 5), 
> ENSURE_REQUIREMENTS, [id=#61]
> :   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
> :  +- FileScan parquet default.t1[a#16L,b#17L] 
> Batched: true, DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC 
> NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS 
> FIRST], false, 0
>+- Exchange hashpartitioning(coalesce(c#18L, 0), 
> isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, 
> [id=#68]
>   +-

[jira] [Resolved] (SPARK-34081) Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join

2021-01-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34081.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31145
[https://github.com/apache/spark/pull/31145]

> Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as 
> broadcast join
> ---
>
> Key: SPARK-34081
> URL: https://issues.apache.org/jira/browse/SPARK-34081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases.
> {code:scala}
> spark.range(5000L).selectExpr("id % 1 as a", "id % 1 as 
> b").write.saveAsTable("t1") spark.range(4000L).selectExpr("id % 8000 as 
> c", "id % 8000 as d").write.saveAsTable("t2")
> spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM 
> t2").explain
> {code}
> Current:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>+- HashAggregate(keys=[a#16L, b#17L], functions=[])
>   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>  +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, 
> [id=#72]
> +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>+- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), 
> coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), 
> coalesce(d#19L, 0), isnull(d#19L)], LeftSemi
>   :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) 
> ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS 
> FIRST], false, 0
>   :  +- Exchange hashpartitioning(coalesce(a#16L, 0), 
> isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, 
> [id=#65]
>   : +- FileScan parquet default.t1[a#16L,b#17L] Batched: 
> true, DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
>   +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) 
> ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS 
> FIRST], false, 0
>  +- Exchange hashpartitioning(coalesce(c#18L, 0), 
> isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, 
> [id=#66]
> +- HashAggregate(keys=[c#18L, d#19L], functions=[])
>+- Exchange hashpartitioning(c#18L, d#19L, 5), 
> ENSURE_REQUIREMENTS, [id=#61]
>   +- HashAggregate(keys=[c#18L, d#19L], 
> functions=[])
>  +- FileScan parquet default.t2[c#18L,d#19L] 
> Batched: true, DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
>  
> Expected:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>+- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, 
> [id=#74]
>   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
>  +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 
> 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), 
> isnull(d#19L)], LeftSemi
> :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC 
> NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS 
> FIRST], false, 0
> :  +- Exchange hashpartitioning(coalesce(a#16L, 0), 
> isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, 
> [id=#67]
> : +- HashAggregate(keys=[a#16L, b#17L], functions=[])
> :+- Exchange hashpartitioning(a#16L, b#17L, 5), 
> ENSURE_REQUIREMENTS, [id=#61]
> :   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
> :  +- FileScan parquet default.t1[a#16L,b#17L] 
> Batched: true, DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC 
> NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS 
> FIRST], false, 0
>+- Exchange

[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls

2021-01-13 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-34096:
---
Description: 
The current 
{code:java}
UnboundedOffsetWindowFunctionFrame
{code}
and
{code:java}
UnboundedPrecedingOffsetWindowFunctionFrame
{code}
 only support nth_value that respect nulls. So nth_value will execute 
{code:java}
updateExpressions
{code}
 multiple times. 

  was:
The current 
{code:java}
UnboundedPrecedingOffsetWindowFunctionFrame
{code}
 only support nth_value that respect nulls. So nth_value will execute 
{code:java}
updateExpressions
{code}
 multiple times. 


> Improve performance for nth_value ignore nulls
> --
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current 
> {code:java}
> UnboundedOffsetWindowFunctionFrame
> {code}
> and
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34096) Improve performance for nth_value ignore nulls

2021-01-13 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-34096:
---
Summary: Improve performance for nth_value ignore nulls  (was: Improve 
performance for nth_value ignore nulls over unbounded preceding window frame)

> Improve performance for nth_value ignore nulls
> --
>
> Key: SPARK-34096
> URL: https://issues.apache.org/jira/browse/SPARK-34096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current 
> {code:java}
> UnboundedPrecedingOffsetWindowFunctionFrame
> {code}
>  only support nth_value that respect nulls. So nth_value will execute 
> {code:java}
> updateExpressions
> {code}
>  multiple times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34109) Killing executors excluded on failure, results in additional executors being marked as excluded due to fetch failures

2021-01-13 Thread Aaruna Godthi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaruna Godthi updated SPARK-34109:
--
Description: 
Configuration:

 
{code:java}
spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled
spark.excludeOnFailure.application.fetchFailure.enabled: true # aka deprecated 
spark.blacklist.application.fetchFailure.enabled
spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated 
spark.blacklist.killBlacklistedExecutors
{code}
 

 

 

In this case, we have noticed when a few executors are excluded due to task 
failures (maybe due to host issues), then those executors are killed after 
being excluded.

However, when other executors try to fetch shuffle blocks from these killed 
executors, then  these other executors also end up getting excluded due to 
`spark.excludeOnFailure.application.fetchFailure.enabled`.

Instead, the fetch failures in case of fetch from these excluded executors 
should not be considered when excluding executors based on 
`spark.excludeOnFailure.application.fetchFailure.enabled`

  was:
Configuration:

 

```

spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled


spark.excludeOnFailure.application.fetchFailure.enabled: true # aka deprecated 
spark.blacklist.application.fetchFailure.enabled

spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated 
spark.blacklist.killBlacklistedExecutors

```

 

In this case, we have noticed when a few executors are excluded due to task 
failures (maybe due to host issues), then those executors are killed after 
being excluded.

However, when other executors try to fetch shuffle blocks from these killed 
executors, then  these other executors also end up getting excluded due to 
`spark.excludeOnFailure.application.fetchFailure.enabled`. 

Instead, the fetch failures in case of fetch from these excluded executors 
should not be considered when excluding executors based on 
`spark.excludeOnFailure.application.fetchFailure.enabled`


> Killing executors excluded on failure, results in additional executors being 
> marked as excluded due to fetch failures
> -
>
> Key: SPARK-34109
> URL: https://issues.apache.org/jira/browse/SPARK-34109
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Aaruna Godthi
>Priority: Major
>
> Configuration:
>  
> {code:java}
> spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled
> spark.excludeOnFailure.application.fetchFailure.enabled: true # aka 
> deprecated spark.blacklist.application.fetchFailure.enabled
> spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated 
> spark.blacklist.killBlacklistedExecutors
> {code}
>  
>  
>  
> In this case, we have noticed when a few executors are excluded due to task 
> failures (maybe due to host issues), then those executors are killed after 
> being excluded.
> However, when other executors try to fetch shuffle blocks from these killed 
> executors, then  these other executors also end up getting excluded due to 
> `spark.excludeOnFailure.application.fetchFailure.enabled`.
> Instead, the fetch failures in case of fetch from these excluded executors 
> should not be considered when excluding executors based on 
> `spark.excludeOnFailure.application.fetchFailure.enabled`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34109) Killing executors excluded on failure, results in additional executors being marked as excluded due to fetch failures

2021-01-13 Thread Aaruna Godthi (Jira)

Aaruna Godthi created SPARK-34109:
-

 Summary: Killing executors excluded on failure, results in 
additional executors being marked as excluded due to fetch failures
 Key: SPARK-34109
 URL: https://issues.apache.org/jira/browse/SPARK-34109
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Shuffle, Spark Core
Affects Versions: 3.0.1, 3.0.0
Reporter: Aaruna Godthi


Configuration:

 

```

spark.excludeOnFailure.enabled: true # aka deprecated spark.blacklist.enabled


spark.excludeOnFailure.application.fetchFailure.enabled: true # aka deprecated 
spark.blacklist.application.fetchFailure.enabled

spark.excludeOnFailure.killExcludedExecutors: true # aka deprecated 
spark.blacklist.killBlacklistedExecutors

```

 

In this case, we have noticed when a few executors are excluded due to task 
failures (maybe due to host issues), then those executors are killed after 
being excluded.

However, when other executors try to fetch shuffle blocks from these killed 
executors, then  these other executors also end up getting excluded due to 
`spark.excludeOnFailure.application.fetchFailure.enabled`. 

Instead, the fetch failures in case of fetch from these excluded executors 
should not be considered when excluding executors based on 
`spark.excludeOnFailure.application.fetchFailure.enabled`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34086) RaiseError generates too much code and may fails codegen in length check for char varchar

2021-01-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34086.
-
Fix Version/s: 3.1.1
   Resolution: Fixed

Issue resolved by pull request 31168
[https://github.com/apache/spark/pull/31168]

> RaiseError generates too much code and may fails codegen in length check for 
> char varchar
> -
>
> Key: SPARK-34086
> URL: https://issues.apache.org/jira/browse/SPARK-34086
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/
> We can reduce more than 8000 bytes by removing the unnecessary CONCAT 
> expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34086) RaiseError generates too much code and may fails codegen in length check for char varchar

2021-01-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34086:
---

Assignee: Kent Yao

> RaiseError generates too much code and may fails codegen in length check for 
> char varchar
> -
>
> Key: SPARK-34086
> URL: https://issues.apache.org/jira/browse/SPARK-34086
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/
> We can reduce more than 8000 bytes by removing the unnecessary CONCAT 
> expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23431) Expose the new executor memory metrics at the stage level

2021-01-13 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-23431:
---
Fix Version/s: 3.1.0

> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Edward Lu
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.1.0
>
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23431) Expose the new executor memory metrics at the stage level

2021-01-13 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264571#comment-17264571
 ] 

Gengliang Wang edited comment on SPARK-23431 at 1/14/21, 3:20 AM:
--

[~dongjoon] Done. Sorry for missing the fixed version field.


was (Author: gengliang.wang):
[~dongjoon]Done. Sorry for missing the fixed version field.

> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Edward Lu
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.1.0
>
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level

2021-01-13 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264571#comment-17264571
 ] 

Gengliang Wang commented on SPARK-23431:


[~dongjoon]Done. Sorry for missing the fixed version field.

> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Edward Lu
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.1.0
>
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34108) Caching with permanent view doesn't work in certain cases

2021-01-13 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-34108:
-
Summary: Caching with permanent view doesn't work in certain cases  (was: 
Caching doesn't work completely with permanent view)

> Caching with permanent view doesn't work in certain cases
> -
>
> Key: SPARK-34108
> URL: https://issues.apache.org/jira/browse/SPARK-34108
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently, caching a permanent view doesn't work in certain cases. For 
> instance, in the following:
> {code:sql}
> CREATE TABLE t (key bigint, value string) USING parquet
> CREATE VIEW v1 AS SELECT key FROM t
> CACHE TABLE v1
> SELECT key FROM t
> {code}
> The last SELECT query will hit the cached {{v1}}. On the other hand:
> {code:sql}
> CREATE TABLE t (key bigint, value string) USING parquet
> CREATE VIEW v1 AS SELECT key FROM t ORDER by key
> CACHE TABLE v1
> SELECT key FROM t ORDER BY key
> {code}
> The SELECT won't hit the cache.
> It seems this is related to {{EliminateView}}. In the second case, it will 
> insert an extra project operator which makes the comparison on canonicalized 
> plan during cache lookup fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34108) Caching doesn't work completely with permanent view

2021-01-13 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-34108:
-
Description: 
Currently, caching a permanent view doesn't work in certain cases. For 
instance, in the following:
{code:sql}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t
CACHE TABLE v1
SELECT key FROM t
{code}
The last SELECT query will hit the cached {{v1}}. On the other hand:
{code:sql}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t ORDER by key
CACHE TABLE v1
SELECT key FROM t ORDER BY key
{code}
The SELECT won't hit the cache.

It seems this is related to {{EliminateView}}. In the second case, it will 
insert an extra project operator which makes the comparison on canonicalized 
plan during cache lookup fail.

  was:
Currently, caching a permanent view doesn't work in some cases. For instance, 
in the following:
{code:sql}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t
CACHE TABLE v1
SELECT key FROM t
{code}

The last SELECT query will hit the cached {{v1}}. However, in the following:
{code:sql}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t ORDER by key
CACHE TABLE v1
SELECT key FROM t ORDER BY key
{code}

The SELECT won't hit the cache.

It seems this is related to {{EliminateView}}. In the second case, it will 
insert an extra project operator which makes the comparison on canonicalized 
plan during cache lookup fail.


> Caching doesn't work completely with permanent view
> ---
>
> Key: SPARK-34108
> URL: https://issues.apache.org/jira/browse/SPARK-34108
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently, caching a permanent view doesn't work in certain cases. For 
> instance, in the following:
> {code:sql}
> CREATE TABLE t (key bigint, value string) USING parquet
> CREATE VIEW v1 AS SELECT key FROM t
> CACHE TABLE v1
> SELECT key FROM t
> {code}
> The last SELECT query will hit the cached {{v1}}. On the other hand:
> {code:sql}
> CREATE TABLE t (key bigint, value string) USING parquet
> CREATE VIEW v1 AS SELECT key FROM t ORDER by key
> CACHE TABLE v1
> SELECT key FROM t ORDER BY key
> {code}
> The SELECT won't hit the cache.
> It seems this is related to {{EliminateView}}. In the second case, it will 
> insert an extra project operator which makes the comparison on canonicalized 
> plan during cache lookup fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34108) Caching doesn't work completely with permanent view

2021-01-13 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-34108:
-
Description: 
Currently, caching a permanent view doesn't work in some cases. For instance, 
in the following:
{code:sql}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t
CACHE TABLE v1
SELECT key FROM t
{code}

The last SELECT query will hit the cached {{v1}}. However, in the following:
{code:sql}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t ORDER by key
CACHE TABLE v1
SELECT key FROM t ORDER BY key
{code}

The SELECT won't hit the cache.

It seems this is related to {{EliminateView}}. In the second case, it will 
insert an extra project operator which makes the comparison on canonicalized 
plan during cache lookup fail.

  was:
Currently, caching a permanent view doesn't work in some cases. For instance, 
in the following:
{code}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t
CACHE TABLE v1
SELECT key FROM t
{code}

The last SELECT query will hit the cached {{v1}}. However, in the following:
{code}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t ORDER by key
CACHE TABLE v1
SELECT key FROM t ORDER BY key
{code}

The SELECT won't hit the cache.

It seems this is related to {{EliminateView}}. In the second case, it will 
insert an extra project operator which makes the comparison on canonicalized 
plan during cache lookup fail.


> Caching doesn't work completely with permanent view
> ---
>
> Key: SPARK-34108
> URL: https://issues.apache.org/jira/browse/SPARK-34108
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently, caching a permanent view doesn't work in some cases. For instance, 
> in the following:
> {code:sql}
> CREATE TABLE t (key bigint, value string) USING parquet
> CREATE VIEW v1 AS SELECT key FROM t
> CACHE TABLE v1
> SELECT key FROM t
> {code}
> The last SELECT query will hit the cached {{v1}}. However, in the following:
> {code:sql}
> CREATE TABLE t (key bigint, value string) USING parquet
> CREATE VIEW v1 AS SELECT key FROM t ORDER by key
> CACHE TABLE v1
> SELECT key FROM t ORDER BY key
> {code}
> The SELECT won't hit the cache.
> It seems this is related to {{EliminateView}}. In the second case, it will 
> insert an extra project operator which makes the comparison on canonicalized 
> plan during cache lookup fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34108) Caching doesn't work completely with permanent view

2021-01-13 Thread Chao Sun (Jira)

Chao Sun created SPARK-34108:


 Summary: Caching doesn't work completely with permanent view
 Key: SPARK-34108
 URL: https://issues.apache.org/jira/browse/SPARK-34108
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Chao Sun


Currently, caching a permanent view doesn't work in some cases. For instance, 
in the following:
{code}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t
CACHE TABLE v1
SELECT key FROM t
{code}

The last SELECT query will hit the cached {{v1}}. However, in the following:
{code}
CREATE TABLE t (key bigint, value string) USING parquet
CREATE VIEW v1 AS SELECT key FROM t ORDER by key
CACHE TABLE v1
SELECT key FROM t ORDER BY key
{code}

The SELECT won't hit the cache.

It seems this is related to {{EliminateView}}. In the second case, it will 
insert an extra project operator which makes the comparison on canonicalized 
plan during cache lookup fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34106:


Assignee: Apache Spark  (was: zhengruifeng)

> Hide FValueTest and AnovaTest
> -
>
> Key: SPARK-34106
> URL: https://issues.apache.org/jira/browse/SPARK-34106
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Major
>
> hide the added test classes for now.
> they are not very practical for big data. If there are valid use cases, we 
> should see more requests from the community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264567#comment-17264567
 ] 

Apache Spark commented on SPARK-34106:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/31176

> Hide FValueTest and AnovaTest
> -
>
> Key: SPARK-34106
> URL: https://issues.apache.org/jira/browse/SPARK-34106
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> hide the added test classes for now.
> they are not very practical for big data. If there are valid use cases, we 
> should see more requests from the community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34106:


Assignee: zhengruifeng  (was: Apache Spark)

> Hide FValueTest and AnovaTest
> -
>
> Key: SPARK-34106
> URL: https://issues.apache.org/jira/browse/SPARK-34106
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> hide the added test classes for now.
> they are not very practical for big data. If there are valid use cases, we 
> should see more requests from the community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33311) Improve semantics for REFRESH TABLE

2021-01-13 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33311:
-
Parent: SPARK-33507
Issue Type: Sub-task  (was: Improvement)

> Improve semantics for REFRESH TABLE
> ---
>
> Key: SPARK-33311
> URL: https://issues.apache.org/jira/browse/SPARK-33311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Chao Sun
>Priority: Major
>
> Currently, the semantics for {{REFRESH TABLE t}} is not well defined for view 
> (let's say {{view}}) that reference the table {{t}}:
> 1. If {{view}} is cached, the behavior is not well-defined. Should Spark 
> invalidate the cache (current behavior) or recache it?
> 2. If {{view}} is a temporary view, currently refreshing {{t}} does not 
> refresh {{view}} since it will just reuse the logical plan defined in the 
> session catalog. This could lead query failures (although with a helpful 
> error message) or to incorrect results depending on the refresh behavior.
> I think we should clear define and document the behavior here, so that users 
> won't get confused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3

2021-01-13 Thread Shashank Pedamallu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashank Pedamallu updated SPARK-34107:
---
Description: 
Spark History Service is having trouble loading when loading initially with 
300k+ applications from S3. Following are the details and snapshots:

Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
{noformat}
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
| => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
  305571
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
{noformat}
Logs when starting SparkHistory:
{noformat}
root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
 
/go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh 
--properties-file /etc/spark-history-config/shs-default.properties
 2021/01/14 02:40:28 Spark spark wrapper is disabled
 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0
 2021/01/14 02:40:28 Statsd disabled
 2021/01/14 02:40:28 Debug log: /tmp/.log
 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 
Mozart 0
 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with 
arguments [org.apache.spark.deploy.history.HistoryServer --properties-file 
/etc/spark-history-config/shs-default.properties]
 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 
2077@shs-with-statsd-86d7f54679-t8fqr
 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM
 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP
 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT
 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root
 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root
 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to:
 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to:
 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); groups with 
view permissions: Set(); users with modify permissions: Set(root); groups with 
modify permissions: Set()
 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; 
users with admin permissions: ; groups with admin permissions
 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 
10 second(s).
 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system 
started
 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to 
org.sparkproject.jetty.util.log.Slf4jLog
 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 
2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 
1.8.0_242-b08
 21/01/14 02:40:31 INFO Server: Started @1999ms
 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f
{HTTP/1.1,[http/1.1]} {0.0.0.0:18080}
21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080.
 21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@b9dfc5a
{/,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@1bbae752
{/json,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@5cf87cfd
{/api,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@74971ed9
{/static,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@1542af63
{/history,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and 
started at http://shs-with-statsd-86d7f54679-t8fqr:18080
 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 
seconds
 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning 
s3a://-company/spark-history-fs-logDirectory/ with 
lastScanTime==-1{noformat}

  was:
Spark History Service is having trouble loading when loading initially with 
300k+ applications from S3. Following are the details and snapshots:

Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
{noformat}
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
| => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
  305571
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
{noformat}
Logs when starting SparkHistory:

 
{noformat}
root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#

[jira] [Updated] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3

2021-01-13 Thread Shashank Pedamallu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashank Pedamallu updated SPARK-34107:
---
Description: 
Spark History Service is having trouble loading when loading initially with 
300k+ applications from S3. Following are the details and snapshots:

Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
{noformat}
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
| => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
  305571
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
{noformat}
Logs when starting SparkHistory:

 
{noformat}
root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
 
/go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh 
--properties-file /etc/spark-history-config/shs-default.properties
 2021/01/14 02:40:28 Spark spark wrapper is disabled
 2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0
 2021/01/14 02:40:28 Statsd disabled
 2021/01/14 02:40:28 Debug log: /tmp/.log
 2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 
Mozart 0
 2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with 
arguments [org.apache.spark.deploy.history.HistoryServer --properties-file 
/etc/spark-history-config/shs-default.properties]
 21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 
2077@shs-with-statsd-86d7f54679-t8fqr
 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM
 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP
 21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT
 21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
 21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root
 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root
 21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to:
 21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to:
 21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); groups with 
view permissions: Set(); users with modify permissions: Set(root); groups with 
modify permissions: Set()
 21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; 
users with admin permissions: ; groups with admin permissions
 21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
 21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 
10 second(s).
 21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system 
started
 21/01/14 02:40:31 INFO log: Logging initialized @1933ms to 
org.sparkproject.jetty.util.log.Slf4jLog
 21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 
2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 
1.8.0_242-b08
 21/01/14 02:40:31 INFO Server: Started @1999ms
 21/01/14 02:40:31 INFO AbstractConnector: Started ServerConnector@51751e5f
{HTTP/1.1,[http/1.1]} {0.0.0.0:18080}
21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080.
 21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@b9dfc5a
{/,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@1bbae752
{/json,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@5cf87cfd
{/api,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@74971ed9
{/static,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@1542af63
{/history,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and 
started at http://shs-with-statsd-86d7f54679-t8fqr:18080
 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 
seconds
 21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning 
s3a://-company/spark-history-fs-logDirectory/ with 
lastScanTime==-1{noformat}

  was:
Spark History Service is having trouble loading when loading initially with 
300k+ applications from S3. Following are the details and snapshots:

Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
{noformat}
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
| => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
  305571
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
Logs when starting SparkHistory:
{noformat}
root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#

[jira] [Created] (SPARK-34107) Spark History not loading when service has to load 300k applications initially from S3

2021-01-13 Thread Shashank Pedamallu (Jira)

Shashank Pedamallu created SPARK-34107:
--

 Summary: Spark History not loading when service has to load 300k 
applications initially from S3
 Key: SPARK-34107
 URL: https://issues.apache.org/jira/browse/SPARK-34107
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Shashank Pedamallu


Spark History Service is having trouble loading when loading initially with 
300k+ applications from S3. Following are the details and snapshots:

Number of files in `spark.history.fs.logDirectory`: (Using xxx for anonymity)
{noformat}
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) $
| => aws s3 ls s3://-company/spark-history-fs-logDirectory/ | wc -l
  305571
spedamallu@spedamallu-mbp143 ~/src/spark (spark-bug) ${noformat}
Logs when starting SparkHistory:
{noformat}
root@shs-with-statsd-86d7f54679-t8fqr:/go/src/github.com/-company/spark-private#
 
/go/src/github.com/-company/spark-private/bootstrap/start-history-server.sh 
--properties-file /etc/spark-history-config/shs-default.properties
2021/01/14 02:40:28 Spark spark wrapper is disabled
2021/01/14 02:40:28 Attempt number 0, Max attempts 0, Left Attempts 0
2021/01/14 02:40:28 Statsd disabled
2021/01/14 02:40:28 Debug log: /tmp/.log
2021/01/14 02:40:28 Job submitted 0 seconds ago, Operator 0, ETL 0, Flyte 0 
Mozart 0
2021/01/14 02:40:28 Running command /opt/spark/bin/spark-class.orig with 
arguments [org.apache.spark.deploy.history.HistoryServer --properties-file 
/etc/spark-history-config/shs-default.properties]
21/01/14 02:40:29 INFO HistoryServer: Started daemon with process name: 
2077@shs-with-statsd-86d7f54679-t8fqr
21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for TERM
21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for HUP
21/01/14 02:40:29 INFO SignalUtils: Registered signal handler for INT
21/01/14 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
21/01/14 02:40:30 INFO SecurityManager: Changing view acls to: root
21/01/14 02:40:30 INFO SecurityManager: Changing modify acls to: root
21/01/14 02:40:30 INFO SecurityManager: Changing view acls groups to:
21/01/14 02:40:30 INFO SecurityManager: Changing modify acls groups to:
21/01/14 02:40:30 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
21/01/14 02:40:30 INFO FsHistoryProvider: History server ui acls disabled; 
users with admin permissions: ; groups with admin permissions
21/01/14 02:40:30 WARN MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
21/01/14 02:40:30 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 
10 second(s).
21/01/14 02:40:30 INFO MetricsSystemImpl: s3a-file-system metrics system started
21/01/14 02:40:31 INFO log: Logging initialized @1933ms to 
org.sparkproject.jetty.util.log.Slf4jLog
21/01/14 02:40:31 INFO Server: jetty-9.4.z-SNAPSHOT; built: 
2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 
1.8.0_242-b08
21/01/14 02:40:31 INFO Server: Started @1999ms
21/01/14 02:40:31 INFO AbstractConnector: Started 
ServerConnector@51751e5f{HTTP/1.1,[http/1.1]}{0.0.0.0:18080}
21/01/14 02:40:31 INFO Utils: Successfully started service on port 18080.
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@b9dfc5a{/,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@1bbae752{/json,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@5cf87cfd{/api,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@74971ed9{/static,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO ContextHandler: Started 
o.s.j.s.ServletContextHandler@1542af63{/history,null,AVAILABLE,@Spark}
21/01/14 02:40:31 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and 
started at http://shs-with-statsd-86d7f54679-t8fqr:18080
21/01/14 02:40:31 DEBUG FsHistoryProvider: Scheduling update thread every 10 
seconds
21/01/14 02:40:31 DEBUG FsHistoryProvider: Scanning 
s3a://-company/spark-history-fs-logDirectory/ with 
lastScanTime==-1{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread zhengruifeng (Jira)

zhengruifeng created SPARK-34106:


 Summary: Hide FValueTest and AnovaTest
 Key: SPARK-34106
 URL: https://issues.apache.org/jira/browse/SPARK-34106
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.2.0, 3.1.1
Reporter: zhengruifeng


hide the added test classes for now.

they are not very practical for big data. If there are valid use cases, we 
should see more requests from the community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34106) Hide FValueTest and AnovaTest

2021-01-13 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-34106:


Assignee: zhengruifeng

> Hide FValueTest and AnovaTest
> -
>
> Key: SPARK-34106
> URL: https://issues.apache.org/jira/browse/SPARK-34106
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> hide the added test classes for now.
> they are not very practical for big data. If there are valid use cases, we 
> should see more requests from the community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33557:
-
Fix Version/s: 3.0.2

> spark.storage.blockManagerSlaveTimeoutMs default value does not follow 
> spark.network.timeout value when the latter was changed
> --
>
> Key: SPARK-33557
> URL: https://issues.apache.org/jira/browse/SPARK-33557
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Ohad
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.0.2, 3.1.0
>
>
> According to the documentation "spark.network.timeout" is the default timeout 
> for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the 
> user sets "spark.network.timeout"  the effective value of 
> "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was 
> not specifically changed.
> However this is not the case since the default value of 
> "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of 
> "spark.network.timeout" (120s)
>  
> "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object 
> of "org.apache.spark.internal.config" as follows:
> {code:java}
> private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT =
>   ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs")
> .version("0.7.0")
> .timeConf(TimeUnit.MILLISECONDS)
> .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString)
> {code}
> So it seems like the its default value is indeed "fixed" to 
> "spark.network.timeout" default value.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

2021-01-13 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264538#comment-17264538
 ] 

Takeshi Yamamuro commented on SPARK-34097:
--

Ah, I see. Thanks for the report. This issue reminds me of the issue: 
https://issues.apache.org/jira/browse/SPARK-28587. Since the timestamp part in 
the where clause looks database-depenedent, I'm thinking now that we might need 
to handle it in JdbcDialect...

> overflow for datetime datatype when creating stride +  JDBC parallel read
> -
>
> Key: SPARK-34097
> URL: https://issues.apache.org/jira/browse/SPARK-34097
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
> Environment: spark 3.0.1
> sql server v12.0
>Reporter: Pradip Sodha
>Priority: Major
>
> I'm trying to do JDBC parallel read with datetime  column as partition column
> {code:java}
> create table eData (eid int, start_time datetime) -- sql server v12.0
> --inserting some data{code}
>  
> {code:java}
> val df = spark // spark 3.0.1
> .read
> .format("jdbc")
> .option("url", "jdbc:sqlserver://...")
> .option("partitionColumn", "start_time")
> .option("lowerBound", "2000-01-01T01:01:11.546")
> .option("upperBound", "2000-01-02T01:01:11.547")
> .option("numPartitions", "10")
> .option("dbtable", "eData")
> .load();
> df.show(false){code}
> and getting this error,
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
> (TID 7, 10.139.64.6, executor 0): 
> com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when 
> converting date and/or time from character string.   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
>   at 
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
>   at 
> org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
>   at org.apache.spark.scheduler.Task.run(Task.scala:117)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ...
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
> when converting date and/or time from character string.{code}
>  
> which is expted because, because the query desing by spark is,
> {code:java}
> 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses 
> of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or 
> "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND 
> "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 
> 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= 
> '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', 
> "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 
> 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < 
> '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND 
> "start_time" < '2000-01-01 17:49:11.5467',

[jira] [Commented] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

2021-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264516#comment-17264516
 ] 

Hyukjin Kwon commented on SPARK-34097:
--

cc [~maropu] FYI

> overflow for datetime datatype when creating stride +  JDBC parallel read
> -
>
> Key: SPARK-34097
> URL: https://issues.apache.org/jira/browse/SPARK-34097
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
> Environment: spark 3.0.1
> sql server v12.0
>Reporter: Pradip Sodha
>Priority: Major
>
> I'm trying to do JDBC parallel read with datetime  column as partition column
> {code:java}
> create table eData (eid int, start_time datetime) -- sql server v12.0
> --inserting some data{code}
>  
> {code:java}
> val df = spark // spark 3.0.1
> .read
> .format("jdbc")
> .option("url", "jdbc:sqlserver://...")
> .option("partitionColumn", "start_time")
> .option("lowerBound", "2000-01-01T01:01:11.546")
> .option("upperBound", "2000-01-02T01:01:11.547")
> .option("numPartitions", "10")
> .option("dbtable", "eData")
> .load();
> df.show(false){code}
> and getting this error,
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
> (TID 7, 10.139.64.6, executor 0): 
> com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when 
> converting date and/or time from character string.   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
>   at 
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
>   at 
> org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
>   at org.apache.spark.scheduler.Task.run(Task.scala:117)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ...
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
> when converting date and/or time from character string.{code}
>  
> which is expted because, because the query desing by spark is,
> {code:java}
> 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses 
> of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or 
> "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND 
> "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 
> 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= 
> '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', 
> "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 
> 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < 
> '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND 
> "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01 
> 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >= 
> '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469', 
> "start_time" >= '2000-01-01 22:37:11.5469'
> {code}
> so, the date use in

[jira] [Resolved] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34100.
--
Resolution: Cannot Reproduce

> pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
> -
>
> Key: SPARK-34100
> URL: https://issues.apache.org/jira/browse/SPARK-34100
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, PySpark
>Affects Versions: 2.4.7
> Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also 
> tested with pip 20.3.3), using Docker or EMR 5.32.0
>  
> Example Dockerfile to reproduce:
> {{FROM amazonlinux:2}}
> {{RUN yum install -y python3}}
> {{RUN pip3 install pyspark==2.4.7}}
>  
>Reporter: Devin Boyer
>Priority: Minor
>
> I'm unable to install the pyspark Python package on Amazon Linux 2, whether 
> in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with 
> Python 3.7 and pip 9.0.3, but upgrading pip yields the same result.
>  
> When installing the package, the installation will fail with the error 
> "ValueError: bad marshal data (unknown type code)". Full example stack below.
>  
> This bug prevents use of pyspark for simple testing environments, and from 
> using tools where the pyspark package is a dependency, like 
> [https://github.com/awslabs/python-deequ.]
>  
> Stack Trace:
> {{Step 3/3 : RUN pip3 install pyspark==2.4.7}}
> {{ ---> Running in 2c6e1c1de62f}}
> {{WARNING: Running pip install with root privileges is generally not a good 
> idea. Try `pip3 install --user` instead.}}
> {{Collecting pyspark==2.4.7}}
> {{ Downloading 
> https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz
>  (217.9MB)}}
> {{ Complete output from command python setup.py egg_info:}}
> {{ Could not import pypandoc - required to package PySpark}}
> {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown 
> distribution option: 'long_description_content_type'}}
> {{ warnings.warn(msg)}}
> {{ zip_safe flag not set; analyzing archive contents...}}
> {{ Traceback (most recent call last):}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, 
> in save_modules}}
> {{ yield saved}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, 
> in setup_context}}
> {{ yield}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, 
> in run_setup}}
> {{ _execfile(setup_script, ns)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in 
> _execfile}}
> {{ exec(code, globals, locals)}}
> {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in 
> }}
> {{ # using Python imports instead which will be resolved correctly.}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, 
> in setup}}
> {{ return distutils.core.setup(**attrs)}}
> {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}}
> {{ dist.run_commands()}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}}
> {{ self.run_command(cmd)}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}}
> {{ cmd_obj.run()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 218, in run}}
> {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 269, in zip_safe}}
> {{ return analyze_egg(self.bdist_dir, self.stubs)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 379, in analyze_egg}}
> {{ safe = scan_module(egg_dir, base, name, stubs) and safe}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 416, in scan_module}}
> {{ code = marshal.load(f)}}
> {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the 
> above exception, another exception occurred:}}{{Traceback (most recent call 
> last):}}
> {{ File "", line 1, in }}
> {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }}
> {{ 'Programming Language :: Python :: Implementation :: PyPy']}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, 
> in setup}}
> {{ _install_setup_requires(attrs)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, 
> in _install_setup_requires}}
> {{ dist.fetch_build_eggs(dist.setup_requires)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in 
> fetch_build_eggs}}
> {{ replace_conflicting=True,}}
> {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 
> 866, in resolve}}
> {{ replace_conflicting=replace_conflicting}}
> {{ File

[jira] [Commented] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2

2021-01-13 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264510#comment-17264510
 ] 

Hyukjin Kwon commented on SPARK-34100:
--

If this is fixed upstream, it's best to identify the ticket and port back 
instead of filing new JIRA to request backport. I am resolving this ticket for 
now but it would be great if we can identify the ticket that fixed this issue.

> pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
> -
>
> Key: SPARK-34100
> URL: https://issues.apache.org/jira/browse/SPARK-34100
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, PySpark
>Affects Versions: 2.4.7
> Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also 
> tested with pip 20.3.3), using Docker or EMR 5.32.0
>  
> Example Dockerfile to reproduce:
> {{FROM amazonlinux:2}}
> {{RUN yum install -y python3}}
> {{RUN pip3 install pyspark==2.4.7}}
>  
>Reporter: Devin Boyer
>Priority: Minor
>
> I'm unable to install the pyspark Python package on Amazon Linux 2, whether 
> in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with 
> Python 3.7 and pip 9.0.3, but upgrading pip yields the same result.
>  
> When installing the package, the installation will fail with the error 
> "ValueError: bad marshal data (unknown type code)". Full example stack below.
>  
> This bug prevents use of pyspark for simple testing environments, and from 
> using tools where the pyspark package is a dependency, like 
> [https://github.com/awslabs/python-deequ.]
>  
> Stack Trace:
> {{Step 3/3 : RUN pip3 install pyspark==2.4.7}}
> {{ ---> Running in 2c6e1c1de62f}}
> {{WARNING: Running pip install with root privileges is generally not a good 
> idea. Try `pip3 install --user` instead.}}
> {{Collecting pyspark==2.4.7}}
> {{ Downloading 
> https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz
>  (217.9MB)}}
> {{ Complete output from command python setup.py egg_info:}}
> {{ Could not import pypandoc - required to package PySpark}}
> {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown 
> distribution option: 'long_description_content_type'}}
> {{ warnings.warn(msg)}}
> {{ zip_safe flag not set; analyzing archive contents...}}
> {{ Traceback (most recent call last):}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, 
> in save_modules}}
> {{ yield saved}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, 
> in setup_context}}
> {{ yield}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, 
> in run_setup}}
> {{ _execfile(setup_script, ns)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in 
> _execfile}}
> {{ exec(code, globals, locals)}}
> {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in 
> }}
> {{ # using Python imports instead which will be resolved correctly.}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, 
> in setup}}
> {{ return distutils.core.setup(**attrs)}}
> {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}}
> {{ dist.run_commands()}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}}
> {{ self.run_command(cmd)}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}}
> {{ cmd_obj.run()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 218, in run}}
> {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 269, in zip_safe}}
> {{ return analyze_egg(self.bdist_dir, self.stubs)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 379, in analyze_egg}}
> {{ safe = scan_module(egg_dir, base, name, stubs) and safe}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 416, in scan_module}}
> {{ code = marshal.load(f)}}
> {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the 
> above exception, another exception occurred:}}{{Traceback (most recent call 
> last):}}
> {{ File "", line 1, in }}
> {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }}
> {{ 'Programming Language :: Python :: Implementation :: PyPy']}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, 
> in setup}}
> {{ _install_setup_requires(attrs)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, 
> in _install_setup_requires}}
> {{ dist.fetch_build_eggs(dist.setup_requires)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in 
>

[jira] [Updated] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34097:
-
Target Version/s:   (was: 3.0.1)

> overflow for datetime datatype when creating stride +  JDBC parallel read
> -
>
> Key: SPARK-34097
> URL: https://issues.apache.org/jira/browse/SPARK-34097
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
> Environment: spark 3.0.1
> sql server v12.0
>Reporter: Pradip Sodha
>Priority: Major
>
> I'm trying to do JDBC parallel read with datetime  column as partition column
> {code:java}
> create table eData (eid int, start_time datetime) -- sql server v12.0
> --inserting some data{code}
>  
> {code:java}
> val df = spark // spark 3.0.1
> .read
> .format("jdbc")
> .option("url", "jdbc:sqlserver://...")
> .option("partitionColumn", "start_time")
> .option("lowerBound", "2000-01-01T01:01:11.546")
> .option("upperBound", "2000-01-02T01:01:11.547")
> .option("numPartitions", "10")
> .option("dbtable", "eData")
> .load();
> df.show(false){code}
> and getting this error,
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
> (TID 7, 10.139.64.6, executor 0): 
> com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when 
> converting date and/or time from character string.   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
>   at 
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
>   at 
> org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
>   at org.apache.spark.scheduler.Task.run(Task.scala:117)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ...
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
> when converting date and/or time from character string.{code}
>  
> which is expted because, because the query desing by spark is,
> {code:java}
> 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses 
> of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or 
> "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND 
> "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 
> 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= 
> '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', 
> "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 
> 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < 
> '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND 
> "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01 
> 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >= 
> '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469', 
> "start_time" >= '2000-01-01 22:37:11.5469'
> {code}
> so, the date use in query is '2000-01-01

[jira] [Resolved] (SPARK-34075) Hidden directories are being listed for partition inference

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34075.
--
Fix Version/s: 3.1.1
   Resolution: Fixed

Issue resolved by pull request 31169
[https://github.com/apache/spark/pull/31169]

> Hidden directories are being listed for partition inference
> ---
>
> Key: SPARK-34075
> URL: https://issues.apache.org/jira/browse/SPARK-34075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Burak Yavuz
>Assignee: Gengliang Wang
>Priority: Blocker
> Fix For: 3.1.1
>
>
> Marking this as a blocker since it seems to be a regression. We are running 
> Delta's tests against Spark 3.1 as part of QA here: 
> [https://github.com/delta-io/delta/pull/579]
>  
> We have noticed that one of our tests regressed with:
> {code:java}
> java.lang.AssertionError: assertion failed: Conflicting directory structures 
> detected. Suspicious paths:
> [info]
> file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551
> [info]
> file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551/_delta_log
> [info] 
> [info] If provided paths are partition directories, please set "basePath" in 
> the options of the data source to specify the root directory of the table. If 
> there are multiple root directories, please load them separately and then 
> union them.
> [info]   at scala.Predef$.assert(Predef.scala:223)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:172)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:104)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:158)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:73)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50)
> [info]   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:167)
> [info]   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:62)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:45)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
> [info]   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:45)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:40)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
> [info]   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> [info]   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
> [info]   at scala.collection.immutable.List.foldLeft(List.scala:89)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
> [info]   at scala.collection.immutable.List.foreach(List.scala:392)
> [info]   at 
>

[jira] [Assigned] (SPARK-34075) Hidden directories are being listed for partition inference

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34075:


Assignee: Gengliang Wang

> Hidden directories are being listed for partition inference
> ---
>
> Key: SPARK-34075
> URL: https://issues.apache.org/jira/browse/SPARK-34075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Burak Yavuz
>Assignee: Gengliang Wang
>Priority: Blocker
>
> Marking this as a blocker since it seems to be a regression. We are running 
> Delta's tests against Spark 3.1 as part of QA here: 
> [https://github.com/delta-io/delta/pull/579]
>  
> We have noticed that one of our tests regressed with:
> {code:java}
> java.lang.AssertionError: assertion failed: Conflicting directory structures 
> detected. Suspicious paths:
> [info]
> file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551
> [info]
> file:/private/var/folders/_2/xn1c9yr11_93wjdk2vkvmwm0gp/t/spark-18706bcc-23ea-4853-b8bc-c4cc2a5ed551/_delta_log
> [info] 
> [info] If provided paths are partition directories, please set "basePath" in 
> the options of the data source to specify the root directory of the table. If 
> there are multiple root directories, please load them separately and then 
> union them.
> [info]   at scala.Predef$.assert(Predef.scala:223)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:172)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:104)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:158)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:73)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50)
> [info]   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:167)
> [info]   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:418)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:62)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile$$anonfun$apply$1.applyOrElse(rules.scala:45)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
> [info]   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:45)
> [info]   at 
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile.apply(rules.scala:40)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
> [info]   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> [info]   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
> [info]   at scala.collection.immutable.List.foldLeft(List.scala:89)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
> [info]   at scala.collection.immutable.List.foreach(List.scala:392)
> [info]   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)
> [info]   at 
>

[jira] [Assigned] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34103:


Assignee: Dongjoon Hyun

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34103.
--
Fix Version/s: 3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 31174
[https://github.com/apache/spark/pull/31174]

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264496#comment-17264496
 ] 

Apache Spark commented on SPARK-33557:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31175

> spark.storage.blockManagerSlaveTimeoutMs default value does not follow 
> spark.network.timeout value when the latter was changed
> --
>
> Key: SPARK-33557
> URL: https://issues.apache.org/jira/browse/SPARK-33557
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Ohad
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.1.0
>
>
> According to the documentation "spark.network.timeout" is the default timeout 
> for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the 
> user sets "spark.network.timeout"  the effective value of 
> "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was 
> not specifically changed.
> However this is not the case since the default value of 
> "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of 
> "spark.network.timeout" (120s)
>  
> "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object 
> of "org.apache.spark.internal.config" as follows:
> {code:java}
> private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT =
>   ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs")
> .version("0.7.0")
> .timeConf(TimeUnit.MILLISECONDS)
> .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString)
> {code}
> So it seems like the its default value is indeed "fixed" to 
> "spark.network.timeout" default value.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264494#comment-17264494
 ] 

Apache Spark commented on SPARK-33557:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31175

> spark.storage.blockManagerSlaveTimeoutMs default value does not follow 
> spark.network.timeout value when the latter was changed
> --
>
> Key: SPARK-33557
> URL: https://issues.apache.org/jira/browse/SPARK-33557
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Ohad
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.1.0
>
>
> According to the documentation "spark.network.timeout" is the default timeout 
> for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the 
> user sets "spark.network.timeout"  the effective value of 
> "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was 
> not specifically changed.
> However this is not the case since the default value of 
> "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of 
> "spark.network.timeout" (120s)
>  
> "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object 
> of "org.apache.spark.internal.config" as follows:
> {code:java}
> private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT =
>   ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs")
> .version("0.7.0")
> .timeConf(TimeUnit.MILLISECONDS)
> .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString)
> {code}
> So it seems like the its default value is indeed "fixed" to 
> "spark.network.timeout" default value.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34051) Support 32-bit unicode escape in string literals

2021-01-13 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-34051.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31096
[https://github.com/apache/spark/pull/31096]

> Support 32-bit unicode escape in string literals
> 
>
> Key: SPARK-34051
> URL: https://issues.apache.org/jira/browse/SPARK-34051
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.2.0
>
>
> Currently, Spark supports 16-bit unicode escape like "\u0041" in string 
> literals.
> I think It's nice if 32-bit unicode is also supported like PostgreSQL and 
> modern programming languages do (e.g, C++11, Rust).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34068) Remove redundant collection conversion in Spark code

2021-01-13 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-34068.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31125
[https://github.com/apache/spark/pull/31125]

> Remove redundant collection conversion in Spark code
> 
>
> Key: SPARK-34068
> URL: https://issues.apache.org/jira/browse/SPARK-34068
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, MLlib, Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.0
>
>
> There are some redundant collection conversion can be removed, for version 
> compatibility, clean up these with Scala-2.13 profile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34068) Remove redundant collection conversion in Spark code

2021-01-13 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-34068:


Assignee: Yang Jie

> Remove redundant collection conversion in Spark code
> 
>
> Key: SPARK-34068
> URL: https://issues.apache.org/jira/browse/SPARK-34068
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, MLlib, Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> There are some redundant collection conversion can be removed, for version 
> compatibility, clean up these with Scala-2.13 profile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34105) In addition to killing exlcuded/flakey executors which should support decommissioning

2021-01-13 Thread Holden Karau (Jira)

Holden Karau created SPARK-34105:


 Summary: In addition to killing exlcuded/flakey executors which 
should support decommissioning
 Key: SPARK-34105
 URL: https://issues.apache.org/jira/browse/SPARK-34105
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Holden Karau


Decommissioning will give the executor a chance to migrate it's files to a more 
stable node.

 

Note: we want SPARK-34104 to be integrated as well so that flaky executors 
which can not decommission are eventually killed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34104) Allow users to specify a maximum decommissioning time

2021-01-13 Thread Holden Karau (Jira)

Holden Karau created SPARK-34104:


 Summary: Allow users to specify a maximum decommissioning time
 Key: SPARK-34104
 URL: https://issues.apache.org/jira/browse/SPARK-34104
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.0, 3.2.0, 3.1.1
Reporter: Holden Karau


We currently have the ability for users to set the predicted time of the 
cluster manager or cloud provider to terminate a decommissioning executor, but 
for nodes where Spark it's self is triggering decommissioning we should add the 
ability of users to specify a maximum time we want to allow the executor to 
decommission.

 

This is important especially if we start to in more places (like with excluded 
executors that are found to be flaky, that may or may not be able to 
decommission successfully).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34104) Allow users to specify a maximum decommissioning time

2021-01-13 Thread Holden Karau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau reassigned SPARK-34104:


Assignee: Holden Karau

> Allow users to specify a maximum decommissioning time
> -
>
> Key: SPARK-34104
> URL: https://issues.apache.org/jira/browse/SPARK-34104
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0, 3.1.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> We currently have the ability for users to set the predicted time of the 
> cluster manager or cloud provider to terminate a decommissioning executor, 
> but for nodes where Spark it's self is triggering decommissioning we should 
> add the ability of users to specify a maximum time we want to allow the 
> executor to decommission.
>  
> This is important especially if we start to in more places (like with 
> excluded executors that are found to be flaky, that may or may not be able to 
> decommission successfully).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34104) Allow users to specify a maximum decommissioning time

2021-01-13 Thread Holden Karau (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264488#comment-17264488
 ] 

Holden Karau commented on SPARK-34104:
--

I'm working on this.

> Allow users to specify a maximum decommissioning time
> -
>
> Key: SPARK-34104
> URL: https://issues.apache.org/jira/browse/SPARK-34104
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0, 3.1.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> We currently have the ability for users to set the predicted time of the 
> cluster manager or cloud provider to terminate a decommissioning executor, 
> but for nodes where Spark it's self is triggering decommissioning we should 
> add the ability of users to specify a maximum time we want to allow the 
> executor to decommission.
>  
> This is important especially if we start to in more places (like with 
> excluded executors that are found to be flaky, that may or may not be able to 
> decommission successfully).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23431) Expose the new executor memory metrics at the stage level

2021-01-13 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264468#comment-17264468
 ] 

Dongjoon Hyun commented on SPARK-23431:
---

Hi, [~Gengliang.Wang]. What is the fixed version of this JIRA? Could you set it 
please?

> Expose the new executor memory metrics at the stage level
> -
>
> Key: SPARK-23431
> URL: https://issues.apache.org/jira/browse/SPARK-23431
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Edward Lu
>Assignee: Terry Kim
>Priority: Major
>
> Collect and show the new executor memory metrics for each stage, to provide 
> more information on how memory is used per stage.
> Modify the AppStatusListener to track the peak values for JVM used memory, 
> execution memory, storage memory, and unified memory for each executor for 
> each stage.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34103:
--
Comment: was deleted

(was: User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174)

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264463#comment-17264463
 ] 

Apache Spark commented on SPARK-23429:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edward Lu
>Assignee: Edward Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264462#comment-17264462
 ] 

Apache Spark commented on SPARK-23429:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edward Lu
>Assignee: Edward Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264460#comment-17264460
 ] 

Apache Spark commented on SPARK-23429:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edward Lu
>Assignee: Edward Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264461#comment-17264461
 ] 

Apache Spark commented on SPARK-23429:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edward Lu
>Assignee: Edward Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34103:
--
Affects Version/s: 3.1.0
   3.0.0
   3.0.1

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264459#comment-17264459
 ] 

Apache Spark commented on SPARK-34103:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34103:


Assignee: (was: Apache Spark)

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264458#comment-17264458
 ] 

Apache Spark commented on SPARK-34103:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31174

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34103:


Assignee: Apache Spark

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-13 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-34103:
-

 Summary: Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
 Key: SPARK-34103
 URL: https://issues.apache.org/jira/browse/SPARK-34103
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.0, 3.1.1
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2021-01-13 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-23429:
-

Assignee: Edward Lu

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edward Lu
>Assignee: Edward Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34102) Spark SQL cannot escape both \ and other special characters

2021-01-13 Thread Noah Kawasaki (Jira)

Noah Kawasaki created SPARK-34102:
-

 Summary: Spark SQL cannot escape both \ and other special 
characters 
 Key: SPARK-34102
 URL: https://issues.apache.org/jira/browse/SPARK-34102
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 2.4.5, 2.3.0, 2.2.2, 2.1.3, 2.0.2
Reporter: Noah Kawasaki


Spark literal string parsing does not properly escape backslashes or other 
special characters. This is an extension of this issue: 
https://issues.apache.org/jira/browse/SPARK-17647#

 

The issue is that depending on how spark.sql.parser.escapedStringLiterals is 
set, you will either be able to correctly get escaped backslashes in a string 
literal, but not escaped other special characters, OR, you can have correctly 
escaped other special characters, but not correctly escaped backslashes.

So you have to choose which configuration you care about more.

I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all 
experience the issue:
{code:java}
# These do not return the expected backslash
SET spark.sql.parser.escapedStringLiterals=false;
SELECT '\\';
> \
(should return \\)

SELECT 'hi\hi';
> hihi
(should return hi\hi) 


# These are correctly escaped
SELECT '\"';
> "

 SELECT '\'';
> '{code}
If I switch this: 
{code:java}
# These now work
SET spark.sql.parser.escapedStringLiterals=true;
SELECT '\\';
> \\

SELECT 'hi\hi';
> hi\hi


# These are now not correctly escaped
SELECT '\"';
> \"
(should return ")

SELECT '\'';
> \'
(should return ' ){code}
 So basically we have to choose:

SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes 
correctly escaped but not other special characters

SET spark.sql.parser.escapedStringLiterals=true; if we want other special 
characters correctly escaped but not backslashes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2021-01-13 Thread Noah Kawasaki (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264452#comment-17264452
 ] 

Noah Kawasaki commented on SPARK-17647:
---

I can also confirm that this issue is not fully resolved. Like what [~swiegleb] 
has shown, escape characters are not fully supported. 

I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all 
experience the issue:
{code:java}
# These do not return the expected backslash
SET spark.sql.parser.escapedStringLiterals=false;
SELECT '\\';
> \
(should return \\)

SELECT 'hi\hi';
> hihi
(should return hi\hi) 


# These are correctly escaped
SELECT '\"';
> "

 SELECT '\'';
> '{code}
If I switch this: 
{code:java}
# These now work
SET spark.sql.parser.escapedStringLiterals=true;
SELECT '\\';
> \\

SELECT 'hi\hi';
> hi\hi


# These are now not correctly escaped
SELECT '\"';
> \"
(should return ")

SELECT '\'';
> \'
(should return ' ){code}
 So basically we have to choose:

SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes 
correctly escaped but not other special characters


SET spark.sql.parser.escapedStringLiterals=true; if we want other special 
characters correctly escaped but not backslashes

> SQL LIKE does not handle backslashes correctly
> --
>
> Key: SPARK-17647
> URL: https://issues.apache.org/jira/browse/SPARK-17647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Major
>  Labels: correctness
> Fix For: 2.1.1, 2.2.0
>
>
> Try the following in SQL shell:
> {code}
> select '' like '%\\%';
> {code}
> It returned false, which is wrong.
> cc: [~yhuai] [~joshrosen]
> A false-negative considered previously:
> {code}
> select '' rlike '.*.*';
> {code}
> It returned true, which is correct if we assume that the pattern is treated 
> as a Java string but not raw string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264451#comment-17264451
 ] 

Apache Spark commented on SPARK-34101:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31173

> Make spark-sql CLI configurable for the behavior of printing header by SET 
> command
> --
>
> Key: SPARK-34101
> URL: https://issues.apache.org/jira/browse/SPARK-34101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we 
> can change the behavior of printing header.
> But spark-sql CLI doesn't allow users to change Hive specific configurations 
> dynamically by SET command.
> So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34101:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Make spark-sql CLI configurable for the behavior of printing header by SET 
> command
> --
>
> Key: SPARK-34101
> URL: https://issues.apache.org/jira/browse/SPARK-34101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we 
> can change the behavior of printing header.
> But spark-sql CLI doesn't allow users to change Hive specific configurations 
> dynamically by SET command.
> So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34101:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Make spark-sql CLI configurable for the behavior of printing header by SET 
> command
> --
>
> Key: SPARK-34101
> URL: https://issues.apache.org/jira/browse/SPARK-34101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we 
> can change the behavior of printing header.
> But spark-sql CLI doesn't allow users to change Hive specific configurations 
> dynamically by SET command.
> So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264450#comment-17264450
 ] 

Apache Spark commented on SPARK-34101:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31173

> Make spark-sql CLI configurable for the behavior of printing header by SET 
> command
> --
>
> Key: SPARK-34101
> URL: https://issues.apache.org/jira/browse/SPARK-34101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we 
> can change the behavior of printing header.
> But spark-sql CLI doesn't allow users to change Hive specific configurations 
> dynamically by SET command.
> So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34101:
---
Description: 
Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we can 
change the behavior of printing header.
But spark-sql CLI doesn't allow users to change Hive specific configurations 
dynamically by SET command.
So, it's better to support the way to change the behavior by SET command.

  was:
Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can 
change the behavior of printing header.
But spark-sql CLI doesn't allow users to change Hive specific configurations 
dynamically by SET command.
So, it's better to support the way to change the behavior by SET command.


> Make spark-sql CLI configurable for the behavior of printing header by SET 
> command
> --
>
> Key: SPARK-34101
> URL: https://issues.apache.org/jira/browse/SPARK-34101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like Hive CLI, spark-sql CLI accepts hive.cli.print.header property and we 
> can change the behavior of printing header.
> But spark-sql CLI doesn't allow users to change Hive specific configurations 
> dynamically by SET command.
> So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34101:
---
Description: 
Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can 
change the behavior of printing header.
But spark-sql CLI doesn't allow users to change Hive specific configurations 
dynamically by SET command.
So, it's better to support the way to change the behavior by SET command.

  was:
Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can 
change the behavior of printing header.

But spark-sql CLI doesn't allow users to change Hive specific configurations 
dynamically by SET command.

So, it's better to support the way to change the behavior by SET command.


> Make spark-sql CLI configurable for the behavior of printing header by SET 
> command
> --
>
> Key: SPARK-34101
> URL: https://issues.apache.org/jira/browse/SPARK-34101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can 
> change the behavior of printing header.
> But spark-sql CLI doesn't allow users to change Hive specific configurations 
> dynamically by SET command.
> So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34101) Make spark-sql CLI configurable for the behavior of printing header by SET command

2021-01-13 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-34101:
--

 Summary: Make spark-sql CLI configurable for the behavior of 
printing header by SET command
 Key: SPARK-34101
 URL: https://issues.apache.org/jira/browse/SPARK-34101
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Like Hive CLI, spark-sql CLI accept hive.cli.print.header property and we can 
change the behavior of printing header.

But spark-sql CLI doesn't allow users to change Hive specific configurations 
dynamically by SET command.

So, it's better to support the way to change the behavior by SET command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19169) columns changed orc table encouter 'IndexOutOfBoundsException' when read the old schema files

2021-01-13 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264325#comment-17264325
 ] 

Dongjoon Hyun commented on SPARK-19169:
---

[~angerszhuuu]. Given the context, this looks like one of the ancient issue 
about the code between Hive and ORC. Please use the `convertMetastoreOrc` 
option as an workaround if you still see the issue with Apache Spark 2.3.2. I 
added native ORC reader to the Spark to avoid that kind of Hive ORC issue. BTW, 
both Apache Spark 2.3.2 and its Apache ORC 1.4.4 are EOL versions. I'd like to 
recommend to upgrade to the latest versions.

If there is a real issue, it would be great if we can have a reproducible 
examples with Apache Spark 3.1.0 RC1.

> columns changed orc table encouter 'IndexOutOfBoundsException' when read the 
> old schema files
> -
>
> Key: SPARK-19169
> URL: https://issues.apache.org/jira/browse/SPARK-19169
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: roncenzhao
>Priority: Major
>
> We hava an orc table called orc_test_tbl and hava inserted some data into it.
> After that, we change the table schema by droping some columns.
> When reading the old schema file, we get the follow exception.
> ```
> java.lang.IndexOutOfBoundsException: toIndex = 65
> at java.util.ArrayList.subListRangeCheck(ArrayList.java:962)
> at java.util.ArrayList.subList(ArrayList.java:954)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.getSchemaOnRead(RecordReaderFactory.java:161)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.createTreeReader(RecordReaderFactory.java:66)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:202)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1215)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:245)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2

2021-01-13 Thread Devin Boyer (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264309#comment-17264309
 ] 

Devin Boyer edited comment on SPARK-34100 at 1/13/21, 5:43 PM:
---

Noting that I found a workaround here: it appears that this is due to [an issue 
with the version of the setuptools|https://stackoverflow.com/a/55167875/316079] 
package bundled into the Python distribution with Amazon Linux 2, and the 
"wheel" library not being installed.

If this command is run on an Amazon Linux 2 installation with Python 3.7 
installed, then pyspark 2.4.x package installation succeeds:

 

{{pip3 install --upgrade --force-reinstall setuptools && pip3 install wheel}}

 

I noticed this doesn't happen with 3.0.x package versions, so maybe there's a 
difference in how the package is distributed between 2.4 and 3.x?

 


was (Author: drboyer):
Noting that I found a workaround here: it appears that this is due to an issue 
with the version of the setuptools package bundled into the Python distribution 
with Amazon Linux 2, and the "wheel" library not being installed.

If this command is run on an Amazon Linux 2 installation with Python 3.7 
installed, then pyspark 2.4.x package installation succeeds:

 

{{pip3 install --upgrade --force-reinstall setuptools && pip3 install wheel}}

 

I noticed this doesn't happen with 3.0.x package versions, so maybe there's a 
difference in how the package is distributed between 2.4 and 3.x?

 

> pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
> -
>
> Key: SPARK-34100
> URL: https://issues.apache.org/jira/browse/SPARK-34100
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, PySpark
>Affects Versions: 2.4.7
> Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also 
> tested with pip 20.3.3), using Docker or EMR 5.32.0
>  
> Example Dockerfile to reproduce:
> {{FROM amazonlinux:2}}
> {{RUN yum install -y python3}}
> {{RUN pip3 install pyspark==2.4.7}}
>  
>Reporter: Devin Boyer
>Priority: Minor
>
> I'm unable to install the pyspark Python package on Amazon Linux 2, whether 
> in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with 
> Python 3.7 and pip 9.0.3, but upgrading pip yields the same result.
>  
> When installing the package, the installation will fail with the error 
> "ValueError: bad marshal data (unknown type code)". Full example stack below.
>  
> This bug prevents use of pyspark for simple testing environments, and from 
> using tools where the pyspark package is a dependency, like 
> [https://github.com/awslabs/python-deequ.]
>  
> Stack Trace:
> {{Step 3/3 : RUN pip3 install pyspark==2.4.7}}
> {{ ---> Running in 2c6e1c1de62f}}
> {{WARNING: Running pip install with root privileges is generally not a good 
> idea. Try `pip3 install --user` instead.}}
> {{Collecting pyspark==2.4.7}}
> {{ Downloading 
> https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz
>  (217.9MB)}}
> {{ Complete output from command python setup.py egg_info:}}
> {{ Could not import pypandoc - required to package PySpark}}
> {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown 
> distribution option: 'long_description_content_type'}}
> {{ warnings.warn(msg)}}
> {{ zip_safe flag not set; analyzing archive contents...}}
> {{ Traceback (most recent call last):}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, 
> in save_modules}}
> {{ yield saved}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, 
> in setup_context}}
> {{ yield}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, 
> in run_setup}}
> {{ _execfile(setup_script, ns)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in 
> _execfile}}
> {{ exec(code, globals, locals)}}
> {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in 
> }}
> {{ # using Python imports instead which will be resolved correctly.}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, 
> in setup}}
> {{ return distutils.core.setup(**attrs)}}
> {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}}
> {{ dist.run_commands()}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}}
> {{ self.run_command(cmd)}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}}
> {{ cmd_obj.run()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 218, in run}}
> {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 269, in zip_safe}}

[jira] [Commented] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2

2021-01-13 Thread Devin Boyer (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264309#comment-17264309
 ] 

Devin Boyer commented on SPARK-34100:
-

Noting that I found a workaround here: it appears that this is due to an issue 
with the version of the setuptools package bundled into the Python distribution 
with Amazon Linux 2, and the "wheel" library not being installed.

If this command is run on an Amazon Linux 2 installation with Python 3.7 
installed, then pyspark 2.4.x package installation succeeds:

 

{{pip3 install --upgrade --force-reinstall setuptools && pip3 install wheel}}

 

I noticed this doesn't happen with 3.0.x package versions, so maybe there's a 
difference in how the package is distributed between 2.4 and 3.x?

 

> pyspark 2.4 packages can't be installed via pip on Amazon Linux 2
> -
>
> Key: SPARK-34100
> URL: https://issues.apache.org/jira/browse/SPARK-34100
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, PySpark
>Affects Versions: 2.4.7
> Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also 
> tested with pip 20.3.3), using Docker or EMR 5.32.0
>  
> Example Dockerfile to reproduce:
> {{FROM amazonlinux:2}}
> {{RUN yum install -y python3}}
> {{RUN pip3 install pyspark==2.4.7}}
>  
>Reporter: Devin Boyer
>Priority: Minor
>
> I'm unable to install the pyspark Python package on Amazon Linux 2, whether 
> in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with 
> Python 3.7 and pip 9.0.3, but upgrading pip yields the same result.
>  
> When installing the package, the installation will fail with the error 
> "ValueError: bad marshal data (unknown type code)". Full example stack below.
>  
> This bug prevents use of pyspark for simple testing environments, and from 
> using tools where the pyspark package is a dependency, like 
> [https://github.com/awslabs/python-deequ.]
>  
> Stack Trace:
> {{Step 3/3 : RUN pip3 install pyspark==2.4.7}}
> {{ ---> Running in 2c6e1c1de62f}}
> {{WARNING: Running pip install with root privileges is generally not a good 
> idea. Try `pip3 install --user` instead.}}
> {{Collecting pyspark==2.4.7}}
> {{ Downloading 
> https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz
>  (217.9MB)}}
> {{ Complete output from command python setup.py egg_info:}}
> {{ Could not import pypandoc - required to package PySpark}}
> {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown 
> distribution option: 'long_description_content_type'}}
> {{ warnings.warn(msg)}}
> {{ zip_safe flag not set; analyzing archive contents...}}
> {{ Traceback (most recent call last):}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, 
> in save_modules}}
> {{ yield saved}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, 
> in setup_context}}
> {{ yield}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, 
> in run_setup}}
> {{ _execfile(setup_script, ns)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in 
> _execfile}}
> {{ exec(code, globals, locals)}}
> {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in 
> }}
> {{ # using Python imports instead which will be resolved correctly.}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, 
> in setup}}
> {{ return distutils.core.setup(**attrs)}}
> {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}}
> {{ dist.run_commands()}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}}
> {{ self.run_command(cmd)}}
> {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}}
> {{ cmd_obj.run()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 218, in run}}
> {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 269, in zip_safe}}
> {{ return analyze_egg(self.bdist_dir, self.stubs)}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 379, in analyze_egg}}
> {{ safe = scan_module(egg_dir, base, name, stubs) and safe}}
> {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
> line 416, in scan_module}}
> {{ code = marshal.load(f)}}
> {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the 
> above exception, another exception occurred:}}{{Traceback (most recent call 
> last):}}
> {{ File "", line 1, in }}
> {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }}
> {{ 'Programming Language :: Python :: Implementation :: PyPy']}}
> {{ File

[jira] [Commented] (SPARK-32333) Drop references to Master

2021-01-13 Thread Neil Shah-Quinn (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264303#comment-17264303
 ] 

Neil Shah-Quinn commented on SPARK-32333:
-

I'm glad there's a plan to improve this language!

For what it's worth, I like "Scheduler" or "Coordinator". They're short and 
accurately reflect that (as I understand it) its role is simply to assign 
executors which then communicate directly with the driver program.

> Drop references to Master
> -
>
> Key: SPARK-32333
> URL: https://issues.apache.org/jira/browse/SPARK-32333
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> We have a lot of references to "master" in the code base. It will be 
> beneficial to remove references to problematic language that can alienate 
> potential community members. 
> SPARK-32004 removed references to slave
>  
> Here is a IETF draft to fix up some of the most egregious examples
> (master/slave, whitelist/backlist) with proposed alternatives.
> https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34100) pyspark 2.4 packages can't be installed via pip on Amazon Linux 2

2021-01-13 Thread Devin Boyer (Jira)

Devin Boyer created SPARK-34100:
---

 Summary: pyspark 2.4 packages can't be installed via pip on Amazon 
Linux 2
 Key: SPARK-34100
 URL: https://issues.apache.org/jira/browse/SPARK-34100
 Project: Spark
  Issue Type: Bug
  Components: Deploy, PySpark
Affects Versions: 2.4.7
 Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also 
tested with pip 20.3.3), using Docker or EMR 5.32.0

 

Example Dockerfile to reproduce:

{{FROM amazonlinux:2}}
{{RUN yum install -y python3}}
{{RUN pip3 install pyspark==2.4.7}}

 
Reporter: Devin Boyer


I'm unable to install the pyspark Python package on Amazon Linux 2, whether in 
a Docker image or an EMR cluster. Amazon Linux 2 currently ships with Python 
3.7 and pip 9.0.3, but upgrading pip yields the same result.

 

When installing the package, the installation will fail with the error 
"ValueError: bad marshal data (unknown type code)". Full example stack below.

 

This bug prevents use of pyspark for simple testing environments, and from 
using tools where the pyspark package is a dependency, like 
[https://github.com/awslabs/python-deequ.]

 

Stack Trace:

{{Step 3/3 : RUN pip3 install pyspark==2.4.7}}
{{ ---> Running in 2c6e1c1de62f}}
{{WARNING: Running pip install with root privileges is generally not a good 
idea. Try `pip3 install --user` instead.}}
{{Collecting pyspark==2.4.7}}
{{ Downloading 
https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz
 (217.9MB)}}
{{ Complete output from command python setup.py egg_info:}}
{{ Could not import pypandoc - required to package PySpark}}
{{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown 
distribution option: 'long_description_content_type'}}
{{ warnings.warn(msg)}}
{{ zip_safe flag not set; analyzing archive contents...}}
{{ Traceback (most recent call last):}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, in 
save_modules}}
{{ yield saved}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, in 
setup_context}}
{{ yield}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, in 
run_setup}}
{{ _execfile(setup_script, ns)}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in 
_execfile}}
{{ exec(code, globals, locals)}}
{{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in 
}}
{{ # using Python imports instead which will be resolved correctly.}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, in 
setup}}
{{ return distutils.core.setup(**attrs)}}
{{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}}
{{ dist.run_commands()}}
{{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}}
{{ self.run_command(cmd)}}
{{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}}
{{ cmd_obj.run()}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
line 218, in run}}
{{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
line 269, in zip_safe}}
{{ return analyze_egg(self.bdist_dir, self.stubs)}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
line 379, in analyze_egg}}
{{ safe = scan_module(egg_dir, base, name, stubs) and safe}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", 
line 416, in scan_module}}
{{ code = marshal.load(f)}}
{{ ValueError: bad marshal data (unknown type code)}}{{During handling of the 
above exception, another exception occurred:}}{{Traceback (most recent call 
last):}}
{{ File "", line 1, in }}
{{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in }}
{{ 'Programming Language :: Python :: Implementation :: PyPy']}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, in 
setup}}
{{ _install_setup_requires(attrs)}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, in 
_install_setup_requires}}
{{ dist.fetch_build_eggs(dist.setup_requires)}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in 
fetch_build_eggs}}
{{ replace_conflicting=True,}}
{{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 866, 
in resolve}}
{{ replace_conflicting=replace_conflicting}}
{{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 
1146, in best_match}}
{{ return self.obtain(req, installer)}}
{{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 
1158, in obtain}}
{{ return installer(requirement)}}
{{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 528, in 
fetch_build_egg}}
{{ return cmd.easy_install(req)}}
{{ File

[jira] [Assigned] (SPARK-34070) Replaces find and emptiness check with exists.

2021-01-13 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-34070:


Assignee: Yang Jie

> Replaces find and emptiness check with exists.
> --
>
> Key: SPARK-34070
> URL: https://issues.apache.org/jira/browse/SPARK-34070
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
>
> Semantic consistency and code Simpilefications
> before:
> {code:java}
> seq.find(p).isDefined
> or
> seq.find(p).isEmpty
> {code}
> after:
> {code:java}
> seq.exists(p)
> or
> !seq.exists(p)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34070) Replaces find and emptiness check with exists.

2021-01-13 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-34070.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31130
[https://github.com/apache/spark/pull/31130]

> Replaces find and emptiness check with exists.
> --
>
> Key: SPARK-34070
> URL: https://issues.apache.org/jira/browse/SPARK-34070
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.2.0
>
>
> Semantic consistency and code Simpilefications
> before:
> {code:java}
> seq.find(p).isDefined
> or
> seq.find(p).isEmpty
> {code}
> after:
> {code:java}
> seq.exists(p)
> or
> !seq.exists(p)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34070) Replaces find and emptiness check with exists.

2021-01-13 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-34070:
-
Priority: Trivial  (was: Minor)

> Replaces find and emptiness check with exists.
> --
>
> Key: SPARK-34070
> URL: https://issues.apache.org/jira/browse/SPARK-34070
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Trivial
>
> Semantic consistency and code Simpilefications
> before:
> {code:java}
> seq.find(p).isDefined
> or
> seq.find(p).isEmpty
> {code}
> after:
> {code:java}
> seq.exists(p)
> or
> !seq.exists(p)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 137 matches

Mail list logo