date:20210630

[jira] [Assigned] (SPARK-35948) Simplify release scripts by removing Spark 2.4/Java7 parts

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35948:


Assignee: Apache Spark

> Simplify release scripts by removing Spark 2.4/Java7 parts
> --
>
> Key: SPARK-35948
> URL: https://issues.apache.org/jira/browse/SPARK-35948
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35948) Simplify release scripts by removing Spark 2.4/Java7 parts

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35948:


Assignee: (was: Apache Spark)

> Simplify release scripts by removing Spark 2.4/Java7 parts
> --
>
> Key: SPARK-35948
> URL: https://issues.apache.org/jira/browse/SPARK-35948
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35948) Simplify release scripts by removing Spark 2.4/Java7 parts

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371851#comment-17371851
 ] 

Apache Spark commented on SPARK-35948:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33150

> Simplify release scripts by removing Spark 2.4/Java7 parts
> --
>
> Key: SPARK-35948
> URL: https://issues.apache.org/jira/browse/SPARK-35948
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35935) REPAIR TABLE fails on table refreshing

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371853#comment-17371853
 ] 

Apache Spark commented on SPARK-35935:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/33152

> REPAIR TABLE fails on table refreshing
> --
>
> Key: SPARK-35935
> URL: https://issues.apache.org/jira/browse/SPARK-35935
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> MSCK REPAIR TABLE can fail while table recovering with the exception:
> {code:java}
> Error in SQL statement: AnalysisException: Incompatible format detected.
> ...
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$verifyNonDeltaTable(DataSourceStrategy.scala:297)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply0$1.applyOrElse(DataSourceStrategy.scala:378)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply0$1.applyOrElse(DataSourceStrategy.scala:342)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:86)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:316)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1093)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1092)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.mapChildren(LogicalPlan.scala:187)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:175)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:316)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:98)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:95)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:74)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply0(DataSourceStrategy.scala:342)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:336)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:248)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:221)
>   at 
> com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:221)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(Li

[jira] [Commented] (SPARK-35935) REPAIR TABLE fails on table refreshing

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371854#comment-17371854
 ] 

Apache Spark commented on SPARK-35935:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/33152

> REPAIR TABLE fails on table refreshing
> --
>
> Key: SPARK-35935
> URL: https://issues.apache.org/jira/browse/SPARK-35935
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> MSCK REPAIR TABLE can fail while table recovering with the exception:
> {code:java}
> Error in SQL statement: AnalysisException: Incompatible format detected.
> ...
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$verifyNonDeltaTable(DataSourceStrategy.scala:297)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply0$1.applyOrElse(DataSourceStrategy.scala:378)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply0$1.applyOrElse(DataSourceStrategy.scala:342)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:86)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:316)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1093)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1092)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.mapChildren(LogicalPlan.scala:187)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:175)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:316)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:98)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:95)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:74)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply0(DataSourceStrategy.scala:342)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:336)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:248)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:221)
>   at 
> com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:221)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(Li

[jira] [Created] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread SunPeng (Jira)

SunPeng created SPARK-35949:
---

 Summary: On client mode, spark context will stopped while 
application is started.
 Key: SPARK-35949
 URL: https://issues.apache.org/jira/browse/SPARK-35949
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: SunPeng


In spark 3.1.2, on client mode, the spark context while stopped while the 
application is started.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread SunPeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SunPeng updated SPARK-35949:

Description: 
In spark 3.1.2, on client mode, the spark context while stopped while the 
application is started.
{quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors 
and spark.executor.instances
21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
300(ns)
21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page template: 
index
21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
["http-nio-9000"]
21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) 
with context path ''
21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
seconds (JVM running for 529.958)
21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
(http/1.1)}{0.0.0.0:4040}
21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
http://mongodb10.pek01.rack.zhihu.com:4040
21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor thread
21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all executors
21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
backend Stopped
21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
21/06/30 12:03:40 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
'dispatcherServlet'
21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
'dispatcherServlet'
21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
{quote}

  was:
In spark 3.1.2, on client mode, the spark context while stopped while the 
application is started.

 


> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
> 21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
> 21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
> 21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
> 21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
> 21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
> 21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
> 21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}{0.0.0.0:4040}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
> http://mongodb10.pek01.rack.zhihu.com:4040
> 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor thread
> 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all executors
> 21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
> 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
> 21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
> 21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
> 21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
> 21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stoppe

[jira] [Updated] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread SunPeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SunPeng updated SPARK-35949:

Description: 
In spark 3.1.2, on client mode, the spark context while stopped while the 
application is started.
{quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
 21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors 
and spark.executor.instances
 21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
300(ns)
 21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
template: index
 21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
["http-nio-9000"]
 21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) 
with context path ''
 21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
seconds (JVM running for 529.958)
 21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
(http/1.1)}
Unknown macro: \{0.0.0.0}
21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor thread
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all executors
 21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
backend Stopped
 21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
 21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
 21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
 21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
 21/06/30 12:03:40 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
 21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
 21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
'dispatcherServlet'
 21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
'dispatcherServlet'
 21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
{quote}

  was:
In spark 3.1.2, on client mode, the spark context while stopped while the 
application is started.
{quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors 
and spark.executor.instances
21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
300(ns)
21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page template: 
index
21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
["http-nio-9000"]
21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) 
with context path ''
21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
seconds (JVM running for 529.958)
21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
(http/1.1)}{0.0.0.0:4040}
21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
http://mongodb10.pek01.rack.zhihu.com:4040
21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor thread
21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all executors
21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
backend Stopped
21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
21/06/30 12:03:40 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
'dispatcherServlet'
21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
'dispatcherServlet'
21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
{quote}


> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  I

[jira] [Assigned] (SPARK-35947) Increase JVM stack size in release-build.sh

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35947:


Assignee: Dongjoon Hyun

> Increase JVM stack size in release-build.sh
> ---
>
> Key: SPARK-35947
> URL: https://issues.apache.org/jira/browse/SPARK-35947
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35947) Increase JVM stack size in release-build.sh

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35947.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33149
[https://github.com/apache/spark/pull/33149]

> Increase JVM stack size in release-build.sh
> ---
>
> Key: SPARK-35947
> URL: https://issues.apache.org/jira/browse/SPARK-35947
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35948) Simplify release scripts by removing Spark 2.4/Java7 parts

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35948:


Assignee: Dongjoon Hyun

> Simplify release scripts by removing Spark 2.4/Java7 parts
> --
>
> Key: SPARK-35948
> URL: https://issues.apache.org/jira/browse/SPARK-35948
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35948) Simplify release scripts by removing Spark 2.4/Java7 parts

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35948.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33150
[https://github.com/apache/spark/pull/33150]

> Simplify release scripts by removing Spark 2.4/Java7 parts
> --
>
> Key: SPARK-35948
> URL: https://issues.apache.org/jira/browse/SPARK-35948
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Su updated SPARK-35950:
-
Attachment: Screenshot from 2021-06-30 15-55-05.png

> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page, the 
> "Failed tasks" column disappear instead of the "Exec Loss Reason" column. 
> !image-2021-06-30-15-56-27-770.png!
> !image-2021-06-30-15-56-03-613.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)

Kevin Su created SPARK-35950:


 Summary: Failed to toggle Exec Loss Reason in the executors page
 Key: SPARK-35950
 URL: https://issues.apache.org/jira/browse/SPARK-35950
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.2.0
Reporter: Kevin Su
 Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
2021-06-30 15-55-05.png

When Clicked the checkbox "Exec Loss Reason" on the executor page, the "Failed 
tasks" column disappear instead of the "Exec Loss Reason" column. 

!image-2021-06-30-15-56-27-770.png!

!image-2021-06-30-15-56-03-613.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Su updated SPARK-35950:
-
Attachment: Screenshot from 2021-06-30 13-28-16.png

> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page, the 
> "Failed tasks" column disappear instead of the "Exec Loss Reason" column. 
> !image-2021-06-30-15-56-27-770.png!
> !image-2021-06-30-15-56-03-613.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Su updated SPARK-35950:
-
Description: 
When Clicked the checkbox "Exec Loss Reason" on the executor page,

the "Failed tasks" column disappear instead of the "Exec Loss Reason" column.  

  was:
When Clicked the checkbox "Exec Loss Reason" on the executor page, the "Failed 
tasks" column disappear instead of the "Exec Loss Reason" column. 

!image-2021-06-30-15-56-27-770.png!

!image-2021-06-30-15-56-03-613.png!

 


> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Failed tasks" column disappear instead of the "Exec Loss Reason" column. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Su updated SPARK-35950:
-
Description: 
When Clicked the checkbox "Exec Loss Reason" on the executor page,

the "Failed tasks" column disappears instead of the "Exec Loss Reason" column.  

  was:
When Clicked the checkbox "Exec Loss Reason" on the executor page,

the "Failed tasks" column disappear instead of the "Exec Loss Reason" column.  


> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Failed tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30332) When running sql query with limit catalyst throw StackOverFlow exception

2021-06-30 Thread Roy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371868#comment-17371868
 ] 

Roy commented on SPARK-30332:
-

We came up with similar situation. We apply many window funciton on a dataset, 
the first time to execute dataset.collectAsList() with limit, we got 
StackOverFlow Exception. However, if we execute dataset.save() without limit 
first, it worked well and after that ,we can get the right result executing 
dataset.collectAsList() with limit as well

> When running sql query with limit catalyst throw StackOverFlow exception 
> -
>
> Key: SPARK-30332
> URL: https://issues.apache.org/jira/browse/SPARK-30332
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: spark version 3.0.0-preview
>Reporter: Izek Greenfield
>Priority: Major
> Attachments: AGGR_41380.csv, AGGR_41390.csv, AGGR_41406.csv, 
> AGGR_41406.csv, AGGR_41410.csv, AGGR_41418.csv, PORTFOLIO_41446.csv, 
> T_41233.csv
>
>
> Running that SQL:
> {code:sql}
> SELECT  BT_capital.asof_date,
> BT_capital.run_id,
> BT_capital.v,
> BT_capital.id,
> BT_capital.entity,
> BT_capital.level_1,
> BT_capital.level_2,
> BT_capital.level_3,
> BT_capital.level_4,
> BT_capital.level_5,
> BT_capital.level_6,
> BT_capital.path_bt_capital,
> BT_capital.line_item,
> t0.target_line_item,
> t0.line_description,
> BT_capital.col_item,
> BT_capital.rep_amount,
> root.orgUnitId,
> root.cptyId,
> root.instId,
> root.startDate,
> root.maturityDate,
> root.amount,
> root.nominalAmount,
> root.quantity,
> root.lkupAssetLiability,
> root.lkupCurrency,
> root.lkupProdType,
> root.interestResetDate,
> root.interestResetTerm,
> root.noticePeriod,
> root.historicCostAmount,
> root.dueDate,
> root.lkupResidence,
> root.lkupCountryOfUltimateRisk,
> root.lkupSector,
> root.lkupIndustry,
> root.lkupAccountingPortfolioType,
> root.lkupLoanDepositTerm,
> root.lkupFixedFloating,
> root.lkupCollateralType,
> root.lkupRiskType,
> root.lkupEligibleRefinancing,
> root.lkupHedging,
> root.lkupIsOwnIssued,
> root.lkupIsSubordinated,
> root.lkupIsQuoted,
> root.lkupIsSecuritised,
> root.lkupIsSecuritisedServiced,
> root.lkupIsSyndicated,
> root.lkupIsDeRecognised,
> root.lkupIsRenegotiated,
> root.lkupIsTransferable,
> root.lkupIsNewBusiness,
> root.lkupIsFiduciary,
> root.lkupIsNonPerforming,
> root.lkupIsInterGroup,
> root.lkupIsIntraGroup,
> root.lkupIsRediscounted,
> root.lkupIsCollateral,
> root.lkupIsExercised,
> root.lkupIsImpaired,
> root.facilityId,
> root.lkupIsOTC,
> root.lkupIsDefaulted,
> root.lkupIsSavingsPosition,
> root.lkupIsForborne,
> root.lkupIsDebtRestructuringLoan,
> root.interestRateAAR,
> root.interestRateAPRC,
> root.custom1,
> root.custom2,
> root.custom3,
> root.lkupSecuritisationType,
> root.lkupIsCashPooling,
> root.lkupIsEquityParticipationGTE10,
> root.lkupIsConvertible,
> root.lkupEconomicHedge,
> root.lkupIsNonCurrHeldForSale,
> root.lkupIsEmbeddedDerivative,
> root.lkupLoanPurpose,
> root.lkupRegulated,
> root.lkupRepaymentType,
> root.glAccount,
> root.lkupIsRecourse,
> root.lkupIsNotFullyGuaranteed,
> root.lkupImpairmentStage,
> root.lkupIsEntireAmountWrittenOff,
> root.lkupIsLowCreditRisk,
> root.lkupIsOBSWithinIFRS9,
> root.lkupIsUnderSpecialSurveillance,
> root.lkupProtection,
> root.lkupIsGeneralAllowance,
> root.lkupSectorUltimateRisk,
> root.cptyOrgUnitId,
> root.name,
> root.lkupNationality,
> root.lkupSize,
> root.lkupIsSPV,
> root.lkupIsCentralCounterparty,
> root.lkupIsMMRMFI,
> root.lkupIsKeyManagement,
> root.lkupIsOtherRelatedParty,
> root.lkupResidenceProvince,
> root.lkupIsTradingBook,
> root.entityHierarchy_entityId,
> root.entityHierarchy_Residence,
> root.lkupLocalCurrency,
> root.cpty_entityhierarchy_entityId,
> root.lkupRelationship,
> root.cpty_lkupRelationship,
> root.entityNationality,
> root.lkupRepCurrency,
> root.startDateFinancialYear,
> root.numEmployees,
> root.numEmployeesTotal,
> root.collateralAmount,
> root.guaranteeAmount,
> root.impairmentSpecificIndividual,
> root.impairmentSpecificCollective,
> root.impairmentGeneral,
> root.creditRiskAmount,
> root.provisionSpecificIndividual,
> root.provisionSpecificCollective,
> root.provisionGeneral,
> root.writeOffAmount,
> root.interest,
> root.fairValueAmount,
> root.grossCarryingAmount,
> root.carryingAmount,
> root.code,
> root.lkupInstrumentType,
> root.price,
> root.amountAtIssue,
> root.yield,
> root.totalFacilityAmount,
> root.facility_rate,
> root.spec_indiv_est,
> root.spec_coll_est,
> root.coll_inc_loss,
> root.impairment_amount,
> root.provision_amount,
> root.accumulated_impairment,
> root.exclusionFlag,
> root.lkupIsHoldingCompany,
> root.instrument_startDate,
> root.entityResidence,
> fxRate.enumerator,
> fxRate.lkupFromCurrency,
>

[jira] [Commented] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371870#comment-17371870
 ] 

Kevin Su commented on SPARK-35950:
--

I'm working on it

> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Failed tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35949:


Assignee: Apache Spark

> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Assignee: Apache Spark
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
> Unknown macro: \{0.0.0.0}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371871#comment-17371871
 ] 

Apache Spark commented on SPARK-35949:
--

User 'sunpe' has created a pull request for this issue:
https://github.com/apache/spark/pull/33151

> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
> Unknown macro: \{0.0.0.0}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35949:


Assignee: (was: Apache Spark)

> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
> Unknown macro: \{0.0.0.0}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34365) Support configurable Avro schema field matching for positional or by-name

2021-06-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-34365.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31490
[https://github.com/apache/spark/pull/31490]

> Support configurable Avro schema field matching for positional or by-name
> -
>
> Key: SPARK-34365
> URL: https://issues.apache.org/jira/browse/SPARK-34365
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0
>
>
> When reading an Avro dataset (using the dataset's schema or by overriding it 
> with 'avroSchema') or writing an Avro dataset with a provided schema by 
> 'avroSchema', currently the matching of Catalyst-to-Avro fields is done by 
> field name.
> This behavior is somewhat recent; prior to SPARK-27762 (fixed in 3.0.0), at 
> least on the write path, we would match the schemas by positionally 
> ("structural" comparison). While I agree that this is much more sensible for 
> default behavior, I propose that we make this behavior configurable using an 
> {{option}} for the Avro datasource. Even at the time that SPARK-27762 was 
> handled, there was [interest in making this behavior 
> configurable|https://github.com/apache/spark/pull/24635#issuecomment-494205251],
>  but it appears it went unaddressed.
> There is precedence for configurability of this behavior as seen in 
> SPARK-32864, which added this support for ORC. Besides this precedence, the 
> behavior of Hive is to perform matching positionally 
> ([ref|https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-WritingtablestoAvrofiles]),
>  so this is behavior that Hadoop/Hive ecosystem users are familiar with:
> {quote}
> Hive is very forgiving about types: it will attempt to store whatever value 
> matches the provided column in the equivalent column position in the new 
> table. No matching is done on column names, for instance.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34365) Support configurable Avro schema field matching for positional or by-name

2021-06-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-34365:
--

Assignee: Erik Krogen

> Support configurable Avro schema field matching for positional or by-name
> -
>
> Key: SPARK-34365
> URL: https://issues.apache.org/jira/browse/SPARK-34365
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> When reading an Avro dataset (using the dataset's schema or by overriding it 
> with 'avroSchema') or writing an Avro dataset with a provided schema by 
> 'avroSchema', currently the matching of Catalyst-to-Avro fields is done by 
> field name.
> This behavior is somewhat recent; prior to SPARK-27762 (fixed in 3.0.0), at 
> least on the write path, we would match the schemas by positionally 
> ("structural" comparison). While I agree that this is much more sensible for 
> default behavior, I propose that we make this behavior configurable using an 
> {{option}} for the Avro datasource. Even at the time that SPARK-27762 was 
> handled, there was [interest in making this behavior 
> configurable|https://github.com/apache/spark/pull/24635#issuecomment-494205251],
>  but it appears it went unaddressed.
> There is precedence for configurability of this behavior as seen in 
> SPARK-32864, which added this support for ORC. Besides this precedence, the 
> behavior of Hive is to perform matching positionally 
> ([ref|https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-WritingtablestoAvrofiles]),
>  so this is behavior that Hadoop/Hive ecosystem users are familiar with:
> {quote}
> Hive is very forgiving about types: it will attempt to store whatever value 
> matches the provided column in the equivalent column position in the new 
> table. No matching is done on column names, for instance.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35935) REPAIR TABLE fails on table refreshing

2021-06-30 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-35935:
-
Affects Version/s: 3.0.4
   3.1.3

> REPAIR TABLE fails on table refreshing
> --
>
> Key: SPARK-35935
> URL: https://issues.apache.org/jira/browse/SPARK-35935
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.1.3, 3.0.4
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> MSCK REPAIR TABLE can fail while table recovering with the exception:
> {code:java}
> Error in SQL statement: AnalysisException: Incompatible format detected.
> ...
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$verifyNonDeltaTable(DataSourceStrategy.scala:297)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply0$1.applyOrElse(DataSourceStrategy.scala:378)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply0$1.applyOrElse(DataSourceStrategy.scala:342)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:86)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:316)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1093)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1092)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.mapChildren(LogicalPlan.scala:187)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:175)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:316)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:98)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:95)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:74)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply0(DataSourceStrategy.scala:342)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:336)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:248)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:221)
>   at 
> com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:221)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(

[jira] [Created] (SPARK-35951) Add since versions for Avro options in Documentation

2021-06-30 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35951:
--

 Summary: Add since versions for Avro options in Documentation
 Key: SPARK-35951
 URL: https://issues.apache.org/jira/browse/SPARK-35951
 Project: Spark
  Issue Type: Task
  Components: Documentation, SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


There are two new Avro options `datetimeRebaseMode` and 
`positionalFieldMatching` after Spark 3.2.
We should document the since version so that users can know whether the option 
works in their Spark version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35951) Add since versions for Avro options in Documentation

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35951:


Assignee: Gengliang Wang  (was: Apache Spark)

> Add since versions for Avro options in Documentation
> 
>
> Key: SPARK-35951
> URL: https://issues.apache.org/jira/browse/SPARK-35951
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> There are two new Avro options `datetimeRebaseMode` and 
> `positionalFieldMatching` after Spark 3.2.
> We should document the since version so that users can know whether the 
> option works in their Spark version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35951) Add since versions for Avro options in Documentation

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35951:


Assignee: Apache Spark  (was: Gengliang Wang)

> Add since versions for Avro options in Documentation
> 
>
> Key: SPARK-35951
> URL: https://issues.apache.org/jira/browse/SPARK-35951
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> There are two new Avro options `datetimeRebaseMode` and 
> `positionalFieldMatching` after Spark 3.2.
> We should document the since version so that users can know whether the 
> option works in their Spark version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35951) Add since versions for Avro options in Documentation

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371908#comment-17371908
 ] 

Apache Spark commented on SPARK-35951:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33153

> Add since versions for Avro options in Documentation
> 
>
> Key: SPARK-35951
> URL: https://issues.apache.org/jira/browse/SPARK-35951
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> There are two new Avro options `datetimeRebaseMode` and 
> `positionalFieldMatching` after Spark 3.2.
> We should document the since version so that users can know whether the 
> option works in their Spark version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35951) Add since versions for Avro options in Documentation

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371907#comment-17371907
 ] 

Apache Spark commented on SPARK-35951:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33153

> Add since versions for Avro options in Documentation
> 
>
> Key: SPARK-35951
> URL: https://issues.apache.org/jira/browse/SPARK-35951
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> There are two new Avro options `datetimeRebaseMode` and 
> `positionalFieldMatching` after Spark 3.2.
> We should document the since version so that users can know whether the 
> option works in their Spark version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371912#comment-17371912
 ] 

Apache Spark commented on SPARK-35949:
--

User 'sunpe' has created a pull request for this issue:
https://github.com/apache/spark/pull/33154

> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
> Unknown macro: \{0.0.0.0}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35949) On client mode, spark context will stopped while application is started.

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371913#comment-17371913
 ] 

Apache Spark commented on SPARK-35949:
--

User 'sunpe' has created a pull request for this issue:
https://github.com/apache/spark/pull/33154

> On client mode, spark context will stopped while application is started.
> 
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
> Unknown macro: \{0.0.0.0}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34920) Introduce SQLSTATE and ERRORCODE to SQL Exception

2021-06-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34920.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32850
[https://github.com/apache/spark/pull/32850]

> Introduce SQLSTATE and ERRORCODE to SQL Exception
> -
>
> Key: SPARK-34920
> URL: https://issues.apache.org/jira/browse/SPARK-34920
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> SQLSTATE is SQL standard state. Please see



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34920) Introduce SQLSTATE and ERRORCODE to SQL Exception

2021-06-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34920:
---

Assignee: Karen Feng

> Introduce SQLSTATE and ERRORCODE to SQL Exception
> -
>
> Key: SPARK-34920
> URL: https://issues.apache.org/jira/browse/SPARK-34920
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Karen Feng
>Priority: Major
> Fix For: 3.2.0
>
>
> SQLSTATE is SQL standard state. Please see



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35951) Add since versions for Avro options in Documentation

2021-06-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35951.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33153
[https://github.com/apache/spark/pull/33153]

> Add since versions for Avro options in Documentation
> 
>
> Key: SPARK-35951
> URL: https://issues.apache.org/jira/browse/SPARK-35951
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.2.0
>
>
> There are two new Avro options `datetimeRebaseMode` and 
> `positionalFieldMatching` after Spark 3.2.
> We should document the since version so that users can know whether the 
> option works in their Spark version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35949) working with spring boot, spark context will stopped while application is started.

2021-06-30 Thread SunPeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SunPeng updated SPARK-35949:

Summary: working with spring boot, spark context will stopped while 
application is started.  (was: On client mode, spark context will stopped while 
application is started.)

> working with spring boot, spark context will stopped while application is 
> started.
> --
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Critical
>
> In spark 3.1.2, on client mode, the spark context while stopped while the 
> application is started.
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
> Unknown macro: \{0.0.0.0}
> 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35949) working with spring boot, spark context will stopped while application is started.

2021-06-30 Thread SunPeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SunPeng updated SPARK-35949:

Description: 
Spark 3.1.2 with springboot, on client mode, the spark context while stopped 
while the application is started.

 

 
{code:java}
//代码占位符
@Bean
@ConditionalOnMissingBean(SparkSession.class)
public SparkSession sparkSession(SparkConf conf) {
return SparkSession.builder()
.enableHiveSupport()
.config(conf)
.getOrCreate();
}{code}
 
{quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
 21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors 
and spark.executor.instances
 21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
300(ns)
 21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
template: index
 21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
["http-nio-9000"]
 21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) 
with context path ''
 21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
seconds (JVM running for 529.958)
 21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
(http/1.1)}
 Unknown macro: \{0.0.0.0}
 21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor thread
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all executors
 21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
backend Stopped
 21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
 21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
 21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
 21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
 21/06/30 12:03:40 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
 21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
 21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
'dispatcherServlet'
 21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
'dispatcherServlet'
 21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
{quote}

  was:
In spark 3.1.2, on client mode, the spark context while stopped while the 
application is started.
{quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
 21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors 
and spark.executor.instances
 21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
300(ns)
 21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
template: index
 21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
["http-nio-9000"]
 21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) 
with context path ''
 21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
seconds (JVM running for 529.958)
 21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
(http/1.1)}
Unknown macro: \{0.0.0.0}
21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor thread
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all executors
 21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
 21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
backend Stopped
 21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
 21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
 21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
 21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
 21/06/30 12:03:40 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
 21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
 21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
'dispatcherServlet'
 21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
'dispatcherServlet'
 21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
{quote}


> working with spring boot, spark co

[jira] [Commented] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371931#comment-17371931
 ] 

Apache Spark commented on SPARK-35950:
--

User 'pingsutw' has created a pull request for this issue:
https://github.com/apache/spark/pull/33155

> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Failed tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35950:


Assignee: (was: Apache Spark)

> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Failed tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35950:


Assignee: Apache Spark

> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Assignee: Apache Spark
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Failed tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Su updated SPARK-35950:
-
Description: 
When Clicked the checkbox "Exec Loss Reason" on the executor page,

the "Active tasks" column disappears instead of the "Exec Loss Reason" column.  

  was:
When Clicked the checkbox "Exec Loss Reason" on the executor page,

the "Failed tasks" column disappears instead of the "Exec Loss Reason" column.  


> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When Clicked the checkbox "Exec Loss Reason" on the executor page,
> the "Active tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35950) Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread Kevin Su (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Su updated SPARK-35950:
-
Description: 
When unselected the checkbox "Exec Loss Reason" on the executor page,

the "Active tasks" column disappears instead of the "Exec Loss Reason" column.  

  was:
When Clicked the checkbox "Exec Loss Reason" on the executor page,

the "Active tasks" column disappears instead of the "Exec Loss Reason" column.  


> Failed to toggle Exec Loss Reason in the executors page
> ---
>
> Key: SPARK-35950
> URL: https://issues.apache.org/jira/browse/SPARK-35950
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Kevin Su
>Priority: Minor
> Attachments: Screenshot from 2021-06-30 13-28-16.png, Screenshot from 
> 2021-06-30 15-55-05.png
>
>
> When unselected the checkbox "Exec Loss Reason" on the executor page,
> the "Active tasks" column disappears instead of the "Exec Loss Reason" 
> column.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35952) Inclusive terminology usage in Spark

2021-06-30 Thread Abhishek Rao (Jira)

Abhishek Rao created SPARK-35952:


 Summary: Inclusive terminology usage in Spark
 Key: SPARK-35952
 URL: https://issues.apache.org/jira/browse/SPARK-35952
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.3
Reporter: Abhishek Rao


Terms such as Blacklist/Whitelist and master/slave is used at different places 
in Spark Code. Do we have any plans to modify this to more inclusive 
terminology, for eg: Denylist/Allowlist and Leader/Follower?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35932) Support extracting hour/minute/second from timestamp without time zone

2021-06-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35932.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33136
[https://github.com/apache/spark/pull/33136]

> Support extracting hour/minute/second from timestamp without time zone
> --
>
> Key: SPARK-35932
> URL: https://issues.apache.org/jira/browse/SPARK-35932
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Support extracting hour/minute/second fields from timestamp without time zone 
> values. In details, the following syntax is supported:
> 1. extract [hour | minute | second] from timestampWithoutTZ
> 2. date_part('[hour | minute | second]', timestampWithoutTZ)
> 3. hour(timestampWithoutTZ)
> 4. minute(timestampWithoutTZ)
> 5. second(timestampWithoutTZ)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35952) Inclusive terminology usage in Spark

2021-06-30 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-35952.
--
Resolution: Duplicate

> Inclusive terminology usage in Spark
> 
>
> Key: SPARK-35952
> URL: https://issues.apache.org/jira/browse/SPARK-35952
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3
>Reporter: Abhishek Rao
>Priority: Major
>
> Terms such as Blacklist/Whitelist and master/slave is used at different 
> places in Spark Code. Do we have any plans to modify this to more inclusive 
> terminology, for eg: Denylist/Allowlist and Leader/Follower?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35735) Take into account day-time interval fields in cast

2021-06-30 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35735:


Assignee: angerszhu

> Take into account day-time interval fields in cast
> --
>
> Key: SPARK-35735
> URL: https://issues.apache.org/jira/browse/SPARK-35735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: angerszhu
>Priority: Major
>
> Take into account day-time interval fields in casts of input values (strings 
> for example) to DayTimeIntervalType. Need to follow to the SQL standard if 
> input fields don't match to the target type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35735) Take into account day-time interval fields in cast

2021-06-30 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35735.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32943
[https://github.com/apache/spark/pull/32943]

> Take into account day-time interval fields in cast
> --
>
> Key: SPARK-35735
> URL: https://issues.apache.org/jira/browse/SPARK-35735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Take into account day-time interval fields in casts of input values (strings 
> for example) to DayTimeIntervalType. Need to follow to the SQL standard if 
> input fields don't match to the target type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32929) StreamSuite failure on IBM Z: - SPARK-20432: union one stream with itself

2021-06-30 Thread Simrit Kaur (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371997#comment-17371997
 ] 

Simrit Kaur commented on SPARK-32929:
-

I am also facing this issue on SPARK 3.1.1 version on IBM Z. It seems that Java 
isn't shielding the impact, Can we make some changes in the code as suggested 
here to fix this? Any other suggestions or hint is highly appreciated.

> StreamSuite failure on IBM Z: - SPARK-20432: union one stream with itself
> -
>
> Key: SPARK-32929
> URL: https://issues.apache.org/jira/browse/SPARK-32929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: openjdk version "11.0.8" 2020-07-14
> OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.8+10)
> OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.8+10, mixed mode)
> Linux 4.15.0-117-generic #118-Ubuntu SMP Fri Sep 4 20:00:20 UTC 2020 s390x 
> s390x s390x GNU/Linux
>Reporter: Michael Munday
>Priority: Minor
>  Labels: big-endian
>
> I am getting zeros in the output of this test on IBM Z. This is a big-endian 
> system. See error below.
> I think this issue is related to the use of {{IntegerType}} in the schema for 
> {{FakeDefaultSource}}. Modifying the schema to use {{LongType}} fixes the 
> issue. Another workaround is to remove {{.select("a")}} (see patch below).
> My working theory is that long data (longs are generated by Range) is being 
> read using unsafe int operations (as specified in the schema). This would 
> 'work' on little-endian systems but not big-endian systems. I'm still working 
> to figure out what the mechanism is and I'd appreciate any hints or insights.
> The error looks like this:
> {noformat}
> - SPARK-20432: union one stream with itself *** FAILED ***
>   Decoded objects do not match expected objects:
>   expected: WrappedArray(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 
> 6, 7, 8, 9, 10)
>   actual:   WrappedArray(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0)
>   assertnotnull(upcast(getcolumnbyordinal(0, LongType), LongType, - root 
> class: "scala.Long"))
>   +- upcast(getcolumnbyordinal(0, LongType), LongType, - root class: 
> "scala.Long")
>  +- getcolumnbyordinal(0, LongType) (QueryTest.scala:88)
> {noformat}
> This change fixes the issue: 
> {code:java}
> --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
> +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
> @@ -45,7 +45,7 @@ import org.apache.spark.sql.functions._
>  import org.apache.spark.sql.internal.SQLConf
>  import org.apache.spark.sql.sources.StreamSourceProvider
>  import org.apache.spark.sql.streaming.util.{BlockOnStopSourceProvider, 
> StreamManualClock}
> -import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> +import org.apache.spark.sql.types.{IntegerType, LongType, StructField, 
> StructType}
>  import org.apache.spark.util.Utils
>  class StreamSuite extends StreamTest {
> @@ -1265,7 +1265,7 @@ class StreamSuite extends StreamTest {
>  }
>  abstract class FakeSource extends StreamSourceProvider {
> -  private val fakeSchema = StructType(StructField("a", IntegerType) :: Nil)
> +  private val fakeSchema = StructType(StructField("a", LongType) :: Nil)
>override def sourceSchema(
>spark: SQLContext,
> @@ -1287,7 +1287,7 @@ class FakeDefaultSource extends FakeSource {
>  new Source {
>private var offset = -1L
> -  override def schema: StructType = StructType(StructField("a", 
> IntegerType) :: Nil)
> +  override def schema: StructType = StructType(StructField("a", 
> LongType) :: Nil)
>override def getOffset: Option[Offset] = {
>  if (offset >= 10) {
> {code}
> Alternatively, this change also fixes the issue:
> {code:java}
> --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
> +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
> @@ -154,7 +154,7 @@ class StreamSuite extends StreamTest {
>}
>  
>test("SPARK-20432: union one stream with itself") {
> -val df = 
> spark.readStream.format(classOf[FakeDefaultSource].getName).load().select("a")
> +val df = 
> spark.readStream.format(classOf[FakeDefaultSource].getName).load()
>  val unioned = df.union(df)
>  withTempDir { outputDir =>
>withTempDir { checkpointDir =>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35945) Unable to parse multi character row and column delimited files using Spark

2021-06-30 Thread Chandra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandra updated SPARK-35945:

Issue Type: Improvement  (was: Bug)

> Unable to parse  multi character row and column delimited files using Spark
> ---
>
> Key: SPARK-35945
> URL: https://issues.apache.org/jira/browse/SPARK-35945
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 2.4.4
> Environment: development
>Reporter: Chandra
>Priority: Major
>
> My requirement is to process the file which have a multi character row and 
> column delimiter.
> I tried multiple options but ended up with few issues.
>  
> File sample:
> 127'~'127433'~''~''~'2'~'ICR'~'STDLONG'~'NR'~'NR'~'1997-06-25 
> 14:47:37'~''~'NR'~''~'1997-06-25 14:47:37'~'BBB'~''~'Stable'~''~''~''~'Not 
> Rated'~'CreditWatch/Outlook'~'OL'~''~''~''~'#@#@#152'~'308044'~''~''~'2'~'ICR'~'FCLONG'~'NR'~'NR'~'1997-12-05
>  14:23:33'~'NM'~'NR'~'1997-12-05 14:23:33'~'1997-12-05 14:23:33'~'B+'~'Watch 
> Pos'~'NM'~''~''~''~'Not 
> Rated'~'CreditWatch/Outlook'~'OL'~''~''~''~'#@#@#155'~'308044'~''~''~'2'~'ICR'~'STDLONG'~'NR'~'NR'~'1997-12-05
>  14:23:34'~'NM'~'NR'~'1997-12-05 14:23:34'~'1997-12-05 14:23:34'~'B+'~'Watch 
> Pos'~'NM'~''~''~''~'Not 
> Rated'~'CreditWatch/Outlook'~'OL'~''~''~''~'#@#[~infrabot]
>  
> Row delimiter is :  #@#@#     COlumn Delimiter:   '~'   
> Code:
> df2 = spark.read.load("spRatingData_sample.txt",
>  format="csv", 
>  sep="'~'",
>  lineSep="#@#@#")
>  print("two.csv rowcount: {}".format(df2.count()))
>  
> ERROR:
> : java.lang.IllegalArgumentException: Delimiter cannot be more than one 
> character: '~'
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVUtils$.toChar(CSVUtils.scala:118)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:87)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:45)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:58)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
>  at scala.Option.orElse(Option.scala:447)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>  at py4j.Gateway.invoke(Gateway.java:282)
>  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>  at py4j.commands.CallCommand.execute(CallCommand.java:79)
>  at py4j.GatewayConnection.run(GatewayConnection.java:238)
>  at java.lang.Thread.run(Thread.java:748)
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>  File "", line 4, in 
>  File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 166, in load
>  return self._df(self._jreader.load(path))
>  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", 
> line 1257, in __call__
>  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 79, in deco
>  raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
>  pyspark.sql.utils.IllegalArgumentException: "Delimiter cannot be more than 
> one character: '~'"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35953) Support extracting date fields from timestamp without time zone

2021-06-30 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35953:
--

 Summary: Support extracting date fields from timestamp without 
time zone
 Key: SPARK-35953
 URL: https://issues.apache.org/jira/browse/SPARK-35953
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Support extracting date fields from timestamp without time zone, which includes:
* year
* month
* day
* year of week
* week
* day of week 
* quarter
* day of month
* day of year



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35953) Support extracting date fields from timestamp without time zone

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35953:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support extracting date fields from timestamp without time zone
> ---
>
> Key: SPARK-35953
> URL: https://issues.apache.org/jira/browse/SPARK-35953
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Support extracting date fields from timestamp without time zone, which 
> includes:
> * year
> * month
> * day
> * year of week
> * week
> * day of week 
> * quarter
> * day of month
> * day of year



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35953) Support extracting date fields from timestamp without time zone

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372025#comment-17372025
 ] 

Apache Spark commented on SPARK-35953:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33156

> Support extracting date fields from timestamp without time zone
> ---
>
> Key: SPARK-35953
> URL: https://issues.apache.org/jira/browse/SPARK-35953
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support extracting date fields from timestamp without time zone, which 
> includes:
> * year
> * month
> * day
> * year of week
> * week
> * day of week 
> * quarter
> * day of month
> * day of year



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35953) Support extracting date fields from timestamp without time zone

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35953:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support extracting date fields from timestamp without time zone
> ---
>
> Key: SPARK-35953
> URL: https://issues.apache.org/jira/browse/SPARK-35953
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support extracting date fields from timestamp without time zone, which 
> includes:
> * year
> * month
> * day
> * year of week
> * week
> * day of week 
> * quarter
> * day of month
> * day of year



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35954) Upgrade Apache Curator Dependency to 4.2.0

2021-06-30 Thread Nikita ROUSSEAU (Jira)

Nikita ROUSSEAU created SPARK-35954:
---

 Summary: Upgrade Apache Curator Dependency to 4.2.0
 Key: SPARK-35954
 URL: https://issues.apache.org/jira/browse/SPARK-35954
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Deploy
Affects Versions: 3.1.2
 Environment: * OS: Linux
 * JAVA: 1.8.0_292
 * {color:#FF}*hadoop-3.3.1*{color}

 
Reporter: Nikita ROUSSEAU


+Abstract :+ as a Spark Cluster Administrator, I want to connect spark masters 
deployed in HA/mode to Zookeeper over SSL/TLS, so that my network traffic is 
ciphered between y components. 
([https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper]
  )

 

With the release of Hadoop 3.3.1, ZKFC libraries and their dependencies were 
updated : it is now possible to connect ZKFC to ZooKeeper over TLS.

 

+Note:+ TLS is possible with Zookeeper Server Version >= 3.5.6 
([https://docs.confluent.io/platform/current/installation/versions-interoperability.html#zk]
 )

 

Spark 3.2.0 aims to support Hadoop 3.3.1 ; this Hadoop release bundles the 
following shared libraries :
 * curator-client-4.2.0.jar
 * curator-framework-4.2.0.jar
 * curator-recipes-4.2.0.jar
 * zookeeper-3.5.6.jar
 * zookeeper-jute-3.5.6.jar

 

Currently, Spark dependency is set to 2.13.0 for the Currator framework 
([https://github.com/apache/spark/blob/master/pom.xml#L127).|https://github.com/apache/spark/blob/master/pom.xml#L127)]

 

It would be great to update "curator-*" dependencies to 4.2.0 in order to be 
compatible with shared jars of the hadoop stack.

Moreover, it will allow administrators to connect Spark Masters to ZooKeeper 
over TLS.

 

Some patches will be required, such as :
 * 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkCuratorUtil.scala#L51]

 

I will try to prepare a MR for this.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35954) Upgrade Apache Curator Dependency to 4.2.0

2021-06-30 Thread Nikita ROUSSEAU (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikita ROUSSEAU updated SPARK-35954:

Target Version/s:   (was: 3.2.0)

> Upgrade Apache Curator Dependency to 4.2.0
> --
>
> Key: SPARK-35954
> URL: https://issues.apache.org/jira/browse/SPARK-35954
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Deploy
>Affects Versions: 3.1.2
> Environment: * OS: Linux
>  * JAVA: 1.8.0_292
>  * {color:#FF}*hadoop-3.3.1*{color}
>  
>Reporter: Nikita ROUSSEAU
>Priority: Major
>
> +Abstract :+ as a Spark Cluster Administrator, I want to connect spark 
> masters deployed in HA/mode to Zookeeper over SSL/TLS, so that my network 
> traffic is ciphered between y components. 
> ([https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper]
>   )
>  
> With the release of Hadoop 3.3.1, ZKFC libraries and their dependencies were 
> updated : it is now possible to connect ZKFC to ZooKeeper over TLS.
>  
> +Note:+ TLS is possible with Zookeeper Server Version >= 3.5.6 
> ([https://docs.confluent.io/platform/current/installation/versions-interoperability.html#zk]
>  )
>  
> Spark 3.2.0 aims to support Hadoop 3.3.1 ; this Hadoop release bundles the 
> following shared libraries :
>  * curator-client-4.2.0.jar
>  * curator-framework-4.2.0.jar
>  * curator-recipes-4.2.0.jar
>  * zookeeper-3.5.6.jar
>  * zookeeper-jute-3.5.6.jar
>  
> Currently, Spark dependency is set to 2.13.0 for the Currator framework 
> ([https://github.com/apache/spark/blob/master/pom.xml#L127).|https://github.com/apache/spark/blob/master/pom.xml#L127)]
>  
> It would be great to update "curator-*" dependencies to 4.2.0 in order to be 
> compatible with shared jars of the hadoop stack.
> Moreover, it will allow administrators to connect Spark Masters to ZooKeeper 
> over TLS.
>  
> Some patches will be required, such as :
>  * 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkCuratorUtil.scala#L51]
>  
> I will try to prepare a MR for this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35954) Upgrade Apache Curator Dependency to 4.2.0

2021-06-30 Thread Nikita ROUSSEAU (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372079#comment-17372079
 ] 

Nikita ROUSSEAU commented on SPARK-35954:
-

Since it is my first contribution, do not hesitate to point me guidelines about 
this issue :)

> Upgrade Apache Curator Dependency to 4.2.0
> --
>
> Key: SPARK-35954
> URL: https://issues.apache.org/jira/browse/SPARK-35954
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Deploy
>Affects Versions: 3.1.2
> Environment: * OS: Linux
>  * JAVA: 1.8.0_292
>  * {color:#FF}*hadoop-3.3.1*{color}
>  
>Reporter: Nikita ROUSSEAU
>Priority: Major
>
> +Abstract :+ as a Spark Cluster Administrator, I want to connect spark 
> masters deployed in HA/mode to Zookeeper over SSL/TLS, so that my network 
> traffic is ciphered between y components. 
> ([https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper]
>   )
>  
> With the release of Hadoop 3.3.1, ZKFC libraries and their dependencies were 
> updated : it is now possible to connect ZKFC to ZooKeeper over TLS.
>  
> +Note:+ TLS is possible with Zookeeper Server Version >= 3.5.6 
> ([https://docs.confluent.io/platform/current/installation/versions-interoperability.html#zk]
>  )
>  
> Spark 3.2.0 aims to support Hadoop 3.3.1 ; this Hadoop release bundles the 
> following shared libraries :
>  * curator-client-4.2.0.jar
>  * curator-framework-4.2.0.jar
>  * curator-recipes-4.2.0.jar
>  * zookeeper-3.5.6.jar
>  * zookeeper-jute-3.5.6.jar
>  
> Currently, Spark dependency is set to 2.13.0 for the Currator framework 
> ([https://github.com/apache/spark/blob/master/pom.xml#L127).|https://github.com/apache/spark/blob/master/pom.xml#L127)]
>  
> It would be great to update "curator-*" dependencies to 4.2.0 in order to be 
> compatible with shared jars of the hadoop stack.
> Moreover, it will allow administrators to connect Spark Masters to ZooKeeper 
> over TLS.
>  
> Some patches will be required, such as :
>  * 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkCuratorUtil.scala#L51]
>  
> I will try to prepare a MR for this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35954) [Deploy] Upgrade Apache Curator Dependency to 4.2.0

2021-06-30 Thread Nikita ROUSSEAU (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikita ROUSSEAU updated SPARK-35954:

Summary: [Deploy] Upgrade Apache Curator Dependency to 4.2.0  (was: Upgrade 
Apache Curator Dependency to 4.2.0)

> [Deploy] Upgrade Apache Curator Dependency to 4.2.0
> ---
>
> Key: SPARK-35954
> URL: https://issues.apache.org/jira/browse/SPARK-35954
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Deploy
>Affects Versions: 3.1.2
> Environment: * OS: Linux
>  * JAVA: 1.8.0_292
>  * {color:#FF}*hadoop-3.3.1*{color}
>  
>Reporter: Nikita ROUSSEAU
>Priority: Major
>
> +Abstract :+ as a Spark Cluster Administrator, I want to connect spark 
> masters deployed in HA/mode to Zookeeper over SSL/TLS, so that my network 
> traffic is ciphered between y components. 
> ([https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper]
>   )
>  
> With the release of Hadoop 3.3.1, ZKFC libraries and their dependencies were 
> updated : it is now possible to connect ZKFC to ZooKeeper over TLS.
>  
> +Note:+ TLS is possible with Zookeeper Server Version >= 3.5.6 
> ([https://docs.confluent.io/platform/current/installation/versions-interoperability.html#zk]
>  )
>  
> Spark 3.2.0 aims to support Hadoop 3.3.1 ; this Hadoop release bundles the 
> following shared libraries :
>  * curator-client-4.2.0.jar
>  * curator-framework-4.2.0.jar
>  * curator-recipes-4.2.0.jar
>  * zookeeper-3.5.6.jar
>  * zookeeper-jute-3.5.6.jar
>  
> Currently, Spark dependency is set to 2.13.0 for the Currator framework 
> ([https://github.com/apache/spark/blob/master/pom.xml#L127).|https://github.com/apache/spark/blob/master/pom.xml#L127)]
>  
> It would be great to update "curator-*" dependencies to 4.2.0 in order to be 
> compatible with shared jars of the hadoop stack.
> Moreover, it will allow administrators to connect Spark Masters to ZooKeeper 
> over TLS.
>  
> Some patches will be required, such as :
>  * 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkCuratorUtil.scala#L51]
>  
> I will try to prepare a MR for this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35954) [Deploy] Upgrade Apache Curator Dependency to 4.2.0

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35954:


Assignee: Apache Spark

> [Deploy] Upgrade Apache Curator Dependency to 4.2.0
> ---
>
> Key: SPARK-35954
> URL: https://issues.apache.org/jira/browse/SPARK-35954
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Deploy
>Affects Versions: 3.1.2
> Environment: * OS: Linux
>  * JAVA: 1.8.0_292
>  * {color:#FF}*hadoop-3.3.1*{color}
>  
>Reporter: Nikita ROUSSEAU
>Assignee: Apache Spark
>Priority: Major
>
> +Abstract :+ as a Spark Cluster Administrator, I want to connect spark 
> masters deployed in HA/mode to Zookeeper over SSL/TLS, so that my network 
> traffic is ciphered between y components. 
> ([https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper]
>   )
>  
> With the release of Hadoop 3.3.1, ZKFC libraries and their dependencies were 
> updated : it is now possible to connect ZKFC to ZooKeeper over TLS.
>  
> +Note:+ TLS is possible with Zookeeper Server Version >= 3.5.6 
> ([https://docs.confluent.io/platform/current/installation/versions-interoperability.html#zk]
>  )
>  
> Spark 3.2.0 aims to support Hadoop 3.3.1 ; this Hadoop release bundles the 
> following shared libraries :
>  * curator-client-4.2.0.jar
>  * curator-framework-4.2.0.jar
>  * curator-recipes-4.2.0.jar
>  * zookeeper-3.5.6.jar
>  * zookeeper-jute-3.5.6.jar
>  
> Currently, Spark dependency is set to 2.13.0 for the Currator framework 
> ([https://github.com/apache/spark/blob/master/pom.xml#L127).|https://github.com/apache/spark/blob/master/pom.xml#L127)]
>  
> It would be great to update "curator-*" dependencies to 4.2.0 in order to be 
> compatible with shared jars of the hadoop stack.
> Moreover, it will allow administrators to connect Spark Masters to ZooKeeper 
> over TLS.
>  
> Some patches will be required, such as :
>  * 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkCuratorUtil.scala#L51]
>  
> I will try to prepare a MR for this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35954) [Deploy] Upgrade Apache Curator Dependency to 4.2.0

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35954:


Assignee: (was: Apache Spark)

> [Deploy] Upgrade Apache Curator Dependency to 4.2.0
> ---
>
> Key: SPARK-35954
> URL: https://issues.apache.org/jira/browse/SPARK-35954
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Deploy
>Affects Versions: 3.1.2
> Environment: * OS: Linux
>  * JAVA: 1.8.0_292
>  * {color:#FF}*hadoop-3.3.1*{color}
>  
>Reporter: Nikita ROUSSEAU
>Priority: Major
>
> +Abstract :+ as a Spark Cluster Administrator, I want to connect spark 
> masters deployed in HA/mode to Zookeeper over SSL/TLS, so that my network 
> traffic is ciphered between y components. 
> ([https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper]
>   )
>  
> With the release of Hadoop 3.3.1, ZKFC libraries and their dependencies were 
> updated : it is now possible to connect ZKFC to ZooKeeper over TLS.
>  
> +Note:+ TLS is possible with Zookeeper Server Version >= 3.5.6 
> ([https://docs.confluent.io/platform/current/installation/versions-interoperability.html#zk]
>  )
>  
> Spark 3.2.0 aims to support Hadoop 3.3.1 ; this Hadoop release bundles the 
> following shared libraries :
>  * curator-client-4.2.0.jar
>  * curator-framework-4.2.0.jar
>  * curator-recipes-4.2.0.jar
>  * zookeeper-3.5.6.jar
>  * zookeeper-jute-3.5.6.jar
>  
> Currently, Spark dependency is set to 2.13.0 for the Currator framework 
> ([https://github.com/apache/spark/blob/master/pom.xml#L127).|https://github.com/apache/spark/blob/master/pom.xml#L127)]
>  
> It would be great to update "curator-*" dependencies to 4.2.0 in order to be 
> compatible with shared jars of the hadoop stack.
> Moreover, it will allow administrators to connect Spark Masters to ZooKeeper 
> over TLS.
>  
> Some patches will be required, such as :
>  * 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkCuratorUtil.scala#L51]
>  
> I will try to prepare a MR for this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35953) Support extracting date fields from timestamp without time zone

2021-06-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35953.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33156
[https://github.com/apache/spark/pull/33156]

> Support extracting date fields from timestamp without time zone
> ---
>
> Key: SPARK-35953
> URL: https://issues.apache.org/jira/browse/SPARK-35953
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Support extracting date fields from timestamp without time zone, which 
> includes:
> * year
> * month
> * day
> * year of week
> * week
> * day of week 
> * quarter
> * day of month
> * day of year



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30707) Lead/Lag window function throws AnalysisException without ORDER BY clause

2021-06-30 Thread Catalin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372127#comment-17372127
 ] 

Catalin commented on SPARK-30707:
-

At Lyft we are also in the process of deprecating Hive and we hit the same 
issue as the one highlighted here.

[~hyukjin.kwon]  going over the PR I was wondering what is the feedback for 
improvement for this specific ?

[~angerszhuuu] are you still interested in merging it ?

> Lead/Lag window function throws AnalysisException without ORDER BY clause
> -
>
> Key: SPARK-30707
> URL: https://issues.apache.org/jira/browse/SPARK-30707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
>  Lead/Lag window function throws AnalysisException without ORDER BY clause:
> {code:java}
> SELECT lead(ten, four + 1) OVER (PARTITION BY four), ten, four
> FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten)s
> org.apache.spark.sql.AnalysisException
> Window function lead(ten#x, (four#x + 1), null) requires window to be 
> ordered, please add ORDER BY clause. For example SELECT lead(ten#x, (four#x + 
> 1), null)(value_expr) OVER (PARTITION BY window_partition ORDER BY 
> window_ordering) from table;
> {code}
>  
> Maybe we need fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35955) Fix decimal overflow issues for Average

2021-06-30 Thread Karen Feng (Jira)

Karen Feng created SPARK-35955:
--

 Summary: Fix decimal overflow issues for Average
 Key: SPARK-35955
 URL: https://issues.apache.org/jira/browse/SPARK-35955
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Karen Feng


Return null on overflow for decimal average.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35955) Fix decimal overflow issues for Average

2021-06-30 Thread Karen Feng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-35955:
---
Description: Return null on overflow for decimal average. Linked to 
SPARK-32018 and SPARK-28067, which address decimal sum.  (was: Return null on 
overflow for decimal average.)

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Return null on overflow for decimal average. Linked to SPARK-32018 and 
> SPARK-28067, which address decimal sum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35956) Support auto-assigning labels to less important pods (e.g. decommissioning pods)

2021-06-30 Thread Holden Karau (Jira)

Holden Karau created SPARK-35956:


 Summary: Support auto-assigning labels to less important pods 
(e.g. decommissioning pods)
 Key: SPARK-35956
 URL: https://issues.apache.org/jira/browse/SPARK-35956
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.2.0
Reporter: Holden Karau
Assignee: Holden Karau


To allow for folks to use pod disruption budgets or replicasets we should 
indicate which pods Spark cares about "the least", those would be pods that are 
otherwise exiting soon.

 

With PDBs the user would create a PDB representing the label of decommissioning 
executors and this could have a higher number of unavailable than the PDB for 
the "regular" execs. For people using replicasets in 1.21 we could also set a 
label of "controller.kubernetes.io/pod-deletion-cost" (see 
[https://github.com/kubernetes/kubernetes/pull/99163] ) to hint to the 
controller that a pod is less important to us.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35888) Add dataSize field in CoalescedPartitionSpec

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372136#comment-17372136
 ] 

Apache Spark commented on SPARK-35888:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33158

> Add dataSize field in CoalescedPartitionSpec
> 
>
> Key: SPARK-35888
> URL: https://issues.apache.org/jira/browse/SPARK-35888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, all test suite about `CoalescedPartitionSpec` do not check the 
> data size due to it doesn't contains data size field.
> We can add data size in `CoalescedPartitionSpec` and then add test case for 
> better coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35888) Add dataSize field in CoalescedPartitionSpec

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372137#comment-17372137
 ] 

Apache Spark commented on SPARK-35888:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33158

> Add dataSize field in CoalescedPartitionSpec
> 
>
> Key: SPARK-35888
> URL: https://issues.apache.org/jira/browse/SPARK-35888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, all test suite about `CoalescedPartitionSpec` do not check the 
> data size due to it doesn't contains data size field.
> We can add data size in `CoalescedPartitionSpec` and then add test case for 
> better coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35725) Support repartition expand partitions in AQE

2021-06-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35725:
---

Assignee: XiDuo You

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35725) Support repartition expand partitions in AQE

2021-06-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35725.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32883
[https://github.com/apache/spark/pull/32883]

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35957) Cannot convert Avro schema to catalyst type because schema at path is not compatible

2021-06-30 Thread Jake Dalli (Jira)

Jake Dalli created SPARK-35957:
--

 Summary: Cannot convert Avro schema to catalyst type because 
schema at path is not compatible
 Key: SPARK-35957
 URL: https://issues.apache.org/jira/browse/SPARK-35957
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.0.0
Reporter: Jake Dalli


* The Apache Avro specification has a *null* primitive type.
 * Using org.apache.spark:spark-avro_2.12:3.0.3 on Spark 3.0.0 with Scala 2.12
 * I try to load an avro schema with the a field defined as follows:
 * 
```
{
  "name": "messageKey",
  "type": "null"
},
```

 * I get the following error:
```
ERROR Client: Application diagnostics message: User class threw exception: 
org.apache.spark.sql.avro.IncompatibleSchemaException: Unsupported type NULL
```

This issue is experienced when using Apache Hudi 0.7.0.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35957) Cannot convert Avro schema to catalyst type because schema at path is not compatible

2021-06-30 Thread Jake Dalli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Dalli updated SPARK-35957:
---
Description: 
* The Apache Avro specification has a *null* primitive type.
 * Using org.apache.spark:spark-avro_2.12:3.0.3 on Spark 3.0.0 with Scala 2.12
 * I try to load an avro schema with the a field defined as follows:
 


{code:java}
{
  "name": "messageKey",
  "type": "null"
},
{code}



 * I get the following error:


{code:java}
ERROR Client: Application diagnostics message: User class threw exception: 
org.apache.spark.sql.avro.IncompatibleSchemaException: Unsupported type NULL

{code}


This issue is experienced when using Apache Hudi 0.7.0.

Full stack trace:

{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
1.0 (TID 4, ip-10-102-8-124.eu-central-1.compute.internal, executor 1): 
org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot convert Avro 
schema to catalyst type because schema at path messageKey is not compatible 
(avroType = NullType, sqlType = NULL).
Source Avro Schema: ...
Target Catalyst type: ...
at 
org.apache.hudi.AvroConversionHelper$.createConverter$1(AvroConversionHelper.scala:265)
at 
org.apache.hudi.AvroConversionHelper$.createConverter$1(AvroConversionHelper.scala:146)
at 
org.apache.hudi.AvroConversionHelper$.createConverterToRow(AvroConversionHelper.scala:273)
at 
org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$1(AvroConversionUtils.scala:42)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at 
org.apache.spark.sql.execution.SQLExecutionRDD.$anonfun$compute$1(SQLExecutionRDD.scala:52)
at 
org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:100)
at 
org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2175)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2124)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2123)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2123)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:990)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:990)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DA

[jira] [Updated] (SPARK-35955) Fix decimal overflow issues for Average

2021-06-30 Thread Karen Feng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-35955:
---
Description: 
Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
SPARK-32018 and SPARK-28067, which address decimal sum.

Repro:

 
{code:java}
import org.apache.spark.sql.functions._
spark.conf.set("spark.sql.ansi.enabled", true)

val df = Seq(
 (BigDecimal("1000"), 1),
 (BigDecimal("1000"), 1),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2),
 (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
"intNum").agg(mean("decNum"))
df2.show(40,false)
{code}
 

Should throw an exception (as sum overflows), but instead returns:

 
{code:java}
+---+
|avg(decNum)|
+---+
|null   |
+---+{code}
 

  was:Return null on overflow for decimal average. Linked to SPARK-32018 and 
SPARK-28067, which address decimal sum.


> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35944) Introduce type aliases for names or labels.

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35944:


Assignee: Apache Spark

> Introduce type aliases for names or labels.
> ---
>
> Key: SPARK-35944
> URL: https://issues.apache.org/jira/browse/SPARK-35944
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35944) Introduce type aliases for names or labels.

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372165#comment-17372165
 ] 

Apache Spark commented on SPARK-35944:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33159

> Introduce type aliases for names or labels.
> ---
>
> Key: SPARK-35944
> URL: https://issues.apache.org/jira/browse/SPARK-35944
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35944) Introduce type aliases for names or labels.

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35944:


Assignee: (was: Apache Spark)

> Introduce type aliases for names or labels.
> ---
>
> Key: SPARK-35944
> URL: https://issues.apache.org/jira/browse/SPARK-35944
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35958) Refactor SparkError.scala to SparkThrowable.java

2021-06-30 Thread Karen Feng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-35958:
---
Description: 
Following up from SPARK-34920:

Error has a special meaning in Java; SparkError should encompass all 
Throwables. It'd be more correct to rename SparkError to SparkThrowable.

In addition, some Throwables come from Java, so to maximize usability, we 
should migrate the base trait from Scala to Java.

  was:
Error has a special meaning in Java; SparkError should encompass all 
Throwables. It'd be more correct to rename SparkError to SparkThrowable.

In addition, some Throwables come from Java, so to maximize usability, we 
should migrate the base trait from Scala to Java.


> Refactor SparkError.scala to SparkThrowable.java
> 
>
> Key: SPARK-35958
> URL: https://issues.apache.org/jira/browse/SPARK-35958
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Following up from SPARK-34920:
> Error has a special meaning in Java; SparkError should encompass all 
> Throwables. It'd be more correct to rename SparkError to SparkThrowable.
> In addition, some Throwables come from Java, so to maximize usability, we 
> should migrate the base trait from Scala to Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35958) Refactor SparkError.scala to SparkThrowable.java

2021-06-30 Thread Karen Feng (Jira)

Karen Feng created SPARK-35958:
--

 Summary: Refactor SparkError.scala to SparkThrowable.java
 Key: SPARK-35958
 URL: https://issues.apache.org/jira/browse/SPARK-35958
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Karen Feng


Error has a special meaning in Java; SparkError should encompass all 
Throwables. It'd be more correct to rename SparkError to SparkThrowable.

In addition, some Throwables come from Java, so to maximize usability, we 
should migrate the base trait from Scala to Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions

2021-06-30 Thread Chao Sun (Jira)

Chao Sun created SPARK-35959:


 Summary: Add a new Maven profile "no-shaded-client" for older 
Hadoop 3.x versions 
 Key: SPARK-35959
 URL: https://issues.apache.org/jira/browse/SPARK-35959
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Chao Sun


Currently Spark uses Hadoop shaded client by default. However, if Spark users 
want to build Spark with older version of Hadoop, such as 3.1.x, the shaded 
client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). 
Therefore, this proposes to offer a new Maven profile "no-shaded-client" for 
this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34859) Vectorized parquet reader needs synchronization among pages for column index

2021-06-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34859.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32753
[https://github.com/apache/spark/pull/32753]

> Vectorized parquet reader needs synchronization among pages for column index
> 
>
> Key: SPARK-34859
> URL: https://issues.apache.org/jira/browse/SPARK-34859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Li Xian
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.2.0
>
> Attachments: 
> part-0-bee08cae-04cd-491c-9602-4c66791af3d0-c000.snappy.parquet
>
>
> the current implementation has a problem. the pages returned by 
> `readNextFilteredRowGroup` may not be aligned, some columns may have more 
> rows than others.
> Parquet is using `org.apache.parquet.column.impl.SynchronizingColumnReader` 
> with `rowIndexes` to make sure that rows are aligned. 
> Currently `VectorizedParquetRecordReader` doesn't have such synchronizing 
> among pages from different columns. Using `readNextFilteredRowGroup` may 
> result in incorrect result.
>  
> I have attache an example parquet file. This file is generated with 
> `spark.range(0, 2000).map(i => (i.toLong, i.toInt))` and the layout of this 
> file is listed below.
> row group 0
> 
> _1:  INT64 SNAPPY DO:0 FPO:4 SZ:8161/16104/1.97 VC:2000 ENC:PLAIN,BIT_PACKED 
> [more]...
> _2:  INT32 SNAPPY DO:0 FPO:8165 SZ:8061/8052/1.00 VC:2000 
> ENC:PLAIN,BIT_PACKED [more]...
>     _1 TV=2000 RL=0 DL=0
>     
> 
>     page 0:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     page 1:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     page 2:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     page 3:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     _2 TV=2000 RL=0 DL=0
>     
> 
>     page 0:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:1000
>     page 1:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:1000
>  
> As you can see in the row group 0, column1 has 4 data pages each with 500 
> values and column 2 has 2 data pages with 1000 values each. 
> If we want to filter the rows by values with _1 = 510 using columnindex, 
> parquet will return the page 1 of column _1 and page 0 of column _2. Page 1 
> of column _1 starts with row 500, and page 0 of column _2 starts with row 0, 
> and it will be incorrect if we simply read the two values as one row.
>  
> As an example, If you try filter with  _1 = 510 with column index on in 
> current version, it will give you the wrong result
> +---+---+
> |_1 |_2 |
> +---+---+
> |510|10 |
> +---+---+
> And if turn columnindex off, you can get the correct result
> +---+---+
> |_1 |_2 |
> +---+---+
> |510|510|
> +---+---+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34859) Vectorized parquet reader needs synchronization among pages for column index

2021-06-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34859:
-

Assignee: Chao Sun

> Vectorized parquet reader needs synchronization among pages for column index
> 
>
> Key: SPARK-34859
> URL: https://issues.apache.org/jira/browse/SPARK-34859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Li Xian
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.2.0
>
> Attachments: 
> part-0-bee08cae-04cd-491c-9602-4c66791af3d0-c000.snappy.parquet
>
>
> the current implementation has a problem. the pages returned by 
> `readNextFilteredRowGroup` may not be aligned, some columns may have more 
> rows than others.
> Parquet is using `org.apache.parquet.column.impl.SynchronizingColumnReader` 
> with `rowIndexes` to make sure that rows are aligned. 
> Currently `VectorizedParquetRecordReader` doesn't have such synchronizing 
> among pages from different columns. Using `readNextFilteredRowGroup` may 
> result in incorrect result.
>  
> I have attache an example parquet file. This file is generated with 
> `spark.range(0, 2000).map(i => (i.toLong, i.toInt))` and the layout of this 
> file is listed below.
> row group 0
> 
> _1:  INT64 SNAPPY DO:0 FPO:4 SZ:8161/16104/1.97 VC:2000 ENC:PLAIN,BIT_PACKED 
> [more]...
> _2:  INT32 SNAPPY DO:0 FPO:8165 SZ:8061/8052/1.00 VC:2000 
> ENC:PLAIN,BIT_PACKED [more]...
>     _1 TV=2000 RL=0 DL=0
>     
> 
>     page 0:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     page 1:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     page 2:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     page 3:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:500
>     _2 TV=2000 RL=0 DL=0
>     
> 
>     page 0:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:1000
>     page 1:  DLE:BIT_PACKED RLE:BIT_PACKED VLE:PLAIN ST:[no stats for  
> [more]... VC:1000
>  
> As you can see in the row group 0, column1 has 4 data pages each with 500 
> values and column 2 has 2 data pages with 1000 values each. 
> If we want to filter the rows by values with _1 = 510 using columnindex, 
> parquet will return the page 1 of column _1 and page 0 of column _2. Page 1 
> of column _1 starts with row 500, and page 0 of column _2 starts with row 0, 
> and it will be incorrect if we simply read the two values as one row.
>  
> As an example, If you try filter with  _1 = 510 with column index on in 
> current version, it will give you the wrong result
> +---+---+
> |_1 |_2 |
> +---+---+
> |510|10 |
> +---+---+
> And if turn columnindex off, you can get the correct result
> +---+---+
> |_1 |_2 |
> +---+---+
> |510|510|
> +---+---+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35615) Refactor python/pyspark/pandas/base.py for better abstraction

2021-06-30 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35615:
-
Description: 
python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
don't have a clear boundary now.

We ought to revisit these two files and refactor them for better abstraction.

Basic operators in python/pyspark/pandas/data_type_ops/base.py.

  was:
python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
don't have a clear boundary now.

We ought to revisit these two files and refactor them for better abstraction.


> Refactor python/pyspark/pandas/base.py for better abstraction
> -
>
> Key: SPARK-35615
> URL: https://issues.apache.org/jira/browse/SPARK-35615
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
> don't have a clear boundary now.
> We ought to revisit these two files and refactor them for better abstraction.
> Basic operators in python/pyspark/pandas/data_type_ops/base.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35615) Refactor python/pyspark/pandas/base.py for better abstraction

2021-06-30 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35615:
-
Description: 
python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
don't have a clear boundary now.

We ought to revisit these two files and refactor them for better abstraction.

All basic operators in python/pyspark/pandas/data_type_ops/base.py.

- Spark column, isnull

  was:
python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
don't have a clear boundary now.

We ought to revisit these two files and refactor them for better abstraction.

Basic operators in python/pyspark/pandas/data_type_ops/base.py.


> Refactor python/pyspark/pandas/base.py for better abstraction
> -
>
> Key: SPARK-35615
> URL: https://issues.apache.org/jira/browse/SPARK-35615
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
> don't have a clear boundary now.
> We ought to revisit these two files and refactor them for better abstraction.
> All basic operators in python/pyspark/pandas/data_type_ops/base.py.
> - Spark column, isnull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372270#comment-17372270
 ] 

Apache Spark commented on SPARK-35959:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/33160

> Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions 
> -
>
> Key: SPARK-35959
> URL: https://issues.apache.org/jira/browse/SPARK-35959
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark uses Hadoop shaded client by default. However, if Spark users 
> want to build Spark with older version of Hadoop, such as 3.1.x, the shaded 
> client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). 
> Therefore, this proposes to offer a new Maven profile "no-shaded-client" for 
> this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372271#comment-17372271
 ] 

Apache Spark commented on SPARK-35959:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/33160

> Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions 
> -
>
> Key: SPARK-35959
> URL: https://issues.apache.org/jira/browse/SPARK-35959
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> Currently Spark uses Hadoop shaded client by default. However, if Spark users 
> want to build Spark with older version of Hadoop, such as 3.1.x, the shaded 
> client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). 
> Therefore, this proposes to offer a new Maven profile "no-shaded-client" for 
> this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35959:


Assignee: Apache Spark

> Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions 
> -
>
> Key: SPARK-35959
> URL: https://issues.apache.org/jira/browse/SPARK-35959
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> Currently Spark uses Hadoop shaded client by default. However, if Spark users 
> want to build Spark with older version of Hadoop, such as 3.1.x, the shaded 
> client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). 
> Therefore, this proposes to offer a new Maven profile "no-shaded-client" for 
> this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35959:


Assignee: (was: Apache Spark)

> Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions 
> -
>
> Key: SPARK-35959
> URL: https://issues.apache.org/jira/browse/SPARK-35959
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark uses Hadoop shaded client by default. However, if Spark users 
> want to build Spark with older version of Hadoop, such as 3.1.x, the shaded 
> client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). 
> Therefore, this proposes to offer a new Maven profile "no-shaded-client" for 
> this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35938) Add deprecation warning for Python 3.6

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35938:


Assignee: Xinrong Meng

> Add deprecation warning for Python 3.6
> --
>
> Key: SPARK-35938
> URL: https://issues.apache.org/jira/browse/SPARK-35938
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Add deprecation warning for Python 3.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35938) Add deprecation warning for Python 3.6

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35938.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33139
[https://github.com/apache/spark/pull/33139]

> Add deprecation warning for Python 3.6
> --
>
> Key: SPARK-35938
> URL: https://issues.apache.org/jira/browse/SPARK-35938
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> Add deprecation warning for Python 3.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35936) Deprecate Python 3.6 support

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35936.
--
  Assignee: Xinrong Meng
Resolution: Done

> Deprecate Python 3.6 support
> 
>
> Key: SPARK-35936
> URL: https://issues.apache.org/jira/browse/SPARK-35936
> Project: Spark
>  Issue Type: Story
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> According to [https://endoflife.date/python], Python 3.6 will be EOL on 23 
> Dec, 2021.
> We should prepare for the deprecation of Python 3.6 support in Spark in 
> advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35939) Deprecate Python 3.6 in Spark documentation

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35939.
--
Fix Version/s: 3.2.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/33141

> Deprecate Python 3.6 in Spark documentation
> ---
>
> Key: SPARK-35939
> URL: https://issues.apache.org/jira/browse/SPARK-35939
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> Deprecate Python 3.6 in Spark documentation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35936) Deprecate Python 3.6 support

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35936:
-
Fix Version/s: 3.2.0

> Deprecate Python 3.6 support
> 
>
> Key: SPARK-35936
> URL: https://issues.apache.org/jira/browse/SPARK-35936
> Project: Spark
>  Issue Type: Story
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> According to [https://endoflife.date/python], Python 3.6 will be EOL on 23 
> Dec, 2021.
> We should prepare for the deprecation of Python 3.6 support in Spark in 
> advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31973) Add ability to disable Sort,Spill in Partial aggregation

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372286#comment-17372286
 ] 

Apache Spark commented on SPARK-31973:
--

User 'shipra-a' has created a pull request for this issue:
https://github.com/apache/spark/pull/33161

> Add ability to disable Sort,Spill in Partial aggregation 
> -
>
> Key: SPARK-31973
> URL: https://issues.apache.org/jira/browse/SPARK-31973
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Priority: Major
>
> In case of HashAggregation, a partial aggregation(update) is done followed by 
> final aggregation(merge) 
> During partial aggregation we sort and spill to disk everytime, when the fast 
> Map(when enabled) and  UnsafeFixedWidthAggregationMap gets exhausted
> *When the cardinality of grouping column is close to the total number of 
> records being processed*, the sorting of data spilling to disk is not 
> required, since it is kind of no-op and we can directly use in Final 
> aggregation.
> When the user is aware of nature of data, currently he has no control over 
> disabling this sort, spill operation.
> This is similar to following issue in Hive:
> https://issues.apache.org/jira/browse/HIVE-223
> https://issues.apache.org/jira/browse/HIVE-291
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35944) Introduce type aliases for names or labels.

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35944.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33159
[https://github.com/apache/spark/pull/33159]

> Introduce type aliases for names or labels.
> ---
>
> Key: SPARK-35944
> URL: https://issues.apache.org/jira/browse/SPARK-35944
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35944) Introduce type aliases for names or labels.

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35944:


Assignee: Takuya Ueshin

> Introduce type aliases for names or labels.
> ---
>
> Key: SPARK-35944
> URL: https://issues.apache.org/jira/browse/SPARK-35944
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35615) Refactor python/pyspark/pandas/base.py for better abstraction

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35615:


Assignee: (was: Apache Spark)

> Refactor python/pyspark/pandas/base.py for better abstraction
> -
>
> Key: SPARK-35615
> URL: https://issues.apache.org/jira/browse/SPARK-35615
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
> don't have a clear boundary now.
> We ought to revisit these two files and refactor them for better abstraction.
> All basic operators in python/pyspark/pandas/data_type_ops/base.py.
> - Spark column, isnull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35615) Refactor python/pyspark/pandas/base.py for better abstraction

2021-06-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372288#comment-17372288
 ] 

Apache Spark commented on SPARK-35615:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/33162

> Refactor python/pyspark/pandas/base.py for better abstraction
> -
>
> Key: SPARK-35615
> URL: https://issues.apache.org/jira/browse/SPARK-35615
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
> don't have a clear boundary now.
> We ought to revisit these two files and refactor them for better abstraction.
> All basic operators in python/pyspark/pandas/data_type_ops/base.py.
> - Spark column, isnull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35615) Refactor python/pyspark/pandas/base.py for better abstraction

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35615:


Assignee: Apache Spark

> Refactor python/pyspark/pandas/base.py for better abstraction
> -
>
> Key: SPARK-35615
> URL: https://issues.apache.org/jira/browse/SPARK-35615
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> python/pyspark/pandas/data_type_ops/base.py and python/pyspark/pandas/base.py 
> don't have a clear boundary now.
> We ought to revisit these two files and refactor them for better abstraction.
> All basic operators in python/pyspark/pandas/data_type_ops/base.py.
> - Spark column, isnull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35949) working with spring boot, spark context will stopped while application is started.

2021-06-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35949:
-
Priority: Major  (was: Critical)

> working with spring boot, spark context will stopped while application is 
> started.
> --
>
> Key: SPARK-35949
> URL: https://issues.apache.org/jira/browse/SPARK-35949
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: SunPeng
>Priority: Major
>
> Spark 3.1.2 with springboot, on client mode, the spark context while stopped 
> while the application is started.
>  
>  
> {code:java}
> //代码占位符
> @Bean
> @ConditionalOnMissingBean(SparkSession.class)
> public SparkSession sparkSession(SparkConf conf) {
> return SparkSession.builder()
> .enableHiveSupport()
> .config(conf)
> .getOrCreate();
> }{code}
>  
> {quote} 21/06/30 12:03:38 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
>  21/06/30 12:03:38 INFO Utils: Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>  21/06/30 12:03:38 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
> 300(ns)
>  21/06/30 12:03:39 INFO WelcomePageHandlerMapping: Adding welcome page 
> template: index
>  21/06/30 12:03:40 INFO Http11NioProtocol: Starting ProtocolHandler 
> ["http-nio-9000"]
>  21/06/30 12:03:40 INFO TomcatWebServer: Tomcat started on port(s): 9000 
> (http) with context path ''
>  21/06/30 12:03:40 INFO SpringApplication: Started application in 525.411 
> seconds (JVM running for 529.958)
>  21/06/30 12:03:40 INFO AbstractConnector: Stopped Spark@3e1d19ea\{HTTP/1.1, 
> (http/1.1)}
>  Unknown macro: \{0.0.0.0}
>  21/06/30 12:03:40 INFO SparkUI: Stopped Spark web UI at 
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Interrupting monitor 
> thread
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: Shutting down all 
> executors
>  21/06/30 12:03:40 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down
>  21/06/30 12:03:40 INFO YarnClientSchedulerBackend: YARN client scheduler 
> backend Stopped
>  21/06/30 12:03:40 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
>  21/06/30 12:03:40 INFO MemoryStore: MemoryStore cleared
>  21/06/30 12:03:40 INFO BlockManager: BlockManager stopped
>  21/06/30 12:03:40 INFO BlockManagerMaster: BlockManagerMaster stopped
>  21/06/30 12:03:40 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
>  21/06/30 12:03:40 INFO SparkContext: Successfully stopped SparkContext
>  21/06/30 12:03:40 INFO [/]: Initializing Spring DispatcherServlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Initializing Servlet 
> 'dispatcherServlet'
>  21/06/30 12:03:40 INFO DispatcherServlet: Completed initialization in 1 ms
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35960) sbt test:compile of tags is broken

2021-06-30 Thread Holden Karau (Jira)

Holden Karau created SPARK-35960:


 Summary: sbt test:compile of tags is broken
 Key: SPARK-35960
 URL: https://issues.apache.org/jira/browse/SPARK-35960
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Holden Karau
Assignee: Holden Karau


The upgrade of scalatestplus to 3.2.9.0 needs to also update scalatest to 3.2.9 
or sbt fails to resolve everything correctly. Issue only affects test:compile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35960) sbt test:compile of tags is broken

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35960:


Assignee: Apache Spark  (was: Holden Karau)

> sbt test:compile of tags is broken
> --
>
> Key: SPARK-35960
> URL: https://issues.apache.org/jira/browse/SPARK-35960
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>
> The upgrade of scalatestplus to 3.2.9.0 needs to also update scalatest to 
> 3.2.9 or sbt fails to resolve everything correctly. Issue only affects 
> test:compile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35960) sbt test:compile of tags is broken

2021-06-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35960:


Assignee: Holden Karau  (was: Apache Spark)

> sbt test:compile of tags is broken
> --
>
> Key: SPARK-35960
> URL: https://issues.apache.org/jira/browse/SPARK-35960
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> The upgrade of scalatestplus to 3.2.9.0 needs to also update scalatest to 
> 3.2.9 or sbt fails to resolve everything correctly. Issue only affects 
> test:compile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 132 matches

Mail list logo