[jira] [Resolved] (SPARK-40312) Add missing configuration documentation in Spark History Server

2022-09-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40312.
-
Fix Version/s: 3.4.0
 Assignee: dzcxzl
   Resolution: Fixed

> Add missing configuration documentation in Spark History Server
> ---
>
> Key: SPARK-40312
> URL: https://issues.apache.org/jira/browse/SPARK-40312
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39414) Upgrade Scala to 2.12.16

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599807#comment-17599807
 ] 

Apache Spark commented on SPARK-39414:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/37780

> Upgrade Scala to 2.12.16
> 
>
> Key: SPARK-39414
> URL: https://issues.apache.org/jira/browse/SPARK-39414
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/scala/scala/releases/tag/v2.12.16



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40319) Remove duplicated query execution error method for PARSE_DATETIME_BY_NEW_PARSER

2022-09-02 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40319.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37776
[https://github.com/apache/spark/pull/37776]

> Remove duplicated query execution error method for 
> PARSE_DATETIME_BY_NEW_PARSER
> ---
>
> Key: SPARK-40319
> URL: https://issues.apache.org/jira/browse/SPARK-40319
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599764#comment-17599764
 ] 

Apache Spark commented on SPARK-40320:
--

User 'yabola' has created a pull request for this issue:
https://github.com/apache/spark/pull/37779

> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was 
> broken here, so executor doesn't receive any message)
> Some ideas:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor problem.  I think at least the Executor 
> status shouldn't be active here or the Executor can exitExecutor (kill itself)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40320:


Assignee: Apache Spark

> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Assignee: Apache Spark
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was 
> broken here, so executor doesn't receive any message)
> Some ideas:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor problem.  I think at least the Executor 
> status shouldn't be active here or the Executor can exitExecutor (kill itself)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40320:


Assignee: (was: Apache Spark)

> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was 
> broken here, so executor doesn't receive any message)
> Some ideas:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor problem.  I think at least the Executor 
> status shouldn't be active here or the Executor can exitExecutor (kill itself)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599763#comment-17599763
 ] 

Apache Spark commented on SPARK-40320:
--

User 'yabola' has created a pull request for this issue:
https://github.com/apache/spark/pull/37779

> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was 
> broken here, so executor doesn't receive any message)
> Some ideas:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor problem.  I think at least the Executor 
> status shouldn't be active here or the Executor can exitExecutor (kill itself)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Solution:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> Reproduce step:
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> Root Cause:
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was 
> broken here, so executor doesn't receive any message)
> Solution:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor 

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some ideas:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some idea:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` 

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some idea:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Solution:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . 

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the executor 

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> Reproduce step:
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> Root Cause:
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)
Mars created SPARK-40320:


 Summary: When the Executor plugin fails to initialize, the 
Executor shows active but does not accept tasks forever, just like being hung
 Key: SPARK-40320
 URL: https://issues.apache.org/jira/browse/SPARK-40320
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 3.0.0
Reporter: Mars






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40283) Update mima's previousSparkVersion to 3.3.0

2022-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40283.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37741
[https://github.com/apache/spark/pull/37741]

> Update mima's previousSparkVersion to 3.3.0
> ---
>
> Key: SPARK-40283
> URL: https://issues.apache.org/jira/browse/SPARK-40283
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40283) Update mima's previousSparkVersion to 3.3.0

2022-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40283:
-

Assignee: Yang Jie  (was: Yuming Wang)

> Update mima's previousSparkVersion to 3.3.0
> ---
>
> Key: SPARK-40283
> URL: https://issues.apache.org/jira/browse/SPARK-40283
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40283) Update mima's previousSparkVersion to 3.3.0

2022-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40283:
-

Assignee: Yuming Wang

> Update mima's previousSparkVersion to 3.3.0
> ---
>
> Key: SPARK-40283
> URL: https://issues.apache.org/jira/browse/SPARK-40283
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38542) UnsafeHashedRelation should serialize numKeys out

2022-09-02 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-38542:
---
Labels: correctness  (was: )

> UnsafeHashedRelation should serialize numKeys out
> -
>
> Key: SPARK-38542
> URL: https://issues.apache.org/jira/browse/SPARK-38542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: mcdull_zhang
>Priority: Critical
>  Labels: correctness
> Fix For: 3.3.0, 3.2.2
>
>
> At present, UnsafeHashedRelation does not write out numKeys during 
> serialization, so the numKeys of UnsafeHashedRelation obtained by 
> deserialization is equal to 0. The numFields of UnsafeRows returned by 
> UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect 
> data.
>  
> For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is 
> called.
> {code:java}
> val broadcastRelation = child.executeBroadcast[HashedRelation]().value
> val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
>   (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
> } else {
>   (broadcastRelation.keys(),
> BoundReference(index, buildKeys(index).dataType, 
> buildKeys(index).nullable))
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40309) Introduce sql_conf context manager for pyspark.sql

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40309:


Assignee: Apache Spark

> Introduce sql_conf context manager for pyspark.sql
> --
>
> Key: SPARK-40309
> URL: https://issues.apache.org/jira/browse/SPARK-40309
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>  Labels: release-notes
>
> That would simplify the control of Spark SQL configuration as below
> from
> {code:java}
> original_value = spark.conf.get("key")
> spark.conf.set("key", "value")
> ...
> spark.conf.set("key", original_value){code}
> to
> {code:java}
> with sql_conf({"key": "value"}):
> ...
> {code}
> [Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
>  is such a context manager is in Pandas API on Spark.
> We should introduce one in `pyspark.sql`, and deduplicate code if possible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40309) Introduce sql_conf context manager for pyspark.sql

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40309:


Assignee: (was: Apache Spark)

> Introduce sql_conf context manager for pyspark.sql
> --
>
> Key: SPARK-40309
> URL: https://issues.apache.org/jira/browse/SPARK-40309
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: release-notes
>
> That would simplify the control of Spark SQL configuration as below
> from
> {code:java}
> original_value = spark.conf.get("key")
> spark.conf.set("key", "value")
> ...
> spark.conf.set("key", original_value){code}
> to
> {code:java}
> with sql_conf({"key": "value"}):
> ...
> {code}
> [Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
>  is such a context manager is in Pandas API on Spark.
> We should introduce one in `pyspark.sql`, and deduplicate code if possible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40309) Introduce sql_conf context manager for pyspark.sql

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599718#comment-17599718
 ] 

Apache Spark commented on SPARK-40309:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/3

> Introduce sql_conf context manager for pyspark.sql
> --
>
> Key: SPARK-40309
> URL: https://issues.apache.org/jira/browse/SPARK-40309
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: release-notes
>
> That would simplify the control of Spark SQL configuration as below
> from
> {code:java}
> original_value = spark.conf.get("key")
> spark.conf.set("key", "value")
> ...
> spark.conf.set("key", original_value){code}
> to
> {code:java}
> with sql_conf({"key": "value"}):
> ...
> {code}
> [Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
>  is such a context manager is in Pandas API on Spark.
> We should introduce one in `pyspark.sql`, and deduplicate code if possible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40309) Introduce sql_conf context manager for pyspark.sql

2022-09-02 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40309:
-
Description: 
That would simplify the control of Spark SQL configuration as below

from
{code:java}
original_value = spark.conf.get("key")
spark.conf.set("key", "value")
...
spark.conf.set("key", original_value){code}
to
{code:java}
with sql_conf({"key": "value"}):
...
{code}
[Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
 is such a context manager is in Pandas API on Spark.

We should introduce one in `pyspark.sql`, and deduplicate code if possible.

 

  was:
[Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
 a context manager is introduced to set the Spark SQL configuration and
then restores it back when it exits, in Pandas API on Spark.

That simplifies the control of Spark SQL configuration as below

from
{code:java}
original_value = spark.conf.get("key")
spark.conf.set("key", "value")
...
spark.conf.set("key", original_value){code}
to
{code:java}
with sql_conf({"key": "value"}):
...
{code}
We should introduce a similar context manager in `pyspark.sql`, and deduplicate 
code if possible.

 


> Introduce sql_conf context manager for pyspark.sql
> --
>
> Key: SPARK-40309
> URL: https://issues.apache.org/jira/browse/SPARK-40309
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: release-notes
>
> That would simplify the control of Spark SQL configuration as below
> from
> {code:java}
> original_value = spark.conf.get("key")
> spark.conf.set("key", "value")
> ...
> spark.conf.set("key", original_value){code}
> to
> {code:java}
> with sql_conf({"key": "value"}):
> ...
> {code}
> [Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
>  is such a context manager is in Pandas API on Spark.
> We should introduce one in `pyspark.sql`, and deduplicate code if possible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40319) Remove duplicated query execution error method for PARSE_DATETIME_BY_NEW_PARSER

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599717#comment-17599717
 ] 

Apache Spark commented on SPARK-40319:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37776

> Remove duplicated query execution error method for 
> PARSE_DATETIME_BY_NEW_PARSER
> ---
>
> Key: SPARK-40319
> URL: https://issues.apache.org/jira/browse/SPARK-40319
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40319) Remove duplicated query execution error method for PARSE_DATETIME_BY_NEW_PARSER

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599716#comment-17599716
 ] 

Apache Spark commented on SPARK-40319:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37776

> Remove duplicated query execution error method for 
> PARSE_DATETIME_BY_NEW_PARSER
> ---
>
> Key: SPARK-40319
> URL: https://issues.apache.org/jira/browse/SPARK-40319
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40319) Remove duplicated query execution error method for PARSE_DATETIME_BY_NEW_PARSER

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40319:


Assignee: Gengliang Wang  (was: Apache Spark)

> Remove duplicated query execution error method for 
> PARSE_DATETIME_BY_NEW_PARSER
> ---
>
> Key: SPARK-40319
> URL: https://issues.apache.org/jira/browse/SPARK-40319
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40319) Remove duplicated query execution error method for PARSE_DATETIME_BY_NEW_PARSER

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40319:


Assignee: Apache Spark  (was: Gengliang Wang)

> Remove duplicated query execution error method for 
> PARSE_DATETIME_BY_NEW_PARSER
> ---
>
> Key: SPARK-40319
> URL: https://issues.apache.org/jira/browse/SPARK-40319
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40319) Remove duplicated query execution error method for PARSE_DATETIME_BY_NEW_PARSER

2022-09-02 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40319:
--

 Summary: Remove duplicated query execution error method for 
PARSE_DATETIME_BY_NEW_PARSER
 Key: SPARK-40319
 URL: https://issues.apache.org/jira/browse/SPARK-40319
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40309) Introduce sql_conf context manager for pyspark.sql

2022-09-02 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40309:
-
Description: 
[Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
 a context manager is introduced to set the Spark SQL configuration and
then restores it back when it exits, in Pandas API on Spark.

That simplifies the control of Spark SQL configuration as below

from
{code:java}
original_value = spark.conf.get("key")
spark.conf.set("key", "value")
...
spark.conf.set("key", original_value){code}
to
{code:java}
with sql_conf({"key": "value"}):
...
{code}
We should introduce a similar context manager in `pyspark.sql`, and deduplicate 
code if possible.

 

  was:
[https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]

a context manager is introduced to set the Spark SQL configuration and
then restores it back when it exits, in Pandas API on Spark.

That simplifies the control of Spark SQL configuration, 

from
{code:java}
original_value = spark.conf.get("key")
spark.conf.set("key", "value")
...
spark.conf.set("key", original_value){code}
to
{code:java}
with sql_conf({"key": "value"}):
...
{code}
 

 


> Introduce sql_conf context manager for pyspark.sql
> --
>
> Key: SPARK-40309
> URL: https://issues.apache.org/jira/browse/SPARK-40309
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: release-notes
>
> [Here|https://github.com/apache/spark/blob/master/python/pyspark/pandas/utils.py#L490]
>  a context manager is introduced to set the Spark SQL configuration and
> then restores it back when it exits, in Pandas API on Spark.
> That simplifies the control of Spark SQL configuration as below
> from
> {code:java}
> original_value = spark.conf.get("key")
> spark.conf.set("key", "value")
> ...
> spark.conf.set("key", original_value){code}
> to
> {code:java}
> with sql_conf({"key": "value"}):
> ...
> {code}
> We should introduce a similar context manager in `pyspark.sql`, and 
> deduplicate code if possible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40318) try_avg() should throw the exceptions from its child

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599703#comment-17599703
 ] 

Apache Spark commented on SPARK-40318:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37775

> try_avg() should throw the exceptions from its child
> 
>
> Key: SPARK-40318
> URL: https://issues.apache.org/jira/browse/SPARK-40318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> milar to [https://github.com/apache/spark/pull/37486] and 
> [https://github.com/apache/spark/pull/37663,]  the errors from try_avg()'s 
> child should be shown instead of ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40318) try_avg() should throw the exceptions from its child

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599702#comment-17599702
 ] 

Apache Spark commented on SPARK-40318:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37775

> try_avg() should throw the exceptions from its child
> 
>
> Key: SPARK-40318
> URL: https://issues.apache.org/jira/browse/SPARK-40318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> milar to [https://github.com/apache/spark/pull/37486] and 
> [https://github.com/apache/spark/pull/37663,]  the errors from try_avg()'s 
> child should be shown instead of ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40318) try_avg() should throw the exceptions from its child

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40318:


Assignee: Gengliang Wang  (was: Apache Spark)

> try_avg() should throw the exceptions from its child
> 
>
> Key: SPARK-40318
> URL: https://issues.apache.org/jira/browse/SPARK-40318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> milar to [https://github.com/apache/spark/pull/37486] and 
> [https://github.com/apache/spark/pull/37663,]  the errors from try_avg()'s 
> child should be shown instead of ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40318) try_avg() should throw the exceptions from its child

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40318:


Assignee: Apache Spark  (was: Gengliang Wang)

> try_avg() should throw the exceptions from its child
> 
>
> Key: SPARK-40318
> URL: https://issues.apache.org/jira/browse/SPARK-40318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> milar to [https://github.com/apache/spark/pull/37486] and 
> [https://github.com/apache/spark/pull/37663,]  the errors from try_avg()'s 
> child should be shown instead of ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40318) try_avg() should throw the exceptions from its child

2022-09-02 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40318:
--

 Summary: try_avg() should throw the exceptions from its child
 Key: SPARK-40318
 URL: https://issues.apache.org/jira/browse/SPARK-40318
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


milar to [https://github.com/apache/spark/pull/37486] and 
[https://github.com/apache/spark/pull/37663,]  the errors from try_avg()'s 
child should be shown instead of ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40210) Fix math atan2, hypot, pow and pmod float argument call

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599672#comment-17599672
 ] 

Apache Spark commented on SPARK-40210:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37774

> Fix math atan2, hypot, pow and pmod float argument call
> ---
>
> Key: SPARK-40210
> URL: https://issues.apache.org/jira/browse/SPARK-40210
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Minor
>
> PySpark atan2, hypot, pow and pmod functions marked as accepting float type 
> as argument but produce error when called together



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40317) Improvement to JDBC predicate for queries involving joins

2022-09-02 Thread David Ahern (Jira)
David Ahern created SPARK-40317:
---

 Summary: Improvement to JDBC predicate for queries involving joins
 Key: SPARK-40317
 URL: https://issues.apache.org/jira/browse/SPARK-40317
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.2
Reporter: David Ahern


Current behaviour on tables involving joins seems to use a subquery as follows

 

select * from

(

select a, b, c from tbl1

lj tbl2 on tbl1.col1 = tbl2.col1

lj tbl3 on tbl1.col2 = tbl3.col2

)

where predicate = 1

where predicate = 2

where predicate = 3

 

More desirable would be

(

select a, b, c from tbl1 where (predicate = 1, predicate = 2, etc)

lj tbl2 on tbl1.col1 = tbl2.col1

lj tbl3 on tbl1.col2 = tbl3.col2

)

 

to just do the join on the subset of data rather than joining all data then 
filtering.  Predicate pushdown usually only works on columns that have been 
indexes.  So even if the data isn't indexed, this would reduce amount of data 
needing to be moved.  In many cases better to do the join on DB side than 
pulling everything into Spark.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-02 Thread Sachit (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599644#comment-17599644
 ] 

Sachit edited comment on SPARK-40316 at 9/2/22 5:30 PM:


Hi [~srowen] 

I am using UDF in scala which is causing this. To debug tried changing data 
type from Seq[Long] to List[Long] 

Even tried changing to Option[Seq[Long]]. 

 

Here is signature of UDF

val myUDF: UserDefinedFunction = udf((inputList: Seq[Long], newList: Seq[Long], 
n: Int, p: Int) => {
}


was (Author: JIRAUSER287754):
I am using UDF in scala which is causing this. To debug tried changing data 
type from Seq[Long] to List[Long] 

Even tried changing to Option[Seq[Long]]. 

 

Here is signature of UDF



val myUDF: UserDefinedFunction = udf((inputList: Seq[Long], newList: Seq[Long], 
n: Int, p: Int) => {
}

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-02 Thread Sachit (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599644#comment-17599644
 ] 

Sachit commented on SPARK-40316:


I am using UDF in scala which is causing this. To debug tried changing data 
type from Seq[Long] to List[Long] 

Even tried changing to Option[Seq[Long]]. 

 

Here is signature of UDF



val myUDF: UserDefinedFunction = udf((inputList: Seq[Long], newList: Seq[Long], 
n: Int, p: Int) => {
}

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40299) java api calls the count() method to appear: java.lang.ArithmeticException: BigInteger would overflow supported range

2022-09-02 Thread Bhavya Jain (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599634#comment-17599634
 ] 

Bhavya Jain commented on SPARK-40299:
-

[~code1v5] Can you please elaborate more on the issue ?
Add some examples and scenarios on how we can replicate the issue ?

> java api calls the count() method to appear: java.lang.ArithmeticException: 
> BigInteger would overflow supported range
> -
>
> Key: SPARK-40299
> URL: https://issues.apache.org/jira/browse/SPARK-40299
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.2
>Reporter: code1v5
>Priority: Major
>
> ive Session ID = a372ea31-ac98-4e01-9de3-dfb623df87a4
> 22/09/01 13:50:32 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> [Stage 0:>                                                          (0 + 8) / 
> 8]22/09/01 13:50:41 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, 
> hdp3-10-106, executor 6): java.lang.ArithmeticException: BigInteger would 
> overflow supported range
>     at java.math.BigInteger.reportOverflow(BigInteger.java:1084)
>     at java.math.BigInteger.pow(BigInteger.java:2391)
>     at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3574)
>     at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3707)
>     at java.math.BigDecimal.setScale(BigDecimal.java:2448)
>     at java.math.BigDecimal.setScale(BigDecimal.java:2515)
>     at 
> org.apache.hadoop.hive.common.type.HiveDecimal.trim(HiveDecimal.java:241)
>     at 
> org.apache.hadoop.hive.common.type.HiveDecimal.normalize(HiveDecimal.java:252)
>     at 
> org.apache.hadoop.hive.common.type.HiveDecimal.create(HiveDecimal.java:83)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazyHiveDecimal.init(LazyHiveDecimal.java:79)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:226)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:202)
>     at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:128)
>     at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:439)
>     at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:434)
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown
>  Source)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>     at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>     at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>     at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>     at org.apache.spark.scheduler.Task.run(Task.scala:109)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> 22/09/01 13:50:42 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; 
> aborting job
> 22/09/01 13:50:42 WARN TaskSetManager: Lost task 7.0 in stage 0.0 (TID 7, 
> hdp2-10-105, executor 8): TaskKilled (Stage cancelled)
> [Stage 0:>                                                          (0 + 6) / 
> 8]org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 
> in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 
> (TID 10, hdp3-10-106, executor 6): java.lang.ArithmeticException: BigInteger 
> would overflow supported range
>     at java.math.BigInteger.reportOverflow(BigInteger.java:1084)
>     at java.math.BigInteger.pow(BigInteger.java:2391)
>     at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3574)
>     at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3707)
>     at java.math.BigDecimal.setScale(BigDecimal.java:2448)
>     at 

[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-02 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599629#comment-17599629
 ] 

Sean R. Owen commented on SPARK-40316:
--

There's no info about what you're doing that triggers this, or what you've done 
to debug

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40303) The performance will be worse after codegen

2022-09-02 Thread Kris Mok (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599616#comment-17599616
 ] 

Kris Mok commented on SPARK-40303:
--

Nice findings [~LuciferYang]!

{quote}
After some experiments, I found when the number of parameters exceeds 50, the 
performance of the case in the Jira description will significant deterioration.
{quote}
Sounds reasonable. Note that in a STD compilation, the only things that need to 
be live at the method entry are the method parameters (both implicit ones like 
{{this}}, and explicit ones); however, for an OSR compilation, it would be all 
of the parameters/local variables that are live at the loop entry point, so in 
this case both the {{doConsume}} parameters and the local variables contribute 
to the problem.

Just FYI I have an old write on PrintCompilation and OSR here: 
https://gist.github.com/rednaxelafx/1165804#file-notes-md
(Gee, just realized that was from 11 years ago...)

{quote}
maybe try to make the input parameters of the `doConsume` method fixed length 
will help, such as using a List or Array
{quote}
Welp, hoisting the parameters into an Arguments object is rather common in 
"code splitting" in code generators. Since we're already doing codegen, it's 
possible to generate tailor-made Arguments classes to retain the type 
information. Using List/Array would require extra boxing for primitive types 
and it's less ideal.
(An array-based box is already used in Spark SQL's codegen in the form of the 
{{references}} array. Indeed the type info is lost on the interface level and 
you'd have to do a cast when you get data out of it. It's still usable though.)

> The performance will be worse after codegen
> ---
>
> Key: SPARK-40303
> URL: https://issues.apache.org/jira/browse/SPARK-40303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import org.apache.spark.benchmark.Benchmark
> val dir = "/tmp/spark/benchmark"
> val N = 200
> val columns = Range(0, 100).map(i => s"id % $i AS id$i")
> spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir)
> // Seq(1, 2, 5, 10, 15, 25, 40, 60, 100)
> Seq(60).foreach{ cnt =>
>   val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => 
> s"count(distinct $c)")
>   val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = 
> 1)
>   benchmark.addCase(s"$cnt count distinct with codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "true",
>   "spark.sql.codegen.factoryMode" -> "FALLBACK") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.addCase(s"$cnt count distinct without codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "false",
>   "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.run()
> }
> {code}
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark count distinct: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> 60 count distinct with codegen   628146 628146
>0  0.0  314072.8   1.0X
> 60 count distinct without codegen147635 147635
>0  0.0   73817.5   4.3X
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-02 Thread Sachit (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599614#comment-17599614
 ] 

Sachit commented on SPARK-40316:


Hi [~hyukjin.kwon] / [~srowen]  , Can you please help  here?

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-02 Thread Sachit (Jira)
Sachit created SPARK-40316:
--

 Summary: Upgrading to Spark 3 is giving NullPointerException
 Key: SPARK-40316
 URL: https://issues.apache.org/jira/browse/SPARK-40316
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.2
Reporter: Sachit


Getting below error while upgrading to Spark3

 

java.lang.RuntimeException: Error while decoding: 
java.lang.NullPointerException: Null value appeared in non-nullable field:
- array element class: "scala.Long"
- root class: "scala.collection.Seq"
If the schema is inferred from a Scala tuple/case class, or a Java bean, please 
try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer 
instead of int/scala.Int).
mapobjects(lambdavariable(MapObject, LongType, true, -1), 
assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
array, true], Some(interface scala.collection.Seq))
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
    at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
    at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
 Source)
    at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
    at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
    at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40251) Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2

2022-09-02 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40251.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37700
[https://github.com/apache/spark/pull/37700]

> Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2
> --
>
> Key: SPARK-40251
> URL: https://issues.apache.org/jira/browse/SPARK-40251
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/luhenry/netlib/compare/v2.2.1...v3.0.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40251) Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2

2022-09-02 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40251:


Assignee: BingKun Pan

> Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2
> --
>
> Key: SPARK-40251
> URL: https://issues.apache.org/jira/browse/SPARK-40251
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>
> https://github.com/luhenry/netlib/compare/v2.2.1...v3.0.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40310) try_sum() should throw the exceptions from its child

2022-09-02 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-40310.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37764
[https://github.com/apache/spark/pull/37764]

> try_sum() should throw the exceptions from its child
> 
>
> Key: SPARK-40310
> URL: https://issues.apache.org/jira/browse/SPARK-40310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Similar to [https://github.com/apache/spark/pull/37486] and 
> [https://github.com/apache/spark/pull/37663,]  the errors from try_sum()'s 
> child should be shown instead of ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40228) Don't simplify multiLike if child is not attribute

2022-09-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-40228:
---

Assignee: Yuming Wang

> Don't simplify multiLike if child is not attribute
> --
>
> Key: SPARK-40228
> URL: https://issues.apache.org/jira/browse/SPARK-40228
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> {code:scala}
> sql("create table t1(name string) using parquet")
> sql("select * from t1 where substr(name, 1, 5) like any('%a', 'b%', 
> '%c%')").explain(true)
> {code}
> {noformat}
> == Physical Plan ==
> *(1) Filter ((EndsWith(substr(name#0, 1, 5), a) OR StartsWith(substr(name#0, 
> 1, 5), b)) OR Contains(substr(name#0, 1, 5), c))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[name#0] Batched: true, DataFilters: 
> [((EndsWith(substr(name#0, 1, 5), a) OR StartsWith(substr(name#0, 1, 5), b)) 
> OR Contains(substr(n..., Format: Parquet, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40228) Don't simplify multiLike if child is not attribute

2022-09-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40228.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37672
[https://github.com/apache/spark/pull/37672]

> Don't simplify multiLike if child is not attribute
> --
>
> Key: SPARK-40228
> URL: https://issues.apache.org/jira/browse/SPARK-40228
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:scala}
> sql("create table t1(name string) using parquet")
> sql("select * from t1 where substr(name, 1, 5) like any('%a', 'b%', 
> '%c%')").explain(true)
> {code}
> {noformat}
> == Physical Plan ==
> *(1) Filter ((EndsWith(substr(name#0, 1, 5), a) OR StartsWith(substr(name#0, 
> 1, 5), b)) OR Contains(substr(name#0, 1, 5), c))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[name#0] Batched: true, DataFilters: 
> [((EndsWith(substr(name#0, 1, 5), a) OR StartsWith(substr(name#0, 1, 5), b)) 
> OR Contains(substr(n..., Format: Parquet, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40098) Format error messages in the Thrift Server

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599503#comment-17599503
 ] 

Apache Spark commented on SPARK-40098:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37773

> Format error messages in the Thrift Server
> --
>
> Key: SPARK-40098
> URL: https://issues.apache.org/jira/browse/SPARK-40098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> # Introduce a config to control the format of error messages: plain text and 
> JSON
> # Modify the Thrift Server to output errors from Spark SQL according to the 
> config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40098) Format error messages in the Thrift Server

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599504#comment-17599504
 ] 

Apache Spark commented on SPARK-40098:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37773

> Format error messages in the Thrift Server
> --
>
> Key: SPARK-40098
> URL: https://issues.apache.org/jira/browse/SPARK-40098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> # Introduce a config to control the format of error messages: plain text and 
> JSON
> # Modify the Thrift Server to output errors from Spark SQL according to the 
> config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599501#comment-17599501
 ] 

Apache Spark commented on SPARK-40163:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37772

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Assignee: seunggabi
>Priority: Trivial
> Fix For: 3.4.0
>
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599500#comment-17599500
 ] 

Apache Spark commented on SPARK-40163:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37772

> [SPARK][SQL] feat: SparkSession.confing(Map)
> 
>
> Key: SPARK-40163
> URL: https://issues.apache.org/jira/browse/SPARK-40163
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: seunggabi
>Assignee: seunggabi
>Priority: Trivial
> Fix For: 3.4.0
>
>
> [https://github.com/apache/spark/pull/37478] 
> - as-is
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> var b = builder
> map.keys.forEach {
> val k = it
> val v = map[k]
> b = when (v) {
> is Long -> b.config(k, v)
> is String -> b.config(k, v)
> is Double -> b.config(k, v)
> is Boolean -> b.config(k, v)
> else -> b
> }
> }
> return b
> }
> } {code}
> - to-be
> {code:java}
> private fun config(builder: SparkSession.Builder): SparkSession.Builder {
> val map = YamlUtils.read(this::class.java, "spark", Extension.YAML)
> return b.config(map)
> }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40315) Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599475#comment-17599475
 ] 

Apache Spark commented on SPARK-40315:
--

User 'c27kwan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37771

> Non-deterministic hashCode() calculations for ArrayBasedMapData on equal 
> objects
> 
>
> Key: SPARK-40315
> URL: https://issues.apache.org/jira/browse/SPARK-40315
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Carmen Kwan
>Priority: Major
>
> There is no explicit `hashCode()` function override for the 
> `ArrayBasedMapData` LogicalPlan. As a result, the `hashCode()` computed for 
> `ArrayBasedMapData` can be different for two equal objects (objects with 
> equal keys and values).
> This error is non-deterministic and hard to reproduce, as we don't control 
> the default `hashCode()` function.
> We should override the `hashCode` function so that it works exactly as we 
> expect. We should also have an explicit `equals()` function for consistency 
> with how `Literals` check for equality of `ArrayBasedMapData`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40315) Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40315:


Assignee: (was: Apache Spark)

> Non-deterministic hashCode() calculations for ArrayBasedMapData on equal 
> objects
> 
>
> Key: SPARK-40315
> URL: https://issues.apache.org/jira/browse/SPARK-40315
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Carmen Kwan
>Priority: Major
>
> There is no explicit `hashCode()` function override for the 
> `ArrayBasedMapData` LogicalPlan. As a result, the `hashCode()` computed for 
> `ArrayBasedMapData` can be different for two equal objects (objects with 
> equal keys and values).
> This error is non-deterministic and hard to reproduce, as we don't control 
> the default `hashCode()` function.
> We should override the `hashCode` function so that it works exactly as we 
> expect. We should also have an explicit `equals()` function for consistency 
> with how `Literals` check for equality of `ArrayBasedMapData`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40315) Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40315:


Assignee: Apache Spark

> Non-deterministic hashCode() calculations for ArrayBasedMapData on equal 
> objects
> 
>
> Key: SPARK-40315
> URL: https://issues.apache.org/jira/browse/SPARK-40315
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Carmen Kwan
>Assignee: Apache Spark
>Priority: Major
>
> There is no explicit `hashCode()` function override for the 
> `ArrayBasedMapData` LogicalPlan. As a result, the `hashCode()` computed for 
> `ArrayBasedMapData` can be different for two equal objects (objects with 
> equal keys and values).
> This error is non-deterministic and hard to reproduce, as we don't control 
> the default `hashCode()` function.
> We should override the `hashCode` function so that it works exactly as we 
> expect. We should also have an explicit `equals()` function for consistency 
> with how `Literals` check for equality of `ArrayBasedMapData`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40315) Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599474#comment-17599474
 ] 

Apache Spark commented on SPARK-40315:
--

User 'c27kwan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37771

> Non-deterministic hashCode() calculations for ArrayBasedMapData on equal 
> objects
> 
>
> Key: SPARK-40315
> URL: https://issues.apache.org/jira/browse/SPARK-40315
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Carmen Kwan
>Priority: Major
>
> There is no explicit `hashCode()` function override for the 
> `ArrayBasedMapData` LogicalPlan. As a result, the `hashCode()` computed for 
> `ArrayBasedMapData` can be different for two equal objects (objects with 
> equal keys and values).
> This error is non-deterministic and hard to reproduce, as we don't control 
> the default `hashCode()` function.
> We should override the `hashCode` function so that it works exactly as we 
> expect. We should also have an explicit `equals()` function for consistency 
> with how `Literals` check for equality of `ArrayBasedMapData`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40315) Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects

2022-09-02 Thread Carmen Kwan (Jira)
Carmen Kwan created SPARK-40315:
---

 Summary: Non-deterministic hashCode() calculations for 
ArrayBasedMapData on equal objects
 Key: SPARK-40315
 URL: https://issues.apache.org/jira/browse/SPARK-40315
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.2
Reporter: Carmen Kwan


There is no explicit `hashCode()` function override for the `ArrayBasedMapData` 
LogicalPlan. As a result, the `hashCode()` computed for `ArrayBasedMapData` can 
be different for two equal objects (objects with equal keys and values).

This error is non-deterministic and hard to reproduce, as we don't control the 
default `hashCode()` function.

We should override the `hashCode` function so that it works exactly as we 
expect. We should also have an explicit `equals()` function for consistency 
with how `Literals` check for equality of `ArrayBasedMapData`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40314) Add inline Scala and Python bindings

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599438#comment-17599438
 ] 

Apache Spark commented on SPARK-40314:
--

User 'Kimahriman' has created a pull request for this issue:
https://github.com/apache/spark/pull/37770

> Add inline Scala and Python bindings
> 
>
> Key: SPARK-40314
> URL: https://issues.apache.org/jira/browse/SPARK-40314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> Add bindings for inline and inline_outer to Scala and Python function APIs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40314) Add inline Scala and Python bindings

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40314:


Assignee: Apache Spark

> Add inline Scala and Python bindings
> 
>
> Key: SPARK-40314
> URL: https://issues.apache.org/jira/browse/SPARK-40314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Assignee: Apache Spark
>Priority: Major
>
> Add bindings for inline and inline_outer to Scala and Python function APIs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40314) Add inline Scala and Python bindings

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599434#comment-17599434
 ] 

Apache Spark commented on SPARK-40314:
--

User 'Kimahriman' has created a pull request for this issue:
https://github.com/apache/spark/pull/37770

> Add inline Scala and Python bindings
> 
>
> Key: SPARK-40314
> URL: https://issues.apache.org/jira/browse/SPARK-40314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> Add bindings for inline and inline_outer to Scala and Python function APIs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40314) Add inline Scala and Python bindings

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40314:


Assignee: (was: Apache Spark)

> Add inline Scala and Python bindings
> 
>
> Key: SPARK-40314
> URL: https://issues.apache.org/jira/browse/SPARK-40314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> Add bindings for inline and inline_outer to Scala and Python function APIs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40314) Add inline Scala and Python bindings

2022-09-02 Thread Adam Binford (Jira)
Adam Binford created SPARK-40314:


 Summary: Add inline Scala and Python bindings
 Key: SPARK-40314
 URL: https://issues.apache.org/jira/browse/SPARK-40314
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Adam Binford


Add bindings for inline and inline_outer to Scala and Python function APIs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40210) Fix math atan2, hypot, pow and pmod float argument call

2022-09-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40210.
---
Target Version/s: 3.4.0
  Resolution: Resolved

> Fix math atan2, hypot, pow and pmod float argument call
> ---
>
> Key: SPARK-40210
> URL: https://issues.apache.org/jira/browse/SPARK-40210
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Minor
>
> PySpark atan2, hypot, pow and pmod functions marked as accepting float type 
> as argument but produce error when called together



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40210) Fix math atan2, hypot, pow and pmod float argument call

2022-09-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40210:
-

Assignee: Khalid Mammadov

> Fix math atan2, hypot, pow and pmod float argument call
> ---
>
> Key: SPARK-40210
> URL: https://issues.apache.org/jira/browse/SPARK-40210
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Minor
>
> PySpark atan2, hypot, pow and pmod functions marked as accepting float type 
> as argument but produce error when called together



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40313) ps.DataFrame(data, index) should support the same anchor

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599359#comment-17599359
 ] 

Apache Spark commented on SPARK-40313:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37768

> ps.DataFrame(data, index) should support the same anchor
> 
>
> Key: SPARK-40313
> URL: https://issues.apache.org/jira/browse/SPARK-40313
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40313) ps.DataFrame(data, index) should support the same anchor

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40313:


Assignee: (was: Apache Spark)

> ps.DataFrame(data, index) should support the same anchor
> 
>
> Key: SPARK-40313
> URL: https://issues.apache.org/jira/browse/SPARK-40313
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40312) Add missing configuration documentation in Spark History Server

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599360#comment-17599360
 ] 

Apache Spark commented on SPARK-40312:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/37769

> Add missing configuration documentation in Spark History Server
> ---
>
> Key: SPARK-40312
> URL: https://issues.apache.org/jira/browse/SPARK-40312
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: dzcxzl
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40313) ps.DataFrame(data, index) should support the same anchor

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599358#comment-17599358
 ] 

Apache Spark commented on SPARK-40313:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37768

> ps.DataFrame(data, index) should support the same anchor
> 
>
> Key: SPARK-40313
> URL: https://issues.apache.org/jira/browse/SPARK-40313
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40312) Add missing configuration documentation in Spark History Server

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40312:


Assignee: (was: Apache Spark)

> Add missing configuration documentation in Spark History Server
> ---
>
> Key: SPARK-40312
> URL: https://issues.apache.org/jira/browse/SPARK-40312
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: dzcxzl
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40312) Add missing configuration documentation in Spark History Server

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40312:


Assignee: Apache Spark

> Add missing configuration documentation in Spark History Server
> ---
>
> Key: SPARK-40312
> URL: https://issues.apache.org/jira/browse/SPARK-40312
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40312) Add missing configuration documentation in Spark History Server

2022-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599361#comment-17599361
 ] 

Apache Spark commented on SPARK-40312:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/37769

> Add missing configuration documentation in Spark History Server
> ---
>
> Key: SPARK-40312
> URL: https://issues.apache.org/jira/browse/SPARK-40312
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: dzcxzl
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40313) ps.DataFrame(data, index) should support the same anchor

2022-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40313:


Assignee: Apache Spark

> ps.DataFrame(data, index) should support the same anchor
> 
>
> Key: SPARK-40313
> URL: https://issues.apache.org/jira/browse/SPARK-40313
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40313) ps.DataFrame(data, index) should support the same anchor

2022-09-02 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40313:
-

 Summary: ps.DataFrame(data, index) should support the same anchor
 Key: SPARK-40313
 URL: https://issues.apache.org/jira/browse/SPARK-40313
 Project: Spark
  Issue Type: Improvement
  Components: ps
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40312) Add missing configuration documentation in Spark History Server

2022-09-02 Thread dzcxzl (Jira)
dzcxzl created SPARK-40312:
--

 Summary: Add missing configuration documentation in Spark History 
Server
 Key: SPARK-40312
 URL: https://issues.apache.org/jira/browse/SPARK-40312
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.3.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40303) The performance will be worse after codegen

2022-09-02 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599332#comment-17599332
 ] 

Yang Jie commented on SPARK-40303:
--

Thank you for your explain, very clear [~rednaxelafx] .

I guess this may related to the number of `doConsume` method parameters. After 
some experiments, I found when the number of parameters exceeds 50, the 
performance of the case in the Jira description will significant deterioration.

If need change the code, maybe try to make the input parameters of the 
`doConsume` method fixed length will help, such as using a List or Array, but 
this will lose the parameter type, and I am not sure whether it is feasible in 
practice

> The performance will be worse after codegen
> ---
>
> Key: SPARK-40303
> URL: https://issues.apache.org/jira/browse/SPARK-40303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import org.apache.spark.benchmark.Benchmark
> val dir = "/tmp/spark/benchmark"
> val N = 200
> val columns = Range(0, 100).map(i => s"id % $i AS id$i")
> spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir)
> // Seq(1, 2, 5, 10, 15, 25, 40, 60, 100)
> Seq(60).foreach{ cnt =>
>   val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => 
> s"count(distinct $c)")
>   val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = 
> 1)
>   benchmark.addCase(s"$cnt count distinct with codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "true",
>   "spark.sql.codegen.factoryMode" -> "FALLBACK") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.addCase(s"$cnt count distinct without codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "false",
>   "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.run()
> }
> {code}
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark count distinct: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> 60 count distinct with codegen   628146 628146
>0  0.0  314072.8   1.0X
> 60 count distinct without codegen147635 147635
>0  0.0   73817.5   4.3X
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40303) The performance will be worse after codegen

2022-09-02 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599314#comment-17599314
 ] 

Yang Jie commented on SPARK-40303:
--

do some investigate on linux with spark `local[2]` module, maybe help:
 # Upgrading to Java 17 will significantly improve performance (about 10times 
over Java 8: From 20 minutes to 1.9 minutes)

 # When using Java 8, adding `spark.sql.CodeGenerator.validParamLength = 48` is 
helpful for performance improvement (From 20 minutes to 3.1 minutes)

> The performance will be worse after codegen
> ---
>
> Key: SPARK-40303
> URL: https://issues.apache.org/jira/browse/SPARK-40303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import org.apache.spark.benchmark.Benchmark
> val dir = "/tmp/spark/benchmark"
> val N = 200
> val columns = Range(0, 100).map(i => s"id % $i AS id$i")
> spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir)
> // Seq(1, 2, 5, 10, 15, 25, 40, 60, 100)
> Seq(60).foreach{ cnt =>
>   val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => 
> s"count(distinct $c)")
>   val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = 
> 1)
>   benchmark.addCase(s"$cnt count distinct with codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "true",
>   "spark.sql.codegen.factoryMode" -> "FALLBACK") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.addCase(s"$cnt count distinct without codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "false",
>   "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.run()
> }
> {code}
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark count distinct: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> 60 count distinct with codegen   628146 628146
>0  0.0  314072.8   1.0X
> 60 count distinct without codegen147635 147635
>0  0.0   73817.5   4.3X
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40303) The performance will be worse after codegen

2022-09-02 Thread Kris Mok (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599302#comment-17599302
 ] 

Kris Mok commented on SPARK-40303:
--

Interesting, thanks for posting your findings, [~LuciferYang]!

In the JDK8 case, the message:
{code}
COMPILE SKIPPED: unsupported calling sequence (not retryable)
{code}
Indicates that the compiler deemed this method should not be attempted to 
compile again on any tier of compilation, and because this is an OSR 
compilation (i.e. loop compilation), this will mark the method as "never try to 
perform OSR compilation again on all tiers". On JDK17 this is relaxed a bit and 
it'll try tier 1.

Note that OSR compilation and STD compilation (normal compilation) are separate 
things. Marking one as not compilation doesn't affect the other. I haven't 
checked the IR yet but if I had to guess, the reason why this method is 
recorded as not OSR compilable is because there are too many live local 
variables at the OSR (loop) entry point, beyond what the HotSpot JVM could 
support.
So only OSR is affected, STD should still be fine.

In general, the tiered compilation system in the HotSpot JVM works as:
- tier 0: interpreter
- tier 1: C1 no profiling (best code quality for C1, same as HotSpot Client 
VM's C1)
- tier 2: C1 basic profiling (lower performance than tier 1, only used when the 
target level is tier 3 but the C1 queue is too long)
- tier 3: C1 full profiling (lower performance than tier 2, for collecting 
profile to perform tier 4 profile-guided optimization)
- tier 4: C2

As such, tiers 2 and 3 are only useful if tier 4 is available. So if a method 
is recorded as "not compilable on tier 4", the only realistic option left is to 
try tier 1.

> The performance will be worse after codegen
> ---
>
> Key: SPARK-40303
> URL: https://issues.apache.org/jira/browse/SPARK-40303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import org.apache.spark.benchmark.Benchmark
> val dir = "/tmp/spark/benchmark"
> val N = 200
> val columns = Range(0, 100).map(i => s"id % $i AS id$i")
> spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir)
> // Seq(1, 2, 5, 10, 15, 25, 40, 60, 100)
> Seq(60).foreach{ cnt =>
>   val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => 
> s"count(distinct $c)")
>   val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = 
> 1)
>   benchmark.addCase(s"$cnt count distinct with codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "true",
>   "spark.sql.codegen.factoryMode" -> "FALLBACK") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.addCase(s"$cnt count distinct without codegen") { _ =>
> withSQLConf(
>   "spark.sql.codegen.wholeStage" -> "false",
>   "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") {
>   spark.read.parquet(dir).selectExpr(selectExps: 
> _*).write.format("noop").mode("Overwrite").save()
> }
>   }
>   benchmark.run()
> }
> {code}
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark count distinct: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> 60 count distinct with codegen   628146 628146
>0  0.0  314072.8   1.0X
> 60 count distinct without codegen147635 147635
>0  0.0   73817.5   4.3X
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org