from:"Thomas Graves \(Jira\)"

[jira] [Updated] (SPARK-48608) Spark 3.5: fails to build with value defaultValueNotConstantError is not a member of object org.apache.spark.sql.errors.QueryCompilationErrors

2024-06-12 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-48608:
--
Priority: Blocker  (was: Major)

> Spark 3.5: fails to build with value defaultValueNotConstantError is not a 
> member of object org.apache.spark.sql.errors.QueryCompilationErrors 
> ---
>
> Key: SPARK-48608
> URL: https://issues.apache.org/jira/browse/SPARK-48608
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.2
>Reporter: Thomas Graves
>Priority: Blocker
>
> PR [https://github.com/apache/spark/pull/46594] seems to have broken the 
> Spark 3.5 build.
> [ERROR] [Error] 
> ...sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:299:
>  value defaultValueNotConstantError is not a member of object 
> org.apache.spark.sql.errors.QueryCompilationErrors
> I don't see that definition defined on the 3.5 branch - 
> [https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala]
> I see it defined on master by 
> https://issues.apache.org/jira/browse/SPARK-46905 which only went into 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48608) Spark 3.5: fails to build with value defaultValueNotConstantError is not a member of object org.apache.spark.sql.errors.QueryCompilationErrors

2024-06-12 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-48608:
-

 Summary: Spark 3.5: fails to build with value 
defaultValueNotConstantError is not a member of object 
org.apache.spark.sql.errors.QueryCompilationErrors 
 Key: SPARK-48608
 URL: https://issues.apache.org/jira/browse/SPARK-48608
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.2
Reporter: Thomas Graves


PR [https://github.com/apache/spark/pull/46594] seems to have broken the Spark 
3.5 build.

[ERROR] [Error] 
...sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:299:
 value defaultValueNotConstantError is not a member of object 
org.apache.spark.sql.errors.QueryCompilationErrors

I don't see that definition defined on the 3.5 branch - 
[https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala]

I see it defined on master by https://issues.apache.org/jira/browse/SPARK-46905 
which only went into 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47398) AQE doesn't allow for extension of InMemoryTableScanExec

2024-03-21 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-47398.
---
Fix Version/s: 4.0.0
   3.5.2
 Assignee: Raza Jafri
   Resolution: Fixed

> AQE doesn't allow for extension of InMemoryTableScanExec
> 
>
> Key: SPARK-47398
> URL: https://issues.apache.org/jira/browse/SPARK-47398
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Raza Jafri
>Assignee: Raza Jafri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> As part of SPARK-42101, we added support to AQE for handling 
> InMemoryTableScanExec. 
> This change directly references `InMemoryTableScanExec` which limits users 
> from extending the caching functionality that was added as part of 
> SPARK-32274 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47458) Incorrect to calculate the concurrent task number

2024-03-19 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-47458.
---
Fix Version/s: 4.0.0
 Assignee: Bobby Wang
   Resolution: Fixed

> Incorrect to calculate the concurrent task number
> -
>
> Key: SPARK-47458
> URL: https://issues.apache.org/jira/browse/SPARK-47458
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bobby Wang
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The below test case failed,
>  
> {code:java}
> test("problem of calculating the maximum concurrent task") {
>   withTempDir { dir =>
> val discoveryScript = createTempScriptWithExpectedOutput(
>   dir, "gpuDiscoveryScript", """{"name": "gpu","addresses":["0", "1", 
> "2", "3"]}""")
> val conf = new SparkConf()
>   // Setup a local cluster which would only has one executor with 2 CPUs 
> and 1 GPU.
>   .setMaster("local-cluster[1, 6, 1024]")
>   .setAppName("test-cluster")
>   .set(WORKER_GPU_ID.amountConf, "4")
>   .set(WORKER_GPU_ID.discoveryScriptConf, discoveryScript)
>   .set(EXECUTOR_GPU_ID.amountConf, "4")
>   .set(TASK_GPU_ID.amountConf, "2")
>   // disable barrier stage retry to fail the application as soon as 
> possible
>   .set(BARRIER_MAX_CONCURRENT_TASKS_CHECK_MAX_FAILURES, 1)
> sc = new SparkContext(conf)
> TestUtils.waitUntilExecutorsUp(sc, 1, 6)
> // Setup a barrier stage which contains 2 tasks and each task requires 1 
> CPU and 1 GPU.
> // Therefore, the total resources requirement (2 CPUs and 2 GPUs) of this 
> barrier stage
> // can not be satisfied since the cluster only has 2 CPUs and 1 GPU in 
> total.
> assert(sc.parallelize(Range(1, 10), 2)
>   .barrier()
>   .mapPartitions { iter => iter }
>   .collect() sameElements Range(1, 10).toArray[Int])
>   }
> } {code}
> The error log
>  
>  
> [SPARK-24819]: Barrier execution mode does not allow run a barrier stage that 
> requires more slots than the total number of slots in the cluster currently. 
> Please init a new cluster with more resources(e.g. CPU, GPU) or repartition 
> the input RDD(s) to reduce the number of slots required to run this barrier 
> stage.
> org.apache.spark.scheduler.BarrierJobSlotsNumberCheckFailed: [SPARK-24819]: 
> Barrier execution mode does not allow run a barrier stage that requires more 
> slots than the total number of slots in the cluster currently. Please init a 
> new cluster with more resources(e.g. CPU, GPU) or repartition the input 
> RDD(s) to reduce the number of slots required to run this barrier stage.
> at 
> org.apache.spark.errors.SparkCoreErrors$.numPartitionsGreaterThanMaxNumConcurrentTasksError(SparkCoreErrors.scala:241)
> at 
> org.apache.spark.scheduler.DAGScheduler.checkBarrierStageWithNumSlots(DAGScheduler.scala:576)
> at 
> org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:654)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1321)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3055)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3046)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3035)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47208) Allow overriding base overhead memory

2024-03-14 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-47208:
-

Assignee: Joao Correia

> Allow overriding base overhead memory
> -
>
> Key: SPARK-47208
> URL: https://issues.apache.org/jira/browse/SPARK-47208
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core, YARN
>Affects Versions: 3.5.1
>Reporter: Joao Correia
>Assignee: Joao Correia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We can already select the desired overhead memory directly via the 
> _'spark.driver/executor.memoryOverhead'_ flags, however, if that flag is not 
> present the overhead memory calculation goes as follows:
> {code:java}
> overhead_memory = Max(384, 'spark.driver/executor.memory' * 
> 'spark.driver/executor.memoryOverheadFactor')
> where the 'memoryOverheadFactor' flag defaults to 0.1{code}
> There are certain times where being able to override the 384Mb minimum 
> directly can be beneficial. We may have a scenario where a lot of off-heap 
> operations are performed (ex: using package managers/native 
> compression/decompression) where we don't have a need for a large JVM heap 
> but we may still need a signficant amount of memory in the spark node. 
> Using the '{_}memoryOverheadFactor{_}' flag may not prove appropriate. Since 
> we may not want the overhead allocation to directly scale with JVM memory, as 
> a cost saving/resource limitation problem.
> As such, I propose the addition of a 
> 'spark.driver/executor.minMemoryOverhead' flag, which can be used to override 
> the 384Mib value used in the overhead calculation.
> The memory overhead calculation will now be :
> {code:java}
> min_memory = 
> sparkConf.get('spark.driver/executor.minMemoryOverhead').getOrElse(384)
> overhead_memory = Max(min_memory, 'spark.driver/executor.memory' * 
> 'spark.driver/executor.memoryOverheadFactor'){code}
> PR: https://github.com/apache/spark/pull/45240  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47208) Allow overriding base overhead memory

2024-03-14 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-47208.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Allow overriding base overhead memory
> -
>
> Key: SPARK-47208
> URL: https://issues.apache.org/jira/browse/SPARK-47208
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core, YARN
>Affects Versions: 3.5.1
>Reporter: Joao Correia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We can already select the desired overhead memory directly via the 
> _'spark.driver/executor.memoryOverhead'_ flags, however, if that flag is not 
> present the overhead memory calculation goes as follows:
> {code:java}
> overhead_memory = Max(384, 'spark.driver/executor.memory' * 
> 'spark.driver/executor.memoryOverheadFactor')
> where the 'memoryOverheadFactor' flag defaults to 0.1{code}
> There are certain times where being able to override the 384Mb minimum 
> directly can be beneficial. We may have a scenario where a lot of off-heap 
> operations are performed (ex: using package managers/native 
> compression/decompression) where we don't have a need for a large JVM heap 
> but we may still need a signficant amount of memory in the spark node. 
> Using the '{_}memoryOverheadFactor{_}' flag may not prove appropriate. Since 
> we may not want the overhead allocation to directly scale with JVM memory, as 
> a cost saving/resource limitation problem.
> As such, I propose the addition of a 
> 'spark.driver/executor.minMemoryOverhead' flag, which can be used to override 
> the 384Mib value used in the overhead calculation.
> The memory overhead calculation will now be :
> {code:java}
> min_memory = 
> sparkConf.get('spark.driver/executor.minMemoryOverhead').getOrElse(384)
> overhead_memory = Max(min_memory, 'spark.driver/executor.memory' * 
> 'spark.driver/executor.memoryOverheadFactor'){code}
> PR: https://github.com/apache/spark/pull/45240  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45527) Task fraction resource request is not expected

2024-02-27 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821279#comment-17821279
 ] 

Thomas Graves commented on SPARK-45527:
---

Note that this is related to SPARK-39853 which was supposed to implement stage 
level scheduling with dynamic allocation disabled.  That pr did not properly 
handle resources (gpu, fpga, etc)

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45527) Task fraction resource request is not expected

2024-01-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-45527.
---
Resolution: Fixed

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45527) Task fraction resource request is not expected

2024-01-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-45527:
--
Fix Version/s: 4.0.0

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45527) Task fraction resource request is not expected

2024-01-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-45527:
-

Assignee: Bobby Wang

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40129) Decimal multiply can produce the wrong answer because it rounds twice

2023-11-27 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790107#comment-17790107
 ] 

Thomas Graves commented on SPARK-40129:
---

this looks like a dup of https://issues.apache.org/jira/browse/SPARK-45786

> Decimal multiply can produce the wrong answer because it rounds twice
> -
>
> Key: SPARK-40129
> URL: https://issues.apache.org/jira/browse/SPARK-40129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
>  Labels: pull-request-available
>
> This looks like it has been around for a long time, but I have reproduced it 
> in 3.2.0+
> The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), 
> but I think it can be reproduced with other number combinations, and possibly 
> with divide too.
> {code:java}
> Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as 
> DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as 
> DECIMAL(38,10))").show(truncate=false)
> {code}
> This produces an answer in Spark of 
> {{-110083130231976019291714061058.575920}} But if I do the calculation in 
> regular java BigDecimal I get {{-110083130231976019291714061058.575919}}
> {code:java}
> BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
> BigDecimal r = new BigDecimal("-12.00");
> BigDecimal prod = l.multiply(r);
> BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);
> {code}
> Spark does essentially all of the same operations, but it used Decimal to do 
> it instead of java's BigDecimal directly. Spark, by way of Decimal, will set 
> a MathContext for the multiply operation that has a max precision of 38 and 
> will do half up rounding. That means that the result of the multiply 
> operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but 
> for the java BigDecimal code the result is 
> {{{}-110083130231976019291714061058.575919495600{}}}. Then in 
> CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression 
> in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that 
> point the already rounded number is rounded yet again resulting in what is 
> arguably a wrong answer by Spark.
> I have not fully tested this, but it looks like we could just remove the 
> MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal 
> operations appear to have their own overflow and rounding anyways. If we want 
> to potentially reduce the total memory usage, we could also set the max 
> precision to 39 and truncate (round down) the result in the math context 
> instead.  That would then let us round the result correctly in setPrecision 
> afterwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45937) Fix documentation of spark.executor.maxNumFailures

2023-11-20 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-45937.
---
Resolution: Duplicate

> Fix documentation of spark.executor.maxNumFailures
> --
>
> Key: SPARK-45937
> URL: https://issues.apache.org/jira/browse/SPARK-45937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Thomas Graves
>Priority: Critical
>
> https://issues.apache.org/jira/browse/SPARK-41210 added support for 
> spark.executor.maxNumFailures on Kubernetes, it made this config generic and 
> deprecated the yarn version.  This config isn't documented and defaults are 
> not documented.
>  
> [https://github.com/apache/spark/commit/40872e9a094f8459b0b6f626937ced48a8d98efb]
> \
> It also added {color:#0a3069}spark.executor.failuresValidityInterval.{color}
>  
> {color:#0a3069}Both need to have default values specified for yarn and k8s, 
> it also needs to remove the yarn documentation for equivalent configs 
> spark.yarn.max.executor.failures configuration{color}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45937) Fix documentation of spark.executor.maxNumFailures

2023-11-15 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786395#comment-17786395
 ] 

Thomas Graves commented on SPARK-45937:
---

 

@Cheng Pan  Could you fix this as followup?

> Fix documentation of spark.executor.maxNumFailures
> --
>
> Key: SPARK-45937
> URL: https://issues.apache.org/jira/browse/SPARK-45937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Thomas Graves
>Priority: Critical
>
> https://issues.apache.org/jira/browse/SPARK-41210 added support for 
> spark.executor.maxNumFailures on Kubernetes, it made this config generic and 
> deprecated the yarn version.  This config isn't documented and defaults are 
> not documented.
>  
> [https://github.com/apache/spark/commit/40872e9a094f8459b0b6f626937ced48a8d98efb]
> \
> It also added {color:#0a3069}spark.executor.failuresValidityInterval.{color}
>  
> {color:#0a3069}Both need to have default values specified for yarn and k8s, 
> it also needs to remove the yarn documentation for equivalent configs 
> spark.yarn.max.executor.failures configuration{color}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45937) Fix documentation of spark.executor.maxNumFailures

2023-11-15 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-45937:
-

 Summary: Fix documentation of spark.executor.maxNumFailures
 Key: SPARK-45937
 URL: https://issues.apache.org/jira/browse/SPARK-45937
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Thomas Graves


https://issues.apache.org/jira/browse/SPARK-41210 added support for 
spark.executor.maxNumFailures on Kubernetes, it made this config generic and 
deprecated the yarn version.  This config isn't documented and defaults are not 
documented.

 

[https://github.com/apache/spark/commit/40872e9a094f8459b0b6f626937ced48a8d98efb]

\

It also added {color:#0a3069}spark.executor.failuresValidityInterval.{color}

 

{color:#0a3069}Both need to have default values specified for yarn and k8s, it 
also needs to remove the yarn documentation for equivalent configs 
spark.yarn.max.executor.failures configuration{color}

 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45495) Support stage level task resource profile for k8s cluster when dynamic allocation disabled

2023-10-13 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-45495.
---
Resolution: Fixed

> Support stage level task resource profile for k8s cluster when dynamic 
> allocation disabled
> --
>
> Key: SPARK-45495
> URL: https://issues.apache.org/jira/browse/SPARK-45495
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Bobby Wang
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> [https://github.com/apache/spark/pull/37268] has introduced a new feature 
> that supports stage-level schedule task resource profile for standalone 
> cluster when dynamic allocation is disabled. It's really cool feature, 
> especially for ML/DL cases, more details can be found in that PR.
>  
> The problem here is that the feature is only available for standalone and 
> YARN cluster for now, but most users would also expect it can be used for 
> other spark clusters like K8s.
>  
> So I filed this issue to track this task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45527) Task fraction resource request is not expected

2023-10-13 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774957#comment-17774957
 ] 

Thomas Graves commented on SPARK-45527:
---

thanks for filing and digging into this. I assume this is only with the 
TaskResourceRequests and using the default ExecutorResourceRequests.  seems a 
bug since that functionality was added.  Either way when we fix should add 
tests similar if we can.

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Priority: Major
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45250) Support stage level task resource profile for yarn cluster when dynamic allocation disabled

2023-10-05 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-45250:
--
Fix Version/s: 3.5.1

> Support stage level task resource profile for yarn cluster when dynamic 
> allocation disabled
> ---
>
> Key: SPARK-45250
> URL: https://issues.apache.org/jira/browse/SPARK-45250
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Bobby Wang
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> [https://github.com/apache/spark/pull/37268] has introduced a new feature 
> that supports stage-level schedule task resource profile for standalone 
> cluster when dynamic allocation is disabled. It's really cool feature, 
> especially for ML/DL cases, more details can be found in that PR.
>  
> The problem here is that the feature is only available for standalone cluster 
> for now, but most users would also expect it can be used for other spark 
> clusters like yarn and k8s.
>  
> So I file this issue to track this task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-26 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-44940:
--
Fix Version/s: 3.5.0
   (was: 3.5.1)

> Improve performance of JSON parsing when 
> "spark.sql.json.enablePartialResults" is enabled
> -
>
> Key: SPARK-44940
> URL: https://issues.apache.org/jira/browse/SPARK-44940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.2, 3.5.0
>
>
> Follow-up on https://issues.apache.org/jira/browse/SPARK-40646.
> I found that JSON parsing is significantly slower due to exception creation 
> in control flow. Also, some fields are not parsed correctly and the exception 
> is thrown in certain cases: 
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590)
>   ... 39 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-26 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769338#comment-17769338
 ] 

Thomas Graves commented on SPARK-44940:
---

 I noticed this went into 3.5.0  
([https://github.com/apache/spark/commits/v3.5.0)] so updating the fixed 
versions.

> Improve performance of JSON parsing when 
> "spark.sql.json.enablePartialResults" is enabled
> -
>
> Key: SPARK-44940
> URL: https://issues.apache.org/jira/browse/SPARK-44940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.2, 3.5.1
>
>
> Follow-up on https://issues.apache.org/jira/browse/SPARK-40646.
> I found that JSON parsing is significantly slower due to exception creation 
> in control flow. Also, some fields are not parsed correctly and the exception 
> is thrown in certain cases: 
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590)
>   ... 39 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43919) Extract JSON functionality out of Row

2023-09-18 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766400#comment-17766400
 ] 

Thomas Graves commented on SPARK-43919:
---

 This is missing description, comments, and link to the pr, I don't understand 
how this can be resolved without any of those.  Doing some searching seems: 
[https://github.com/apache/spark/pull/41425] 

[~hvanhovell]  please make sure proper linkage before resolving.

> Extract JSON functionality out of  Row
> --
>
> Key: SPARK-43919
> URL: https://issues.apache.org/jira/browse/SPARK-43919
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44284) Introduce simpe conf system for sql/api

2023-09-05 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762102#comment-17762102
 ] 

Thomas Graves commented on SPARK-44284:
---

Can we get a description on this? This seems like a fairly significant change 
for a one line without description here or in the pr.

> Introduce simpe conf system for sql/api
> ---
>
> Key: SPARK-44284
> URL: https://issues.apache.org/jira/browse/SPARK-44284
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> Create a simple conf system for classes in sql/api



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44144) Enable `spark.authenticate` by default in K8s environment

2023-08-25 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759061#comment-17759061
 ] 

Thomas Graves commented on SPARK-44144:
---

I'm not necessarily against this but it seems odd to do for one resource 
manager and not others, especially like YARN where the same is true that it 
automatically generates secrets.  Its also inconsistent with what we have done 
in the past with essentially auth off by default.  Does this affect performance 
for instance?

> Enable `spark.authenticate` by default in K8s environment
> -
>
> Key: SPARK-44144
> URL: https://issues.apache.org/jira/browse/SPARK-44144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Spark supports spark.authenticate and spark.authenticate.secret since 
> 1.0.0.
> This issue proposes to set `spark.authenticate=true` simply in K8s 
> environment. There is no other required change because Spark will 
> automatically generate an authentication secret unique to each application 
> for a little improved isolation and security per applications in K8s 
> environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44144) Enable `spark.authenticate` by default in K8s environment

2023-08-24 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758602#comment-17758602
 ] 

Thomas Graves commented on SPARK-44144:
---

Can you add a description on this?  Why do we want this on by default?

> Enable `spark.authenticate` by default in K8s environment
> -
>
> Key: SPARK-44144
> URL: https://issues.apache.org/jira/browse/SPARK-44144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44871) Fix PERCENTILE_DISC behaviour

2023-08-18 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756077#comment-17756077
 ] 

Thomas Graves commented on SPARK-44871:
---

Can you add a description to this please

> Fix PERCENTILE_DISC behaviour
> -
>
> Key: SPARK-44871
> URL: https://issues.apache.org/jira/browse/SPARK-44871
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0, 3.5.0, 4.0.0
>Reporter: Peter Toth
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44134) Can't set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf

2023-06-23 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-44134:
--
Fix Version/s: 3.4.2
   (was: 3.4.1)

> Can't set resources (GPU/FPGA) to 0 when they are set to positive value in 
> spark-defaults.conf
> --
>
> Key: SPARK-44134
> URL: https://issues.apache.org/jira/browse/SPARK-44134
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.3.3, 3.5.0, 3.4.2
>
>
> With resource aware scheduling, if you specify a default value in the 
> spark-defaults.conf, a user can't override that to set it to 0.
> Meaning spark-defaults.conf has something like:
> {{spark.executor.resource.\{resourceName}.amount=1}}
> {{spark.task.resource.\{resourceName}.amount}} =1
> If the user tries to override when submitting an application with 
> {{{}spark.executor.resource.\{resourceName}.amount{}}}=0 and 
> {{spark.task.resource.\{resourceName}.amount}} =0, it gives the user an error:
>  
> {code:java}
> 23/06/21 09:12:57 ERROR Main: Failed to initialize Spark session.
> org.apache.spark.SparkException: No executor resource configs were not 
> specified for the following task configs: gpu
>         at 
> org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:206)
>         at 
> org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:139)
>         at scala.Option.getOrElse(Option.scala:189)
>         at 
> org.apache.spark.resource.ResourceProfile.limitingResource(ResourceProfile.scala:138)
>         at 
> org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:95)
>         at 
> org.apache.spark.resource.ResourceProfileManager.(ResourceProfileManager.scala:49)
>         at org.apache.spark.SparkContext.(SparkContext.scala:455)
>         at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>         at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953){code}
> This used to work, my guess is this may have gotten broken with the stage 
> level scheduling feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44134) Can't set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf

2023-06-22 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-44134:
--
Description: 
With resource aware scheduling, if you specify a default value in the 
spark-defaults.conf, a user can't override that to set it to 0.

Meaning spark-defaults.conf has something like:
{{spark.executor.resource.\{resourceName}.amount=1}}

{{spark.task.resource.\{resourceName}.amount}} =1

If the user tries to override when submitting an application with 
{{{}spark.executor.resource.\{resourceName}.amount{}}}=0 and 
{{spark.task.resource.\{resourceName}.amount}} =0, it gives the user an error:

 
{code:java}
23/06/21 09:12:57 ERROR Main: Failed to initialize Spark session.
org.apache.spark.SparkException: No executor resource configs were not 
specified for the following task configs: gpu
        at 
org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:206)
        at 
org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:139)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.resource.ResourceProfile.limitingResource(ResourceProfile.scala:138)
        at 
org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:95)
        at 
org.apache.spark.resource.ResourceProfileManager.(ResourceProfileManager.scala:49)
        at org.apache.spark.SparkContext.(SparkContext.scala:455)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
        at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953){code}
This used to work, my guess is this may have gotten broken with the stage level 
scheduling feature.

  was:
With resource aware scheduling, if you specify a default value in the 
spark-defaults.conf, a user can't override that to set it to 0.

Meaning spark-defaults.conf has something like:
{{spark.executor.resource.\{resourceName}.amount=1}}

{{spark.task.resource.\{resourceName}.amount}} =1

{{}}

If the user tries to override when submitting an application with 
{{{}spark.executor.resource.\{resourceName}.amount{}}}=0 and 
{{{}{}}}{{{}spark.task.resource.\{resourceName}.amount{}}}{{ =0, it gives the 
user an error:}}

{{}}
{code:java}
23/06/21 09:12:57 ERROR Main: Failed to initialize Spark session.
org.apache.spark.SparkException: No executor resource configs were not 
specified for the following task configs: gpu
        at 
org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:206)
        at 
org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:139)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.resource.ResourceProfile.limitingResource(ResourceProfile.scala:138)
        at 
org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:95)
        at 
org.apache.spark.resource.ResourceProfileManager.(ResourceProfileManager.scala:49)
        at org.apache.spark.SparkContext.(SparkContext.scala:455)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
        at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953){code}
This used to work, my guess is this may have gotten broken with the stage level 
scheduling feature.


> Can't set resources (GPU/FPGA) to 0 when they are set to positive value in 
> spark-defaults.conf
> --
>
> Key: SPARK-44134
> URL: https://issues.apache.org/jira/browse/SPARK-44134
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Priority: Major
>
> With resource aware scheduling, if you specify a default value in the 
> spark-defaults.conf, a user can't override that to set it to 0.
> Meaning spark-defaults.conf has something like:
> {{spark.executor.resource.\{resourceName}.amount=1}}
> {{spark.task.resource.\{resourceName}.amount}} =1
> If the user tries to override when submitting an application with 
> {{{}spark.executor.resource.\{resourceName}.amount{}}}=0 and 
> {{spark.task.resource.\{resourceName}.amount}} =0, it gives the user an error:
>  
> {code:java}
> 23/06/21 09:12:57 ERROR Main: Failed to initialize Spark session.
> org.apache.spark.SparkException: No executor resource configs were not 
> specified for the following task configs: gpu
>         at 
> org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:206)
>         at 
> org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:139)
>         at scala.Option.getOrElse(Opt

[jira] [Created] (SPARK-44134) Can't set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf

2023-06-21 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-44134:
-

 Summary: Can't set resources (GPU/FPGA) to 0 when they are set to 
positive value in spark-defaults.conf
 Key: SPARK-44134
 URL: https://issues.apache.org/jira/browse/SPARK-44134
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Thomas Graves


With resource aware scheduling, if you specify a default value in the 
spark-defaults.conf, a user can't override that to set it to 0.

Meaning spark-defaults.conf has something like:
{{spark.executor.resource.\{resourceName}.amount=1}}

{{spark.task.resource.\{resourceName}.amount}} =1

{{}}

If the user tries to override when submitting an application with 
{{{}spark.executor.resource.\{resourceName}.amount{}}}=0 and 
{{{}{}}}{{{}spark.task.resource.\{resourceName}.amount{}}}{{ =0, it gives the 
user an error:}}

{{}}
{code:java}
23/06/21 09:12:57 ERROR Main: Failed to initialize Spark session.
org.apache.spark.SparkException: No executor resource configs were not 
specified for the following task configs: gpu
        at 
org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:206)
        at 
org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:139)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.resource.ResourceProfile.limitingResource(ResourceProfile.scala:138)
        at 
org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:95)
        at 
org.apache.spark.resource.ResourceProfileManager.(ResourceProfileManager.scala:49)
        at org.apache.spark.SparkContext.(SparkContext.scala:455)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
        at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953){code}
This used to work, my guess is this may have gotten broken with the stage level 
scheduling feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44134) Can't set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf

2023-06-21 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735746#comment-17735746
 ] 

Thomas Graves commented on SPARK-44134:
---

I'm working on a fix for this

> Can't set resources (GPU/FPGA) to 0 when they are set to positive value in 
> spark-defaults.conf
> --
>
> Key: SPARK-44134
> URL: https://issues.apache.org/jira/browse/SPARK-44134
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Priority: Major
>
> With resource aware scheduling, if you specify a default value in the 
> spark-defaults.conf, a user can't override that to set it to 0.
> Meaning spark-defaults.conf has something like:
> {{spark.executor.resource.\{resourceName}.amount=1}}
> {{spark.task.resource.\{resourceName}.amount}} =1
> {{}}
> If the user tries to override when submitting an application with 
> {{{}spark.executor.resource.\{resourceName}.amount{}}}=0 and 
> {{{}{}}}{{{}spark.task.resource.\{resourceName}.amount{}}}{{ =0, it gives the 
> user an error:}}
> {{}}
> {code:java}
> 23/06/21 09:12:57 ERROR Main: Failed to initialize Spark session.
> org.apache.spark.SparkException: No executor resource configs were not 
> specified for the following task configs: gpu
>         at 
> org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:206)
>         at 
> org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:139)
>         at scala.Option.getOrElse(Option.scala:189)
>         at 
> org.apache.spark.resource.ResourceProfile.limitingResource(ResourceProfile.scala:138)
>         at 
> org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:95)
>         at 
> org.apache.spark.resource.ResourceProfileManager.(ResourceProfileManager.scala:49)
>         at org.apache.spark.SparkContext.(SparkContext.scala:455)
>         at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>         at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953){code}
> This used to work, my guess is this may have gotten broken with the stage 
> level scheduling feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43510) Spark application hangs when YarnAllocator adds running executors after processing completed containers

2023-06-06 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-43510.
---
Fix Version/s: 3.4.1
   3.5.0
 Assignee: Manu Zhang
   Resolution: Fixed

> Spark application hangs when YarnAllocator adds running executors after 
> processing completed containers
> ---
>
> Key: SPARK-43510
> URL: https://issues.apache.org/jira/browse/SPARK-43510
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.4.0
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> I see application hangs when containers are preempted immediately after 
> allocation as follows.
> {code:java}
> 23/05/14 09:11:33 INFO YarnAllocator: Launching container 
> container_e3812_1684033797982_57865_01_000382 on host 
> hdc42-mcc10-01-0910-4207-015-tess0028.stratus.rno.ebay.com for executor with 
> ID 277 for ResourceProfile Id 0 
> 23/05/14 09:11:33 WARN YarnAllocator: Cannot find executorId for container: 
> container_e3812_1684033797982_57865_01_000382
> 23/05/14 09:11:33 INFO YarnAllocator: Completed container 
> container_e3812_1684033797982_57865_01_000382 (state: COMPLETE, exit status: 
> -102)
> 23/05/14 09:11:33 INFO YarnAllocator: Container 
> container_e3812_1684033797982_57865_01_000382 was preempted.{code}
> Note the warning log where YarnAllocator cannot find executorId for the 
> container when processing completed containers. The only plausible cause is 
> YarnAllocator added the running executor after processing completed 
> containers. The former happens in a separate thread after executor launch.
> YarnAllocator believes there are still running executors, although they are 
> already lost due to preemption. Hence, the application hangs without any 
> running executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41660) only propagate metadata columns if they are used

2023-05-26 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726740#comment-17726740
 ] 

Thomas Graves commented on SPARK-41660:
---

it looks like this was backported to 3.3. with 
https://github.com/apache/spark/pull/40889

> only propagate metadata columns if they are used
> 
>
> Key: SPARK-41660
> URL: https://issues.apache.org/jira/browse/SPARK-41660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.3, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41660) only propagate metadata columns if they are used

2023-05-26 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-41660:
--
Fix Version/s: 3.3.3

> only propagate metadata columns if they are used
> 
>
> Key: SPARK-41660
> URL: https://issues.apache.org/jira/browse/SPARK-41660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.3, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43340) JsonProtocol is not backward compatible

2023-05-02 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718601#comment-17718601
 ] 

Thomas Graves commented on SPARK-43340:
---

Likely related to SPARK-39489

> JsonProtocol is not backward compatible
> ---
>
> Key: SPARK-43340
> URL: https://issues.apache.org/jira/browse/SPARK-43340
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Ahmed Hussein
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> Recently I was testing with some 3.0.2 eventlogs.
> The SHS-3.4+ does not interpret failed jobs/ failed SQLs correctly.
> Instead it will list them as "Incomplete/Active" whereas it should be listed 
> as "Failed".
> The problem is due to missing fields in eventlogs generated by previous 
> versions. In this case the eventlog does not have "Stack Trace" field which 
> causes a NPE
>  
>  
>  
> {code:java}
> {"Event":"SparkListenerJobEnd","Job ID":31,"Completion 
> Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
> {"Message":"Job aborted"}
> }}
> {code}
>  
>  
> The SHS logfile
>  
>  
> {code:java}
> 23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test 
> to re-build UI...
> 23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: 
> file:/tmp/nds_q86_fail_test
> java.lang.NullPointerException
>     at 
> org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
>     at 
> org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
>     at 
> org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
>     at 
> org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
>     at 
> org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
>     at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
>     at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
>     at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:88)
>     at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:59)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3(FsHistoryProvider.scala:1140)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3$adapted(FsHistoryProvider.scala:1138)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1138)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1136)
>     at scala.collection.immutable.List.foreach(List.scala:431)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1136)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala:1117)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.createInMemoryStore(FsHistoryProvider.scala:1355)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:345)
>     at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:199)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:134)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:55)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:51)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:88)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:100)
>     at 
> o

[jira] [Resolved] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2023-03-20 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-41585.
---
   Fix Version/s: 3.5.0
Target Version/s: 3.5.0
Assignee: Luca Canali
  Resolution: Fixed

> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.3, 3.1.3, 3.2.2, 3.3.1
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 3.5.0
>
>
> The Spark exclude node functionality for Spark on YARN, introduced in 
> SPARK-26688, allows users to specify a list of node names that are excluded 
> from resource allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may set the 
> configurations {{{}spark.dynamicAllocation.enabled{}}}=true, 
> spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, thus 
> making Spark spawning executors only via dynamic allocation.
> This proposes to document this behavior for the current Spark release and 
> also proposes an improvement of this feature by extending the scope of Spark 
> exclude node functionality for YARN beyond dynamic allocation, which I 
> believe makes it more generally useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692408#comment-17692408
 ] 

Thomas Graves commented on SPARK-41793:
---

[~ulysses] [~cloud_fan] [~xinrong] 

We need to decide what we are doing with this for 3.4 before doing any release.

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2023-02-13 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688016#comment-17688016
 ] 

Thomas Graves commented on SPARK-39375:
---

So regarding UDFs, its not clear to me how that is currently being implemented? 
  If I have a python UDF does it require the server to be started like pyspark 
so the python process is already present?  Or is it starting python on the 
side.  It would be nice to have a design as [~xkrogen]  mentioned.

> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime environment.
>  
> Spark Connect will strengthen Spark’s position as the modern unified engine 
> for large-scale data analytics and expand applicability to use cases and 
> developers we could not reach with the current setup: Spark will become 
> ubiquitously usable as the DataFrame API can be used with (almost) any 
> programming language.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42374) User-facing documentaiton

2023-02-13 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687978#comment-17687978
 ] 

Thomas Graves commented on SPARK-42374:
---

Just a note that we should make sure to document that there is no built in 
authentication with this, unless that has changed since Design

> User-facing documentaiton
> -
>
> Key: SPARK-42374
> URL: https://issues.apache.org/jira/browse/SPARK-42374
> Project: Spark
>  Issue Type: Documentation
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Haejoon Lee
>Priority: Major
>
> Should provide the user-facing documentation so end users how to use Spark 
> Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-01-19 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-41793:
--
Labels: correctness  (was: )

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Major
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-01-19 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-41793:
--
Priority: Blocker  (was: Major)

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-01-19 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678744#comment-17678744
 ] 

Thomas Graves commented on SPARK-41793:
---

this sounds like a correctness issue - [~cloud_fan] [~ulyssesyou] am I missing 
something here?

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Major
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39601) AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-12-13 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-39601.
---
Fix Version/s: 3.4.0
 Assignee: Cheng Pan
   Resolution: Fixed

> AllocationFailure should not be treated as exitCausedByApp when driver is 
> shutting down
> ---
>
> Key: SPARK-39601
> URL: https://issues.apache.org/jira/browse/SPARK-39601
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.4.0
>
>
> I observed some Spark Applications successfully completed all jobs but failed 
> during the shutting down phase w/ reason: Max number of executor failures 
> (16) reached, the timeline is
> Driver - Job success, Spark starts shutting down procedure.
> {code:java}
> 2022-06-23 19:50:55 CST AbstractConnector INFO - Stopped 
> Spark@74e9431b{HTTP/1.1, (http/1.1)}
> {0.0.0.0:0}
> 2022-06-23 19:50:55 CST SparkUI INFO - Stopped Spark web UI at 
> http://hadoop2627.xxx.org:28446
> 2022-06-23 19:50:55 CST YarnClusterSchedulerBackend INFO - Shutting down all 
> executors
> {code}
> Driver - A container allocate successful during shutting down phase.
> {code:java}
> 2022-06-23 19:52:21 CST YarnAllocator INFO - Launching container 
> container_e94_1649986670278_7743380_02_25 on host hadoop4388.xxx.org for 
> executor with ID 24 for ResourceProfile Id 0{code}
> Executor - The executor can not connect to driver endpoint because driver 
> already stopped the endpoint.
> {code:java}
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
>   at 
> org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81)
>   at 
> org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:413)
>   at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
>   at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>   at scala.collection.immutable.Range.foreach(Range.scala:158)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:411)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>   ... 4 more
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find 
> endpoint: spark://coarsegrainedschedu...@hadoop2627.xxx.org:21956
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1(NettyRpcEnv.scala:148)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1$adapted(NettyRpcEnv.scala:144)
>   at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
>   at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>   at org.apache.spark.util.ThreadUtils$$anon$1.execute(ThreadUtils.scala:99)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$4.execute(ExecutionContextImpl.scala:138)
>   at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288){code}
> Driver - YarnAllocator received container launch error message and treat it 
> as `exitCausedByApp`
> {code:java}
> 2022-06-23 19:52:27 CST YarnAllocator

[jira] [Updated] (SPARK-40524) local mode with resource scheduling can hang

2022-09-21 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-40524:
--
Summary: local mode with resource scheduling can hang  (was: local mode 
with resource scheduling should just fail)

> local mode with resource scheduling can hang
> 
>
> Key: SPARK-40524
> URL: https://issues.apache.org/jira/browse/SPARK-40524
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Priority: Major
>
> If you try to run spark in local mode and request custom resources like 
> GPU's, Spark will hang.  Resource scheduling isn't supported in local mode so 
> just removing the request for resources fixes the issue, but its really 
> confusing to users since it just hangs.
>  
> ie to reproduce:
> spark-sql --conf spark.executor.resource.gpu.amount=1 --conf 
> spark.task.resource.gpu.amount=1
> Run:
> select 1
> result: hangs
> To fix run:
> spark-sql 
>  
> spark-sql> select 1;
> 1
> Time taken: 2.853 seconds, Fetched 1 row(s)
>  
> It would be nice if we just fail to start or threw an exception when using 
> those options in local mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40524) local mode with resource scheduling should just fail

2022-09-21 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-40524:
-

 Summary: local mode with resource scheduling should just fail
 Key: SPARK-40524
 URL: https://issues.apache.org/jira/browse/SPARK-40524
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: Thomas Graves


If you try to run spark in local mode and request custom resources like GPU's, 
Spark will hang.  Resource scheduling isn't supported in local mode so just 
removing the request for resources fixes the issue, but its really confusing to 
users since it just hangs.

 

ie to reproduce:

spark-sql --conf spark.executor.resource.gpu.amount=1 --conf 
spark.task.resource.gpu.amount=1

Run:

select 1

result: hangs

To fix run:

spark-sql 

 

spark-sql> select 1;
1
Time taken: 2.853 seconds, Fetched 1 row(s)

 

It would be nice if we just fail to start or threw an exception when using 
those options in local mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40490) `YarnShuffleIntegrationSuite` no longer verifies `registeredExecFile` reload after SPARK-17321

2022-09-21 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-40490.
---
Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

> `YarnShuffleIntegrationSuite` no longer verifies `registeredExecFile`  reload 
> after  SPARK-17321
> 
>
> Key: SPARK-40490
> URL: https://issues.apache.org/jira/browse/SPARK-40490
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> After SPARK-17321, YarnShuffleService will persist data to local shuffle 
> state db and reload data from  local shuffle state db only when Yarn 
> NodeManager  start with `YarnConfiguration#NM_RECOVERY_ENABLED = true` , but 
> `YarnShuffleIntegrationSuite` not set this config and the default value of 
> the configuration is false,  so `YarnShuffleIntegrationSuite` will neither 
> trigger data persistence to the db nor verify the reload of data
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40280) Failure to create parquet predicate push down for ints and longs on some valid files

2022-09-08 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-40280.
---
Fix Version/s: 3.4.0
   3.3.1
   3.2.3
 Assignee: Robert Joseph Evans
   Resolution: Fixed

> Failure to create parquet predicate push down for ints and longs on some 
> valid files
> 
>
> Key: SPARK-40280
> URL: https://issues.apache.org/jira/browse/SPARK-40280
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
>
> The [parquet 
> format|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#signed-integers]
>  specification states that...
> bq. {{{}INT(8, true){}}}, {{{}INT(16, true){}}}, and {{INT(32, true)}} must 
> annotate an {{int32}} primitive type and {{INT(64, true)}} must annotate an 
> {{int64}} primitive type. {{INT(32, true)}} and {{INT(64, true)}} are implied 
> by the {{int32}} and {{int64}} primitive types if no other annotation is 
> present and should be considered optional.
> But the code inside of 
> [ParquetFilters.scala|https://github.com/apache/spark/blob/296fe49ec855ac8c15c080e7bab6d519fe504bd3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L125-L126]
>  requires that for {{int32}} and {{int64}} that there be no annotation. If 
> there is an annotation for those columns and they are a part of a predicate 
> push down, the hard coded types will not match and the corresponding filter 
> ends up being {{None}}.
> This can be a huge performance penalty for a valid parquet file.
> I am happy to provide files that show the issue if needed for testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`

2022-08-15 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579738#comment-17579738
 ] 

Thomas Graves commented on SPARK-3:
---

Just curious does rocksdb give us some particular benefit - performance or 
compatibility?  Is leveldb not support on apple silicon?  Just curious and 
would be good to record why we add support.

> Add `RocksDBProvider` similar to `LevelDBProvider`
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and 
> `YarnShuffleService`, a corresponding `RocksDB` implementation should be added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-08-10 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38910:
--
Fix Version/s: 3.4.0

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Priority: Minor
> Fix For: 3.4.0
>
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-08-10 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-38910.
---
Resolution: Fixed

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.4.0
>
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-08-10 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-38910:
-

Assignee: angerszhu

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.4.0
>
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39976) NULL check in ArrayIntersect adds extraneous null from first param

2022-08-08 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39976:
--
Labels: correctness  (was: )

> NULL check in ArrayIntersect adds extraneous null from first param
> --
>
> Key: SPARK-39976
> URL: https://issues.apache.org/jira/browse/SPARK-39976
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Navin Kumar
>Priority: Major
>  Labels: correctness
>
> This is very likely a regression from SPARK-36829.
> When using {{array_intersect(a, b)}}, if the first parameter contains a 
> {{NULL}} value and the second one does not, an extraneous {{NULL}} is present 
> in the output. This also leads to {{array_intersect(a, b) != 
> array_intersect(b, a)}} which is incorrect as set intersection should be 
> commutative.
> Example using PySpark:
> {code:python}
> >>> a = [1, 2, 3]
> >>> b = [3, None, 5]
> >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
> >>> df.show()
> +-++
> |a|   b|
> +-++
> |[1, 2, 3]|[3, null, 5]|
> +-++
> >>> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |  [3]|
> +-+
> >>> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |[3, null]|
> +-+
> {code}
> Note that in the first case, {{a}} does not contain a {{NULL}}, and the final 
> output is correct: {{[3]}}. In the second case, since {{b}} does contain 
> {{NULL}} and is now the first parameter.
> The same behavior occurs in Scala when writing to Parquet:
> {code:scala}
> scala> val a = Array[java.lang.Integer](1, 2, null, 4)
> a: Array[Integer] = Array(1, 2, null, 4)
> scala> val b = Array[java.lang.Integer](4, 5, 6, 7)
> b: Array[Integer] = Array(4, 5, 6, 7)
> scala> val df = Seq((a, b)).toDF("a","b")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.write.parquet("/tmp/simple.parquet")
> scala> val df = spark.read.parquet("/tmp/simple.parquet")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.show()
> +---++
> |  a|   b|
> +---++
> |[1, 2, null, 4]|[4, 5, 6, 7]|
> +---++
> scala> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |[null, 4]|
> +-+
> scala> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |  [4]|
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39976) NULL check in ArrayIntersect adds extraneous null from first param

2022-08-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39976:
--
Labels:   (was: corr)

> NULL check in ArrayIntersect adds extraneous null from first param
> --
>
> Key: SPARK-39976
> URL: https://issues.apache.org/jira/browse/SPARK-39976
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Navin Kumar
>Priority: Blocker
>
> This is very likely a regression from SPARK-36829.
> When using {{array_intersect(a, b)}}, if the first parameter contains a 
> {{NULL}} value and the second one does not, an extraneous {{NULL}} is present 
> in the output. This also leads to {{array_intersect(a, b) != 
> array_intersect(b, a)}} which is incorrect as set intersection should be 
> commutative.
> Example using PySpark:
> {code:python}
> >>> a = [1, 2, 3]
> >>> b = [3, None, 5]
> >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
> >>> df.show()
> +-++
> |a|   b|
> +-++
> |[1, 2, 3]|[3, null, 5]|
> +-++
> >>> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |  [3]|
> +-+
> >>> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |[3, null]|
> +-+
> {code}
> Note that in the first case, {{a}} does not contain a {{NULL}}, and the final 
> output is correct: {{[3]}}. In the second case, since {{b}} does contain 
> {{NULL}} and is now the first parameter.
> The same behavior occurs in Scala when writing to Parquet:
> {code:scala}
> scala> val a = Array[java.lang.Integer](1, 2, null, 4)
> a: Array[Integer] = Array(1, 2, null, 4)
> scala> val b = Array[java.lang.Integer](4, 5, 6, 7)
> b: Array[Integer] = Array(4, 5, 6, 7)
> scala> val df = Seq((a, b)).toDF("a","b")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.write.parquet("/tmp/simple.parquet")
> scala> val df = spark.read.parquet("/tmp/simple.parquet")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.show()
> +---++
> |  a|   b|
> +---++
> |[1, 2, null, 4]|[4, 5, 6, 7]|
> +---++
> scala> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |[null, 4]|
> +-+
> scala> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |  [4]|
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39976) NULL check in ArrayIntersect adds extraneous null from first param

2022-08-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39976:
--
Priority: Blocker  (was: Major)

> NULL check in ArrayIntersect adds extraneous null from first param
> --
>
> Key: SPARK-39976
> URL: https://issues.apache.org/jira/browse/SPARK-39976
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Navin Kumar
>Priority: Blocker
>
> This is very likely a regression from SPARK-36829.
> When using {{array_intersect(a, b)}}, if the first parameter contains a 
> {{NULL}} value and the second one does not, an extraneous {{NULL}} is present 
> in the output. This also leads to {{array_intersect(a, b) != 
> array_intersect(b, a)}} which is incorrect as set intersection should be 
> commutative.
> Example using PySpark:
> {code:python}
> >>> a = [1, 2, 3]
> >>> b = [3, None, 5]
> >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
> >>> df.show()
> +-++
> |a|   b|
> +-++
> |[1, 2, 3]|[3, null, 5]|
> +-++
> >>> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |  [3]|
> +-+
> >>> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |[3, null]|
> +-+
> {code}
> Note that in the first case, {{a}} does not contain a {{NULL}}, and the final 
> output is correct: {{[3]}}. In the second case, since {{b}} does contain 
> {{NULL}} and is now the first parameter.
> The same behavior occurs in Scala when writing to Parquet:
> {code:scala}
> scala> val a = Array[java.lang.Integer](1, 2, null, 4)
> a: Array[Integer] = Array(1, 2, null, 4)
> scala> val b = Array[java.lang.Integer](4, 5, 6, 7)
> b: Array[Integer] = Array(4, 5, 6, 7)
> scala> val df = Seq((a, b)).toDF("a","b")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.write.parquet("/tmp/simple.parquet")
> scala> val df = spark.read.parquet("/tmp/simple.parquet")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.show()
> +---++
> |  a|   b|
> +---++
> |[1, 2, null, 4]|[4, 5, 6, 7]|
> +---++
> scala> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |[null, 4]|
> +-+
> scala> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |  [4]|
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39976) NULL check in ArrayIntersect adds extraneous null from first param

2022-08-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39976:
--
Labels: corr  (was: )

> NULL check in ArrayIntersect adds extraneous null from first param
> --
>
> Key: SPARK-39976
> URL: https://issues.apache.org/jira/browse/SPARK-39976
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Navin Kumar
>Priority: Blocker
>  Labels: corr
>
> This is very likely a regression from SPARK-36829.
> When using {{array_intersect(a, b)}}, if the first parameter contains a 
> {{NULL}} value and the second one does not, an extraneous {{NULL}} is present 
> in the output. This also leads to {{array_intersect(a, b) != 
> array_intersect(b, a)}} which is incorrect as set intersection should be 
> commutative.
> Example using PySpark:
> {code:python}
> >>> a = [1, 2, 3]
> >>> b = [3, None, 5]
> >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
> >>> df.show()
> +-++
> |a|   b|
> +-++
> |[1, 2, 3]|[3, null, 5]|
> +-++
> >>> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |  [3]|
> +-+
> >>> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |[3, null]|
> +-+
> {code}
> Note that in the first case, {{a}} does not contain a {{NULL}}, and the final 
> output is correct: {{[3]}}. In the second case, since {{b}} does contain 
> {{NULL}} and is now the first parameter.
> The same behavior occurs in Scala when writing to Parquet:
> {code:scala}
> scala> val a = Array[java.lang.Integer](1, 2, null, 4)
> a: Array[Integer] = Array(1, 2, null, 4)
> scala> val b = Array[java.lang.Integer](4, 5, 6, 7)
> b: Array[Integer] = Array(4, 5, 6, 7)
> scala> val df = Seq((a, b)).toDF("a","b")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.write.parquet("/tmp/simple.parquet")
> scala> val df = spark.read.parquet("/tmp/simple.parquet")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.show()
> +---++
> |  a|   b|
> +---++
> |[1, 2, null, 4]|[4, 5, 6, 7]|
> +---++
> scala> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |[null, 4]|
> +-+
> scala> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |  [4]|
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39976) NULL check in ArrayIntersect adds extraneous null from first param

2022-08-04 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575398#comment-17575398
 ] 

Thomas Graves commented on SPARK-39976:
---

[~cloud_fan]  [~angerszhuuu]  who worked on original issue. This sounds like 
correctness to me so we should add label if so.

> NULL check in ArrayIntersect adds extraneous null from first param
> --
>
> Key: SPARK-39976
> URL: https://issues.apache.org/jira/browse/SPARK-39976
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Navin Kumar
>Priority: Major
>
> This is very likely a regression from SPARK-36829.
> When using {{array_intersect(a, b)}}, if the first parameter contains a 
> {{NULL}} value and the second one does not, an extraneous {{NULL}} is present 
> in the output. This also leads to {{array_intersect(a, b) != 
> array_intersect(b, a)}} which is incorrect as set intersection should be 
> commutative.
> Example using PySpark:
> {code:python}
> >>> a = [1, 2, 3]
> >>> b = [3, None, 5]
> >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
> >>> df.show()
> +-++
> |a|   b|
> +-++
> |[1, 2, 3]|[3, null, 5]|
> +-++
> >>> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |  [3]|
> +-+
> >>> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |[3, null]|
> +-+
> {code}
> Note that in the first case, {{a}} does not contain a {{NULL}}, and the final 
> output is correct: {{[3]}}. In the second case, since {{b}} does contain 
> {{NULL}} and is now the first parameter.
> The same behavior occurs in Scala when writing to Parquet:
> {code:scala}
> scala> val a = Array[java.lang.Integer](1, 2, null, 4)
> a: Array[Integer] = Array(1, 2, null, 4)
> scala> val b = Array[java.lang.Integer](4, 5, 6, 7)
> b: Array[Integer] = Array(4, 5, 6, 7)
> scala> val df = Seq((a, b)).toDF("a","b")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.write.parquet("/tmp/simple.parquet")
> scala> val df = spark.read.parquet("/tmp/simple.parquet")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.show()
> +---++
> |  a|   b|
> +---++
> |[1, 2, null, 4]|[4, 5, 6, 7]|
> +---++
> scala> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |[null, 4]|
> +-+
> scala> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |  [4]|
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-39491) Hadoop 2.7 build fails due to org.apache.hadoop.yarn.api.records.NodeState.DECOMMISSIONING

2022-06-16 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-39491:
-

 Summary: Hadoop 2.7 build fails due to 
org.apache.hadoop.yarn.api.records.NodeState.DECOMMISSIONING
 Key: SPARK-39491
 URL: https://issues.apache.org/jira/browse/SPARK-39491
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Thomas Graves


trying to build with the hadoop-2 profile, which uses hadoop 2.7 version fails 
due to:

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:454:
 value DECOMMISSIONING is not a member of object 
org.apache.hadoop.yarn.api.records.NodeState

 

DECOMMISSIONING was only added in hadoop 2.8.

This was added in https://issues.apache.org/jira/browse/SPARK-30835



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39107) Silent change in regexp_replace's handling of empty strings

2022-06-16 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555111#comment-17555111
 ] 

Thomas Graves commented on SPARK-39107:
---

[~srowen]   I think this actually went into 3.1.4,  not 3.1.3, could you 
confirm before I update Fixed versions? 

> Silent change in regexp_replace's handling of empty strings
> ---
>
> Key: SPARK-39107
> URL: https://issues.apache.org/jira/browse/SPARK-39107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Willi Raschkowski
>Assignee: Lorenzo Martini
>Priority: Major
>  Labels: correctness, release-notes
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>
> Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change 
> that a) seems incorrect, and b) is undocumented in the [migration 
> guide|https://spark.apache.org/docs/latest/sql-migration-guide.html]:
> {code:title=3.0.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", 
> "")).show
> +---++
> |col|replaced|
> +---++
> |   | |
> +---++
> {code}
> {code:title=3.1.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", 
> "")).show
> +---++
> |col|replaced|
> +---++
> |   ||
> +---++
> {code}
> Note, the regular expression {{^$}} should match the empty string, but 
> doesn't in version 3.1. E.g. this is the Java behavior:
> {code}
> scala> "".replaceAll("^$", "");
> res1: String = 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39434) Provide runtime error query context when array index is out of bound

2022-06-13 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39434:
--
Fix Version/s: 3.4.0

> Provide runtime error query context when array index is out of bound
> 
>
> Key: SPARK-39434
> URL: https://issues.apache.org/jira/browse/SPARK-39434
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39363) fix spark.kubernetes.memoryOverheadFactor deprecation warning

2022-06-02 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545479#comment-17545479
 ] 

Thomas Graves commented on SPARK-39363:
---

[~Kimahriman] 

> fix spark.kubernetes.memoryOverheadFactor deprecation warning
> -
>
> Key: SPARK-39363
> URL: https://issues.apache.org/jira/browse/SPARK-39363
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Thomas Graves
>Priority: Major
>
> see [https://github.com/apache/spark/pull/36744] for details.
>  
> It looks like we deprecated {{spark.kubernetes.memoryOverheadFactor, but it 
> has a default value which leads it to printing warnings all the time.}}
> {{}}
> {code:java}
> 22/06/01 23:53:49 WARN SparkConf: The configuration key 
> 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 
> and may be removed in the future. Please use 
> spark.driver.memoryOverheadFactor and 
> spark.executor.memoryOverheadFactor{code}
> {{}}
> {{We should remove the default value if possible. It should only be used as 
> fallback but we should be able to use the default from }}
> spark.driver.memoryOverheadFactor.
> {{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39363) fix spark.kubernetes.memoryOverheadFactor deprecation warning

2022-06-02 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-39363:
--
Description: 
see [https://github.com/apache/spark/pull/36744] for details.

 

It looks like we deprecated {{spark.kubernetes.memoryOverheadFactor, but it has 
a default value which leads it to printing warnings all the time.}}

{{}}
{code:java}
22/06/01 23:53:49 WARN SparkConf: The configuration key 
'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 
and may be removed in the future. Please use spark.driver.memoryOverheadFactor 
and spark.executor.memoryOverheadFactor{code}
{{}}

{{We should remove the default value if possible. It should only be used as 
fallback but we should be able to use the default from 
}}spark.driver.memoryOverheadFactor.

 

  was:
see [https://github.com/apache/spark/pull/36744] for details.

 

It looks like we deprecated {{spark.kubernetes.memoryOverheadFactor, but it has 
a default value which leads it to printing warnings all the time.}}

{{}}
{code:java}
22/06/01 23:53:49 WARN SparkConf: The configuration key 
'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 
and may be removed in the future. Please use spark.driver.memoryOverheadFactor 
and spark.executor.memoryOverheadFactor{code}
{{}}

{{We should remove the default value if possible. It should only be used as 
fallback but we should be able to use the default from }}

spark.driver.memoryOverheadFactor.

{{{}{}}}{{{}{}}}


> fix spark.kubernetes.memoryOverheadFactor deprecation warning
> -
>
> Key: SPARK-39363
> URL: https://issues.apache.org/jira/browse/SPARK-39363
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Thomas Graves
>Priority: Major
>
> see [https://github.com/apache/spark/pull/36744] for details.
>  
> It looks like we deprecated {{spark.kubernetes.memoryOverheadFactor, but it 
> has a default value which leads it to printing warnings all the time.}}
> {{}}
> {code:java}
> 22/06/01 23:53:49 WARN SparkConf: The configuration key 
> 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 
> and may be removed in the future. Please use 
> spark.driver.memoryOverheadFactor and 
> spark.executor.memoryOverheadFactor{code}
> {{}}
> {{We should remove the default value if possible. It should only be used as 
> fallback but we should be able to use the default from 
> }}spark.driver.memoryOverheadFactor.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-39363) fix spark.kubernetes.memoryOverheadFactor deprecation warning

2022-06-02 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-39363:
-

 Summary: fix spark.kubernetes.memoryOverheadFactor deprecation 
warning
 Key: SPARK-39363
 URL: https://issues.apache.org/jira/browse/SPARK-39363
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Thomas Graves


see [https://github.com/apache/spark/pull/36744] for details.

 

It looks like we deprecated {{spark.kubernetes.memoryOverheadFactor, but it has 
a default value which leads it to printing warnings all the time.}}

{{}}
{code:java}
22/06/01 23:53:49 WARN SparkConf: The configuration key 
'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 
and may be removed in the future. Please use spark.driver.memoryOverheadFactor 
and spark.executor.memoryOverheadFactor{code}
{{}}

{{We should remove the default value if possible. It should only be used as 
fallback but we should be able to use the default from }}

spark.driver.memoryOverheadFactor.

{{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38955) from_csv can corrupt surrounding lines if a lineSep is in the data

2022-04-20 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38955:
--
Labels:   (was: corr)

> from_csv can corrupt surrounding lines if a lineSep is in the data
> --
>
> Key: SPARK-38955
> URL: https://issues.apache.org/jira/browse/SPARK-38955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Priority: Blocker
>
> I don't know how critical this is. I was doing some general testing to 
> understand {{from_csv}} and found that if I happen to have a {{lineSep}} in 
> the input data and I noticed that the next row appears to be corrupted. 
> {{multiLine}} does not appear to fix it. Because this is data corruption I am 
> inclined to mark this as CRITICAL or BLOCKER, but it is an odd corner case so 
> I m not going to set it myself.
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]())).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,:2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |1,:2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|   {1, \n2}|
> |6,7,8,9,10| {6, 7}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38955) from_csv can corrupt surrounding lines if a lineSep is in the data

2022-04-20 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38955:
--
Priority: Blocker  (was: Major)

> from_csv can corrupt surrounding lines if a lineSep is in the data
> --
>
> Key: SPARK-38955
> URL: https://issues.apache.org/jira/browse/SPARK-38955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Priority: Blocker
>
> I don't know how critical this is. I was doing some general testing to 
> understand {{from_csv}} and found that if I happen to have a {{lineSep}} in 
> the input data and I noticed that the next row appears to be corrupted. 
> {{multiLine}} does not appear to fix it. Because this is data corruption I am 
> inclined to mark this as CRITICAL or BLOCKER, but it is an odd corner case so 
> I m not going to set it myself.
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]())).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,:2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |1,:2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|   {1, \n2}|
> |6,7,8,9,10| {6, 7}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38955) from_csv can corrupt surrounding lines if a lineSep is in the data

2022-04-20 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38955:
--
Labels: corr  (was: )

> from_csv can corrupt surrounding lines if a lineSep is in the data
> --
>
> Key: SPARK-38955
> URL: https://issues.apache.org/jira/browse/SPARK-38955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Priority: Blocker
>  Labels: corr
>
> I don't know how critical this is. I was doing some general testing to 
> understand {{from_csv}} and found that if I happen to have a {{lineSep}} in 
> the input data and I noticed that the next row appears to be corrupted. 
> {{multiLine}} does not appear to fix it. Because this is data corruption I am 
> inclined to mark this as CRITICAL or BLOCKER, but it is an odd corner case so 
> I m not going to set it myself.
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]())).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,:2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |1,:2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|   {1, \n2}|
> |6,7,8,9,10| {6, 7}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38955) from_csv can corrupt surrounding lines if a lineSep is in the data

2022-04-20 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524985#comment-17524985
 ] 

Thomas Graves commented on SPARK-38955:
---

the from_csv docs point to the data source options which contain the lineSep so 
it seems like we should update documentation and then like you said don't 
permit it to be specified. since its a corruption seems bad, so marking as 
blocker to atleast get more visibility and input.

> from_csv can corrupt surrounding lines if a lineSep is in the data
> --
>
> Key: SPARK-38955
> URL: https://issues.apache.org/jira/browse/SPARK-38955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Priority: Major
>
> I don't know how critical this is. I was doing some general testing to 
> understand {{from_csv}} and found that if I happen to have a {{lineSep}} in 
> the input data and I noticed that the next row appears to be corrupted. 
> {{multiLine}} does not appear to fix it. Because this is data corruption I am 
> inclined to mark this as CRITICAL or BLOCKER, but it is an odd corner case so 
> I m not going to set it myself.
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]())).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,:2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |1,:2,3,4,5|  {1, null}|
> |6,7,8,9,10|  {null, 8}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}
> {code}
> Seq[String]("1,\n2,3,4,5","6,7,8,9,10", "11,12,13,14,15", 
> null).toDF.select(col("value"), from_csv(col("value"), 
> StructType(Seq(StructField("a", LongType), StructField("b", StringType))), 
> Map[String,String]("lineSep" -> ":"))).show()
> +--+---+
> | value|from_csv(value)|
> +--+---+
> |   1,\n2,3,4,5|   {1, \n2}|
> |6,7,8,9,10| {6, 7}|
> |11,12,13,14,15|   {11, 12}|
> |  null|   null|
> +--+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38677) pyspark hangs in local mode running rdd map operation

2022-03-28 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38677:
--
Description: 
In spark 3.2.1 (spark 3.2.0 doesn't show this issue), pyspark will hang when 
running and RDD map operations and converting to a dataframe.  Code is below to 
reproduce.  

Env:
spark 3.2.1 local mode, just run {{./bin/pyspark --driver-memory G 
--driver-cores }}

{{download dataset from here 
[https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip]}}
{{just 20 rows could reproduce the issue }}{{head -n 20 
mortgage_eval_merged.csv > mortgage_eval_merged-small.csv}}{{{} but if the 
input dataset is small, such 5 rows, it works well.{}}}}}{}}}run codes 
below:
{code:java}
path = "//mortgage_eval_merged-small.csv"
src_data = sc.textFile(path).map(lambda x:x.split(","))
column_list 
=['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12','c13','c14','c15','c16','c17','c18','c19','c20','c21','c22','c23','c24','c25','c26','c27','c28']
df = spark.createDataFrame(src_data,column_list)
print(df.show(1)){code}

  was:
In spark 3.2.1 (spark 3.2.0 doesn't show this issue), pyspark will hang when 
running and RDD map operations and converting to a dataframe.  Code is below to 
reproduce.  

Env:
spark 3.2.1 local mode, just run {{./bin/pyspark --driver-memory G 
--driver-cores }}

{{download dataset from here 
[https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip]}}
{{just 20 rows could reproduce the issue }}{{head -n 20 
mortgage_eval_merged.csv > mortgage_eval_merged-small.csv}}{{{} but if the 
input dataset is small, such 5 rows, it works well.{}}}{{{}{}}}run codes 
below:
{code:java}
path = "//mortgage_eval_merged-small.csv" src_data = 
sc.textFile(path).map(lambda x:x.split(",")) column_list = 
['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12','c13','c14','c15','c16','c17','c18','c19','c20','c21','c22','c23','c24','c25','c26','c27','c28']
 df = spark.createDataFrame(src_data,column_list) print(df.show(1)){code}


> pyspark hangs in local mode running rdd map operation
> -
>
> Key: SPARK-38677
> URL: https://issues.apache.org/jira/browse/SPARK-38677
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Thomas Graves
>Priority: Blocker
>
> In spark 3.2.1 (spark 3.2.0 doesn't show this issue), pyspark will hang when 
> running and RDD map operations and converting to a dataframe.  Code is below 
> to reproduce.  
> Env:
> spark 3.2.1 local mode, just run {{./bin/pyspark --driver-memory G 
> --driver-cores }}
> {{download dataset from here 
> [https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip]}}
> {{just 20 rows could reproduce the issue }}{{head -n 20 
> mortgage_eval_merged.csv > mortgage_eval_merged-small.csv}}{{{} but if the 
> input dataset is small, such 5 rows, it works well.{}}}}}{}}}run 
> codes below:
> {code:java}
> path = "//mortgage_eval_merged-small.csv"
> src_data = sc.textFile(path).map(lambda x:x.split(","))
> column_list 
> =['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12','c13','c14','c15','c16','c17','c18','c19','c20','c21','c22','c23','c24','c25','c26','c27','c28']
> df = spark.createDataFrame(src_data,column_list)
> print(df.show(1)){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38677) pyspark hangs in local mode running rdd map operation

2022-03-28 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513536#comment-17513536
 ] 

Thomas Graves commented on SPARK-38677:
---

Note, if you kill the python.daemon process while its hung, it will return to 
pyspark console and have the right results.

I looked through commits in 3.2.1 and it appears that this was introduced by 
https://issues.apache.org/jira/browse/SPARK-33277

Specifically commit 
[https://github.com/apache/spark/commit/243c321db2f02f6b4d926114bd37a6e74c2be185]
 

At least I revert that commit and rebuilt and it then works.  Also this did not 
reproduce in standalone mode so it might just be a local mode issue.

 

[~ueshin] [~ankurdave] [~hyukjin.kwon] 

> pyspark hangs in local mode running rdd map operation
> -
>
> Key: SPARK-38677
> URL: https://issues.apache.org/jira/browse/SPARK-38677
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Thomas Graves
>Priority: Blocker
>
> In spark 3.2.1 (spark 3.2.0 doesn't show this issue), pyspark will hang when 
> running and RDD map operations and converting to a dataframe.  Code is below 
> to reproduce.  
> Env:
> spark 3.2.1 local mode, just run {{./bin/pyspark --driver-memory G 
> --driver-cores }}
> {{download dataset from here 
> [https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip]}}
> {{just 20 rows could reproduce the issue }}{{head -n 20 
> mortgage_eval_merged.csv > mortgage_eval_merged-small.csv}}{{{} but if the 
> input dataset is small, such 5 rows, it works well.{}}}{{{}{}}}run codes 
> below:
> {code:java}
> path = "//mortgage_eval_merged-small.csv" src_data = 
> sc.textFile(path).map(lambda x:x.split(",")) column_list = 
> ['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12','c13','c14','c15','c16','c17','c18','c19','c20','c21','c22','c23','c24','c25','c26','c27','c28']
>  df = spark.createDataFrame(src_data,column_list) print(df.show(1)){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38677) pyspark hangs in local mode running rdd map operation

2022-03-28 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38677:
--
Affects Version/s: 3.3.0

> pyspark hangs in local mode running rdd map operation
> -
>
> Key: SPARK-38677
> URL: https://issues.apache.org/jira/browse/SPARK-38677
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Thomas Graves
>Priority: Blocker
>
> In spark 3.2.1 (spark 3.2.0 doesn't show this issue), pyspark will hang when 
> running and RDD map operations and converting to a dataframe.  Code is below 
> to reproduce.  
> Env:
> spark 3.2.1 local mode, just run {{./bin/pyspark --driver-memory G 
> --driver-cores }}
> {{download dataset from here 
> [https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip]}}
> {{just 20 rows could reproduce the issue }}{{head -n 20 
> mortgage_eval_merged.csv > mortgage_eval_merged-small.csv}}{{{} but if the 
> input dataset is small, such 5 rows, it works well.{}}}{{{}{}}}run codes 
> below:
> {code:java}
> path = "//mortgage_eval_merged-small.csv" src_data = 
> sc.textFile(path).map(lambda x:x.split(",")) column_list = 
> ['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12','c13','c14','c15','c16','c17','c18','c19','c20','c21','c22','c23','c24','c25','c26','c27','c28']
>  df = spark.createDataFrame(src_data,column_list) print(df.show(1)){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38677) pyspark hangs in local mode running rdd map operation

2022-03-28 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-38677:
-

 Summary: pyspark hangs in local mode running rdd map operation
 Key: SPARK-38677
 URL: https://issues.apache.org/jira/browse/SPARK-38677
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.2.1
Reporter: Thomas Graves


In spark 3.2.1 (spark 3.2.0 doesn't show this issue), pyspark will hang when 
running and RDD map operations and converting to a dataframe.  Code is below to 
reproduce.  

Env:
spark 3.2.1 local mode, just run {{./bin/pyspark --driver-memory G 
--driver-cores }}

{{download dataset from here 
[https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip]}}
{{just 20 rows could reproduce the issue }}{{head -n 20 
mortgage_eval_merged.csv > mortgage_eval_merged-small.csv}}{{{} but if the 
input dataset is small, such 5 rows, it works well.{}}}{{{}{}}}run codes 
below:
{code:java}
path = "//mortgage_eval_merged-small.csv" src_data = 
sc.textFile(path).map(lambda x:x.split(",")) column_list = 
['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12','c13','c14','c15','c16','c17','c18','c19','c20','c21','c22','c23','c24','c25','c26','c27','c28']
 df = spark.createDataFrame(src_data,column_list) print(df.show(1)){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37618) Support cleaning up shuffle blocks from external shuffle service

2022-03-25 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-37618.
---
Fix Version/s: 3.3.0
 Assignee: Adam Binford
   Resolution: Fixed

> Support cleaning up shuffle blocks from external shuffle service
> 
>
> Key: SPARK-37618
> URL: https://issues.apache.org/jira/browse/SPARK-37618
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently shuffle data is not cleaned up when an external shuffle service is 
> used and the associated executor has been deallocated before the shuffle is 
> cleaned up. Shuffle data is only cleaned up once the application ends.
> There have been various issues filed for this:
> https://issues.apache.org/jira/browse/SPARK-26020
> https://issues.apache.org/jira/browse/SPARK-17233
> https://issues.apache.org/jira/browse/SPARK-4236
> But shuffle files will still stick around until an application completes. 
> Dynamic allocation is commonly used for long running jobs (such as structured 
> streaming), so any long running jobs with a large shuffle involved will 
> eventually fill up local disk space. The shuffle service already supports 
> cleaning up shuffle service persisted RDDs, so it should be able to support 
> cleaning up shuffle blocks as well once the shuffle is removed by the 
> ContextCleaner. 
> The current alternative is to use shuffle tracking instead of an external 
> shuffle service, but this is less optimal from a resource perspective as all 
> executors must be kept alive until the shuffle has been fully consumed and 
> cleaned up (and with the default GC interval being 30 minutes this can waste 
> a lot of time with executors held onto but not doing anything).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38194) Make memory overhead factor configurable

2022-03-21 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38194:
--
Fix Version/s: 3.3.0

> Make memory overhead factor configurable
> 
>
> Key: SPARK-38194
> URL: https://issues.apache.org/jira/browse/SPARK-38194
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Mesos, YARN
>Affects Versions: 3.4.0
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> Currently if the memory overhead is not provided for a Yarn job, it defaults 
> to 10% of the respective driver/executor memory. This 10% is hard-coded and 
> the only way to increase memory overhead is to set the exact memory overhead. 
> We have seen more than 10% memory being used, and it would be helpful to be 
> able to set the default overhead factor so that the overhead doesn't need to 
> be pre-calculated for any driver/executor memory size. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38194) Make Yarn memory overhead factor configurable

2022-03-16 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38194:
--
Fix Version/s: 3.3.0
   (was: 3.4.0)

> Make Yarn memory overhead factor configurable
> -
>
> Key: SPARK-38194
> URL: https://issues.apache.org/jira/browse/SPARK-38194
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.2.1
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently if the memory overhead is not provided for a Yarn job, it defaults 
> to 10% of the respective driver/executor memory. This 10% is hard-coded and 
> the only way to increase memory overhead is to set the exact memory overhead. 
> We have seen more than 10% memory being used, and it would be helpful to be 
> able to set the default overhead factor so that the overhead doesn't need to 
> be pre-calculated for any driver/executor memory size. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38194) Make Yarn memory overhead factor configurable

2022-03-16 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-38194:
--
Fix Version/s: 3.4.0
   (was: 3.3.0)

> Make Yarn memory overhead factor configurable
> -
>
> Key: SPARK-38194
> URL: https://issues.apache.org/jira/browse/SPARK-38194
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.2.1
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently if the memory overhead is not provided for a Yarn job, it defaults 
> to 10% of the respective driver/executor memory. This 10% is hard-coded and 
> the only way to increase memory overhead is to set the exact memory overhead. 
> We have seen more than 10% memory being used, and it would be helpful to be 
> able to set the default overhead factor so that the overhead doesn't need to 
> be pre-calculated for any driver/executor memory size. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38194) Make Yarn memory overhead factor configurable

2022-03-16 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-38194.
---
Fix Version/s: 3.3.0
 Assignee: Adam Binford
   Resolution: Fixed

> Make Yarn memory overhead factor configurable
> -
>
> Key: SPARK-38194
> URL: https://issues.apache.org/jira/browse/SPARK-38194
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.2.1
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently if the memory overhead is not provided for a Yarn job, it defaults 
> to 10% of the respective driver/executor memory. This 10% is hard-coded and 
> the only way to increase memory overhead is to set the exact memory overhead. 
> We have seen more than 10% memory being used, and it would be helpful to be 
> able to set the default overhead factor so that the overhead doesn't need to 
> be pre-calculated for any driver/executor memory size. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-08 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503229#comment-17503229
 ] 

Thomas Graves commented on SPARK-38379:
---

so the issue here is there is a race between when kubernetes call 
MountVolumesFeatureStep via adding it to the ExecutorPodsLifecycleManager which 
calls addSubscriber in ExecutorPodsSnapshotsStoreImpl. and when the 
spark.app.id is actually set in the Spark Context.  Here spark context isn't 
set until after the scheduler backend has started.    If its not set the only 
way to get the appId is to get the one generated in 
KubernetesClusterSchedulerBackend since that is wha tis ultimately used in 
spark context to set spark.app.id.  I'll investigate a fix.

 

> Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes 
> --
>
> Key: SPARK-38379
> URL: https://issues.apache.org/jira/browse/SPARK-38379
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Thomas Graves
>Priority: Major
>
> I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
> client mode.  I'm using persistent local volumes to mount nvme under /data in 
> the executors and on startup the driver always throws the warning below.
> using these options:
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
>  
>  
> {code:java}
> 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> java.util.NoSuchElementException: spark.app.id
>         at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.SparkConf.get(SparkConf.scala:245)
>         at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at scala.collection.Iterator.foreach(Iterator.scala:943)
>         at scala.collection.Iterator.foreach$(Iterator.scala:943)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>         at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>         at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:91)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.c

[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2022-03-08 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503221#comment-17503221
 ] 

Thomas Graves commented on SPARK-34960:
---

if I'm reading the orc spec right the ColumnStatistics footer are optional in 
Orc, correct?

I'm assuming that is why PR says "If the file does not have valid statistics, 
Spark will throw exception and fail query."     I guess the only way to know 
its there or not is to read it so we can't really determine ahead of time?

This seems like behavior that should be documented in the very least.  I want 
to make sure I'm not missing something here.

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36645) Aggregate (Min/Max/Count) push down for Parquet

2022-03-08 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-36645:
--
Summary: Aggregate (Min/Max/Count) push down for Parquet  (was: Aggregate 
(Count) push down for Parquet)

> Aggregate (Min/Max/Count) push down for Parquet
> ---
>
> Key: SPARK-36645
> URL: https://issues.apache.org/jira/browse/SPARK-36645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> Push down Aggregate (Min/Max/Count)  for Parquet for performance improvement



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36645) Aggregate (Min/Max/Count) push down for Parquet

2022-03-08 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503212#comment-17503212
 ] 

Thomas Graves edited comment on SPARK-36645 at 3/8/22, 10:52 PM:
-

Note it appears this only really pushes down count and min and max for some 
types because:

Parquet Binary min/max could be truncated. We may get wrong result if we rely 
on parquet Binary min/max.

I'm going to update the title to reflect this.


was (Author: tgraves):
Note it appears this only really pushes down count because:

Parquet Binary min/max could be truncated. We may get wrong result if we rely 
on parquet Binary min/max.

I'm going to update the title to reflect this.

> Aggregate (Min/Max/Count) push down for Parquet
> ---
>
> Key: SPARK-36645
> URL: https://issues.apache.org/jira/browse/SPARK-36645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> Push down Aggregate (Min/Max/Count)  for Parquet for performance improvement



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36645) Aggregate (Count) push down for Parquet

2022-03-08 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-36645:
--
Summary: Aggregate (Count) push down for Parquet  (was: Aggregate 
(Min/Max/Count) push down for Parquet)

> Aggregate (Count) push down for Parquet
> ---
>
> Key: SPARK-36645
> URL: https://issues.apache.org/jira/browse/SPARK-36645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> Push down Aggregate (Min/Max/Count)  for Parquet for performance improvement



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36645) Aggregate (Min/Max/Count) push down for Parquet

2022-03-08 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503212#comment-17503212
 ] 

Thomas Graves commented on SPARK-36645:
---

Note it appears this only really pushes down count because:

Parquet Binary min/max could be truncated. We may get wrong result if we rely 
on parquet Binary min/max.

I'm going to update the title to reflect this.

> Aggregate (Min/Max/Count) push down for Parquet
> ---
>
> Key: SPARK-36645
> URL: https://issues.apache.org/jira/browse/SPARK-36645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> Push down Aggregate (Min/Max/Count)  for Parquet for performance improvement



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2022-03-08 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503138#comment-17503138
 ] 

Thomas Graves commented on TEZ-3362:


[~jeagles] [~kshukla]  I know its been a while, I was looking at this feature 
and I'm wondering how this works on a secure yarn setup?  Generally the files 
and directories are read/write by the user and read only by the Hadoop group.  
If this runs in the auxiliary shuffle handler in the node manager it wouldn't 
have permissions to remove the directories.  

Is this somehow relying on other permission or configuration changes or is it 
running the remove as the user and I'm not seeing it?

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, 
> TEZ-3362.006.patch, TEZ-3362.007.patch, TEZ-3362.008.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-07 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502555#comment-17502555
 ] 

Thomas Graves commented on SPARK-38379:
---

so I actually created another pod with Spark client in it and use the 
spark-shell.

[https://spark.apache.org/docs/3.2.1/running-on-kubernetes.html#client-mode]

Only thing I had to do was make sure ports were available. 

Since you don't run in this mode, I can investigate more.

> Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes 
> --
>
> Key: SPARK-38379
> URL: https://issues.apache.org/jira/browse/SPARK-38379
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Thomas Graves
>Priority: Major
>
> I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
> client mode.  I'm using persistent local volumes to mount nvme under /data in 
> the executors and on startup the driver always throws the warning below.
> using these options:
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
>  
>  
> {code:java}
> 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> java.util.NoSuchElementException: spark.app.id
>         at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.SparkConf.get(SparkConf.scala:245)
>         at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at scala.collection.Iterator.foreach(Iterator.scala:943)
>         at scala.collection.Iterator.foreach$(Iterator.scala:943)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>         at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>         at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:91)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
>         at

[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-02 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500354#comment-17500354
 ] 

Thomas Graves commented on SPARK-38379:
---

just going by the stack trace this looks related to change 
https://issues.apache.org/jira/browse/SPARK-35182

[~dongjoon] Just curious if you have run into this?

> Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes 
> --
>
> Key: SPARK-38379
> URL: https://issues.apache.org/jira/browse/SPARK-38379
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Thomas Graves
>Priority: Major
>
> I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
> client mode.  I'm using persistent local volumes to mount nvme under /data in 
> the executors and on startup the driver always throws the warning below.
> using these options:
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
>  
>  
> {code:java}
> 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> java.util.NoSuchElementException: spark.app.id
>         at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.SparkConf.get(SparkConf.scala:245)
>         at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at scala.collection.Iterator.foreach(Iterator.scala:943)
>         at scala.collection.Iterator.foreach$(Iterator.scala:943)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>         at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>         at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:91)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117)
>

[jira] [Created] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-01 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-38379:
-

 Summary: Kubernetes: NoSuchElementException: spark.app.id when 
using PersistentVolumes 
 Key: SPARK-38379
 URL: https://issues.apache.org/jira/browse/SPARK-38379
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.2.1
Reporter: Thomas Graves


I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
client mode.  I'm using persistent local volumes to mount nvme under /data in 
the executors and on startup the driver always throws the warning below.

using these options:

--conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false

 

 
{code:java}
22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying 
snapshot subscriber.
java.util.NoSuchElementException: spark.app.id
        at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.SparkConf.get(SparkConf.scala:245)
        at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
        at 
org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
        at 
org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
        at 
org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
        at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at 
org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:138)
       at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126)
        at 
org.apache.spark.scheduler.

[jira] [Commented] (SPARK-37461) yarn-client mode client's appid value is null

2021-11-30 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451172#comment-17451172
 ] 

Thomas Graves commented on SPARK-37461:
---

[~angerszhuuu] please add a description to this issue.

> yarn-client mode client's appid value is null
> -
>
> Key: SPARK-37461
> URL: https://issues.apache.org/jira/browse/SPARK-37461
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-09 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-37260:
-

 Summary: PYSPARK Arrow 3.2.0 docs link invalid
 Key: SPARK-37260
 URL: https://issues.apache.org/jira/browse/SPARK-37260
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.2.0
Reporter: Thomas Graves


[http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]

links to:

[https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]

which links to:

[https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]

But that is an invalid link.

I assume its supposed to point to:

https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438784#comment-17438784
 ] 

Thomas Graves commented on SPARK-37208:
---

Note, I'm working on this.

> Support mapping Spark gpu/fpga resource types to custom YARN resource type
> --
>
> Key: SPARK-37208
> URL: https://issues.apache.org/jira/browse/SPARK-37208
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> Currently Spark supports gpu/fpga resource scheduling and specifically on 
> YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
> yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
> made it easier for users to plugin in custom resource types. This means users 
> may create a custom resource type that represents a GPU or FPGAs because they 
> want additional logic that YARN the built in versions don't have. Ideally 
> Spark users still just  use the generic "gpu" or "fpga" types in Spark. So we 
> should add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-37208:
-

 Summary: Support mapping Spark gpu/fpga resource types to custom 
YARN resource type
 Key: SPARK-37208
 URL: https://issues.apache.org/jira/browse/SPARK-37208
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.0.0
Reporter: Thomas Graves


Currently Spark supports gpu/fpga resource scheduling and specifically on YARN 
it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
made it easier for users to plugin in custom resource types. This means users 
may create a custom resource type that represents a GPU or FPGAs because they 
want additional logic that YARN the built in versions don't have. Ideally Spark 
users still just  use the generic "gpu" or "fpga" types in Spark. So we should 
add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36540) AM should not just finish with Success when dissconnected

2021-10-11 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-36540.
---
Fix Version/s: 3.3.0
 Assignee: angerszhu
   Resolution: Fixed

> AM should not just finish with Success when dissconnected
> -
>
> Key: SPARK-36540
> URL: https://issues.apache.org/jira/browse/SPARK-36540
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> We meet a case AM lose connection
> {code}
> 21/08/18 02:14:15 ERROR TransportRequestHandler: Error sending result 
> RpcResponse{requestId=5675952834716124039, 
> body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=47 cap=64]}} to 
> xx.xx.xx.xx:41420; closing connection
> java.nio.channels.ClosedChannelException
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Check the code about client, when AMEndpoint dissconnected, will finish 
> Application with SUCCESS final status
> {code}
> override def onDisconnected(remoteAddress: RpcAddress): Unit = {
>   // In cluster mode or unmanaged am case, do not rely on the 
> disassociated event to exit
>   // This avoids potentially reporting incorrect exit codes if the driver 
> fails
>   if (!(isClusterMode || sparkConf.get(YARN_UNMANAGED_AM))) {
> logInfo(s"Driver terminated or disconnected! Shutting down. 
> $remoteAddress")
> finish(FinalApplicationStatus.SUCCEEDED, 
> ApplicationMaster.EXIT_SUCCESS)
>   }
> }
> {code}
> Nomally in client mode, when application success, driver will stop and AM 
> loss connection, it's ok that exit with SUCCESS, but if there is a not work 
> problem cause dissconnected. Still finish with final status is not correct.
> Then YarnClientSchedulerBackend will receive application report with final 
> status with success and stop SparkContext cause application failed but mark 
> it as a normal stop.
> {code}
>   private class MonitorThread extends Thread {
> private var allowInterrupt = true
> override def run() {
>   try {
> val YarnAppReport(_, state, diags) =
>   client.monitorApplication(appId.get, logApplicationReport = false)
> logError(s"YARN application has exited unexpectedly with state 
> $state! " +
>   "Check the YARN application logs for more details.")
> diags.foreach { err =>
>   logError(s"Diagnostics message: $err")
> }
> allowInterrupt = false
> sc.stop()
>   } catch {
> case e: InterruptedException => logInfo("Interrupting monitor thread")
>   }
> }
> def stopMonitor(): Unit = {
>   if (allowInterrupt) {
> this.interrupt()
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36624) When application killed, sc should not exit with code 0

2021-09-29 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-36624.
---
Fix Version/s: 3.3.0
 Assignee: angerszhu
   Resolution: Fixed

> When application killed, sc should not exit with code 0
> ---
>
> Key: SPARK-36624
> URL: https://issues.apache.org/jira/browse/SPARK-36624
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> When application killed, sc should not exit with code 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36817) Does Apache Spark 3 support GPU usage for Spark RDDs?

2021-09-27 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420737#comment-17420737
 ] 

Thomas Graves commented on SPARK-36817:
---

please refer to https://github.com/NVIDIA/spark-rapids/issues/35791

> Does Apache Spark 3 support GPU usage for Spark RDDs?
> -
>
> Key: SPARK-36817
> URL: https://issues.apache.org/jira/browse/SPARK-36817
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Abhishek Shakya
>Priority: Major
>
> I am currently trying to run genomic analyses pipelines using 
> [Hail|https://hail.is/](library for genomics analyses written in python and 
> Scala). Recently, Apache Spark 3 was released and it supported GPU usage.
> I tried [spark-rapids|https://nvidia.github.io/spark-rapids/] library start 
> an on-premise slurm cluster with gpu nodes. I was able to initialise the 
> cluster. However, when I tried running hail tasks, the executors keep getting 
> killed.
> On querying in Hail forum, I got the response that
> {quote}That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any 
> Spark-SQL interfaces, only the RDD interfaces.
> {quote}
> So, does Spark3 not support GPU usage for RDD interfaces?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-09-23 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reopened SPARK-35672:
---

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-09-23 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419312#comment-17419312
 ] 

Thomas Graves commented on SPARK-35672:
---

Ok, sounds like we should revert then so this doesn't block 3.2 release

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36772) FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-17 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-36772:
--
Target Version/s: 3.2.0

> FinalizeShuffleMerge fails with an exception due to attempt id not matching
> ---
>
> Key: SPARK-36772
> URL: https://issues.apache.org/jira/browse/SPARK-36772
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Priority: Blocker
>
> As part of driver request to external shuffle services (ESS) to finalize the 
> merge, it also passes its [application attempt 
> id|https://github.com/apache/spark/blob/3f09093a21306b0fbcb132d4c9f285e56ac6b43c/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java#L180]
>  so that ESS can validate the request is from the correct attempt.
> This attempt id is fetched from the TransportConf passed in when creating the 
> [ExternalBlockStoreClient|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkEnv.scala#L352]
>  - and the transport conf leverages a [cloned 
> copy|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala#L47]
>  of the SparkConf passed to it.
> Application attempt id is set as part of SparkContext 
> [initialization|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L586].
> But this happens after driver SparkEnv has [already been 
> created|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L460].
> Hence the attempt id that ExternalBlockStoreClient uses will always end up 
> being -1 : which will not match the attempt id at ESS (which is based on 
> spark.app.attempt.id) : resulting in merge finalization to always fail (" 
> java.lang.IllegalArgumentException: The attempt id -1 in this 
> FinalizeShuffleMerge message does not match with the current attempt id 1 
> stored in shuffle service for application ...")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec

2021-09-03 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-3:
--
Priority: Blocker  (was: Major)

> [SQL] Regression in AQEShuffleReadExec
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Blocker
>
> I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 
> 3.2 release candidate and there is a regression in AQEShuffleReadExec where 
> it now throws an exception if the shuffle's output partitioning does not 
> match a specific list of schemes.
> The problem can be solved by returning UnknownPartitioning, as it already 
> does in some cases, rather than throwing an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36622) spark.history.kerberos.principal doesn't take value _HOST

2021-09-02 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408813#comment-17408813
 ] 

Thomas Graves commented on SPARK-36622:
---

Supported _HOST for SHS likely makes sense since its a server.

> spark.history.kerberos.principal doesn't take value _HOST
> -
>
> Key: SPARK-36622
> URL: https://issues.apache.org/jira/browse/SPARK-36622
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Security, Spark Core
>Affects Versions: 3.0.1, 3.1.2
>Reporter: pralabhkumar
>Priority: Minor
>
> spark.history.kerberos.principal doesn't understand value _HOST. 
> It says failure to login for principal : spark/_HOST@realm . 
> It will be helpful to take _HOST value via config file and change it with 
> current hostname(similar to what Hive does) . This will also help to run SHS 
> on multiple machines without hardcoding principal hostname.  
> .spark.history.kerberos.principal
>  
> It require minor change in HistoryServer.scala in initSecurity  method . 
>  
> Please let me know , if this request make sense , I'll create the PR . 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32333) Drop references to Master

2021-08-27 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405814#comment-17405814
 ] 

Thomas Graves commented on SPARK-32333:
---

I was looking to break this up into subtasks but not sure how much we will be 
able to, perhaps something like this:

Note we need to keep backwards compatibility so rename is copy for any public 
api's/scripts.

Also note, one thing we may not want to change or discuss more is rename 
ApplicationMaster since that is the name Hadoop uses for it.  We can certainly 
change internal api's and functions but external we may not want to.

1) Rename the standalone Master and public apis (SparkConf.setMaster), scripts, 
docs that reference those scripts

2) Rename standalone Master classes that aren't public - MasterArguments, 
MasterUI, MasterMessages, etc
3) Rename internal classes and messages with Master in name, note we could also 
rename class with Master in the name of them like BlockManagerMaster, 
MapOutputTrackerMaster, etc. We could break this up further if people are 
interesting in helping.



 

> Drop references to Master
> -
>
> Key: SPARK-32333
> URL: https://issues.apache.org/jira/browse/SPARK-32333
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> We have a lot of references to "master" in the code base. It will be 
> beneficial to remove references to problematic language that can alienate 
> potential community members. 
> SPARK-32004 removed references to slave
>  
> Here is a IETF draft to fix up some of the most egregious examples
> (master/slave, whitelist/backlist) with proposed alternatives.
> https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-11 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397461#comment-17397461
 ] 

Thomas Graves commented on SPARK-36446:
---

[~adamkennedy77] ^

> YARN shuffle server restart crashes all dynamic allocation jobs that have 
> deallocated an executor
> -
>
> Key: SPARK-36446
> URL: https://issues.apache.org/jira/browse/SPARK-36446
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.8, 3.1.2
>Reporter: Adam Kennedy
>Priority: Critical
>
> When dynamic allocation is enabled, executors that deallocate rely on the 
> shuffle server to hold blocks and supply them to remaining executors.
> When YARN Shuffle Server restarts (either intentionally or due to a crash), 
> it loses block information and relies on being able to contact Executors (the 
> locations of which it durably stores) to refetch the list of blocks.
> This mutual dependency on the other to hold block information fails fatally 
> under some common scenarios.
> For example, if a Spark application is running under dynamic allocation, some 
> amount of executors will almost always shut down.
> If, after this has occurred, any shuffle server crashes, or is restarted 
> (either directly when running as a standalone service, or as part of a YARN 
> node manager restart) then there is no way to restore block data and it is 
> permanently lost.
> Worse, when Executors try to fetch blocks from the shuffle server, the 
> shuffle server cannot location the exeutor, decides it doesn't exist, treats 
> it as a fatal exception, and causes the application to terminate and crash.
> Thus, in a real world scenario that we observe on a 1000+ node multi-tenant 
> cluster  where dynamic allocation is on by default, a rolling restart of the 
> YARN node managers will cause ALL jobs that have deallocated any executor and 
> have shuffles or transferred blocks to the shuffle server in order to shut 
> down, to crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-06 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394931#comment-17394931
 ] 

Thomas Graves commented on SPARK-36446:
---

Is this with the yarn nodemangar recovery enabled?  ie yarn stores the 
necessarily information in a database which on restart it loads back up, if 
that is not configured it will not remember blocks.  

> YARN shuffle server restart crashes all dynamic allocation jobs that have 
> deallocated an executor
> -
>
> Key: SPARK-36446
> URL: https://issues.apache.org/jira/browse/SPARK-36446
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.8, 3.1.2
>Reporter: Adam Kennedy
>Priority: Critical
>
> When dynamic allocation is enabled, executors that deallocate rely on the 
> shuffle server to hold blocks and supply them to remaining executors.
> When YARN Shuffle Server restarts (either intentionally or due to a crash), 
> it loses block information and relies on being able to contact Executors (the 
> locations of which it durably stores) to refetch the list of blocks.
> This mutual dependency on the other to hold block information fails fatally 
> under some common scenarios.
> For example, if a Spark application is running under dynamic allocation, some 
> amount of executors will almost always shut down.
> If, after this has occurred, any shuffle server crashes, or is restarted 
> (either directly when running as a standalone service, or as part of a YARN 
> node manager restart) then there is no way to restore block data and it is 
> permanently lost.
> Worse, when Executors try to fetch blocks from the shuffle server, the 
> shuffle server cannot location the exeutor, decides it doesn't exist, treats 
> it as a fatal exception, and causes the application to terminate and crash.
> Thus, in a real world scenario that we observe on a 1000+ node multi-tenant 
> cluster  where dynamic allocation is on by default, a rolling restart of the 
> YARN node managers will cause ALL jobs that have deallocated any executor and 
> have shuffles or transferred blocks to the shuffle server in order to shut 
> down, to crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-595) Document "local-cluster" mode

2021-08-06 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-595:
---

Assignee: Yuto Akutsu

> Document "local-cluster" mode
> -
>
> Key: SPARK-595
> URL: https://issues.apache.org/jira/browse/SPARK-595
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation
>Affects Versions: 0.6.0
>Reporter: Josh Rosen
>Assignee: Yuto Akutsu
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
>
> The 'Spark Standalone Mode' guide describes how to manually launch a 
> standalone cluster, which can be done locally for testing, but it does not 
> mention SparkContext's `local-cluster` option.
> What are the differences between these approaches?  Which one should I prefer 
> for local testing?  Can I still use the standalone web interface if I use 
> 'local-cluster' mode?
> It would be useful to document this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-595) Document "local-cluster" mode

2021-08-06 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-595.
-
Fix Version/s: 3.3.0
   3.2.0
   Resolution: Fixed

> Document "local-cluster" mode
> -
>
> Key: SPARK-595
> URL: https://issues.apache.org/jira/browse/SPARK-595
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation
>Affects Versions: 0.6.0
>Reporter: Josh Rosen
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
>
> The 'Spark Standalone Mode' guide describes how to manually launch a 
> standalone cluster, which can be done locally for testing, but it does not 
> mention SparkContext's `local-cluster` option.
> What are the differences between these approaches?  Which one should I prefer 
> for local testing?  Can I still use the standalone web interface if I use 
> 'local-cluster' mode?
> It would be useful to document this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-595) Document "local-cluster" mode

2021-08-06 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reopened SPARK-595:
-

> Document "local-cluster" mode
> -
>
> Key: SPARK-595
> URL: https://issues.apache.org/jira/browse/SPARK-595
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation
>Affects Versions: 0.6.0
>Reporter: Josh Rosen
>Priority: Minor
>
> The 'Spark Standalone Mode' guide describes how to manually launch a 
> standalone cluster, which can be done locally for testing, but it does not 
> mention SparkContext's `local-cluster` option.
> What are the differences between these approaches?  Which one should I prefer 
> for local testing?  Can I still use the standalone web interface if I use 
> 'local-cluster' mode?
> It would be useful to document this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3514 matches

Mail list logo