[jira] [Updated] (SPARK-43225) Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution

2023-04-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43225:

Summary: Remove jackson-core-asl and jackson-mapper-asl from pre-built 
distribution  (was: Change the scope of jackson-mapper-asl from compile to test)

> Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution
> --
>
> Key: SPARK-43225
> URL: https://issues.apache.org/jira/browse/SPARK-43225
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> To fix CVE issue: https://github.com/apache/spark/security/dependabot/50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43102) Upgrade commons-compress to 1.23.0

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43102.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40751
[https://github.com/apache/spark/pull/40751]

> Upgrade commons-compress to 1.23.0
> --
>
> Key: SPARK-43102
> URL: https://issues.apache.org/jira/browse/SPARK-43102
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>
> https://commons.apache.org/proper/commons-compress/changes-report.html#a1.23.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43102) Upgrade commons-compress to 1.23.0

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43102:
-

Assignee: Yang Jie

> Upgrade commons-compress to 1.23.0
> --
>
> Key: SPARK-43102
> URL: https://issues.apache.org/jira/browse/SPARK-43102
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> https://commons.apache.org/proper/commons-compress/changes-report.html#a1.23.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43227) Fix deserialisation issue when UDFs contain a lambda expression

2023-04-20 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-43227:


 Summary: Fix deserialisation issue when UDFs contain a lambda 
expression
 Key: SPARK-43227
 URL: https://issues.apache.org/jira/browse/SPARK-43227
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


The following code:
{code:java}
class A(x: Int) { def get = x * 20 + 5 }
val dummyUdf = (x: Int) => new A(x).get
val myUdf = udf(dummyUdf)
spark.range(5).select(myUdf(col("id"))).as[Int].collect() {code}
hits the following error:
{noformat}
io.grpc.StatusRuntimeException: INTERNAL: cannot assign instance of 
java.lang.invoke.SerializedLambda to field ammonite.$sess.cmd26$Helper.dummyUdf 
of type scala.Function1 in instance of ammonite.$sess.cmd26$Helper
  io.grpc.Status.asRuntimeException(Status.java:535)
  io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  
org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62)
  org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114)
  org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131)
  org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687)
  org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088)
  org.apache.spark.sql.Dataset.collect(Dataset.scala:2686)
  ammonite.$sess.cmd28$Helper.(cmd28.sc:1)
  ammonite.$sess.cmd28$.(cmd28.sc:7)
  ammonite.$sess.cmd28$.(cmd28.sc){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43213) Add `DataFrame.offset` to PySpark

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43213.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40873
[https://github.com/apache/spark/pull/40873]

> Add `DataFrame.offset` to PySpark
> -
>
> Key: SPARK-43213
> URL: https://issues.apache.org/jira/browse/SPARK-43213
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43213) Add `DataFrame.offset` to PySpark

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43213:
-

Assignee: Ruifeng Zheng

> Add `DataFrame.offset` to PySpark
> -
>
> Key: SPARK-43213
> URL: https://issues.apache.org/jira/browse/SPARK-43213
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43224) Executor should not be removed when decommissioned in standalone

2023-04-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714820#comment-17714820
 ] 

Hyukjin Kwon commented on SPARK-43224:
--

[~warrenzhu25]would be great if we have some description on this issue.

> Executor should not be removed when decommissioned in standalone
> 
>
> Key: SPARK-43224
> URL: https://issues.apache.org/jira/browse/SPARK-43224
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43226) Define extractors for file-constant metadata columns

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-43226:
-
Target Version/s:   (was: 3.5.0)

> Define extractors for file-constant metadata columns
> 
>
> Key: SPARK-43226
> URL: https://issues.apache.org/jira/browse/SPARK-43226
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ryan Johnson
>Priority: Major
>
> File-source constant metadata columns are often derived indirectly from 
> file-level metadata values rather than exposing those values directly. For 
> example, {{_metadata.file_name}} is currently hard-coded in 
> {{FileFormat.updateMetadataInternalRow}} as:
>  
> {code:java}
> UTF8String.fromString(filePath.getName){code}
>  
> We should add support for metadata extractors, functions that map from 
> {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns 
> in a generic way instead of hard-coding them.
> We can't just add them to the metadata map because then they have to be 
> pre-computed even if it turns out the query does not select that field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43225) Change the scope of jackson-mapper-asl from compile to test

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714817#comment-17714817
 ] 

Snoot.io commented on SPARK-43225:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40893

> Change the scope of jackson-mapper-asl from compile to test
> ---
>
> Key: SPARK-43225
> URL: https://issues.apache.org/jira/browse/SPARK-43225
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> To fix CVE issue: https://github.com/apache/spark/security/dependabot/50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43222) Remove check of `isHadoop3`

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714819#comment-17714819
 ] 

Snoot.io commented on SPARK-43222:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40882

> Remove check of `isHadoop3`
> ---
>
> Key: SPARK-43222
> URL: https://issues.apache.org/jira/browse/SPARK-43222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43193) Remove workaround for HADOOP-12074

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714818#comment-17714818
 ] 

Snoot.io commented on SPARK-43193:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40852

> Remove workaround for HADOOP-12074
> --
>
> Key: SPARK-43193
> URL: https://issues.apache.org/jira/browse/SPARK-43193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31733) Make YarnClient.`specify a more specific type for the application` pass in Hadoop-3.2

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714816#comment-17714816
 ] 

Snoot.io commented on SPARK-31733:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40877

> Make YarnClient.`specify a more specific type for the application` pass in 
> Hadoop-3.2
> -
>
> Key: SPARK-31733
> URL: https://issues.apache.org/jira/browse/SPARK-31733
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43193) Remove workaround for HADOOP-12074

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43193.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40852
[https://github.com/apache/spark/pull/40852]

> Remove workaround for HADOOP-12074
> --
>
> Key: SPARK-43193
> URL: https://issues.apache.org/jira/browse/SPARK-43193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43193) Remove workaround for HADOOP-12074

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43193:


Assignee: Cheng Pan

> Remove workaround for HADOOP-12074
> --
>
> Key: SPARK-43193
> URL: https://issues.apache.org/jira/browse/SPARK-43193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43226) Define extractors for file-constant metadata columns

2023-04-20 Thread Ryan Johnson (Jira)
Ryan Johnson created SPARK-43226:


 Summary: Define extractors for file-constant metadata columns
 Key: SPARK-43226
 URL: https://issues.apache.org/jira/browse/SPARK-43226
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Ryan Johnson


File-source constant metadata columns are often derived indirectly from 
file-level metadata values rather than exposing those values directly. For 
example, {{_metadata.file_name}} is currently hard-coded in 
{{FileFormat.updateMetadataInternalRow}} as:

 
{code:java}
UTF8String.fromString(filePath.getName){code}
 

We should add support for metadata extractors, functions that map from 
{{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns in 
a generic way instead of hard-coding them.

We can't just add them to the metadata map because then they have to be 
pre-computed even if it turns out the query does not select that field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43128) Streaming progress struct (especially in Scala)

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714812#comment-17714812
 ] 

Snoot.io commented on SPARK-43128:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40892

> Streaming progress struct (especially in Scala)
> ---
>
> Key: SPARK-43128
> URL: https://issues.apache.org/jira/browse/SPARK-43128
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Streaming spark connect transfers streaming progress as full “json”.
> This works ok for Python since it does not have any schema defined. 
> But in Scala, it is a full fledged class. We need to decide if we want to 
> match legacy Progress struct in spark-connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43128) Streaming progress struct (especially in Scala)

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714810#comment-17714810
 ] 

Snoot.io commented on SPARK-43128:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40892

> Streaming progress struct (especially in Scala)
> ---
>
> Key: SPARK-43128
> URL: https://issues.apache.org/jira/browse/SPARK-43128
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Streaming spark connect transfers streaming progress as full “json”.
> This works ok for Python since it does not have any schema defined. 
> But in Scala, it is a full fledged class. We need to decide if we want to 
> match legacy Progress struct in spark-connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42945) Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714808#comment-17714808
 ] 

Snoot.io commented on SPARK-42945:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40575

> Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect
> ---
>
> Key: SPARK-42945
> URL: https://issues.apache.org/jira/browse/SPARK-42945
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Make the PySpark setting PYSPARK_JVM_STACKTRACE_ENABLED work with Spark 
> Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43136) Scala mapGroup, coGroup

2023-04-20 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714807#comment-17714807
 ] 

Snoot.io commented on SPARK-43136:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40729

> Scala mapGroup, coGroup
> ---
>
> Key: SPARK-43136
> URL: https://issues.apache.org/jira/browse/SPARK-43136
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding Basics of Dataset#groupByKey -> KeyValueGroupedDataset support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43225) Change the scope of jackson-mapper-asl from compile to test

2023-04-20 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-43225:
---

 Summary: Change the scope of jackson-mapper-asl from compile to 
test
 Key: SPARK-43225
 URL: https://issues.apache.org/jira/browse/SPARK-43225
 Project: Spark
  Issue Type: Bug
  Components: Build, SQL
Affects Versions: 3.5.0
Reporter: Yuming Wang


To fix CVE issue: https://github.com/apache/spark/security/dependabot/50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43119) Support Get SQL Keywords Dynamically

2023-04-20 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-43119:


Assignee: Kent Yao

> Support Get SQL Keywords Dynamically
> 
>
> Key: SPARK-43119
> URL: https://issues.apache.org/jira/browse/SPARK-43119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Implements the JDBC standard API and an auxiliary function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43119) Support Get SQL Keywords Dynamically

2023-04-20 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-43119.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40768
[https://github.com/apache/spark/pull/40768]

> Support Get SQL Keywords Dynamically
> 
>
> Key: SPARK-43119
> URL: https://issues.apache.org/jira/browse/SPARK-43119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>
> Implements the JDBC standard API and an auxiliary function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan closed SPARK-43220.
---

> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> {code:java}
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show(){code}
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.
> !image-2023-04-20-23-40-25-212.png|width=795,height=152!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714794#comment-17714794
 ] 

Jia Fan commented on SPARK-43220:
-

After I tested with another way, it works. My fault.

> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> {code:java}
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show(){code}
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.
> !image-2023-04-20-23-40-25-212.png|width=795,height=152!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan resolved SPARK-43220.
-
Resolution: Invalid

> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> {code:java}
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show(){code}
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.
> !image-2023-04-20-23-40-25-212.png|width=795,height=152!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39203) Fix remote table location based on database location

2023-04-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714793#comment-17714793
 ] 

Hyukjin Kwon commented on SPARK-39203:
--

But to be clear, this change exists in Spark 3.4.0.
It was taken out from 3.4.1 and 3.5.0.

> Fix remote table location based on database location
> 
>
> Key: SPARK-39203
> URL: https://issues.apache.org/jira/browse/SPARK-39203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> We have HDFS and Hive on cluster A. We have Spark on cluster B and need to 
> read data from cluster A. The table location is incorrect:
> {noformat}
> spark-sql> desc formatted  default.test_table;
> fas_acct_id   decimal(18,0)
> fas_acct_cd   string
> cmpny_cd  string
> entity_id string
> cre_date  date
> cre_user  string
> upd_date  timestamp
> upd_user  string
> # Detailed Table Information
> Database default
> Table test_table
> Type  EXTERNAL
> Provider  parquet
> Statistics25310025737 bytes
> Location  /user/hive/warehouse/test_table
> Serde Library 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Storage Properties[compression=snappy]
> spark-sql> desc database default;
> Namespace Namedefault
> Comment
> Location  viewfs://clusterA/user/hive/warehouse/
> Owner hive_dba
> {noformat}
> The correct table location should be 
> viewfs://clusterA/user/hive/warehouse/test_table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39203) Fix remote table location based on database location

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39203:
-
Fix Version/s: (was: 3.4.0)

> Fix remote table location based on database location
> 
>
> Key: SPARK-39203
> URL: https://issues.apache.org/jira/browse/SPARK-39203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> We have HDFS and Hive on cluster A. We have Spark on cluster B and need to 
> read data from cluster A. The table location is incorrect:
> {noformat}
> spark-sql> desc formatted  default.test_table;
> fas_acct_id   decimal(18,0)
> fas_acct_cd   string
> cmpny_cd  string
> entity_id string
> cre_date  date
> cre_user  string
> upd_date  timestamp
> upd_user  string
> # Detailed Table Information
> Database default
> Table test_table
> Type  EXTERNAL
> Provider  parquet
> Statistics25310025737 bytes
> Location  /user/hive/warehouse/test_table
> Serde Library 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Storage Properties[compression=snappy]
> spark-sql> desc database default;
> Namespace Namedefault
> Comment
> Location  viewfs://clusterA/user/hive/warehouse/
> Owner hive_dba
> {noformat}
> The correct table location should be 
> viewfs://clusterA/user/hive/warehouse/test_table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-39203) Fix remote table location based on database location

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-39203:
--
  Assignee: (was: Yuming Wang)

Reverted in https://github.com/apache/spark/pull/40871

> Fix remote table location based on database location
> 
>
> Key: SPARK-39203
> URL: https://issues.apache.org/jira/browse/SPARK-39203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> We have HDFS and Hive on cluster A. We have Spark on cluster B and need to 
> read data from cluster A. The table location is incorrect:
> {noformat}
> spark-sql> desc formatted  default.test_table;
> fas_acct_id   decimal(18,0)
> fas_acct_cd   string
> cmpny_cd  string
> entity_id string
> cre_date  date
> cre_user  string
> upd_date  timestamp
> upd_user  string
> # Detailed Table Information
> Database default
> Table test_table
> Type  EXTERNAL
> Provider  parquet
> Statistics25310025737 bytes
> Location  /user/hive/warehouse/test_table
> Serde Library 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Storage Properties[compression=snappy]
> spark-sql> desc database default;
> Namespace Namedefault
> Comment
> Location  viewfs://clusterA/user/hive/warehouse/
> Owner hive_dba
> {noformat}
> The correct table location should be 
> viewfs://clusterA/user/hive/warehouse/test_table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43124) Dataset.show should not trigger job execution on CommandResults

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43124.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40779
[https://github.com/apache/spark/pull/40779]

> Dataset.show should not trigger job execution on CommandResults
> ---
>
> Key: SPARK-43124
> URL: https://issues.apache.org/jira/browse/SPARK-43124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43124) Dataset.show should not trigger job execution on CommandResults

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43124:


Assignee: Peter Toth

> Dataset.show should not trigger job execution on CommandResults
> ---
>
> Key: SPARK-43124
> URL: https://issues.apache.org/jira/browse/SPARK-43124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42960) Add remaining Streaming Query commands like await_termination()

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42960.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40785
[https://github.com/apache/spark/pull/40785]

> Add remaining Streaming Query commands like await_termination()
> ---
>
> Key: SPARK-42960
> URL: https://issues.apache.org/jira/browse/SPARK-42960
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> Add remaining Streaming Query API, including: 
>  * await_termination() : needs to be a streaming RPC.
>  * exception()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42960) Add remaining Streaming Query commands like await_termination()

2023-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42960:


Assignee: Raghu Angadi

> Add remaining Streaming Query commands like await_termination()
> ---
>
> Key: SPARK-42960
> URL: https://issues.apache.org/jira/browse/SPARK-42960
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
>
> Add remaining Streaming Query API, including: 
>  * await_termination() : needs to be a streaming RPC.
>  * exception()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43224) Executor should not be removed when decommissioned in standalone

2023-04-20 Thread Zhongwei Zhu (Jira)
Zhongwei Zhu created SPARK-43224:


 Summary: Executor should not be removed when decommissioned in 
standalone
 Key: SPARK-43224
 URL: https://issues.apache.org/jira/browse/SPARK-43224
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Zhongwei Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43211) Remove Hadoop2 support in IsolatedClientLoader

2023-04-20 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-43211.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40870
[https://github.com/apache/spark/pull/40870]

> Remove Hadoop2 support in IsolatedClientLoader
> --
>
> Key: SPARK-43211
> URL: https://issues.apache.org/jira/browse/SPARK-43211
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43211) Remove Hadoop2 support in IsolatedClientLoader

2023-04-20 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-43211:


Assignee: Cheng Pan

> Remove Hadoop2 support in IsolatedClientLoader
> --
>
> Key: SPARK-43211
> URL: https://issues.apache.org/jira/browse/SPARK-43211
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43202) Replace reflection w/ direct calling for YARN Resource API

2023-04-20 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-43202.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40860
[https://github.com/apache/spark/pull/40860]

> Replace reflection w/ direct calling for YARN Resource API
> --
>
> Key: SPARK-43202
> URL: https://issues.apache.org/jira/browse/SPARK-43202
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43202) Replace reflection w/ direct calling for YARN Resource API

2023-04-20 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-43202:


Assignee: Cheng Pan

> Replace reflection w/ direct calling for YARN Resource API
> --
>
> Key: SPARK-43202
> URL: https://issues.apache.org/jira/browse/SPARK-43202
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43201) Inconsistency between from_avro and from_json function

2023-04-20 Thread Philip Adetiloye (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-43201:
-
Description: 
Spark from_avro function does not allow schema parameter to use dataframe 
column but takes only a String schema:
{code:java}
def from_avro(col: Column, jsonFormatSchema: String): Column {code}
This makes it impossible to deserialize rows of Avro records with different 
schema since only one schema string could be pass externally. 

 

Here is what I would expect like from_json function:
{code:java}
def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
code example:
{code:java}
import org.apache.spark.sql.functions.from_avro

val avroSchema1 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
 

val avroSchema2 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""


val df = Seq(
  (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
  (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
).toDF("binaryData", "schema")


val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))


parsed.show()


// Output:
// ++
// |  parsedData|
// ++
// |[apple1, 1.0]|
// |[apple2, 2.0]|
// ++
 {code}
 

  was:
Spark from_avro function does not allow schema parameter to use dataframe 
column but takes only a String schema:
{code:java}
def from_avro(col: Column, jsonFormatSchema: String): Column {code}
This makes it impossible to deserialize rows of Avro records with different 
schema since only one schema string could be pass externally. 

 

Here is what I would expect:
{code:java}
def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
code example:
{code:java}
import org.apache.spark.sql.functions.from_avro

val avroSchema1 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
 

val avroSchema2 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""


val df = Seq(
  (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
  (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
).toDF("binaryData", "schema")


val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))


parsed.show()


// Output:
// ++
// |  parsedData|
// ++
// |[apple1, 1.0]|
// |[apple2, 2.0]|
// ++
 {code}
 


> Inconsistency between from_avro and from_json function
> --
>
> Key: SPARK-43201
> URL: https://issues.apache.org/jira/browse/SPARK-43201
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Philip Adetiloye
>Priority: Major
>
> Spark from_avro function does not allow schema parameter to use dataframe 
> column but takes only a String schema:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: String): Column {code}
> This makes it impossible to deserialize rows of Avro records with different 
> schema since only one schema string could be pass externally. 
>  
> Here is what I would expect like from_json function:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
> code example:
> {code:java}
> import org.apache.spark.sql.functions.from_avro
> val avroSchema1 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
>  
> val avroSchema2 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
> val df = Seq(
>   (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
>   (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
> ).toDF("binaryData", "schema")
> val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))
> parsed.show()
> // Output:
> // ++
> // |  parsedData|
> // ++
> // |[apple1, 1.0]|
> // |[apple2, 2.0]|
> // ++
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43201) Inconsistency between from_avro and from_json function

2023-04-20 Thread Philip Adetiloye (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-43201:
-
Description: 
Spark from_avro function does not allow schema parameter to use dataframe 
column but takes only a String schema:
{code:java}
def from_avro(col: Column, jsonFormatSchema: String): Column {code}
This makes it impossible to deserialize rows of Avro records with different 
schema since only one schema string could be pass externally. 

 

Here is what I would expect:
{code:java}
def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
code example:
{code:java}
import org.apache.spark.sql.functions.from_avro

val avroSchema1 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
 

val avroSchema2 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""


val df = Seq(
  (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
  (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
).toDF("binaryData", "schema")


val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))


parsed.show()


// Output:
// ++
// |  parsedData|
// ++
// |[apple1, 1.0]|
// |[apple2, 2.0]|
// ++
 {code}
 

  was:
Spark from_avro function does not allow schema to use dataframe column but 
takes a String schema:
{code:java}
def from_avro(col: Column, jsonFormatSchema: String): Column {code}
This makes it impossible to deserialize rows of Avro records with different 
schema since only one schema string could be pass externally. 

 

Here is what I would expect:
{code:java}
def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
code example:
{code:java}
import org.apache.spark.sql.functions.from_avro

val avroSchema1 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
 

val avroSchema2 = 
"""{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""


val df = Seq(
  (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
  (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
).toDF("binaryData", "schema")


val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))


parsed.show()


// Output:
// ++
// |  parsedData|
// ++
// |[apple1, 1.0]|
// |[apple2, 2.0]|
// ++
 {code}
 


> Inconsistency between from_avro and from_json function
> --
>
> Key: SPARK-43201
> URL: https://issues.apache.org/jira/browse/SPARK-43201
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Philip Adetiloye
>Priority: Major
>
> Spark from_avro function does not allow schema parameter to use dataframe 
> column but takes only a String schema:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: String): Column {code}
> This makes it impossible to deserialize rows of Avro records with different 
> schema since only one schema string could be pass externally. 
>  
> Here is what I would expect:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
> code example:
> {code:java}
> import org.apache.spark.sql.functions.from_avro
> val avroSchema1 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
>  
> val avroSchema2 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
> val df = Seq(
>   (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
>   (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
> ).toDF("binaryData", "schema")
> val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))
> parsed.show()
> // Output:
> // ++
> // |  parsedData|
> // ++
> // |[apple1, 1.0]|
> // |[apple2, 2.0]|
> // ++
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43202) Replace reflection w/ direct calling for YARN Resource API

2023-04-20 Thread Mike K (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714707#comment-17714707
 ] 

Mike K commented on SPARK-43202:


User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40860

> Replace reflection w/ direct calling for YARN Resource API
> --
>
> Key: SPARK-43202
> URL: https://issues.apache.org/jira/browse/SPARK-43202
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

line:1092

!image-2023-04-21-00-24-22-059.png!

step 2:

On the driver side, the driver obtains all blockManagers holding the block 
based on the BlockId. For non remote shuffle scenarios, the driver will 
retrieve the first one with the blockId and blockManager from the locations

Assuming that there are two BlockManagers holding the BlockId on this node, 
BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
stores it in disk

Assuming the returned status is of type memory and its disksize is 0

line: 852, 856

!image-2023-04-21-00-30-41-851.png!

step 3:

This method will return a BlockLocationsAndStatus object. If there are BMs 
using disk, the disk's path information will be stored in localDirs

!image-2023-04-21-00-50-10-918.png!

step 4:

When the executor obtains locationsAndStatusOption, localDirs is not empty, but 
status.diskSize is 0

line: 1102

!image-2023-04-21-00-54-11-968.png!

step 5:

The readDiskBlockFromSameHostExecutor only determines whether the Block file 
exists, and then directly uses the incoming blocksize to read the byte array. 
If the blocksize is 0, it returns an empty byte array

Only checked if the file exists

line: 1234, 1240

!image-2023-04-21-00-57-29-140.png!

Taking values from an empty array, causing an out of bounds problem

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

line:1092

!image-2023-04-21-00-24-22-059.png!

step 2:

On the driver side, the driver obtains all blockManagers holding the block 
based on the BlockId. For non remote shuffle scenarios, the driver will 
retrieve the first one with the blockId and blockManager from the locations

Assuming that there are two BlockManagers holding the BlockId on this node, 
BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
stores it in disk

Assuming the returned status is of type memory and its disksize is 0

line: 852, 856

!image-2023-04-21-00-30-41-851.png!

step 3:

This method will return a BlockLocationsAndStatus object. If there are BMs 
using disk, the disk's path information will be stored in localDirs

!image-2023-04-21-00-50-10-918.png!

step 4:

When the executor obtains locationsAndStatusOption, localDirs is not empty, but 
status.diskSize is 0

line: 1102

!image-2023-04-21-00-54-11-968.png!

step 5:

The readDiskBlockFromSameHostExecutor only determines whether the Block file 
exists, and then directly uses the incoming blocksize to read the byte array. 
If the blocksize is 0, it returns an empty byte array

only check 

line: 1234, 1240

!image-2023-04-21-00-57-29-140.png!


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, 
> image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, 
> image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw 

[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

line:1092

!image-2023-04-21-00-24-22-059.png!

step 2:

On the driver side, the driver obtains all blockManagers holding the block 
based on the BlockId. For non remote shuffle scenarios, the driver will 
retrieve the first one with the blockId and blockManager from the locations

Assuming that there are two BlockManagers holding the BlockId on this node, 
BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
stores it in disk

Assuming the returned status is of type memory and its disksize is 0

line: 852, 856

!image-2023-04-21-00-30-41-851.png!

step 3:

This method will return a BlockLocationsAndStatus object. If there are BMs 
using disk, the disk's path information will be stored in localDirs

!image-2023-04-21-00-50-10-918.png!

step 4:

When the executor obtains locationsAndStatusOption, localDirs is not empty, but 
status.diskSize is 0

line: 1102

!image-2023-04-21-00-54-11-968.png!

step 5:

The readDiskBlockFromSameHostExecutor only determines whether the Block file 
exists, and then directly uses the incoming blocksize to read the byte array. 
If the blocksize is 0, it returns an empty byte array

only check 

line: 1234, 1240

!image-2023-04-21-00-57-29-140.png!

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

line:1092

!image-2023-04-21-00-24-22-059.png!

step 2:

On the driver side, the driver obtains all blockManagers holding the block 
based on the BlockId. For non remote shuffle scenarios, the driver will 
retrieve the first one with the blockId and blockManager from the locations

Assuming that there are two BlockManagers holding the BlockId on this node, 
BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
stores it in disk

line: 852, 856

!image-2023-04-21-00-30-41-851.png!

step 3:

 

 


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, 
> image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, 
> image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block 
> based on the BlockId. For non remote shuffle scenarios, the driver will 
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding 

[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-57-29-140.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, 
> image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, 
> image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block 
> based on the BlockId. For non remote shuffle scenarios, the driver will 
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding the BlockId on this node, 
> BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
> stores it in disk
> line: 852, 856
> !image-2023-04-21-00-30-41-851.png!
> step 3:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-54-11-968.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, 
> image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, 
> image-2023-04-21-00-54-11-968.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block 
> based on the BlockId. For non remote shuffle scenarios, the driver will 
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding the BlockId on this node, 
> BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
> stores it in disk
> line: 852, 856
> !image-2023-04-21-00-30-41-851.png!
> step 3:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43208) IsolatedClassLoader should close barrier class InputStream after reading

2023-04-20 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-43208:


Assignee: Cheng Pan

> IsolatedClassLoader should close barrier class InputStream after reading
> 
>
> Key: SPARK-43208
> URL: https://issues.apache.org/jira/browse/SPARK-43208
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-53-20-720.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, 
> image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block 
> based on the BlockId. For non remote shuffle scenarios, the driver will 
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding the BlockId on this node, 
> BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
> stores it in disk
> line: 852, 856
> !image-2023-04-21-00-30-41-851.png!
> step 3:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43208) IsolatedClassLoader should close barrier class InputStream after reading

2023-04-20 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-43208.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40867
[https://github.com/apache/spark/pull/40867]

> IsolatedClassLoader should close barrier class InputStream after reading
> 
>
> Key: SPARK-43208
> URL: https://issues.apache.org/jira/browse/SPARK-43208
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-50-10-918.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, 
> image-2023-04-21-00-50-10-918.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block 
> based on the BlockId. For non remote shuffle scenarios, the driver will 
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding the BlockId on this node, 
> BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
> stores it in disk
> line: 852, 856
> !image-2023-04-21-00-30-41-851.png!
> step 3:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

line:1092

!image-2023-04-21-00-24-22-059.png!

step 2:

On the driver side, the driver obtains all blockManagers holding the block 
based on the BlockId. For non remote shuffle scenarios, the driver will 
retrieve the first one with the blockId and blockManager from the locations

Assuming that there are two BlockManagers holding the BlockId on this node, 
BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
stores it in disk

line: 852, 856

!image-2023-04-21-00-30-41-851.png!

step 3:

 

 

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

code: !image-2023-04-21-00-19-58-021.png!

step 2:

 


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block 
> based on the BlockId. For non remote shuffle scenarios, the driver will 
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding the BlockId on this node, 
> BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and 
> stores it in disk
> line: 852, 856
> !image-2023-04-21-00-30-41-851.png!
> step 3:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43222) Remove check of `isHadoop3`

2023-04-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-43222:
-
Component/s: (was: YARN)

> Remove check of `isHadoop3`
> ---
>
> Key: SPARK-43222
> URL: https://issues.apache.org/jira/browse/SPARK-43222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43223) KeyValueGroupedDataset#agg

2023-04-20 Thread Zhen Li (Jira)
Zhen Li created SPARK-43223:
---

 Summary: KeyValueGroupedDataset#agg
 Key: SPARK-43223
 URL: https://issues.apache.org/jira/browse/SPARK-43223
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Adding missing agg functions in the KVGDS API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-30-41-851.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> code: !image-2023-04-21-00-19-58-021.png!
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions

2023-04-20 Thread Sun Chao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Chao resolved SPARK-35959.
--
Resolution: Won't Fix

> Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions 
> -
>
> Key: SPARK-35959
> URL: https://issues.apache.org/jira/browse/SPARK-35959
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark uses Hadoop shaded client by default. However, if Spark users 
> want to build Spark with older version of Hadoop, such as 3.1.x, the shaded 
> client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). 
> Therefore, this proposes to offer a new Maven profile "no-shaded-client" for 
> this use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-24-22-059.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png, 
> image-2023-04-21-00-24-22-059.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> code: !image-2023-04-21-00-19-58-021.png!
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Attachment: image-2023-04-21-00-19-58-021.png

> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> code: !image-2023-04-21-00-19-58-021.png!
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

code: !image-2023-04-21-00-19-58-021.png!

step 2:

 

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information
{code:java}
//  private[spark] def getRemoteBlock[T](
      blockId: BlockId,
      bufferTransformer: ManagedBuffer => T): Option[T] = {
    logDebug(s"Getting remote block $blockId")
    require(blockId != null, "BlockId is null")    // Because all the remote 
blocks are registered in driver, it is not necessary to ask
    // all the storage endpoints to get block status.
    val locationsAndStatusOption = master.getLocationsAndStatus(blockId, 
blockManagerId.host) {code}
step 2:

 


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
> Attachments: image-2023-04-21-00-19-58-021.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> code: !image-2023-04-21-00-19-58-021.png!
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43222) Remove check of `isHadoop3`

2023-04-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-43222:
-
Summary: Remove check of `isHadoop3`  (was: Remove check of 
`VersionUtils.isHadoop3`)

> Remove check of `isHadoop3`
> ---
>
> Key: SPARK-43222
> URL: https://issues.apache.org/jira/browse/SPARK-43222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL, YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information
{code:java}
//  private[spark] def getRemoteBlock[T](
      blockId: BlockId,
      bufferTransformer: ManagedBuffer => T): Option[T] = {
    logDebug(s"Getting remote block $blockId")
    require(blockId != null, "BlockId is null")    // Because all the remote 
blocks are registered in driver, it is not necessary to ask
    // all the storage endpoints to get block status.
    val locationsAndStatusOption = master.getLocationsAndStatus(blockId, 
blockManagerId.host) {code}
step 2:

 

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

 

step 2:

 


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> {code:java}
> //  private[spark] def getRemoteBlock[T](
>       blockId: BlockId,
>       bufferTransformer: ManagedBuffer => T): Option[T] = {
>     logDebug(s"Getting remote block $blockId")
>     require(blockId != null, "BlockId is null")    // Because all the remote 
> blocks are registered in driver, it is not necessary to ask
>     // all the storage endpoints to get block status.
>     val locationsAndStatusOption = master.getLocationsAndStatus(blockId, 
> blockManagerId.host) {code}
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

 

step 2:

 

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

!image-2023-04-21-00-07-51-400.png!

step 2:

 


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
>  
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
---
Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

!image-2023-04-21-00-07-51-400.png!

step 2:

 

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)


> Executor obtained error information 
> 
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.1, 3.2.0, 3.3.0
>Reporter: Qiang Yang
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> !image-2023-04-21-00-07-51-400.png!
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43222) Remove check of `VersionUtils.isHadoop3`

2023-04-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-43222:


 Summary: Remove check of `VersionUtils.isHadoop3`
 Key: SPARK-43222
 URL: https://issues.apache.org/jira/browse/SPARK-43222
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL, YARN
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43221) Executor obtained error information

2023-04-20 Thread Qiang Yang (Jira)
Qiang Yang created SPARK-43221:
--

 Summary: Executor obtained error information 
 Key: SPARK-43221
 URL: https://issues.apache.org/jira/browse/SPARK-43221
 Project: Spark
  Issue Type: Bug
  Components: Block Manager
Affects Versions: 3.3.0, 3.2.0, 3.1.1
Reporter: Qiang Yang


Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43197) Clean up the code written for compatibility with Hadoop 2

2023-04-20 Thread GridGain Integration (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714647#comment-17714647
 ] 

GridGain Integration commented on SPARK-43197:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40860

> Clean up the code written for compatibility with Hadoop 2
> -
>
> Key: SPARK-43197
> URL: https://issues.apache.org/jira/browse/SPARK-43197
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL, YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> SPARK-42452 removed support for Hadoop2, we can clean up the code written for 
> compatibility with Hadoop 2 to make it more concise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43215) Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`

2023-04-20 Thread GridGain Integration (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714646#comment-17714646
 ] 

GridGain Integration commented on SPARK-43215:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40876

> Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`
> ---
>
> Key: SPARK-43215
> URL: https://issues.apache.org/jira/browse/SPARK-43215
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43215) Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`

2023-04-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-43215.
--
Resolution: Duplicate

> Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`
> ---
>
> Key: SPARK-43215
> URL: https://issues.apache.org/jira/browse/SPARK-43215
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43113) Codegen error when full outer join's bound condition has multiple references to the same stream-side column

2023-04-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714645#comment-17714645
 ] 

Hudson commented on SPARK-43113:


User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/40881

> Codegen error when full outer join's bound condition has multiple references 
> to the same stream-side column
> ---
>
> Key: SPARK-43113
> URL: https://issues.apache.org/jira/browse/SPARK-43113
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> Example # 1 (sort merge join):
> {noformat}
> create or replace temp view v1 as
> select * from values
> (1, 1),
> (2, 2),
> (3, 1)
> as v1(key, value);
> create or replace temp view v2 as
> select * from values
> (1, 22, 22),
> (3, -1, -1),
> (7, null, null)
> as v2(a, b, c);
> select *
> from v1
> full outer join v2
> on key = a
> and value > b
> and value > c;
> {noformat}
> The join's generated code causes the following compilation error:
> {noformat}
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 277, Column 9: Redefinition of local variable "smj_isNull_7"
> {noformat}
> Example #2 (shuffle hash join):
> {noformat}
> select /*+ SHUFFLE_HASH(v2) */ *
> from v1
> full outer join v2
> on key = a
> and value > b
> and value > c;
> {noformat}
> The shuffle hash join's generated code causes the following compilation error:
> {noformat}
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 174, Column 5: Redefinition of local variable "shj_value_1" 
> {noformat}
> With default configuration, both queries end up succeeding, since Spark falls 
> back to running each query with whole-stage codegen disabled.
> The issue happens only when the join's bound condition refers to the same 
> stream-side column more than once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714644#comment-17714644
 ] 

Jia Fan commented on SPARK-43220:
-

Maybe the way of test have problem or `InMemoryTable` not support this, I will 
test again to confirm that.

> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> {code:java}
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show(){code}
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.
> !image-2023-04-20-23-40-25-212.png|width=795,height=152!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43216) Refactor `ResourceRequestHelper ` to no longer use reflection

2023-04-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-43216.
--
Resolution: Duplicate

> Refactor `ResourceRequestHelper ` to no longer use reflection
> -
>
> Key: SPARK-43216
> URL: https://issues.apache.org/jira/browse/SPARK-43216
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-43220:

Description: 
{code:java}
sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet")
sql("INSERT INTO TABLE persons VALUES " +
"('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
"('Eddie Davis','245 Market St, Milpitas',345678901)")

sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
Cupertino', 432795921)")
sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM persons2")
sql("SELECT * FROM persons").show(){code}

When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
`WHERE ssn = 123456789` or `WHERE FALSE` both not support.

!image-2023-04-20-23-40-25-212.png|width=795,height=152!

  was:
sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet")
sql("INSERT INTO TABLE persons VALUES " +
"('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
"('Eddie Davis','245 Market St, Milpitas',345678901)")
sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
Cupertino', 432795921)")
sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM persons2")
sql("SELECT * FROM persons").show()
 
When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
`WHERE ssn = 123456789` or `WHERE FALSE` both not support.

!image-2023-04-20-23-40-25-212.png|width=795,height=152!


> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> {code:java}
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show(){code}
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.
> !image-2023-04-20-23-40-25-212.png|width=795,height=152!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-43220:

Description: 
sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet")
sql("INSERT INTO TABLE persons VALUES " +
"('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
"('Eddie Davis','245 Market St, Milpitas',345678901)")
sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
Cupertino', 432795921)")
sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM persons2")
sql("SELECT * FROM persons").show()
 
When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
`WHERE ssn = 123456789` or `WHERE FALSE` both not support.

!image-2023-04-20-23-40-25-212.png|width=795,height=152!

  was:
sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet")
sql("INSERT INTO TABLE persons VALUES " +
"('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
"('Eddie Davis','245 Market St, Milpitas',345678901)")
sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
Cupertino', 432795921)")
sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM persons2")
sql("SELECT * FROM persons").show()
 
When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
`WHERE ssn = 123456789` or `WHERE FALSE` both not support.


> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show()
>  
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.
> !image-2023-04-20-23-40-25-212.png|width=795,height=152!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-43220:

Attachment: image-2023-04-20-23-40-25-212.png

> INSERT INTO REPLACE statement can't support WHERE with bool_expression
> --
>
> Key: SPARK-43220
> URL: https://issues.apache.org/jira/browse/SPARK-43220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-04-20-23-40-25-212.png
>
>
> sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
> sql("CREATE TABLE persons2 (name string,address String,ssn int) USING 
> parquet")
> sql("INSERT INTO TABLE persons VALUES " +
> "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
> "('Eddie Davis','245 Market St, Milpitas',345678901)")
> sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
> Cupertino', 432795921)")
> sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM 
> persons2")
> sql("SELECT * FROM persons").show()
>  
> When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
> `WHERE ssn = 123456789` or `WHERE FALSE` both not support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression

2023-04-20 Thread Jia Fan (Jira)
Jia Fan created SPARK-43220:
---

 Summary: INSERT INTO REPLACE statement can't support WHERE with 
bool_expression
 Key: SPARK-43220
 URL: https://issues.apache.org/jira/browse/SPARK-43220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Jia Fan


sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet")
sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet")
sql("INSERT INTO TABLE persons VALUES " +
"('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," +
"('Eddie Davis','245 Market St, Milpitas',345678901)")
sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, 
Cupertino', 432795921)")
sql("INSERT INTO persons REPLACE WHERE ssn =  123456789 SELECT * FROM persons2")
sql("SELECT * FROM persons").show()
 
When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. 
`WHERE ssn = 123456789` or `WHERE FALSE` both not support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43219) Website can't find INSERT INTO REPLACE Statement

2023-04-20 Thread Jia Fan (Jira)
Jia Fan created SPARK-43219:
---

 Summary: Website can't find INSERT INTO REPLACE Statement
 Key: SPARK-43219
 URL: https://issues.apache.org/jira/browse/SPARK-43219
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Jia Fan


`INSERT INTO REPLACE` statement be supported in [SPARK_40956], but can't be 
found in website



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43218) Support "ESCAPE BY" in SparkScriptTransformationExec

2023-04-20 Thread jiang13021 (Jira)
jiang13021 created SPARK-43218:
--

 Summary: Support "ESCAPE BY" in SparkScriptTransformationExec
 Key: SPARK-43218
 URL: https://issues.apache.org/jira/browse/SPARK-43218
 Project: Spark
  Issue Type: Wish
  Components: SQL
Affects Versions: 3.4.0, 3.3.0, 3.2.0
Reporter: jiang13021


If I don't `set spark.sql.catalogImplementation=hive`, I can't use "SELECT 
TRANSFORM" with "ESCAPE BY". Although HiveScriptTransform also doesn't 
implement ESCAPE BY, I can use RowFormatSerde to achieve this ability.

 

In fact, HiveScriptTransform doesn't need to connect to Hive Metastore. I can 
use reflection to forcibly call HiveScriptTransformationExec without connecting 
to Hive Metastore, and it can work properly. Maybe HiveScriptTransform can be 
more generic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43152) User-defined output metadata path (_spark_metadata)

2023-04-20 Thread Jacek Laskowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-43152:

Summary: User-defined output metadata path (_spark_metadata)  (was: 
Parametrisable output metadata path (_spark_metadata))

> User-defined output metadata path (_spark_metadata)
> ---
>
> Key: SPARK-43152
> URL: https://issues.apache.org/jira/browse/SPARK-43152
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wojciech Indyk
>Priority: Major
>
> Currently path of metadata of output checkpoint is hardcoded. The metadata is 
> saved in output path in _spark_metadata folder. It's a constraint on 
> structure of paths, that might be easily relaxed by parametrisable path of 
> output metadata. It would help with issues like [changing output directory of 
> spark streaming 
> job|https://kb.databricks.com/en_US/streaming/file-sink-streaming], [two jobs 
> writing to the same output 
> path|https://issues.apache.org/jira/browse/SPARK-30542] or [partition 
> discovery|https://stackoverflow.com/questions/61904732/is-it-possible-to-change-location-of-spark-metadata-folder-in-spark-structured/61905158].
>  It would also help with separation of metadata from data in path structure.
> The main target of change is getMetadataLogPath method in FileStreamSink. It 
> has got access to sqlConf, so this method can override the default 
> _spark_metadata path if defined it config. Introduction of parametrised 
> metadata path needs reconsidering of meaning of  hasMetadata method in 
> FileStreamSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField

2023-04-20 Thread Johan Lasperas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Lasperas updated SPARK-43217:
---
Description: 
[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested fields below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
valid:
{code:java}
val schema = new StructType()
  .add("a", ArrayType(ArrayType(
    new StructType().add("i", "int"
findNestedField(Seq("a", "element", "element", "i"), schema) {code}
 

  was:
[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested field below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
valid:
{code:java}
val schema = new StructType()
  .add("a", ArrayType(ArrayType(
    new StructType().add("i", "int"
findNestedField(Seq("a", "element", "element", "i"), schema) {code}
 


> Correctly recurse into maps of maps and arrays of arrays in 
> StructType.findNestedField
> --
>
> Key: SPARK-43217
> URL: https://issues.apache.org/jira/browse/SPARK-43217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
>  is unable to reach nested fields below two directly nested maps or arrays. 
> Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
> exception if the child is not a struct.
> The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
> `a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
> valid:
> {code:java}
> val schema = new StructType()
>   .add("a", ArrayType(ArrayType(
>     new StructType().add("i", "int"
> findNestedField(Seq("a", "element", "element", "i"), schema) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField

2023-04-20 Thread Johan Lasperas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Lasperas updated SPARK-43217:
---
Description: 
[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested field below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
valid:
{code:java}
val schema = new StructType()
  .add("a", ArrayType(ArrayType(
    new StructType().add("i", "int"
findNestedField(Seq("a", "element", "element", "i"), schema) {code}
 

  was:
[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested field below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws 'Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.', even though the access path is valid:
{code:java}
val schema = new StructType()
  .add("a", ArrayType(ArrayType(
    new StructType().add("i", "int"
findNestedField(Seq("a", "element", "element", "i"), schema) {code}
 


> Correctly recurse into maps of maps and arrays of arrays in 
> StructType.findNestedField
> --
>
> Key: SPARK-43217
> URL: https://issues.apache.org/jira/browse/SPARK-43217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
>  is unable to reach nested field below two directly nested maps or arrays. 
> Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
> exception if the child is not a struct.
> The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
> `a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
> valid:
> {code:java}
> val schema = new StructType()
>   .add("a", ArrayType(ArrayType(
>     new StructType().add("i", "int"
> findNestedField(Seq("a", "element", "element", "i"), schema) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField

2023-04-20 Thread Johan Lasperas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Lasperas updated SPARK-43217:
---
Description: 
[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested field below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws 'Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.', even though the access path is valid:
{code:java}
val schema = new StructType()
  .add("a", ArrayType(ArrayType(
    new StructType().add("i", "int"
findNestedField(Seq("a", "element", "element", "i"), schema) {code}
 

  was:
[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested field below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws 'Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.', even though the access path is valid:

```

val schema = new StructType()

  .add("a", ArrayType(ArrayType(

    new StructType().add("i", "int"

findNestedField(Seq("a", "element", "element", "i"), schema)

```


> Correctly recurse into maps of maps and arrays of arrays in 
> StructType.findNestedField
> --
>
> Key: SPARK-43217
> URL: https://issues.apache.org/jira/browse/SPARK-43217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
>  is unable to reach nested field below two directly nested maps or arrays. 
> Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
> exception if the child is not a struct.
> The following throws 'Field name `a`.`element`.element`.`i` is invalid: 
> `a`.`element`.`element` is not a struct.', even though the access path is 
> valid:
> {code:java}
> val schema = new StructType()
>   .add("a", ArrayType(ArrayType(
>     new StructType().add("i", "int"
> findNestedField(Seq("a", "element", "element", "i"), schema) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField

2023-04-20 Thread Johan Lasperas (Jira)
Johan Lasperas created SPARK-43217:
--

 Summary: Correctly recurse into maps of maps and arrays of arrays 
in StructType.findNestedField
 Key: SPARK-43217
 URL: https://issues.apache.org/jira/browse/SPARK-43217
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Johan Lasperas


[StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
 is unable to reach nested field below two directly nested maps or arrays. 
Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
exception if the child is not a struct.

The following throws 'Field name `a`.`element`.element`.`i` is invalid: 
`a`.`element`.`element` is not a struct.', even though the access path is valid:

```

val schema = new StructType()

  .add("a", ArrayType(ArrayType(

    new StructType().add("i", "int"

findNestedField(Seq("a", "element", "element", "i"), schema)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36058) Support replicasets/job API

2023-04-20 Thread Hu Ziqian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714556#comment-17714556
 ] 

Hu Ziqian commented on SPARK-36058:
---

hi [~holden], i have a question about the statefulsetPodsAllocator.

I understand that with dynamic allocation, the driver will delete executor who 
has idle exceed timeout. For example, we have executor 0 to 9, and the executor 
5 is idle. the driver will delete executor 5 and adjust target pod number from 
10 to 9. But with stateful set, the k8s will try to delete pod with max index, 
for example executor 9. 

So there is a conflict between deletion from driver and deletion from 
controller manager of k8s.

I want to know is there any limitation when use statefulset  pod allocator. If 
not, how to avoid the conflict above?

> Support replicasets/job API
> ---
>
> Key: SPARK-36058
> URL: https://issues.apache.org/jira/browse/SPARK-36058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.3.0
>
>
> Volcano & Yunikorn both support scheduling invidual pods, but they also 
> support higher level abstractions similar to the vanilla Kube replicasets 
> which we can use to improve scheduling performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43216) Refactor `ResourceRequestHelper ` to no longer use reflection

2023-04-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-43216:


 Summary: Refactor `ResourceRequestHelper ` to no longer use 
reflection
 Key: SPARK-43216
 URL: https://issues.apache.org/jira/browse/SPARK-43216
 Project: Spark
  Issue Type: Sub-task
  Components: YARN
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43215) Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`

2023-04-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-43215:


 Summary: Remove 
`ResourceRequestHelper#isYarnResourceTypesAvailable`
 Key: SPARK-43215
 URL: https://issues.apache.org/jira/browse/SPARK-43215
 Project: Spark
  Issue Type: Sub-task
  Components: YARN
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43214) Post driver-side metrics for LocalTableScanExec/CommandResultExec

2023-04-20 Thread Fu Chen (Jira)
Fu Chen created SPARK-43214:
---

 Summary: Post driver-side metrics for 
LocalTableScanExec/CommandResultExec
 Key: SPARK-43214
 URL: https://issues.apache.org/jira/browse/SPARK-43214
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Fu Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43184) Resume using enumeration to compare `NodeState.DECOMMISSIONING`

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43184.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40846
[https://github.com/apache/spark/pull/40846]

> Resume using enumeration to compare  `NodeState.DECOMMISSIONING`
> 
>
> Key: SPARK-43184
> URL: https://issues.apache.org/jira/browse/SPARK-43184
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43184) Resume using enumeration to compare `NodeState.DECOMMISSIONING`

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43184:
-

Assignee: Yang Jie

> Resume using enumeration to compare  `NodeState.DECOMMISSIONING`
> 
>
> Key: SPARK-43184
> URL: https://issues.apache.org/jira/browse/SPARK-43184
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43213) Add `DataFrame.offset` to PySpark

2023-04-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714501#comment-17714501
 ] 

ASF GitHub Bot commented on SPARK-43213:


User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40873

> Add `DataFrame.offset` to PySpark
> -
>
> Key: SPARK-43213
> URL: https://issues.apache.org/jira/browse/SPARK-43213
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39203) Fix remote table location based on database location

2023-04-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714499#comment-17714499
 ] 

ASF GitHub Bot commented on SPARK-39203:


User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40871

> Fix remote table location based on database location
> 
>
> Key: SPARK-39203
> URL: https://issues.apache.org/jira/browse/SPARK-39203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> We have HDFS and Hive on cluster A. We have Spark on cluster B and need to 
> read data from cluster A. The table location is incorrect:
> {noformat}
> spark-sql> desc formatted  default.test_table;
> fas_acct_id   decimal(18,0)
> fas_acct_cd   string
> cmpny_cd  string
> entity_id string
> cre_date  date
> cre_user  string
> upd_date  timestamp
> upd_user  string
> # Detailed Table Information
> Database default
> Table test_table
> Type  EXTERNAL
> Provider  parquet
> Statistics25310025737 bytes
> Location  /user/hive/warehouse/test_table
> Serde Library 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Storage Properties[compression=snappy]
> spark-sql> desc database default;
> Namespace Namedefault
> Comment
> Location  viewfs://clusterA/user/hive/warehouse/
> Owner hive_dba
> {noformat}
> The correct table location should be 
> viewfs://clusterA/user/hive/warehouse/test_table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43213) Add `DataFrame.offset` to PySpark

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43213:
--
Issue Type: New Feature  (was: Improvement)

> Add `DataFrame.offset` to PySpark
> -
>
> Key: SPARK-43213
> URL: https://issues.apache.org/jira/browse/SPARK-43213
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43169) Update mima's previousSparkVersion to 3.4.0

2023-04-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714498#comment-17714498
 ] 

ASF GitHub Bot commented on SPARK-43169:


User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40862

> Update mima's previousSparkVersion to 3.4.0
> ---
>
> Key: SPARK-43169
> URL: https://issues.apache.org/jira/browse/SPARK-43169
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43213) Add `DataFrame.offset` to PySpark

2023-04-20 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-43213:
-

 Summary: Add `DataFrame.offset` to PySpark
 Key: SPARK-43213
 URL: https://issues.apache.org/jira/browse/SPARK-43213
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43183) Move update event on idleness in streaming query listener to separate callback method

2023-04-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-43183.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40845
[https://github.com/apache/spark/pull/40845]

> Move update event on idleness in streaming query listener to separate 
> callback method
> -
>
> Key: SPARK-43183
> URL: https://issues.apache.org/jira/browse/SPARK-43183
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.5.0
>
>
> People has been having a lot of confusions about update event on idleness; 
> it’s not only the matter of understanding but also comes up with various 
> types of complaints. For example, since we give the latest batch ID for 
> update event on idleness, if the listener implementation blindly performs 
> upsert based on batch ID, they are in risk to lose metrics.
> This also complicates the logic because we have to memorize the execution for 
> the previous batch, which is arguably not necessary.
> Because of this, we’d be better to move the idle event out of progress update 
> event and have separate callback method for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43183) Move update event on idleness in streaming query listener to separate callback method

2023-04-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-43183:


Assignee: Jungtaek Lim

> Move update event on idleness in streaming query listener to separate 
> callback method
> -
>
> Key: SPARK-43183
> URL: https://issues.apache.org/jira/browse/SPARK-43183
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> People has been having a lot of confusions about update event on idleness; 
> it’s not only the matter of understanding but also comes up with various 
> types of complaints. For example, since we give the latest batch ID for 
> update event on idleness, if the listener implementation blindly performs 
> upsert based on batch ID, they are in risk to lose metrics.
> This also complicates the logic because we have to memorize the execution for 
> the previous batch, which is arguably not necessary.
> Because of this, we’d be better to move the idle event out of progress update 
> event and have separate callback method for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43207) Add helper functions for extract value from literal expression

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43207:
-

Assignee: Ruifeng Zheng

> Add helper functions for extract value from literal expression
> --
>
> Key: SPARK-43207
> URL: https://issues.apache.org/jira/browse/SPARK-43207
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43207) Add helper functions for extract value from literal expression

2023-04-20 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43207.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40863
[https://github.com/apache/spark/pull/40863]

> Add helper functions for extract value from literal expression
> --
>
> Key: SPARK-43207
> URL: https://issues.apache.org/jira/browse/SPARK-43207
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43212) Migrate Structured Streaming errors into error class

2023-04-20 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43212:
---

 Summary: Migrate Structured Streaming errors into error class
 Key: SPARK-43212
 URL: https://issues.apache.org/jira/browse/SPARK-43212
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Structured Streaming
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Migrate Structured Streaming errors into error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43190) ListQuery.childOutput should be consistent with child output

2023-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43190.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40851
[https://github.com/apache/spark/pull/40851]

> ListQuery.childOutput should be consistent with child output
> 
>
> Key: SPARK-43190
> URL: https://issues.apache.org/jira/browse/SPARK-43190
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43190) ListQuery.childOutput should be consistent with child output

2023-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43190:
---

Assignee: Wenchen Fan

> ListQuery.childOutput should be consistent with child output
> 
>
> Key: SPARK-43190
> URL: https://issues.apache.org/jira/browse/SPARK-43190
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43211) Remove Hadoop2 support in IsolatedClientLoader

2023-04-20 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-43211:
-

 Summary: Remove Hadoop2 support in IsolatedClientLoader
 Key: SPARK-43211
 URL: https://issues.apache.org/jira/browse/SPARK-43211
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43210) Introduce PySparkAssersionError

2023-04-20 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43210:
---

 Summary: Introduce PySparkAssersionError
 Key: SPARK-43210
 URL: https://issues.apache.org/jira/browse/SPARK-43210
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Introduce PySparkAssersionError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43209) Migrate Expression errors into error class

2023-04-20 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43209:
---

 Summary: Migrate Expression errors into error class
 Key: SPARK-43209
 URL: https://issues.apache.org/jira/browse/SPARK-43209
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Migrate Expression errors into error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >