[jira] [Commented] (SPARK-40100) Add Int128 type

2022-08-15 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580084#comment-17580084
 ] 

jiaan.geng commented on SPARK-40100:


I'm working on.

> Add Int128 type
> ---
>
> Key: SPARK-40100
> URL: https://issues.apache.org/jira/browse/SPARK-40100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Extend Catalyst's type system by a new type:
> Int128Type represents the Int128



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40100) Add Int128 type

2022-08-15 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-40100:
--

 Summary: Add Int128 type
 Key: SPARK-40100
 URL: https://issues.apache.org/jira/browse/SPARK-40100
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng


Extend Catalyst's type system by a new type:
Int128Type represents the Int128



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40099) Merge adjacent CaseWhen branches if their values are the same

2022-08-15 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-40099:
---

 Summary: Merge adjacent CaseWhen branches if their values are the 
same
 Key: SPARK-40099
 URL: https://issues.apache.org/jira/browse/SPARK-40099
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang


For example:
{code:sql}
  CASE
WHEN f1.buyer_id IS NOT NULL THEN 1
WHEN f2.buyer_id IS NOT NULL THEN 1
ELSE 0
  END
{code}

The excepted result:
{code:sql}
  CASE
WHEN f1.buyer_id IS NOT NULL or f2.buyer_id IS NOT NULL 
THEN 1
ELSE 0
  END
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40095.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37530
[https://github.com/apache/spark/pull/37530]

> sc.uiWebUrl should not throw exception when webui is disabled
> -
>
> Key: SPARK-40095
> URL: https://issues.apache.org/jira/browse/SPARK-40095
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40098) Format error messages in the Thrift Server

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40098:


Assignee: Max Gekk  (was: Apache Spark)

> Format error messages in the Thrift Server
> --
>
> Key: SPARK-40098
> URL: https://issues.apache.org/jira/browse/SPARK-40098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> # Introduce a config to control the format of error messages: plain text and 
> JSON
> # Modify the Thrift Server to output errors from Spark SQL according to the 
> config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40098) Format error messages in the Thrift Server

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40098:


Assignee: Apache Spark  (was: Max Gekk)

> Format error messages in the Thrift Server
> --
>
> Key: SPARK-40098
> URL: https://issues.apache.org/jira/browse/SPARK-40098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> # Introduce a config to control the format of error messages: plain text and 
> JSON
> # Modify the Thrift Server to output errors from Spark SQL according to the 
> config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40098) Format error messages in the Thrift Server

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580074#comment-17580074
 ] 

Apache Spark commented on SPARK-40098:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37520

> Format error messages in the Thrift Server
> --
>
> Key: SPARK-40098
> URL: https://issues.apache.org/jira/browse/SPARK-40098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> # Introduce a config to control the format of error messages: plain text and 
> JSON
> # Modify the Thrift Server to output errors from Spark SQL according to the 
> config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40098) Format error messages in the Thrift Server

2022-08-15 Thread Max Gekk (Jira)
Max Gekk created SPARK-40098:


 Summary: Format error messages in the Thrift Server
 Key: SPARK-40098
 URL: https://issues.apache.org/jira/browse/SPARK-40098
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


# Introduce a config to control the format of error messages: plain text and 
JSON
# Modify the Thrift Server to output errors from Spark SQL according to the 
config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-15 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580073#comment-17580073
 ] 

Hyukjin Kwon commented on SPARK-40063:
--

Just to clarify, does that only happen in single column? or happen in other 
columns too? Spark itself doesn't guarantee the order of rows. If you need to 
keep the natural order, you could try to enable {{compute.ordered_head}} and 
see what you get.

> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
> Environment: Databricks Runtime 11.1
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>  Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['col_to_apply_function'] = df.apply(lambda row: 
> example_func(row['col_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.
> Setting one column as index also didn't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40097) Support Int128 type

2022-08-15 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-40097:
---
Description: 
Spark SQL today supports the Decimal data type. The implementation of Spark 
Decimal holds a BigDecimal or Long value. Spark Decimal provides some operators 
like +, -, *, /, % and so on. These operators rely heavily on the computational 
power of BigDecimal or Long itself. For ease of understanding, take the + as an 
example. The implementation shows below.

{code:java}
  def + (that: Decimal): Decimal = {
if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) 
{
  Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, 
scale)
} else {
  Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
}
  }
{code}

We can see the + of Long will be called if Spark Decimal holds a Long value. 
Otherwise,  the add of BigDecimal will be called if Spark Decimal holds a 
BigDecimal value. The other operators of Spark Decimal adopt the similar way. 
Furthermore, the code shown above calls Decimal.apply to construct a new 
instance of Spark Decimal.  As we know, the add operator of BigDecimal 
constructed a new instance of BigDecimal. So, if we call the + operator of 
Spark Decimal who holds a Long value, Spark will construct a new instance of 
Decimal. Otherwise, Spark will construct a new instance of BigDecimal and a new 
instance of Decimal.
Through rough analysis, we know:
1. The computational power of Spark Decimal may depend on BigDecimal.
2. The calculation operators of Spark Decimal create a lot of new instances of 
Decimal and may create a lot of new instances of BigDecimal.
If a large table has a field called 'colA whose type is Spark Decimal, the 
execution of SUM('colA) will involve the creation of a large number of Spark 
Decimal instances and BigDecimal instances. These Spark Decimal instances and 
BigDecimal instances will lead to garbage collection frequently.
Int128 is a high-performance data type about 2X~10X more efficient than Spark 
Decimal for typical operations. It uses a finite (128 bit) precision and can 
handle up to decimal(38, X). The implementation of Int128 just uses two Long 
values to represent the high and low bits of 128 bits respectively. Int128 is 
lighter than Spark Decimal, reduces the cost of new() and garbage collection.

h3. Milestone 1 – Spark Decimal equivalency ( The new type Int128 meets or 
exceeds all function of the existing SQL Decimal):
* Add a new DataType implementation for Int128.
* Support Int128 in Dataset/UDF.
* Int128 literals
* Int128 arithmetic(e.g. Int128 + Int128, Int128 - Decimal)
* Decimal or Math functions/operators: POWER, LOG, Round, etc
* Cast to and from Int128, cast String/Decimal to Int128, cast Int128 to string 
(pretty printing)/Decimal, with the * * SQL syntax to specify the types
* Support sorting Int128.

h3. Milestone 2 – Persistence:
 * Ability to create tables of type Int128
 * Ability to write to common file formats such as Parquet and JSON.
 * INSERT, SELECT, UPDATE, MERGE
 * Discovery

h3. Milestone 3 – Client support
 * JDBC support
 * Hive Thrift server

h3. Milestone 4 – PySpark and Spark R integration
 * Python UDF can take and return Int128
 * DataFrame support

> Support Int128 type
> ---
>
> Key: SPARK-40097
> URL: https://issues.apache.org/jira/browse/SPARK-40097
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL today supports the Decimal data type. The implementation of Spark 
> Decimal holds a BigDecimal or Long value. Spark Decimal provides some 
> operators like +, -, *, /, % and so on. These operators rely heavily on the 
> computational power of BigDecimal or Long itself. For ease of understanding, 
> take the + as an example. The implementation shows below.
> {code:java}
>   def + (that: Decimal): Decimal = {
> if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == 
> that.scale) {
>   Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 
> 1, scale)
> } else {
>   Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
> }
>   }
> {code}
> We can see the + of Long will be called if Spark Decimal holds a Long value. 
> Otherwise,  the add of BigDecimal will be called if Spark Decimal holds a 
> BigDecimal value. The other operators of Spark Decimal adopt the similar way. 
> Furthermore, the code shown above calls Decimal.apply to construct a new 
> instance of Spark Decimal.  As we know, the add operator of BigDecimal 
> constructed a new instance of BigDecimal. So, if we call the + operator of 
> Spark Decimal who holds a Long value, Spark will construct a new instance of 
> Decimal. Otherwise, Spark wi

[jira] [Updated] (SPARK-40097) Support Int128 type

2022-08-15 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-40097:
---
Attachment: Benchmark for performance comparison between Int128 and Spark 
decimal.pdf

> Support Int128 type
> ---
>
> Key: SPARK-40097
> URL: https://issues.apache.org/jira/browse/SPARK-40097
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
> Attachments: Benchmark for performance comparison between Int128 and 
> Spark decimal.pdf
>
>
> Spark SQL today supports the Decimal data type. The implementation of Spark 
> Decimal holds a BigDecimal or Long value. Spark Decimal provides some 
> operators like +, -, *, /, % and so on. These operators rely heavily on the 
> computational power of BigDecimal or Long itself. For ease of understanding, 
> take the + as an example. The implementation shows below.
> {code:java}
>   def + (that: Decimal): Decimal = {
> if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == 
> that.scale) {
>   Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 
> 1, scale)
> } else {
>   Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
> }
>   }
> {code}
> We can see the + of Long will be called if Spark Decimal holds a Long value. 
> Otherwise,  the add of BigDecimal will be called if Spark Decimal holds a 
> BigDecimal value. The other operators of Spark Decimal adopt the similar way. 
> Furthermore, the code shown above calls Decimal.apply to construct a new 
> instance of Spark Decimal.  As we know, the add operator of BigDecimal 
> constructed a new instance of BigDecimal. So, if we call the + operator of 
> Spark Decimal who holds a Long value, Spark will construct a new instance of 
> Decimal. Otherwise, Spark will construct a new instance of BigDecimal and a 
> new instance of Decimal.
> Through rough analysis, we know:
> 1. The computational power of Spark Decimal may depend on BigDecimal.
> 2. The calculation operators of Spark Decimal create a lot of new instances 
> of Decimal and may create a lot of new instances of BigDecimal.
> If a large table has a field called 'colA whose type is Spark Decimal, the 
> execution of SUM('colA) will involve the creation of a large number of Spark 
> Decimal instances and BigDecimal instances. These Spark Decimal instances and 
> BigDecimal instances will lead to garbage collection frequently.
> Int128 is a high-performance data type about 2X~10X more efficient than Spark 
> Decimal for typical operations. It uses a finite (128 bit) precision and can 
> handle up to decimal(38, X). The implementation of Int128 just uses two Long 
> values to represent the high and low bits of 128 bits respectively. Int128 is 
> lighter than Spark Decimal, reduces the cost of new() and garbage collection.
> h3. Milestone 1 – Spark Decimal equivalency ( The new type Int128 meets or 
> exceeds all function of the existing SQL Decimal):
> * Add a new DataType implementation for Int128.
> * Support Int128 in Dataset/UDF.
> * Int128 literals
> * Int128 arithmetic(e.g. Int128 + Int128, Int128 - Decimal)
> * Decimal or Math functions/operators: POWER, LOG, Round, etc
> * Cast to and from Int128, cast String/Decimal to Int128, cast Int128 to 
> string (pretty printing)/Decimal, with the * * SQL syntax to specify the types
> * Support sorting Int128.
> h3. Milestone 2 – Persistence:
>  * Ability to create tables of type Int128
>  * Ability to write to common file formats such as Parquet and JSON.
>  * INSERT, SELECT, UPDATE, MERGE
>  * Discovery
> h3. Milestone 3 – Client support
>  * JDBC support
>  * Hive Thrift server
> h3. Milestone 4 – PySpark and Spark R integration
>  * Python UDF can take and return Int128
>  * DataFrame support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled

2022-08-15 Thread Min Shen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580070#comment-17580070
 ] 

Min Shen commented on SPARK-40082:
--

[~csingh] [~mridul] 

Want to bring your attention to this ticket. This seems an issue that we 
previously saw. Does upstream already have the fix for this?

> DAGScheduler may not schduler new stage in condition of push-based shuffle 
> enabled
> --
>
> Key: SPARK-40082
> URL: https://issues.apache.org/jira/browse/SPARK-40082
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.1.1
>Reporter: Penglei Shi
>Priority: Major
> Attachments: missParentStages.png, shuffleMergeFinalized.png, 
> submitMissingTasks.png
>
>
> In condition of push-based shuffle being enabled and speculative tasks 
> existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, 
> then its parent stages will be resubmitting firstly and it will cost some 
> time to compute. Before the shuffleMapStage being resubmitted, its all 
> speculative tasks success and register map output, but speculative task 
> successful events can not trigger shuffleMergeFinalized because this stage 
> has been removed from runningStages.
> Then this stage is resubmitted, but speculative tasks have registered map 
> output and there are no missing tasks to compute, resubmitting stages will 
> also not trigger shuffleMergeFinalized. Eventually this stage‘s 
> _shuffleMergedFinalized keeps false.
> Then AQE will submit next stages which are dependent on  this shuffleMapStage 
> occurring fetchFailed. And in getMissingParentStages, this stage will be 
> marked as missing and will be resubmitted, but next stages are added to 
> waitingStages after this stage being finished, so next stages will not be 
> submitted even though this stage's resubmitting has been finished.
> I have only met some times in my production env and it is difficult to 
> reproduce。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40091) is rocksdbjni required if not using streaming?

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40091.
--
Resolution: Duplicate

> is rocksdbjni required if not using streaming?
> --
>
> Key: SPARK-40091
> URL: https://issues.apache.org/jira/browse/SPARK-40091
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 3.3.0
>Reporter: t oo
>Priority: Major
>
> my docker file is very big with pyspark
> can i remove dis file below if i don't use 'spark streaming'?
> 35M     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/rocksdbjni-6.20.3.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40092) is breeze required if not using ML?

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40092.
--
Resolution: Duplicate

> is breeze required if not using ML?
> ---
>
> Key: SPARK-40092
> URL: https://issues.apache.org/jira/browse/SPARK-40092
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 3.3.0
>Reporter: t oo
>Priority: Major
>
> my docker file is very big with pyspark
> can i remove dis files below if i don't use 'ML'?
> 14M     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/breeze_2.12-1.2.jar
> 5.9M    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/spark-mllib_2.12-3.3.0.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40093) is kubernetes jar required if not using that executor?

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40093.
--
Resolution: Won't Fix

> is kubernetes jar required if not using that executor?
> --
>
> Key: SPARK-40093
> URL: https://issues.apache.org/jira/browse/SPARK-40093
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 3.3.0
>Reporter: t oo
>Priority: Major
>
> my docker file is very big with pyspark
> can i remove dis files below if i don't use 'kubernetes executor'?
> 11M     total
> 4.0M    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-core-5.12.2.jar
> 840K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-client-5.12.2.jar
> 760K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-admissionregistration-5.12.2.jar
> 704K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apiextensions-5.12.2.jar
> 640K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-autoscaling-5.12.2.jar
> 528K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-extensions-5.12.2.jar
> 516K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/spark-kubernetes_2.12-3.3.0.jar
> 456K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-networking-5.12.2.jar
> 436K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apps-5.12.2.jar
> 364K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-storageclass-5.12.2.jar
> 336K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-policy-5.12.2.jar
> 264K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-flowcontrol-5.12.2.jar
> 244K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-batch-5.12.2.jar
> 192K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-discovery-5.12.2.jar
> 176K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-rbac-5.12.2.jar
> 160K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-node-5.12.2.jar
> 144K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-certificates-5.12.2.jar
> 104K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-events-5.12.2.jar
> 80K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-metrics-5.12.2.jar
> 68K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-scheduling-5.12.2.jar
> 48K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-coordination-5.12.2.jar
> 20K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-common-5.12.2.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40093) is kubernetes jar required if not using that executor?

2022-08-15 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580069#comment-17580069
 ] 

Hyukjin Kwon commented on SPARK-40093:
--

[~toopt4] the profile and release distributions are matched with our regular 
Spark release. I think we should better raise a discussion in the mailing list 
before filing it as a JIRA.

Short answer: I don't think we'll do this because such releases with the 
exclusion is over-complicated but the benefit is rather trival (size of the 
release)

> is kubernetes jar required if not using that executor?
> --
>
> Key: SPARK-40093
> URL: https://issues.apache.org/jira/browse/SPARK-40093
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 3.3.0
>Reporter: t oo
>Priority: Major
>
> my docker file is very big with pyspark
> can i remove dis files below if i don't use 'kubernetes executor'?
> 11M     total
> 4.0M    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-core-5.12.2.jar
> 840K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-client-5.12.2.jar
> 760K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-admissionregistration-5.12.2.jar
> 704K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apiextensions-5.12.2.jar
> 640K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-autoscaling-5.12.2.jar
> 528K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-extensions-5.12.2.jar
> 516K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/spark-kubernetes_2.12-3.3.0.jar
> 456K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-networking-5.12.2.jar
> 436K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apps-5.12.2.jar
> 364K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-storageclass-5.12.2.jar
> 336K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-policy-5.12.2.jar
> 264K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-flowcontrol-5.12.2.jar
> 244K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-batch-5.12.2.jar
> 192K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-discovery-5.12.2.jar
> 176K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-rbac-5.12.2.jar
> 160K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-node-5.12.2.jar
> 144K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-certificates-5.12.2.jar
> 104K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-events-5.12.2.jar
> 80K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-metrics-5.12.2.jar
> 68K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-scheduling-5.12.2.jar
> 48K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-coordination-5.12.2.jar
> 20K     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-common-5.12.2.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40097) Support Int128 type

2022-08-15 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-40097:
--

 Summary: Support Int128 type
 Key: SPARK-40097
 URL: https://issues.apache.org/jira/browse/SPARK-40097
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40095:


Assignee: Ruifeng Zheng

> sc.uiWebUrl should not throw exception when webui is disabled
> -
>
> Key: SPARK-40095
> URL: https://issues.apache.org/jira/browse/SPARK-40095
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40085.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37524
[https://github.com/apache/spark/pull/37524]

> use INTERNAL_ERROR error class instead of IllegalStateException to indicate 
> bugs
> 
>
> Key: SPARK-40085
> URL: https://issues.apache.org/jira/browse/SPARK-40085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40085:


Assignee: Wenchen Fan

> use INTERNAL_ERROR error class instead of IllegalStateException to indicate 
> bugs
> 
>
> Key: SPARK-40085
> URL: https://issues.apache.org/jira/browse/SPARK-40085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40096:


Assignee: (was: Apache Spark)

> Finalize shuffle merge slow due to connection creation fails
> 
>
> Key: SPARK-40096
> URL: https://issues.apache.org/jira/browse/SPARK-40096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Major
>
> *How to reproduce this issue*
>  * Enable push based shuffle
>  * Remove some merger nodes before sending finalize RPCs
>  * Driver try to connect those merger shuffle services and send finalize RPC 
> one by one, each connection creation will timeout after 
> SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)
>  
> We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
> handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40096:


Assignee: Apache Spark

> Finalize shuffle merge slow due to connection creation fails
> 
>
> Key: SPARK-40096
> URL: https://issues.apache.org/jira/browse/SPARK-40096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>
> *How to reproduce this issue*
>  * Enable push based shuffle
>  * Remove some merger nodes before sending finalize RPCs
>  * Driver try to connect those merger shuffle services and send finalize RPC 
> one by one, each connection creation will timeout after 
> SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)
>  
> We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
> handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580059#comment-17580059
 ] 

Apache Spark commented on SPARK-40096:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/37533

> Finalize shuffle merge slow due to connection creation fails
> 
>
> Key: SPARK-40096
> URL: https://issues.apache.org/jira/browse/SPARK-40096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Major
>
> *How to reproduce this issue*
>  * Enable push based shuffle
>  * Remove some merger nodes before sending finalize RPCs
>  * Driver try to connect those merger shuffle services and send finalize RPC 
> one by one, each connection creation will timeout after 
> SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)
>  
> We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
> handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-08-15 Thread Wan Kun (Jira)
Wan Kun created SPARK-40096:
---

 Summary: Finalize shuffle merge slow due to connection creation 
fails
 Key: SPARK-40096
 URL: https://issues.apache.org/jira/browse/SPARK-40096
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Wan Kun


*How to reproduce this issue*
 * Enable push based shuffle
 * Remove some merger nodes before sending finalize RPCs
 * Driver try to connect those merger shuffle services and send finalize RPC 
one by one, each connection creation will timeout after 
SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)

 
We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39989) Support estimate column statistics if it is foldable expression

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580046#comment-17580046
 ] 

Apache Spark commented on SPARK-39989:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/37532

> Support estimate column statistics if it is foldable expression
> ---
>
> Key: SPARK-39989
> URL: https://issues.apache.org/jira/browse/SPARK-39989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39989) Support estimate column statistics if it is foldable expression

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580045#comment-17580045
 ] 

Apache Spark commented on SPARK-39989:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/37532

> Support estimate column statistics if it is foldable expression
> ---
>
> Key: SPARK-39989
> URL: https://issues.apache.org/jira/browse/SPARK-39989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38334) Implement support for DEFAULT values for columns in tables

2022-08-15 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580028#comment-17580028
 ] 

Daniel commented on SPARK-38334:


I think it is OK to declare that this feature is implemented in Apache Spark 
now. Marking this as fixed.

> Implement support for DEFAULT values for columns in tables 
> ---
>
> Key: SPARK-38334
> URL: https://issues.apache.org/jira/browse/SPARK-38334
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>
> This story tracks the implementation of DEFAULT values for columns in tables.
> CREATE TABLE and ALTER TABLE invocations will support setting column default 
> values for future operations. Following INSERT, UPDATE, MERGE statements may 
> then reference the value using the DEFAULT keyword as needed.
> Examples:
> {code:sql}
> CREATE TABLE T(a INT, b INT NOT NULL);
> -- The default default is NULL
> INSERT INTO T VALUES (DEFAULT, 0);
> INSERT INTO T(b)  VALUES (1);
> SELECT * FROM T;
> (NULL, 0)
> (NULL, 1)
> -- Adding a default to a table with rows, sets the values for the
> -- existing rows (exist default) and new rows (current default).
> ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
> INSERT INTO T VALUES (1, 2, DEFAULT);
> SELECT * FROM T;
> (NULL, 0, 5)
> (NULL, 1, 5)
> (1, 2, 5) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38334) Implement support for DEFAULT values for columns in tables

2022-08-15 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580028#comment-17580028
 ] 

Daniel edited comment on SPARK-38334 at 8/16/22 3:48 AM:
-

I think it is OK to declare that this feature is implemented in Apache Spark 
now. Marking this as fixed. We can always reopen (or file another ticket) if we 
would like to support other data sources.


was (Author: JIRAUSER285772):
I think it is OK to declare that this feature is implemented in Apache Spark 
now. Marking this as fixed.

> Implement support for DEFAULT values for columns in tables 
> ---
>
> Key: SPARK-38334
> URL: https://issues.apache.org/jira/browse/SPARK-38334
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>
> This story tracks the implementation of DEFAULT values for columns in tables.
> CREATE TABLE and ALTER TABLE invocations will support setting column default 
> values for future operations. Following INSERT, UPDATE, MERGE statements may 
> then reference the value using the DEFAULT keyword as needed.
> Examples:
> {code:sql}
> CREATE TABLE T(a INT, b INT NOT NULL);
> -- The default default is NULL
> INSERT INTO T VALUES (DEFAULT, 0);
> INSERT INTO T(b)  VALUES (1);
> SELECT * FROM T;
> (NULL, 0)
> (NULL, 1)
> -- Adding a default to a table with rows, sets the values for the
> -- existing rows (exist default) and new rows (current default).
> ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
> INSERT INTO T VALUES (1, 2, DEFAULT);
> SELECT * FROM T;
> (NULL, 0, 5)
> (NULL, 1, 5)
> (1, 2, 5) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38334) Implement support for DEFAULT values for columns in tables

2022-08-15 Thread Daniel (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel resolved SPARK-38334.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Implement support for DEFAULT values for columns in tables 
> ---
>
> Key: SPARK-38334
> URL: https://issues.apache.org/jira/browse/SPARK-38334
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>
> This story tracks the implementation of DEFAULT values for columns in tables.
> CREATE TABLE and ALTER TABLE invocations will support setting column default 
> values for future operations. Following INSERT, UPDATE, MERGE statements may 
> then reference the value using the DEFAULT keyword as needed.
> Examples:
> {code:sql}
> CREATE TABLE T(a INT, b INT NOT NULL);
> -- The default default is NULL
> INSERT INTO T VALUES (DEFAULT, 0);
> INSERT INTO T(b)  VALUES (1);
> SELECT * FROM T;
> (NULL, 0)
> (NULL, 1)
> -- Adding a default to a table with rows, sets the values for the
> -- existing rows (exist default) and new rows (current default).
> ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
> INSERT INTO T VALUES (1, 2, DEFAULT);
> SELECT * FROM T;
> (NULL, 0, 5)
> (NULL, 1, 5)
> (1, 2, 5) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580022#comment-17580022
 ] 

Apache Spark commented on SPARK-40095:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37530

> sc.uiWebUrl should not throw exception when webui is disabled
> -
>
> Key: SPARK-40095
> URL: https://issues.apache.org/jira/browse/SPARK-40095
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40095:


Assignee: (was: Apache Spark)

> sc.uiWebUrl should not throw exception when webui is disabled
> -
>
> Key: SPARK-40095
> URL: https://issues.apache.org/jira/browse/SPARK-40095
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40095:


Assignee: Apache Spark

> sc.uiWebUrl should not throw exception when webui is disabled
> -
>
> Key: SPARK-40095
> URL: https://issues.apache.org/jira/browse/SPARK-40095
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580021#comment-17580021
 ] 

Apache Spark commented on SPARK-40095:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37530

> sc.uiWebUrl should not throw exception when webui is disabled
> -
>
> Key: SPARK-40095
> URL: https://issues.apache.org/jira/browse/SPARK-40095
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40095) sc.uiWebUrl should not throw exception when webui is disabled

2022-08-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40095:
-

 Summary: sc.uiWebUrl should not throw exception when webui is 
disabled
 Key: SPARK-40095
 URL: https://issues.apache.org/jira/browse/SPARK-40095
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-15 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-4.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37430
[https://github.com/apache/spark/pull/37430]

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-15 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-4:
--

Assignee: Daniel

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40094) Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException to release executors for dynamic allocation

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580003#comment-17580003
 ] 

Apache Spark commented on SPARK-40094:
--

User 'wangshengjie123' has created a pull request for this issue:
https://github.com/apache/spark/pull/37528

>  Send TaskEnd event when task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException to release executors for dynamic 
> allocation 
> --
>
> Key: SPARK-40094
> URL: https://issues.apache.org/jira/browse/SPARK-40094
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Priority: Major
>
> We found if task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException, wont send TaskEnd event, and this will 
> cause dynamic allocation not release executor normally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40094) Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException to release executors for dynamic allocation

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40094:


Assignee: (was: Apache Spark)

>  Send TaskEnd event when task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException to release executors for dynamic 
> allocation 
> --
>
> Key: SPARK-40094
> URL: https://issues.apache.org/jira/browse/SPARK-40094
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Priority: Major
>
> We found if task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException, wont send TaskEnd event, and this will 
> cause dynamic allocation not release executor normally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40094) Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException to release executors for dynamic allocation

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40094:


Assignee: Apache Spark

>  Send TaskEnd event when task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException to release executors for dynamic 
> allocation 
> --
>
> Key: SPARK-40094
> URL: https://issues.apache.org/jira/browse/SPARK-40094
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Assignee: Apache Spark
>Priority: Major
>
> We found if task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException, wont send TaskEnd event, and this will 
> cause dynamic allocation not release executor normally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40094) Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException to release executors for dynamic allocation

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580002#comment-17580002
 ] 

Apache Spark commented on SPARK-40094:
--

User 'wangshengjie123' has created a pull request for this issue:
https://github.com/apache/spark/pull/37528

>  Send TaskEnd event when task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException to release executors for dynamic 
> allocation 
> --
>
> Key: SPARK-40094
> URL: https://issues.apache.org/jira/browse/SPARK-40094
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Priority: Major
>
> We found if task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException, wont send TaskEnd event, and this will 
> cause dynamic allocation not release executor normally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40094) Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException to release executors for dynamic allocation

2022-08-15 Thread wangshengjie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1757#comment-1757
 ] 

wangshengjie commented on SPARK-40094:
--

I'm working on this, a pr will be submitted later, thanks.
 

>  Send TaskEnd event when task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException to release executors for dynamic 
> allocation 
> --
>
> Key: SPARK-40094
> URL: https://issues.apache.org/jira/browse/SPARK-40094
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Priority: Major
>
> We found if task failed with NotSerializableException or 
> TaskOutputFileAlreadyExistException, wont send TaskEnd event, and this will 
> cause dynamic allocation not release executor normally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40094) Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException to release executors for dynamic allocation

2022-08-15 Thread wangshengjie (Jira)
wangshengjie created SPARK-40094:


 Summary:  Send TaskEnd event when task failed with 
NotSerializableException or TaskOutputFileAlreadyExistException to release 
executors for dynamic allocation 
 Key: SPARK-40094
 URL: https://issues.apache.org/jira/browse/SPARK-40094
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: wangshengjie


We found if task failed with NotSerializableException or 
TaskOutputFileAlreadyExistException, wont send TaskEnd event, and this will 
cause dynamic allocation not release executor normally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40093) is kubernetes jar required if not using that executor?

2022-08-15 Thread t oo (Jira)
t oo created SPARK-40093:


 Summary: is kubernetes jar required if not using that executor?
 Key: SPARK-40093
 URL: https://issues.apache.org/jira/browse/SPARK-40093
 Project: Spark
  Issue Type: Question
  Components: Deploy
Affects Versions: 3.3.0
Reporter: t oo


my docker file is very big with pyspark

can i remove dis files below if i don't use 'ML'?

14M     /usr/local/lib/python3.10/site-packages/pyspark/jars/breeze_2.12-1.2.jar

5.9M    
/usr/local/lib/python3.10/site-packages/pyspark/jars/spark-mllib_2.12-3.3.0.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40093) is kubernetes jar required if not using that executor?

2022-08-15 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated SPARK-40093:
-
Description: 
my docker file is very big with pyspark

can i remove dis files below if i don't use 'kubernetes executor'?

11M     total
4.0M    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-core-5.12.2.jar
840K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-client-5.12.2.jar
760K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-admissionregistration-5.12.2.jar
704K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apiextensions-5.12.2.jar
640K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-autoscaling-5.12.2.jar
528K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-extensions-5.12.2.jar
516K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/spark-kubernetes_2.12-3.3.0.jar
456K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-networking-5.12.2.jar
436K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apps-5.12.2.jar
364K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-storageclass-5.12.2.jar
336K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-policy-5.12.2.jar
264K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-flowcontrol-5.12.2.jar
244K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-batch-5.12.2.jar
192K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-discovery-5.12.2.jar
176K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-rbac-5.12.2.jar
160K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-node-5.12.2.jar
144K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-certificates-5.12.2.jar
104K    
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-events-5.12.2.jar
80K     
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-metrics-5.12.2.jar
68K     
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-scheduling-5.12.2.jar
48K     
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-coordination-5.12.2.jar
20K     
/usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-common-5.12.2.jar

  was:
my docker file is very big with pyspark

can i remove dis files below if i don't use 'ML'?

14M     /usr/local/lib/python3.10/site-packages/pyspark/jars/breeze_2.12-1.2.jar

5.9M    
/usr/local/lib/python3.10/site-packages/pyspark/jars/spark-mllib_2.12-3.3.0.jar


> is kubernetes jar required if not using that executor?
> --
>
> Key: SPARK-40093
> URL: https://issues.apache.org/jira/browse/SPARK-40093
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 3.3.0
>Reporter: t oo
>Priority: Major
>
> my docker file is very big with pyspark
> can i remove dis files below if i don't use 'kubernetes executor'?
> 11M     total
> 4.0M    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-core-5.12.2.jar
> 840K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-client-5.12.2.jar
> 760K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-admissionregistration-5.12.2.jar
> 704K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apiextensions-5.12.2.jar
> 640K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-autoscaling-5.12.2.jar
> 528K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-extensions-5.12.2.jar
> 516K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/spark-kubernetes_2.12-3.3.0.jar
> 456K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-networking-5.12.2.jar
> 436K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-apps-5.12.2.jar
> 364K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-storageclass-5.12.2.jar
> 336K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-policy-5.12.2.jar
> 264K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-flowcontrol-5.12.2.jar
> 244K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-batch-5.12.2.jar
> 192K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-discovery-5.12.2.jar
> 176K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-rbac-5.12.2.jar
> 160K    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/kubernetes-model-node-5.12.2.jar
> 144K    
> /usr/local/lib/python3.10/site-

[jira] [Created] (SPARK-40092) is breeze required if not using ML?

2022-08-15 Thread t oo (Jira)
t oo created SPARK-40092:


 Summary: is breeze required if not using ML?
 Key: SPARK-40092
 URL: https://issues.apache.org/jira/browse/SPARK-40092
 Project: Spark
  Issue Type: Question
  Components: Deploy
Affects Versions: 3.3.0
Reporter: t oo


my docker file is very big with pyspark

can i remove dis file below if i don't use 'spark streaming'?

35M     
/usr/local/lib/python3.10/site-packages/pyspark/jars/rocksdbjni-6.20.3.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40092) is breeze required if not using ML?

2022-08-15 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated SPARK-40092:
-
Description: 
my docker file is very big with pyspark

can i remove dis files below if i don't use 'ML'?

14M     /usr/local/lib/python3.10/site-packages/pyspark/jars/breeze_2.12-1.2.jar

5.9M    
/usr/local/lib/python3.10/site-packages/pyspark/jars/spark-mllib_2.12-3.3.0.jar

  was:
my docker file is very big with pyspark

can i remove dis file below if i don't use 'spark streaming'?

35M     
/usr/local/lib/python3.10/site-packages/pyspark/jars/rocksdbjni-6.20.3.jar


> is breeze required if not using ML?
> ---
>
> Key: SPARK-40092
> URL: https://issues.apache.org/jira/browse/SPARK-40092
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 3.3.0
>Reporter: t oo
>Priority: Major
>
> my docker file is very big with pyspark
> can i remove dis files below if i don't use 'ML'?
> 14M     
> /usr/local/lib/python3.10/site-packages/pyspark/jars/breeze_2.12-1.2.jar
> 5.9M    
> /usr/local/lib/python3.10/site-packages/pyspark/jars/spark-mllib_2.12-3.3.0.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40091) is rocksdbjni required if not using streaming?

2022-08-15 Thread t oo (Jira)
t oo created SPARK-40091:


 Summary: is rocksdbjni required if not using streaming?
 Key: SPARK-40091
 URL: https://issues.apache.org/jira/browse/SPARK-40091
 Project: Spark
  Issue Type: Question
  Components: Deploy
Affects Versions: 3.3.0
Reporter: t oo


my docker file is very big with pyspark

can i remove dis file below if i don't use 'spark streaming'?

35M     
/usr/local/lib/python3.10/site-packages/pyspark/jars/rocksdbjni-6.20.3.jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579984#comment-17579984
 ] 

XiDuo You commented on SPARK-40089:
---

thank you [~revans2] for reporting the issue, I can reproduce it by:
{code:java}
SELECT cast(col1 as decimal(20,2)) as c FROM VALUES 
(99.50),(99.49),(1.11) ORDER BY c;

-- output:
99.50
1.11
99.49{code}
 

do you want send a pr to fix it ?

> Doring of at least Decimal(20, 2) fails for some values near the max.
> -
>
> Key: SPARK-40089
> URL: https://issues.apache.org/jira/browse/SPARK-40089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
> Attachments: input.parquet
>
>
> I have been doing some testing with Decimal values for the RAPIDS Accelerator 
> for Apache Spark. I have been trying to add in new corner cases and when I 
> tried to enable the maximum supported value for a sort I started to get 
> failures.  On closer inspection it looks like the CPU is sorting things 
> incorrectly.  Specifically anything that is "99.50" or above 
> is placed as a chunk in the wrong location in the outputs.
>  In local mode with 12 tasks.
> {code:java}
> spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println)
>  {code}
>  
> Here you will notice that the last entry printed is 
> {{[99.49]}}, and {{[99.99]}} is near the top 
> near {{[-99.99]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40009) Add missing doc string info to DataFrame API

2022-08-15 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40009.
--
Fix Version/s: 3.4.0
 Assignee: Khalid Mammadov
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37441

> Add missing doc string info to DataFrame API
> 
>
> Key: SPARK-40009
> URL: https://issues.apache.org/jira/browse/SPARK-40009
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Minor
> Fix For: 3.4.0
>
>
> Some of the docstrings in Python DataFrame API is not complete, for example 
> some missing Parameters section or Return or Examples. It would help users if 
> we can provide these missing infos for all methods/functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40077) Make pyspark.context examples self-contained

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40077:


Assignee: Ruifeng Zheng

> Make pyspark.context examples self-contained
> 
>
> Key: SPARK-40077
> URL: https://issues.apache.org/jira/browse/SPARK-40077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40077) Make pyspark.context examples self-contained

2022-08-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40077.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37517
[https://github.com/apache/spark/pull/37517]

> Make pyspark.context examples self-contained
> 
>
> Key: SPARK-40077
> URL: https://issues.apache.org/jira/browse/SPARK-40077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37442) In AQE, wrong InMemoryRelation size estimation causes "Cannot broadcast the table that is larger than 8GB: 8 GB" failure

2022-08-15 Thread EmmaYang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579969#comment-17579969
 ] 

EmmaYang commented on SPARK-37442:
--

Hello, I exactly have this issue. and I am usign spark2.4

so my broadcast dataframe is built on top of files.  and the oveall files size 
is > 12gb, but I only use the sub dataframe, and in STORAGE, it showed only 2.5 
GB,  but still give me the broadcast hit 8GB error

 

so any workaround solution for it ?

 

Thank you.

 

: : : +- ResolvedHint (broadcast) : : : +- Filter isnotnull(invlv_pty_id#5078) 
: : : +- InMemoryRelation [invlv_pty_id#5078, invlv_pty_id#5078], 
StorageLevel(disk, memory, deserialized, 1 replicas) : : : +- *(1) Project 
[invlv_pty_id#5078, invlv_pty_id#5078] : : : +- *(1) FileScan csv 
[invlv_pty_id#5078] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[hdfs://gftsdev/data/gfrrsnsd/standardization/hive/gfrrsnsd_standardization/trl_...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct : : +- ResolvedHint (broadcast) : : +- Filter 
isnotnull(invlv_pty_id#5078) : : +- InMemoryRelation [invlv_pty_id#5078, 
invlv_pty_id#5078], StorageLevel(disk, memory, deserialized, 1 replicas) : : +- 
*(1) Project [invlv_pty_id#5078, invlv_pty_id#5078] : : +- *(1) FileScan csv 
[invlv_pty_id#5078] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[hdfs://gftsdev/data/gfrrsnsd/standardization/hive/gfrrsnsd_standardization/trl_...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct : +- ResolvedHint (broadcast) : +- Filter 
isnotnull(invlv_pty_id#5078) : +- InMemoryRelation [invlv_pty_id#5078, 
invlv_pty_id#5078], StorageLevel(disk, memory, deserialized, 1 replicas) : +- 
*(1) Project [invlv_pty_id#5078, invlv_pty_id#5078] : +- *(1) FileScan csv 
[invlv_pty_id#5078] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[hdfs://gftsdev/data/gfrrsnsd/standardization/hive/gfrrsnsd_standardization/trl_...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct +- ResolvedHint (broadcast) +- Filter 
isnotnull(invlv_pty_id#5078) +- InMemoryRelation [invlv_pty_id#5078, 
invlv_pty_id#5078], StorageLevel(disk, memory, deserialized, 1 replicas) +- 
*(1) Project [invlv_pty_id#5078, invlv_pty_id#5078] +- *(1) FileScan csv 
[invlv_pty_id#5078] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[hdfs://gftsdev/data/gfrrsnsd/standardization/hive/gfrrsnsd_standardization/trl_...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct

> In AQE, wrong InMemoryRelation size estimation causes "Cannot broadcast the 
> table that is larger than 8GB: 8 GB" failure
> 
>
> Key: SPARK-37442
> URL: https://issues.apache.org/jira/browse/SPARK-37442
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer, SQL
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Michael Chen
>Assignee: Michael Chen
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> There is a period in time where an InMemoryRelation will have the cached 
> buffers loaded, but the statistics will be inaccurate (anywhere between 0 -> 
> size in bytes reported by accumulators). When AQE is enabled, it is possible 
> that join planning strategies will happen in this window. In this scenario, 
> join children sizes including InMemoryRelation are greatly underestimated and 
> a broadcast join can be planned when it shouldn't be. We have seen scenarios 
> where a broadcast join is planned with the builder size greater than 8GB 
> because at planning time, the optimizer believes the InMemoryRelation is 0 
> bytes.
> Here is an example test case where the broadcast threshold is being ignored. 
> It can mimic the 8GB error by increasing the size of the tables.
> {code:java}
> withSQLConf(
>   SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
>   SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1048584") {
>   // Spark estimates a string column as 20 bytes so with 60k rows, these 
> relations should be
>   // estimated at ~120m bytes which is greater than the broadcast join 
> threshold
>   Seq.fill(6)("a").toDF("key")
> .createOrReplaceTempView("temp")
>   Seq.fill(6)("b").toDF("key")
> .createOrReplaceTempView("temp2")
>   Seq("a").toDF("key").createOrReplaceTempView("smallTemp")
>   spark.sql("SELECT key as newKey FROM temp").persist()
>   val query =
>   s"""
>  |SELECT t3.newKey
>  |FROM
>  |  (SELECT t1.newKey
>  |  FROM (SELECT key as newKey FROM temp) as t1
>  |JOIN
>  |(SELECT key FROM smallTemp) as t2
>  |ON t1.newKey = t2.key
>  |  ) as t3
>  |  JOIN
>  |  (SELECT key FROM temp2) as t4
>  |  ON t3.newKey = t4.key
>  |UNION
>  |SELECT t1.newKey
>   

[jira] [Resolved] (SPARK-40066) ANSI mode: always return null on invalid access to map column

2022-08-15 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-40066.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37503
[https://github.com/apache/spark/pull/37503]

> ANSI mode: always return null on invalid access to map column
> -
>
> Key: SPARK-40066
> URL: https://issues.apache.org/jira/browse/SPARK-40066
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Since https://github.com/apache/spark/pull/30386, Spark always throws an 
> error on invalid access to a map column. There is no such syntax in the ANSI 
> SQL standard since there is no Map type in it. There is a similar type 
> `multiset` which returns null on non-existing element access.
> Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns 
> null return on map(json) key not exists.
> I suggest loosen the the syntax here. When users get the error, most of them 
> will just use `try_element_at()` to get the same syntax or just turn off the 
> ANSI SQL mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40090) Upgrade to Py4J 0.10.9.7

2022-08-15 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-40090.
--
Resolution: Duplicate

> Upgrade to Py4J 0.10.9.7
> 
>
> Key: SPARK-40090
> URL: https://issues.apache.org/jira/browse/SPARK-40090
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Upgrade to Py4J 0.10.9.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40090) Upgrade to Py4J 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40090:


Assignee: (was: Apache Spark)

> Upgrade to Py4J 0.10.9.7
> 
>
> Key: SPARK-40090
> URL: https://issues.apache.org/jira/browse/SPARK-40090
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Upgrade to Py4J 0.10.9.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40090) Upgrade to Py4J 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579930#comment-17579930
 ] 

Apache Spark commented on SPARK-40090:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37527

> Upgrade to Py4J 0.10.9.7
> 
>
> Key: SPARK-40090
> URL: https://issues.apache.org/jira/browse/SPARK-40090
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Upgrade to Py4J 0.10.9.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40090) Upgrade to Py4J 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40090:


Assignee: Apache Spark

> Upgrade to Py4J 0.10.9.7
> 
>
> Key: SPARK-40090
> URL: https://issues.apache.org/jira/browse/SPARK-40090
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Upgrade to Py4J 0.10.9.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40090) Upgrade to Py4J 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579929#comment-17579929
 ] 

Apache Spark commented on SPARK-40090:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37527

> Upgrade to Py4J 0.10.9.7
> 
>
> Key: SPARK-40090
> URL: https://issues.apache.org/jira/browse/SPARK-40090
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Upgrade to Py4J 0.10.9.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40090) Upgrade to Py4J 0.10.9.7

2022-08-15 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-40090:


 Summary: Upgrade to Py4J 0.10.9.7
 Key: SPARK-40090
 URL: https://issues.apache.org/jira/browse/SPARK-40090
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Upgrade to Py4J 0.10.9.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579897#comment-17579897
 ] 

Robert Joseph Evans commented on SPARK-40089:
-

Looking at the code it appears that the prefix calculator has an overflow bug 
in it.

{code}
if (value.changePrecision(p, s)) value.toUnscaledLong else Long.MinValue
{code}

We are rounding up when changing the precision and when that happens we fall 
back to {{Long.MinValue}}  a.k.a -9223372036854775808, which results in the 
failure. 

> Doring of at least Decimal(20, 2) fails for some values near the max.
> -
>
> Key: SPARK-40089
> URL: https://issues.apache.org/jira/browse/SPARK-40089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
> Attachments: input.parquet
>
>
> I have been doing some testing with Decimal values for the RAPIDS Accelerator 
> for Apache Spark. I have been trying to add in new corner cases and when I 
> tried to enable the maximum supported value for a sort I started to get 
> failures.  On closer inspection it looks like the CPU is sorting things 
> incorrectly.  Specifically anything that is "99.50" or above 
> is placed as a chunk in the wrong location in the outputs.
>  In local mode with 12 tasks.
> {code:java}
> spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println)
>  {code}
>  
> Here you will notice that the last entry printed is 
> {{[99.49]}}, and {{[99.99]}} is near the top 
> near {{[-99.99]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579892#comment-17579892
 ] 

Robert Joseph Evans commented on SPARK-40089:
-

It sure looks like it is related to the prefix calculator. I think it is 
overflowing some how. I added some debugging into 3.2.0 and I got back

{code}
22/08/15 20:17:58 ERROR SortExec: PREFIX FOR 99.99 IS false 
-9223372036854775808
{code}

The prefix should not be negative for non-negative values.


> Doring of at least Decimal(20, 2) fails for some values near the max.
> -
>
> Key: SPARK-40089
> URL: https://issues.apache.org/jira/browse/SPARK-40089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
> Attachments: input.parquet
>
>
> I have been doing some testing with Decimal values for the RAPIDS Accelerator 
> for Apache Spark. I have been trying to add in new corner cases and when I 
> tried to enable the maximum supported value for a sort I started to get 
> failures.  On closer inspection it looks like the CPU is sorting things 
> incorrectly.  Specifically anything that is "99.50" or above 
> is placed as a chunk in the wrong location in the outputs.
>  In local mode with 12 tasks.
> {code:java}
> spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println)
>  {code}
>  
> Here you will notice that the last entry printed is 
> {{[99.49]}}, and {{[99.99]}} is near the top 
> near {{[-99.99]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579887#comment-17579887
 ] 

Robert Joseph Evans commented on SPARK-40089:
-

I have been trying to debug this and it does not look like it is related to the 
partitioner. I can run with a single shuffle partition and I get the same 
results. Not sure if the prefix calculation is doing this or what.

> Doring of at least Decimal(20, 2) fails for some values near the max.
> -
>
> Key: SPARK-40089
> URL: https://issues.apache.org/jira/browse/SPARK-40089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
> Attachments: input.parquet
>
>
> I have been doing some testing with Decimal values for the RAPIDS Accelerator 
> for Apache Spark. I have been trying to add in new corner cases and when I 
> tried to enable the maximum supported value for a sort I started to get 
> failures.  On closer inspection it looks like the CPU is sorting things 
> incorrectly.  Specifically anything that is "99.50" or above 
> is placed as a chunk in the wrong location in the outputs.
>  In local mode with 12 tasks.
> {code:java}
> spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println)
>  {code}
>  
> Here you will notice that the last entry printed is 
> {{[99.49]}}, and {{[99.99]}} is near the top 
> near {{[-99.99]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated SPARK-40089:

Attachment: input.parquet

> Doring of at least Decimal(20, 2) fails for some values near the max.
> -
>
> Key: SPARK-40089
> URL: https://issues.apache.org/jira/browse/SPARK-40089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
> Attachments: input.parquet
>
>
> I have been doing some testing with Decimal values for the RAPIDS Accelerator 
> for Apache Spark. I have been trying to add in new corner cases and when I 
> tried to enable the maximum supported value for a sort I started to get 
> failures.  On closer inspection it looks like the CPU is sorting things 
> incorrectly.  Specifically anything that is "99.50" or above 
> is placed as a chunk in the wrong location in the outputs.
>  In local mode with 12 tasks.
> {code:java}
> spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println)
>  {code}
>  
> Here you will notice that the last entry printed is 
> {{[99.49]}}, and {{[99.99]}} is near the top 
> near {{[-99.99]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-40089:
---

 Summary: Doring of at least Decimal(20, 2) fails for some values 
near the max.
 Key: SPARK-40089
 URL: https://issues.apache.org/jira/browse/SPARK-40089
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0, 3.2.0, 3.4.0
Reporter: Robert Joseph Evans


I have been doing some testing with Decimal values for the RAPIDS Accelerator 
for Apache Spark. I have been trying to add in new corner cases and when I 
tried to enable the maximum supported value for a sort I started to get 
failures.  On closer inspection it looks like the CPU is sorting things 
incorrectly.  Specifically anything that is "99.50" or above is 
placed as a chunk in the wrong location in the outputs.

 In local mode with 12 tasks.
{code:java}
spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println)
 {code}
 

Here you will notice that the last entry printed is 
{{[99.49]}}, and {{[99.99]}} is near the top 
near {{[-99.99]}}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`

2022-08-15 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579845#comment-17579845
 ] 

Dongjoon Hyun commented on SPARK-3:
---

BTW, SPARK-35781 has more backgrounds.
- LevelDB, RocksDB, BrotliCodec had known issues on Apple Silicon.
- Only RocksDB community managed to fix it recently at RocksDB 7.0.3.
- However, since it's a big change from RocksDB 6 to RocksDB 7, SPARK-38257 
landed to only Apache Spark 3.4.0 in the community.

However, I'm using RocksDB 7 in the production internally.

> Add `RocksDBProvider` similar to `LevelDBProvider`
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and 
> `YarnShuffleService`, a corresponding `RocksDB` implementation should be added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`

2022-08-15 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579839#comment-17579839
 ] 

Dongjoon Hyun commented on SPARK-3:
---

Yes, [~tgraves]. LevelDB is too ancient and has compatibility issue on Apple 
Silicon which is tracked by SPARK-35782.

> Add `RocksDBProvider` similar to `LevelDBProvider`
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and 
> `YarnShuffleService`, a corresponding `RocksDB` implementation should be added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-15 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-40064.

Fix Version/s: 3.4.0
 Assignee: Huaxin Gao
   Resolution: Fixed

> Use V2 Filter in SupportsOverwrite
> --
>
> Key: SPARK-40064
> URL: https://issues.apache.org/jira/browse/SPARK-40064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
>  Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40088) Add SparkPlanWIthAQESuite

2022-08-15 Thread Kazuyuki Tanimura (Jira)
Kazuyuki Tanimura created SPARK-40088:
-

 Summary: Add SparkPlanWIthAQESuite
 Key: SPARK-40088
 URL: https://issues.apache.org/jira/browse/SPARK-40088
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.4.0
Reporter: Kazuyuki Tanimura


Currently `SparkPlanSuite` assumes that AQE is always turned off. We should 
also test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40087) Support multiple Column drop in R

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40087:


Assignee: Apache Spark

> Support multiple Column drop in R
> -
>
> Key: SPARK-40087
> URL: https://issues.apache.org/jira/browse/SPARK-40087
> Project: Spark
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Santosh Pingale
>Assignee: Apache Spark
>Priority: Minor
>
> This is a followup on SPARK-39895. The PR previously attempted to adjust 
> implementation for R as well to match signatures but that part was removed 
> and we only focused on getting python implementation to behave correctly.
> *{{Change supports following operations:}}*
> {{df <- select(read.json(jsonPath), "name", "age")}}
> {{df$age2 <- df$age}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
> {{df1 <- drop(df, list(df$age, column("random")))}}
> {{expect_equal(columns(df1), c("name", "age2"))}}
> {{df1 <- drop(df, list(df$age, df$name))}}
> {{expect_equal(columns(df1), c("age2"))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40087) Support multiple Column drop in R

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40087:


Assignee: (was: Apache Spark)

> Support multiple Column drop in R
> -
>
> Key: SPARK-40087
> URL: https://issues.apache.org/jira/browse/SPARK-40087
> Project: Spark
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Santosh Pingale
>Priority: Minor
>
> This is a followup on SPARK-39895. The PR previously attempted to adjust 
> implementation for R as well to match signatures but that part was removed 
> and we only focused on getting python implementation to behave correctly.
> *{{Change supports following operations:}}*
> {{df <- select(read.json(jsonPath), "name", "age")}}
> {{df$age2 <- df$age}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
> {{df1 <- drop(df, list(df$age, column("random")))}}
> {{expect_equal(columns(df1), c("name", "age2"))}}
> {{df1 <- drop(df, list(df$age, df$name))}}
> {{expect_equal(columns(df1), c("age2"))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40087) Support multiple Column drop in R

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579826#comment-17579826
 ] 

Apache Spark commented on SPARK-40087:
--

User 'santosh-d3vpl3x' has created a pull request for this issue:
https://github.com/apache/spark/pull/37526

> Support multiple Column drop in R
> -
>
> Key: SPARK-40087
> URL: https://issues.apache.org/jira/browse/SPARK-40087
> Project: Spark
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Santosh Pingale
>Priority: Minor
>
> This is a followup on SPARK-39895. The PR previously attempted to adjust 
> implementation for R as well to match signatures but that part was removed 
> and we only focused on getting python implementation to behave correctly.
> *{{Change supports following operations:}}*
> {{df <- select(read.json(jsonPath), "name", "age")}}
> {{df$age2 <- df$age}}
> {{df1 <- drop(df, df$age, df$name)}}
> {{expect_equal(columns(df1), c("age2"))}}
> {{df1 <- drop(df, list(df$age, column("random")))}}
> {{expect_equal(columns(df1), c("name", "age2"))}}
> {{df1 <- drop(df, list(df$age, df$name))}}
> {{expect_equal(columns(df1), c("age2"))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40087) Support multiple Column drop in R

2022-08-15 Thread Santosh Pingale (Jira)
Santosh Pingale created SPARK-40087:
---

 Summary: Support multiple Column drop in R
 Key: SPARK-40087
 URL: https://issues.apache.org/jira/browse/SPARK-40087
 Project: Spark
  Issue Type: New Feature
  Components: R
Affects Versions: 3.3.0
Reporter: Santosh Pingale


This is a followup on SPARK-39895. The PR previously attempted to adjust 
implementation for R as well to match signatures but that part was removed and 
we only focused on getting python implementation to behave correctly.

*{{Change supports following operations:}}*

{{df <- select(read.json(jsonPath), "name", "age")}}

{{df$age2 <- df$age}}

{{df1 <- drop(df, df$age, df$name)}}
{{expect_equal(columns(df1), c("age2"))}}

{{df1 <- drop(df, list(df$age, column("random")))}}
{{expect_equal(columns(df1), c("name", "age2"))}}

{{df1 <- drop(df, list(df$age, df$name))}}
{{expect_equal(columns(df1), c("age2"))}}

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40086) Improve AliasAwareOutputPartitioning to take all aliases into account

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40086:


Assignee: (was: Apache Spark)

> Improve AliasAwareOutputPartitioning to take all aliases into account
> -
>
> Key: SPARK-40086
> URL: https://issues.apache.org/jira/browse/SPARK-40086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>
> Currently AliasAwareOutputPartitioning takes only the last alias by aliased 
> expressions into account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40086) Improve AliasAwareOutputPartitioning to take all aliases into account

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579814#comment-17579814
 ] 

Apache Spark commented on SPARK-40086:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/37525

> Improve AliasAwareOutputPartitioning to take all aliases into account
> -
>
> Key: SPARK-40086
> URL: https://issues.apache.org/jira/browse/SPARK-40086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>
> Currently AliasAwareOutputPartitioning takes only the last alias by aliased 
> expressions into account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40086) Improve AliasAwareOutputPartitioning to take all aliases into account

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40086:


Assignee: Apache Spark

> Improve AliasAwareOutputPartitioning to take all aliases into account
> -
>
> Key: SPARK-40086
> URL: https://issues.apache.org/jira/browse/SPARK-40086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Major
>
> Currently AliasAwareOutputPartitioning takes only the last alias by aliased 
> expressions into account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40086) Improve AliasAwareOutputPartitioning to take all aliases into account

2022-08-15 Thread Peter Toth (Jira)
Peter Toth created SPARK-40086:
--

 Summary: Improve AliasAwareOutputPartitioning to take all aliases 
into account
 Key: SPARK-40086
 URL: https://issues.apache.org/jira/browse/SPARK-40086
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Peter Toth


Currently AliasAwareOutputPartitioning takes only the last alias by aliased 
expressions into account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579749#comment-17579749
 ] 

Apache Spark commented on SPARK-40085:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37524

> use INTERNAL_ERROR error class instead of IllegalStateException to indicate 
> bugs
> 
>
> Key: SPARK-40085
> URL: https://issues.apache.org/jira/browse/SPARK-40085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40085:


Assignee: Apache Spark

> use INTERNAL_ERROR error class instead of IllegalStateException to indicate 
> bugs
> 
>
> Key: SPARK-40085
> URL: https://issues.apache.org/jira/browse/SPARK-40085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40085:


Assignee: (was: Apache Spark)

> use INTERNAL_ERROR error class instead of IllegalStateException to indicate 
> bugs
> 
>
> Key: SPARK-40085
> URL: https://issues.apache.org/jira/browse/SPARK-40085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579748#comment-17579748
 ] 

Apache Spark commented on SPARK-40085:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37524

> use INTERNAL_ERROR error class instead of IllegalStateException to indicate 
> bugs
> 
>
> Key: SPARK-40085
> URL: https://issues.apache.org/jira/browse/SPARK-40085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40085) use INTERNAL_ERROR error class instead of IllegalStateException to indicate bugs

2022-08-15 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-40085:
---

 Summary: use INTERNAL_ERROR error class instead of 
IllegalStateException to indicate bugs
 Key: SPARK-40085
 URL: https://issues.apache.org/jira/browse/SPARK-40085
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`

2022-08-15 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579738#comment-17579738
 ] 

Thomas Graves commented on SPARK-3:
---

Just curious does rocksdb give us some particular benefit - performance or 
compatibility?  Is leveldb not support on apple silicon?  Just curious and 
would be good to record why we add support.

> Add `RocksDBProvider` similar to `LevelDBProvider`
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and 
> `YarnShuffleService`, a corresponding `RocksDB` implementation should be added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-15 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40058:


Assignee: ZiyueGuan

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: ZiyueGuan
>Assignee: ZiyueGuan
>Priority: Minor
> Fix For: 3.4.0
>
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-15 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40058.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37498

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: ZiyueGuan
>Priority: Minor
> Fix For: 3.4.0
>
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40035) Avoid apply filter twice when listing files

2022-08-15 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40035.
--
Resolution: Duplicate

> Avoid apply filter twice when listing files
> ---
>
> Key: SPARK-40035
> URL: https://issues.apache.org/jira/browse/SPARK-40035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: EdisonWang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39887) Expression transform error

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39887:
---

Assignee: Peter Toth

> Expression transform error
> --
>
> Key: SPARK-39887
> URL: https://issues.apache.org/jira/browse/SPARK-39887
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.2.2
>Reporter: zhuml
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.1.4
>
>
> {code:java}
> spark.sql(
>   """
> |select to_date(a) a, to_date(b) b from
> |(select  a, a as b from
> |(select to_date(a) a from
> | values ('2020-02-01') as t1(a)
> | group by to_date(a)) t3
> |union all
> |select a, b from
> |(select to_date(a) a, to_date(b) b from
> |values ('2020-01-01','2020-01-02') as t1(a, b)
> | group by to_date(a), to_date(b)) t4) t5
> |group by to_date(a), to_date(b)
> |""".stripMargin).show(){code}
> result is (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-01)
> expected (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-02)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39887) Expression transform error

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-39887:

Fix Version/s: 3.4.0
   3.3.1
   3.2.3

> Expression transform error
> --
>
> Key: SPARK-39887
> URL: https://issues.apache.org/jira/browse/SPARK-39887
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.2.2
>Reporter: zhuml
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.1.4, 3.4.0, 3.3.1, 3.2.3
>
>
> {code:java}
> spark.sql(
>   """
> |select to_date(a) a, to_date(b) b from
> |(select  a, a as b from
> |(select to_date(a) a from
> | values ('2020-02-01') as t1(a)
> | group by to_date(a)) t3
> |union all
> |select a, b from
> |(select to_date(a) a, to_date(b) b from
> |values ('2020-01-01','2020-01-02') as t1(a, b)
> | group by to_date(a), to_date(b)) t4) t5
> |group by to_date(a), to_date(b)
> |""".stripMargin).show(){code}
> result is (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-01)
> expected (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-02)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39887) Expression transform error

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39887.
-
Fix Version/s: 3.1.4
   Resolution: Fixed

Issue resolved by pull request 37496
[https://github.com/apache/spark/pull/37496]

> Expression transform error
> --
>
> Key: SPARK-39887
> URL: https://issues.apache.org/jira/browse/SPARK-39887
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.2.2
>Reporter: zhuml
>Priority: Major
> Fix For: 3.1.4
>
>
> {code:java}
> spark.sql(
>   """
> |select to_date(a) a, to_date(b) b from
> |(select  a, a as b from
> |(select to_date(a) a from
> | values ('2020-02-01') as t1(a)
> | group by to_date(a)) t3
> |union all
> |select a, b from
> |(select to_date(a) a, to_date(b) b from
> |values ('2020-01-01','2020-01-02') as t1(a, b)
> | group by to_date(a), to_date(b)) t4) t5
> |group by to_date(a), to_date(b)
> |""".stripMargin).show(){code}
> result is (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-01)
> expected (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-02)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39982) StructType.fromJson method missing documentation

2022-08-15 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39982.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37408

> StructType.fromJson method missing documentation
> 
>
> Key: SPARK-39982
> URL: https://issues.apache.org/jira/browse/SPARK-39982
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Trivial
> Fix For: 3.4.0
>
>
> StructType.fromJson method does not have any documentation. It would be good 
> to have one that explains how one can use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39982) StructType.fromJson method missing documentation

2022-08-15 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-39982:


Assignee: Khalid Mammadov

> StructType.fromJson method missing documentation
> 
>
> Key: SPARK-39982
> URL: https://issues.apache.org/jira/browse/SPARK-39982
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Trivial
>
> StructType.fromJson method does not have any documentation. It would be good 
> to have one that explains how one can use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40019) Refactor comment of ArrayType

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40019.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37453
[https://github.com/apache/spark/pull/37453]

> Refactor comment of ArrayType
> -
>
> Key: SPARK-40019
> URL: https://issues.apache.org/jira/browse/SPARK-40019
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.2.2
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0
>
>
> Now the parameter `containsNull` of ArrayType/MapType is so confused, need to 
> add comment 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40019) Refactor comment of ArrayType

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40019:
---

Assignee: angerszhu

> Refactor comment of ArrayType
> -
>
> Key: SPARK-40019
> URL: https://issues.apache.org/jira/browse/SPARK-40019
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.2.2
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Now the parameter `containsNull` of ArrayType/MapType is so confused, need to 
> add comment 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40073) Should Use `connector/${moduleName}` instead of `external/${moduleName}`

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40073.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37512
[https://github.com/apache/spark/pull/37512]

> Should Use `connector/${moduleName}` instead of `external/${moduleName}`
> 
>
> Key: SPARK-40073
> URL: https://issues.apache.org/jira/browse/SPARK-40073
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra, PySpark, Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> SPARK-38569 rename `external` top level dir to `connector`, but  
> `external/${moduleName}` is still used in documents instead of 
> `connector/${moduleName}`
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40083) Add shuffle index cache expire time policy to avoid unused continuous memory consumption

2022-08-15 Thread wangshengjie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579692#comment-17579692
 ] 

wangshengjie edited comment on SPARK-40083 at 8/15/22 1:03 PM:
---

Maybe i also should add expire policy for push merge shuffle manager 


was (Author: wangshengjie):
Maybe i should add expire policy for push merge shuffle manager

> Add shuffle index cache expire time policy to avoid unused continuous memory 
> consumption
> 
>
> Key: SPARK-40083
> URL: https://issues.apache.org/jira/browse/SPARK-40083
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Priority: Major
>
> In our production environment, we found some applicaitons finished already 
> about 2 days and its cache still in memory, so we could add guava cache 
> expire time policy to save memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40073) Should Use `connector/${moduleName}` instead of `external/${moduleName}`

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40073:
---

Assignee: Yang Jie

> Should Use `connector/${moduleName}` instead of `external/${moduleName}`
> 
>
> Key: SPARK-40073
> URL: https://issues.apache.org/jira/browse/SPARK-40073
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra, PySpark, Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> SPARK-38569 rename `external` top level dir to `connector`, but  
> `external/${moduleName}` is still used in documents instead of 
> `connector/${moduleName}`
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40083) Add shuffle index cache expire time policy to avoid unused continuous memory consumption

2022-08-15 Thread wangshengjie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579692#comment-17579692
 ] 

wangshengjie commented on SPARK-40083:
--

Maybe i should add expire policy for push merge shuffle manager

> Add shuffle index cache expire time policy to avoid unused continuous memory 
> consumption
> 
>
> Key: SPARK-40083
> URL: https://issues.apache.org/jira/browse/SPARK-40083
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.3.0
>Reporter: wangshengjie
>Priority: Major
>
> In our production environment, we found some applicaitons finished already 
> about 2 days and its cache still in memory, so we could add guava cache 
> expire time policy to save memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39976) NULL check in ArrayIntersect adds extraneous null from first param

2022-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39976.
-
Fix Version/s: 3.4.0
   3.3.1
 Assignee: angerszhu
   Resolution: Fixed

> NULL check in ArrayIntersect adds extraneous null from first param
> --
>
> Key: SPARK-39976
> URL: https://issues.apache.org/jira/browse/SPARK-39976
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Navin Kumar
>Assignee: angerszhu
>Priority: Major
>  Labels: correctness
> Fix For: 3.4.0, 3.3.1
>
>
> This is very likely a regression from SPARK-36829.
> When using {{array_intersect(a, b)}}, if the first parameter contains a 
> {{NULL}} value and the second one does not, an extraneous {{NULL}} is present 
> in the output. This also leads to {{array_intersect(a, b) != 
> array_intersect(b, a)}} which is incorrect as set intersection should be 
> commutative.
> Example using PySpark:
> {code:python}
> >>> a = [1, 2, 3]
> >>> b = [3, None, 5]
> >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
> >>> df.show()
> +-++
> |a|   b|
> +-++
> |[1, 2, 3]|[3, null, 5]|
> +-++
> >>> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |  [3]|
> +-+
> >>> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |[3, null]|
> +-+
> {code}
> Note that in the first case, {{a}} does not contain a {{NULL}}, and the final 
> output is correct: {{[3]}}. In the second case, since {{b}} does contain 
> {{NULL}} and is now the first parameter.
> The same behavior occurs in Scala when writing to Parquet:
> {code:scala}
> scala> val a = Array[java.lang.Integer](1, 2, null, 4)
> a: Array[Integer] = Array(1, 2, null, 4)
> scala> val b = Array[java.lang.Integer](4, 5, 6, 7)
> b: Array[Integer] = Array(4, 5, 6, 7)
> scala> val df = Seq((a, b)).toDF("a","b")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.write.parquet("/tmp/simple.parquet")
> scala> val df = spark.read.parquet("/tmp/simple.parquet")
> df: org.apache.spark.sql.DataFrame = [a: array, b: array]
> scala> df.show()
> +---++
> |  a|   b|
> +---++
> |[1, 2, null, 4]|[4, 5, 6, 7]|
> +---++
> scala> df.selectExpr("array_intersect(a,b)").show()
> +-+
> |array_intersect(a, b)|
> +-+
> |[null, 4]|
> +-+
> scala> df.selectExpr("array_intersect(b,a)").show()
> +-+
> |array_intersect(b, a)|
> +-+
> |  [4]|
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40084) Upgrade Py4J from 0.10.9.5 to 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579685#comment-17579685
 ] 

Apache Spark commented on SPARK-40084:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37523

> Upgrade Py4J from 0.10.9.5 to 0.10.9.7
> --
>
> Key: SPARK-40084
> URL: https://issues.apache.org/jira/browse/SPARK-40084
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> * Java side: Add support for Java 11/17
> Release note: https://www.py4j.org/changelog.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40084) Upgrade Py4J from 0.10.9.5 to 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40084:


Assignee: (was: Apache Spark)

> Upgrade Py4J from 0.10.9.5 to 0.10.9.7
> --
>
> Key: SPARK-40084
> URL: https://issues.apache.org/jira/browse/SPARK-40084
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> * Java side: Add support for Java 11/17
> Release note: https://www.py4j.org/changelog.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40084) Upgrade Py4J from 0.10.9.5 to 0.10.9.7

2022-08-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40084:


Assignee: Apache Spark

> Upgrade Py4J from 0.10.9.5 to 0.10.9.7
> --
>
> Key: SPARK-40084
> URL: https://issues.apache.org/jira/browse/SPARK-40084
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.4.0
>
>
> * Java side: Add support for Java 11/17
> Release note: https://www.py4j.org/changelog.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >