[jira] [Updated] (SPARK-37004) Job cancellation causes py4j errors on Jupyter due to pinned thread mode

2021-10-13 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-37004:
--
Description: 
Spark 3.2.0 turned on py4j pinned thread mode by default (SPARK-35303). 
However, in a jupyter notebook, after I cancel (interrupt) a long-running Spark 
job, the next Spark command will fail with some py4j errors. See attached 
notebook for repro.

Cannot reproduce the issue after I turn off pinned thread mode .

  was:
3.2.0 turned on py4j pinned thread mode by default. However, in a jupyter 
notebook, after I cancel (interrupt) a long-running Spark job, the next Spark 
command will fail with some py4j errors. See attached notebook for repro.

Cannot reproduce the issue after I turn off pinned thread mode .


> Job cancellation causes py4j errors on Jupyter due to pinned thread mode
> 
>
> Key: SPARK-37004
> URL: https://issues.apache.org/jira/browse/SPARK-37004
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xiangrui Meng
>Priority: Blocker
> Attachments: pinned.ipynb
>
>
> Spark 3.2.0 turned on py4j pinned thread mode by default (SPARK-35303). 
> However, in a jupyter notebook, after I cancel (interrupt) a long-running 
> Spark job, the next Spark command will fail with some py4j errors. See 
> attached notebook for repro.
> Cannot reproduce the issue after I turn off pinned thread mode .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37004) Job cancellation causes py4j errors on Jupyter due to pinned thread mode

2021-10-13 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-37004:
--
Issue Type: Bug  (was: Improvement)

> Job cancellation causes py4j errors on Jupyter due to pinned thread mode
> 
>
> Key: SPARK-37004
> URL: https://issues.apache.org/jira/browse/SPARK-37004
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xiangrui Meng
>Priority: Blocker
> Attachments: pinned.ipynb
>
>
> 3.2.0 turned on py4j pinned thread mode by default. However, in a jupyter 
> notebook, after I cancel (interrupt) a long-running Spark job, the next Spark 
> command will fail with some py4j errors. See attached notebook for repro.
> Cannot reproduce the issue after I turn off pinned thread mode .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37004) Job cancellation causes py4j errors on Jupyter due to pinned thread mode

2021-10-13 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-37004:
--
Attachment: pinned.ipynb

> Job cancellation causes py4j errors on Jupyter due to pinned thread mode
> 
>
> Key: SPARK-37004
> URL: https://issues.apache.org/jira/browse/SPARK-37004
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xiangrui Meng
>Priority: Blocker
> Attachments: pinned.ipynb
>
>
> 3.2.0 turned on py4j pinned thread mode by default. However, in a jupyter 
> notebook, after I cancel (interrupt) a long-running Spark job, the next Spark 
> command will fail with some py4j errors. See attached notebook for repro.
> Cannot reproduce the issue after I turn off pinned thread mode .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37004) Job cancellation causes py4j errors on Jupyter due to pinned thread mode

2021-10-13 Thread Xiangrui Meng (Jira)
Xiangrui Meng created SPARK-37004:
-

 Summary: Job cancellation causes py4j errors on Jupyter due to 
pinned thread mode
 Key: SPARK-37004
 URL: https://issues.apache.org/jira/browse/SPARK-37004
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xiangrui Meng


3.2.0 turned on py4j pinned thread mode by default. However, in a jupyter 
notebook, after I cancel (interrupt) a long-running Spark job, the next Spark 
command will fail with some py4j errors. See attached notebook for repro.

Cannot reproduce the issue after I turn off pinned thread mode .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-10-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35973:
---

Assignee: PengLei

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Major
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs and corresponding default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-10-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35973.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34232
[https://github.com/apache/spark/pull/34232]

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs and corresponding default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37003) Merge INSERT related docs

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37003:


Assignee: Apache Spark

> Merge INSERT related docs
> -
>
> Key: SPARK-37003
> URL: https://issues.apache.org/jira/browse/SPARK-37003
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Current insert doc have too many same content, merge insert into and overwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37003) Merge INSERT related docs

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37003:


Assignee: (was: Apache Spark)

> Merge INSERT related docs
> -
>
> Key: SPARK-37003
> URL: https://issues.apache.org/jira/browse/SPARK-37003
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current insert doc have too many same content, merge insert into and overwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37003) Merge INSERT related docs

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428600#comment-17428600
 ] 

Apache Spark commented on SPARK-37003:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34282

> Merge INSERT related docs
> -
>
> Key: SPARK-37003
> URL: https://issues.apache.org/jira/browse/SPARK-37003
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current insert doc have too many same content, merge insert into and overwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37002) Introduce the 'compute.check_identical_indices' option

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428599#comment-17428599
 ] 

Apache Spark commented on SPARK-37002:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34281

> Introduce the 'compute.check_identical_indices' option
> --
>
> Key: SPARK-37002
> URL: https://issues.apache.org/jira/browse/SPARK-37002
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-36968
> [https://github.com/apache/spark/pull/34235]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37002) Introduce the 'compute.check_identical_indices' option

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37002:


Assignee: Apache Spark

> Introduce the 'compute.check_identical_indices' option
> --
>
> Key: SPARK-37002
> URL: https://issues.apache.org/jira/browse/SPARK-37002
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-36968
> [https://github.com/apache/spark/pull/34235]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37002) Introduce the 'compute.check_identical_indices' option

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428598#comment-17428598
 ] 

Apache Spark commented on SPARK-37002:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34281

> Introduce the 'compute.check_identical_indices' option
> --
>
> Key: SPARK-37002
> URL: https://issues.apache.org/jira/browse/SPARK-37002
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-36968
> [https://github.com/apache/spark/pull/34235]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37002) Introduce the 'compute.check_identical_indices' option

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37002:


Assignee: (was: Apache Spark)

> Introduce the 'compute.check_identical_indices' option
> --
>
> Key: SPARK-37002
> URL: https://issues.apache.org/jira/browse/SPARK-37002
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-36968
> [https://github.com/apache/spark/pull/34235]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36858) Spark API to apply same function to multiple columns

2021-10-13 Thread Armand BERGES (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428596#comment-17428596
 ] 

Armand BERGES commented on SPARK-36858:
---

Honestly, I feel a little dumb to not to have think at it earlier ... 

I fix our implementation and it is way better ! :)

To be clear, our method is like that : 


{code:java}
def withColumns(cols: Seq[String], columnTransform: String => Column, 
nameTransform: String => String = identity): DataFrame = { 
  // See https://issues.apache.org/jira/browse/SPARK-36858 
  cols.foreach((colName: String) => df = df.withColumn(nameTransform(colName), 
columnTransform(colName))) 
  df 
}
{code}

I think the method signature could easily be improved, and we could discuss 
about it.

Based on your comment, this ticket could probably change in "Add a question in 
some Tutorial" to avoid some noobies to fell into the trap I mention.

Of course, if Spark implements this method with some nice API it could be more 
easier to avoid this trap :) 

> Spark API to apply same function to multiple columns
> 
>
> Key: SPARK-36858
> URL: https://issues.apache.org/jira/browse/SPARK-36858
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.1.2
>Reporter: Armand BERGES
>Priority: Minor
>
> Hi
> My team and I have regularly need to apply the same function to multiple 
> columns at once.
> For example, we want to remove all non alphanumerical characters to each 
> columns of our dataframes. 
> When we hit this use case first, some people in my team were using this kind 
> of code : 
> {code:java}
> val colListToClean =  ## Generate some list, could be very long.
> val dfToClean: DataFrame = ... ## This is the dataframe we want to clean
> def cleanFunction(colName: String): Column = ... ## Write some function to 
> manipulate column based on its name.
> val dfCleaned = colListToClean.foldLeft(dfToClean)((df, colName) => 
> df.withColumn(colName, cleanFunction(colName)){code}
> This kind of code when applied on a large set of columns overloaded our 
> driver (because a Dataframe is generated for each column to clean).
> Based on this issue, we developed some code to add two functions : 
>  * One to apply the same function to multiple columns
>  * One to rename multiple columns based on a Map. 
>  
> I wonder if your ever ask your team to add such kind of API ? If you did, had 
> you any kind of issue regarding the implementation ? If you didn't, is this 
> any idea you could add to Spark ? 
> Best regards, 
>  
> LvffY
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37003) Merge INSERT related docs

2021-10-13 Thread angerszhu (Jira)
angerszhu created SPARK-37003:
-

 Summary: Merge INSERT related docs
 Key: SPARK-37003
 URL: https://issues.apache.org/jira/browse/SPARK-37003
 Project: Spark
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.2.0
Reporter: angerszhu
 Fix For: 3.3.0


Current insert doc have too many same content, merge insert into and overwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37002) Introduce the 'compute.check_identical_indices' option

2021-10-13 Thread dch nguyen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dch nguyen updated SPARK-37002:
---
Summary: Introduce the 'compute.check_identical_indices' option  (was: 
Introduce the 'compute.check_identical_index' option)

> Introduce the 'compute.check_identical_indices' option
> --
>
> Key: SPARK-37002
> URL: https://issues.apache.org/jira/browse/SPARK-37002
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-36968
> [https://github.com/apache/spark/pull/34235]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37002) Introduce the 'compute.check_identical_index' option

2021-10-13 Thread dch nguyen (Jira)
dch nguyen created SPARK-37002:
--

 Summary: Introduce the 'compute.check_identical_index' option
 Key: SPARK-37002
 URL: https://issues.apache.org/jira/browse/SPARK-37002
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen


https://issues.apache.org/jira/browse/SPARK-36968

[https://github.com/apache/spark/pull/34235]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36994) Upgrade Apache Thrift

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428581#comment-17428581
 ] 

Apache Spark commented on SPARK-36994:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34280

> Upgrade Apache Thrift
> -
>
> Key: SPARK-36994
> URL: https://issues.apache.org/jira/browse/SPARK-36994
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.0.1
>Reporter: kaja girish
>Priority: Major
>
> *Image:*
>  * spark:3.0.1
> *Components Affected:* 
>  * Apache Thrift
> *Recommendation:*
>  * upgrade Apache Thrift 
> *CVE:*
>  
> |Component Name|Component Version Name|Vulnerability|Fixed version|
> |Apache Thrift|0.11.0-4.|CVE-2019-0205|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2019-0210|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2020-13949|0.14.1|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36994) Upgrade Apache Thrift

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428579#comment-17428579
 ] 

Apache Spark commented on SPARK-36994:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34280

> Upgrade Apache Thrift
> -
>
> Key: SPARK-36994
> URL: https://issues.apache.org/jira/browse/SPARK-36994
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.0.1
>Reporter: kaja girish
>Priority: Major
>
> *Image:*
>  * spark:3.0.1
> *Components Affected:* 
>  * Apache Thrift
> *Recommendation:*
>  * upgrade Apache Thrift 
> *CVE:*
>  
> |Component Name|Component Version Name|Vulnerability|Fixed version|
> |Apache Thrift|0.11.0-4.|CVE-2019-0205|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2019-0210|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2020-13949|0.14.1|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36994) Upgrade Apache Thrift

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36994:


Assignee: Apache Spark

> Upgrade Apache Thrift
> -
>
> Key: SPARK-36994
> URL: https://issues.apache.org/jira/browse/SPARK-36994
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.0.1
>Reporter: kaja girish
>Assignee: Apache Spark
>Priority: Major
>
> *Image:*
>  * spark:3.0.1
> *Components Affected:* 
>  * Apache Thrift
> *Recommendation:*
>  * upgrade Apache Thrift 
> *CVE:*
>  
> |Component Name|Component Version Name|Vulnerability|Fixed version|
> |Apache Thrift|0.11.0-4.|CVE-2019-0205|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2019-0210|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2020-13949|0.14.1|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36994) Upgrade Apache Thrift

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36994:


Assignee: (was: Apache Spark)

> Upgrade Apache Thrift
> -
>
> Key: SPARK-36994
> URL: https://issues.apache.org/jira/browse/SPARK-36994
> Project: Spark
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 3.0.1
>Reporter: kaja girish
>Priority: Major
>
> *Image:*
>  * spark:3.0.1
> *Components Affected:* 
>  * Apache Thrift
> *Recommendation:*
>  * upgrade Apache Thrift 
> *CVE:*
>  
> |Component Name|Component Version Name|Vulnerability|Fixed version|
> |Apache Thrift|0.11.0-4.|CVE-2019-0205|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2019-0210|0.13.0|
> |Apache Thrift|0.11.0-4.|CVE-2020-13949|0.14.1|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36546) Make unionByName null-filling behavior work with array of struct columns

2021-10-13 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-36546:

Affects Version/s: (was: 3.1.1)
   3.3.0

> Make unionByName null-filling behavior work with array of struct columns
> 
>
> Key: SPARK-36546
> URL: https://issues.apache.org/jira/browse/SPARK-36546
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Vishal Dhavale
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, unionByName workes with two DataFrames with slightly different 
> schemas. It would be good it works with an array of struct columns.
>  
> unionByName fails if we try to merge dataframe with an array of struct 
> columns with slightly different schema
> Below is the example.
> Step 1: dataframe arrayStructDf1 with columnbooksIntersted of type array of 
> struct
> {code:java}
> val arrayStructData = Seq(
>  Row("James",List(Row("Java","XX",120),Row("Scala","XA",300))),
>  Row("Lilly",List(Row("Java","XY",200),Row("Scala","XB",500
> val arrayStructSchema = new StructType().add("name",StringType)
>  .add("booksIntersted",ArrayType(new StructType()
>  .add("name",StringType)
>  .add("author",StringType)
>  .add("pages",IntegerType)))
> val arrayStructDf1 = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData),arrayStructSchema)
> arrayStructDf1.printSchema() 
> scala> arrayStructDf2.printSchema()
> root
>  |-- name: string (nullable = true)
>  |-- booksIntersted: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- name: string (nullable = true)
>  |||-- author: string (nullable = true)
>  |||-- pages: integer (nullable = true)
> {code}
>  
> Step 2: Another dataframe arrayStructDf2 with column booksIntersted of type 
> array of a struct but struct contains an extra field called "new_column"
> {code:java}
> val arrayStructData2 = Seq(
>  
> Row("James",List(Row("Java","XX",120,"new_column_data"),Row("Scala","XA",300,"new_column_data"))),
>  
> Row("Lilly",List(Row("Java","XY",200,"new_column_data"),Row("Scala","XB",500,"new_column_data"
> val arrayStructSchemaNewClm = new StructType().add("name",StringType)
>  .add("booksIntersted",ArrayType(new StructType()
>  .add("name",StringType)
>  .add("author",StringType)
>  .add("pages",IntegerType)
>  .add("new_column",StringType)))
> val arrayStructDf2 = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData2),arrayStructSchemaNewClm)
> arrayStructDf2.printSchema()
> scala> arrayStructDf2.printSchema()
> root
>  |-- name: string (nullable = true)
>  |-- booksIntersted: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- name: string (nullable = true)
>  |||-- author: string (nullable = true)
>  |||-- pages: integer (nullable = true)
>  |||-- new_column: string (nullable = true){code}
>  
> Step3:  Merge arrayStructDf1 and arrayStructDf2 using unionByName
> We see the error org.apache.spark.sql.AnalysisException: Union can only be 
> performed on tables with the compatible column types. 
> {code:java}
> scala> arrayStructDf1.unionByName(arrayStructDf2,allowMissingColumns=true)
> org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
> with the compatible column types. 
> array> <> 
> array> at the second column of 
> the second table;
> 'Union false, false
> :- LogicalRDD [name#183, booksIntersted#184], false
> +- Project [name#204, booksIntersted#205]
>  +- LogicalRDD [name#204, booksIntersted#205], false{code}
>  
> unionByName should fill the missing data with null like it does column with 
> struct type  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36984) Misleading Spark Streaming source documentation

2021-10-13 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428566#comment-17428566
 ] 

Hyukjin Kwon commented on SPARK-36984:
--

Yeah we should fix the docs. feel free to edit -  we don't have the support 
anymore in Dstream. Users should better migrate to Structured Streaming.

> Misleading Spark Streaming source documentation
> ---
>
> Key: SPARK-36984
> URL: https://issues.apache.org/jira/browse/SPARK-36984
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark, Structured Streaming
>Affects Versions: 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Lukáš
>Priority: Trivial
> Attachments: docs_highlight.png
>
>
> The documentation at 
> [https://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources]
>  clearly states that *Kafka* (and Kinesis) are available in the Python API v 
> 3.1.2 in *Spark Streaming (DStreams)*. (see attachments for highlight)
> However, there is no way to create DStream from Kafka in PySpark >= 3.0.0, as 
> the `kafka.py` file is missing in 
> [https://github.com/apache/spark/tree/master/python/pyspark/streaming]. I'm 
> coming from PySpark 2.4.4 where this was possible. _Should Kafka be excluded 
> as advanced source for spark streaming in Python API in the docs?_
>  
> Note that I'm aware of this Kafka integration guide 
> [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]
>  but I'm not interested in Structured Streaming as it doesn't support 
> arbitrary stateful operations in Python. DStreams support this functionality 
> with `updateStateByKey`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36546) Make unionByName null-filling behavior work with array of struct columns

2021-10-13 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-36546:
---

Assignee: Adam Binford

> Make unionByName null-filling behavior work with array of struct columns
> 
>
> Key: SPARK-36546
> URL: https://issues.apache.org/jira/browse/SPARK-36546
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Vishal Dhavale
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, unionByName workes with two DataFrames with slightly different 
> schemas. It would be good it works with an array of struct columns.
>  
> unionByName fails if we try to merge dataframe with an array of struct 
> columns with slightly different schema
> Below is the example.
> Step 1: dataframe arrayStructDf1 with columnbooksIntersted of type array of 
> struct
> {code:java}
> val arrayStructData = Seq(
>  Row("James",List(Row("Java","XX",120),Row("Scala","XA",300))),
>  Row("Lilly",List(Row("Java","XY",200),Row("Scala","XB",500
> val arrayStructSchema = new StructType().add("name",StringType)
>  .add("booksIntersted",ArrayType(new StructType()
>  .add("name",StringType)
>  .add("author",StringType)
>  .add("pages",IntegerType)))
> val arrayStructDf1 = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData),arrayStructSchema)
> arrayStructDf1.printSchema() 
> scala> arrayStructDf2.printSchema()
> root
>  |-- name: string (nullable = true)
>  |-- booksIntersted: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- name: string (nullable = true)
>  |||-- author: string (nullable = true)
>  |||-- pages: integer (nullable = true)
> {code}
>  
> Step 2: Another dataframe arrayStructDf2 with column booksIntersted of type 
> array of a struct but struct contains an extra field called "new_column"
> {code:java}
> val arrayStructData2 = Seq(
>  
> Row("James",List(Row("Java","XX",120,"new_column_data"),Row("Scala","XA",300,"new_column_data"))),
>  
> Row("Lilly",List(Row("Java","XY",200,"new_column_data"),Row("Scala","XB",500,"new_column_data"
> val arrayStructSchemaNewClm = new StructType().add("name",StringType)
>  .add("booksIntersted",ArrayType(new StructType()
>  .add("name",StringType)
>  .add("author",StringType)
>  .add("pages",IntegerType)
>  .add("new_column",StringType)))
> val arrayStructDf2 = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData2),arrayStructSchemaNewClm)
> arrayStructDf2.printSchema()
> scala> arrayStructDf2.printSchema()
> root
>  |-- name: string (nullable = true)
>  |-- booksIntersted: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- name: string (nullable = true)
>  |||-- author: string (nullable = true)
>  |||-- pages: integer (nullable = true)
>  |||-- new_column: string (nullable = true){code}
>  
> Step3:  Merge arrayStructDf1 and arrayStructDf2 using unionByName
> We see the error org.apache.spark.sql.AnalysisException: Union can only be 
> performed on tables with the compatible column types. 
> {code:java}
> scala> arrayStructDf1.unionByName(arrayStructDf2,allowMissingColumns=true)
> org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
> with the compatible column types. 
> array> <> 
> array> at the second column of 
> the second table;
> 'Union false, false
> :- LogicalRDD [name#183, booksIntersted#184], false
> +- Project [name#204, booksIntersted#205]
>  +- LogicalRDD [name#204, booksIntersted#205], false{code}
>  
> unionByName should fill the missing data with null like it does column with 
> struct type  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36546) Make unionByName null-filling behavior work with array of struct columns

2021-10-13 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-36546.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34246
[https://github.com/apache/spark/pull/34246]

> Make unionByName null-filling behavior work with array of struct columns
> 
>
> Key: SPARK-36546
> URL: https://issues.apache.org/jira/browse/SPARK-36546
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Vishal Dhavale
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, unionByName workes with two DataFrames with slightly different 
> schemas. It would be good it works with an array of struct columns.
>  
> unionByName fails if we try to merge dataframe with an array of struct 
> columns with slightly different schema
> Below is the example.
> Step 1: dataframe arrayStructDf1 with columnbooksIntersted of type array of 
> struct
> {code:java}
> val arrayStructData = Seq(
>  Row("James",List(Row("Java","XX",120),Row("Scala","XA",300))),
>  Row("Lilly",List(Row("Java","XY",200),Row("Scala","XB",500
> val arrayStructSchema = new StructType().add("name",StringType)
>  .add("booksIntersted",ArrayType(new StructType()
>  .add("name",StringType)
>  .add("author",StringType)
>  .add("pages",IntegerType)))
> val arrayStructDf1 = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData),arrayStructSchema)
> arrayStructDf1.printSchema() 
> scala> arrayStructDf2.printSchema()
> root
>  |-- name: string (nullable = true)
>  |-- booksIntersted: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- name: string (nullable = true)
>  |||-- author: string (nullable = true)
>  |||-- pages: integer (nullable = true)
> {code}
>  
> Step 2: Another dataframe arrayStructDf2 with column booksIntersted of type 
> array of a struct but struct contains an extra field called "new_column"
> {code:java}
> val arrayStructData2 = Seq(
>  
> Row("James",List(Row("Java","XX",120,"new_column_data"),Row("Scala","XA",300,"new_column_data"))),
>  
> Row("Lilly",List(Row("Java","XY",200,"new_column_data"),Row("Scala","XB",500,"new_column_data"
> val arrayStructSchemaNewClm = new StructType().add("name",StringType)
>  .add("booksIntersted",ArrayType(new StructType()
>  .add("name",StringType)
>  .add("author",StringType)
>  .add("pages",IntegerType)
>  .add("new_column",StringType)))
> val arrayStructDf2 = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData2),arrayStructSchemaNewClm)
> arrayStructDf2.printSchema()
> scala> arrayStructDf2.printSchema()
> root
>  |-- name: string (nullable = true)
>  |-- booksIntersted: array (nullable = true)
>  ||-- element: struct (containsNull = true)
>  |||-- name: string (nullable = true)
>  |||-- author: string (nullable = true)
>  |||-- pages: integer (nullable = true)
>  |||-- new_column: string (nullable = true){code}
>  
> Step3:  Merge arrayStructDf1 and arrayStructDf2 using unionByName
> We see the error org.apache.spark.sql.AnalysisException: Union can only be 
> performed on tables with the compatible column types. 
> {code:java}
> scala> arrayStructDf1.unionByName(arrayStructDf2,allowMissingColumns=true)
> org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
> with the compatible column types. 
> array> <> 
> array> at the second column of 
> the second table;
> 'Union false, false
> :- LogicalRDD [name#183, booksIntersted#184], false
> +- Project [name#204, booksIntersted#205]
>  +- LogicalRDD [name#204, booksIntersted#205], false{code}
>  
> unionByName should fill the missing data with null like it does column with 
> struct type  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36988) What ciphers spark support for internode communication?

2021-10-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36988.
--
Resolution: Invalid

> What ciphers spark support for internode communication?
> ---
>
> Key: SPARK-36988
> URL: https://issues.apache.org/jira/browse/SPARK-36988
> Project: Spark
>  Issue Type: Question
>  Components: Security
>Affects Versions: 3.1.2
>Reporter: zoli
>Priority: Minor
>
> {{Spark documentation mentions this:}}
>  {{[https://spark.apache.org/docs/3.0.0/security.html]}}
> {code:java}
> spark.network.crypto.config.*
> "Configuration values for the commons-crypto library, such as which cipher 
> implementations to use. The config name should be the name of commons-crypto 
> configuration without the commons.crypto prefix."{code}
> {{What this means?}}
>  {{If I leave it to None what will happen? There won't be any encryption used 
> or will it fallback to some default one?}}
>  {{The common-crypto mentions that it uses JCE or OPENSSL implementations, 
> but says nothing about the ciphers.}}
>  {{Does it support everything the given JVM does?}}
> {{The documentation is vague on this.}}
> {{However the spark ui part for the security is clear:}}
> {code:java}
> ${ns}.enabledAlgorithms
> A comma-separated list of ciphers. The specified ciphers must be supported by 
> JVM. The reference list of protocols can be found in the "JSSE Cipher Suite 
> Names" section of the Java security guide. The list for Java 8 can be found 
> at this page. Note: If not set, the default cipher suite for the JRE will be 
> used.{code}
> {{ }}
>  {{So what will happen if I leave spark.network.crypto.config.* to None?}}
>  {{And what ciphers are supported?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36988) What ciphers spark support for internode communication?

2021-10-13 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428564#comment-17428564
 ] 

Hyukjin Kwon commented on SPARK-36988:
--

For questions, let's interact with Spark mailing list instead of filing an 
issue as a JIRA. You would be able to have a better answer there.

> What ciphers spark support for internode communication?
> ---
>
> Key: SPARK-36988
> URL: https://issues.apache.org/jira/browse/SPARK-36988
> Project: Spark
>  Issue Type: Question
>  Components: Security
>Affects Versions: 3.1.2
>Reporter: zoli
>Priority: Minor
>
> {{Spark documentation mentions this:}}
>  {{[https://spark.apache.org/docs/3.0.0/security.html]}}
> {code:java}
> spark.network.crypto.config.*
> "Configuration values for the commons-crypto library, such as which cipher 
> implementations to use. The config name should be the name of commons-crypto 
> configuration without the commons.crypto prefix."{code}
> {{What this means?}}
>  {{If I leave it to None what will happen? There won't be any encryption used 
> or will it fallback to some default one?}}
>  {{The common-crypto mentions that it uses JCE or OPENSSL implementations, 
> but says nothing about the ciphers.}}
>  {{Does it support everything the given JVM does?}}
> {{The documentation is vague on this.}}
> {{However the spark ui part for the security is clear:}}
> {code:java}
> ${ns}.enabledAlgorithms
> A comma-separated list of ciphers. The specified ciphers must be supported by 
> JVM. The reference list of protocols can be found in the "JSSE Cipher Suite 
> Names" section of the Java security guide. The list for Java 8 can be found 
> at this page. Note: If not set, the default cipher suite for the JRE will be 
> used.{code}
> {{ }}
>  {{So what will happen if I leave spark.network.crypto.config.* to None?}}
>  {{And what ciphers are supported?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36999) Document the command ALTER TABLE RECOVER PARTITIONS

2021-10-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36999:
-
Labels: starter  (was: Starter starter)

> Document the command ALTER TABLE RECOVER PARTITIONS
> ---
>
> Key: SPARK-36999
> URL: https://issues.apache.org/jira/browse/SPARK-36999
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: starter
>
> Update the page 
> [https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-alter-table.html,] 
> and document the command ALTER TABLE RECOVER PARTITIONS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428557#comment-17428557
 ] 

Apache Spark commented on SPARK-36993:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/34279

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.1.3, 3.2.1, 3.3.0
>
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet

2021-10-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36646:
-
Summary: Push down group by partition column for Aggregate (Min/Max/Count) 
for Parquet  (was: push down group by partition column for Aggregate 
(Min/Max/Count) for Parquet)

> Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
> -
>
> Key: SPARK-36646
> URL: https://issues.apache.org/jira/browse/SPARK-36646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> If Aggregate (Min/Max/Count) in parquet is group by partition column, push 
> down group by



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37001:


Assignee: (was: Apache Spark)

> Disable two level of map for final hash aggregation by default
> --
>
> Key: SPARK-37001
> URL: https://issues.apache.org/jira/browse/SPARK-37001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Priority: Minor
>
> This JIRA is to disable two level of maps for final hash aggregation by 
> default. The feature was introduced in 
> [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead 
> to query performance regression when the final aggregation gets rows with a 
> lot of distinct keys. The 1st level hash map is full so a lot of rows will 
> waste the 1st hash map lookup and inserted into 2nd hash map. This feature 
> still benefits query with not so many distinct keys though, so introducing a 
> config to allow query to enable the feature when seeing benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37001:


Assignee: Apache Spark

> Disable two level of map for final hash aggregation by default
> --
>
> Key: SPARK-37001
> URL: https://issues.apache.org/jira/browse/SPARK-37001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> This JIRA is to disable two level of maps for final hash aggregation by 
> default. The feature was introduced in 
> [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead 
> to query performance regression when the final aggregation gets rows with a 
> lot of distinct keys. The 1st level hash map is full so a lot of rows will 
> waste the 1st hash map lookup and inserted into 2nd hash map. This feature 
> still benefits query with not so many distinct keys though, so introducing a 
> config to allow query to enable the feature when seeing benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428529#comment-17428529
 ] 

Apache Spark commented on SPARK-37001:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/34270

> Disable two level of map for final hash aggregation by default
> --
>
> Key: SPARK-37001
> URL: https://issues.apache.org/jira/browse/SPARK-37001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Priority: Minor
>
> This JIRA is to disable two level of maps for final hash aggregation by 
> default. The feature was introduced in 
> [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead 
> to query performance regression when the final aggregation gets rows with a 
> lot of distinct keys. The 1st level hash map is full so a lot of rows will 
> waste the 1st hash map lookup and inserted into 2nd hash map. This feature 
> still benefits query with not so many distinct keys though, so introducing a 
> config to allow query to enable the feature when seeing benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-13 Thread Cheng Su (Jira)
Cheng Su created SPARK-37001:


 Summary: Disable two level of map for final hash aggregation by 
default
 Key: SPARK-37001
 URL: https://issues.apache.org/jira/browse/SPARK-37001
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Cheng Su


This JIRA is to disable two level of maps for final hash aggregation by 
default. The feature was introduced in 
[#32242|https://github.com/apache/spark/pull/32242] and we found it can lead to 
query performance regression when the final aggregation gets rows with a lot of 
distinct keys. The 1st level hash map is full so a lot of rows will waste the 
1st hash map lookup and inserted into 2nd hash map. This feature still benefits 
query with not so many distinct keys though, so introducing a config to allow 
query to enable the feature when seeing benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36990) Long columns cannot read columns with INT32 type in the parquet file

2021-10-13 Thread Catalin Toda (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Catalin Toda resolved SPARK-36990.
--
Resolution: Duplicate

> Long columns cannot read columns with INT32 type in the parquet file
> 
>
> Key: SPARK-36990
> URL: https://issues.apache.org/jira/browse/SPARK-36990
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Catalin Toda
>Priority: Major
>
> The code below does not work on both Spark 3.1 and Spark 3.2.
> Part of the issue is the fact that the fileSchema has logicalTypeAnnotation 
> == null 
> ([https://github.com/apache/spark/blob/5013171fd36e6221a540c801cb7fd9e298a6b5ba/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L92)]
>  which makes isUnsignedTypeMatched return false always:
> [https://github.com/apache/spark/blob/5b2f1912280e7a5afb92a96b894a7bc5f263aa6e/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java#L180]
>  
> I am not sure even if logicalTypeAnnotation would not be null if 
> isUnsignedTypeMatched is supposed to return true for this use case.
> Python repro:
> {code:java}
> import os
> from pyspark.sql.functions import *
> from pyspark.sql import SparkSession
> from pyspark.sql.types import *
> spark = SparkSession.builder \
> .config("spark.hadoop.fs.s3.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
> .config("spark.hadoop.fs.AbstractFileSystem.s3.impl", 
> "org.apache.hadoop.fs.s3a.S3A") \
> .getOrCreate()
> df = 
> spark.createDataFrame([(1,2),(2,3)],StructType([StructField("id",IntegerType(),True),StructField("id2",IntegerType(),True)])).select("id")
> df.write.mode("overwrite").parquet("s3://bucket/test")
> df=spark.read.schema(StructType([StructField("id",LongType(),True)])).parquet("s3://bucket/test")
> df.show(1, False)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37000:


Assignee: (was: Apache Spark)

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37000:


Assignee: Apache Spark

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428523#comment-17428523
 ] 

Apache Spark commented on SPARK-37000:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/34278

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37000:


Assignee: Apache Spark

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428524#comment-17428524
 ] 

Apache Spark commented on SPARK-37000:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/34278

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35640) Refactor Parquet vectorized reader to remove duplicated code paths

2021-10-13 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428522#comment-17428522
 ] 

Chao Sun commented on SPARK-35640:
--

[~catalinii] this change seems unrelated since it's only in Spark 3.2.0, but 
you mentioned the issue also happens in Spark 3.1.2. The issue seems to be also 
well-known, see SPARK-16544.

> Refactor Parquet vectorized reader to remove duplicated code paths
> --
>
> Key: SPARK-35640
> URL: https://issues.apache.org/jira/browse/SPARK-35640
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently in Parquet vectorized code path, there are many code duplications 
> such as the following:
> {code:java}
>   public void readIntegers(
>   int total,
>   WritableColumnVector c,
>   int rowId,
>   int level,
>   VectorizedValuesReader data) throws IOException {
> int left = total;
> while (left > 0) {
>   if (this.currentCount == 0) this.readNextGroup();
>   int n = Math.min(left, this.currentCount);
>   switch (mode) {
> case RLE:
>   if (currentValue == level) {
> data.readIntegers(n, c, rowId);
>   } else {
> c.putNulls(rowId, n);
>   }
>   break;
> case PACKED:
>   for (int i = 0; i < n; ++i) {
> if (currentBuffer[currentBufferIdx++] == level) {
>   c.putInt(rowId + i, data.readInteger());
> } else {
>   c.putNull(rowId + i);
> }
>   }
>   break;
>   }
>   rowId += n;
>   left -= n;
>   currentCount -= n;
> }
>   }
> {code}
> This makes it hard to maintain as any change on this will need to be 
> replicated in 20+ places. The issue becomes more serious when we are going to 
> implement column index and complex type support for the vectorized path.
> The original intention is for performance. However now days JIT compilers 
> tend to be smart on this and will inline virtual calls as much as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35640) Refactor Parquet vectorized reader to remove duplicated code paths

2021-10-13 Thread Catalin Toda (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428520#comment-17428520
 ] 

Catalin Toda commented on SPARK-35640:
--

I opened https://issues.apache.org/jira/browse/SPARK-36990 which seems to be 
related to this change as well.
It seems that the logicalTypeAnnotation is null in my environment. This PR 
proposes relying on logicalTypeAnnotation which would then always return false.

> Refactor Parquet vectorized reader to remove duplicated code paths
> --
>
> Key: SPARK-35640
> URL: https://issues.apache.org/jira/browse/SPARK-35640
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently in Parquet vectorized code path, there are many code duplications 
> such as the following:
> {code:java}
>   public void readIntegers(
>   int total,
>   WritableColumnVector c,
>   int rowId,
>   int level,
>   VectorizedValuesReader data) throws IOException {
> int left = total;
> while (left > 0) {
>   if (this.currentCount == 0) this.readNextGroup();
>   int n = Math.min(left, this.currentCount);
>   switch (mode) {
> case RLE:
>   if (currentValue == level) {
> data.readIntegers(n, c, rowId);
>   } else {
> c.putNulls(rowId, n);
>   }
>   break;
> case PACKED:
>   for (int i = 0; i < n; ++i) {
> if (currentBuffer[currentBufferIdx++] == level) {
>   c.putInt(rowId + i, data.readInteger());
> } else {
>   c.putNull(rowId + i);
> }
>   }
>   break;
>   }
>   rowId += n;
>   left -= n;
>   currentCount -= n;
> }
>   }
> {code}
> This makes it hard to maintain as any change on this will need to be 
> replicated in 20+ places. The issue becomes more serious when we are going to 
> implement column index and complex type support for the vectorized path.
> The original intention is for performance. However now days JIT compilers 
> tend to be smart on this and will inline virtual calls as much as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37000:
--
Description: Add type hints for python/pyspark/sql/utils.py.

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37000:
-

 Summary: Add type hints to python/pyspark/sql/util.py
 Key: SPARK-37000
 URL: https://issues.apache.org/jira/browse/SPARK-37000
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-13 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428475#comment-17428475
 ] 

Takuya Ueshin commented on SPARK-37000:
---

I'm working on this.

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36982) Migrate SHOW NAMESPACES to use V2 command by default

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-36982.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34255
[https://github.com/apache/spark/pull/34255]

> Migrate SHOW NAMESPACES to use V2 command by default
> 
>
> Key: SPARK-36982
> URL: https://issues.apache.org/jira/browse/SPARK-36982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW NAMESPACES to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36982) Migrate SHOW NAMESPACES to use V2 command by default

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-36982:


Assignee: Terry Kim

> Migrate SHOW NAMESPACES to use V2 command by default
> 
>
> Key: SPARK-36982
> URL: https://issues.apache.org/jira/browse/SPARK-36982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> Migrate SHOW NAMESPACES to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36991) Inline type hints for spark/python/pyspark/sql/streaming.py

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36991:


Assignee: (was: Apache Spark)

> Inline type hints for spark/python/pyspark/sql/streaming.py
> ---
>
> Key: SPARK-36991
> URL: https://issues.apache.org/jira/browse/SPARK-36991
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Inline type hints for spark/python/pyspark/sql/streaming.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36991) Inline type hints for spark/python/pyspark/sql/streaming.py

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428444#comment-17428444
 ] 

Apache Spark commented on SPARK-36991:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34277

> Inline type hints for spark/python/pyspark/sql/streaming.py
> ---
>
> Key: SPARK-36991
> URL: https://issues.apache.org/jira/browse/SPARK-36991
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Inline type hints for spark/python/pyspark/sql/streaming.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36991) Inline type hints for spark/python/pyspark/sql/streaming.py

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36991:


Assignee: Apache Spark

> Inline type hints for spark/python/pyspark/sql/streaming.py
> ---
>
> Key: SPARK-36991
> URL: https://issues.apache.org/jira/browse/SPARK-36991
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Inline type hints for spark/python/pyspark/sql/streaming.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36999) Document the command ALTER TABLE RECOVER PARTITIONS

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-36999:
-
Labels: Starter starter  (was: )

> Document the command ALTER TABLE RECOVER PARTITIONS
> ---
>
> Key: SPARK-36999
> URL: https://issues.apache.org/jira/browse/SPARK-36999
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: Starter, starter
>
> Update the page 
> [https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-alter-table.html,] 
> and document the command ALTER TABLE RECOVER PARTITIONS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36999) Document the command ALTER TABLE RECOVER PARTITIONS

2021-10-13 Thread Max Gekk (Jira)
Max Gekk created SPARK-36999:


 Summary: Document the command ALTER TABLE RECOVER PARTITIONS
 Key: SPARK-36999
 URL: https://issues.apache.org/jira/browse/SPARK-36999
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk


Update the page 
[https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-alter-table.html,] and 
document the command ALTER TABLE RECOVER PARTITIONS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36998) Handle concurrent eviction of same application in SHS

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428384#comment-17428384
 ] 

Apache Spark commented on SPARK-36998:
--

User 'thejdeep' has created a pull request for this issue:
https://github.com/apache/spark/pull/34276

> Handle concurrent eviction of same application in SHS
> -
>
> Key: SPARK-36998
> URL: https://issues.apache.org/jira/browse/SPARK-36998
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Thejdeep Gudivada
>Priority: Minor
>
> SHS throws this exception when trying to make room for parsing of a log file. 
> Reason for this is - there is a race condition to make space for processing 
> of two log files and the deleteDirectory method is overlapping.
> {code:java}
> 21/10/13 09:13:54 INFO HistoryServerDiskManager: Lease of 49.0 KiB may cause 
> usage to exceed max (101.7 GiB > 100.0 GiB) 21/10/13 09:13:54 WARN 
> HttpChannel: handleException 
> /api/v1/applications/application_1632281309592_2767775/1/jobs 
> java.io.IOException : Unable to delete directory 
> /grid/spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb.
>  21/10/13 09:13:54 WARN HttpChannelState: unhandled due to prior sendError 
> javax.servlet.ServletException: 
> org.glassfish.jersey.server.ContainerException: java.io.IOException: Unable 
> to delete directory /grid 
> /spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb. at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:516) at 
> org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
>  at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) 
> at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:380) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at 
> org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36998) Handle concurrent eviction of same application in SHS

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36998:


Assignee: Apache Spark

> Handle concurrent eviction of same application in SHS
> -
>
> Key: SPARK-36998
> URL: https://issues.apache.org/jira/browse/SPARK-36998
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Thejdeep Gudivada
>Assignee: Apache Spark
>Priority: Minor
>
> SHS throws this exception when trying to make room for parsing of a log file. 
> Reason for this is - there is a race condition to make space for processing 
> of two log files and the deleteDirectory method is overlapping.
> {code:java}
> 21/10/13 09:13:54 INFO HistoryServerDiskManager: Lease of 49.0 KiB may cause 
> usage to exceed max (101.7 GiB > 100.0 GiB) 21/10/13 09:13:54 WARN 
> HttpChannel: handleException 
> /api/v1/applications/application_1632281309592_2767775/1/jobs 
> java.io.IOException : Unable to delete directory 
> /grid/spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb.
>  21/10/13 09:13:54 WARN HttpChannelState: unhandled due to prior sendError 
> javax.servlet.ServletException: 
> org.glassfish.jersey.server.ContainerException: java.io.IOException: Unable 
> to delete directory /grid 
> /spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb. at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:516) at 
> org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
>  at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) 
> at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:380) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at 
> org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36998) Handle concurrent eviction of same application in SHS

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36998:


Assignee: (was: Apache Spark)

> Handle concurrent eviction of same application in SHS
> -
>
> Key: SPARK-36998
> URL: https://issues.apache.org/jira/browse/SPARK-36998
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Thejdeep Gudivada
>Priority: Minor
>
> SHS throws this exception when trying to make room for parsing of a log file. 
> Reason for this is - there is a race condition to make space for processing 
> of two log files and the deleteDirectory method is overlapping.
> {code:java}
> 21/10/13 09:13:54 INFO HistoryServerDiskManager: Lease of 49.0 KiB may cause 
> usage to exceed max (101.7 GiB > 100.0 GiB) 21/10/13 09:13:54 WARN 
> HttpChannel: handleException 
> /api/v1/applications/application_1632281309592_2767775/1/jobs 
> java.io.IOException : Unable to delete directory 
> /grid/spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb.
>  21/10/13 09:13:54 WARN HttpChannelState: unhandled due to prior sendError 
> javax.servlet.ServletException: 
> org.glassfish.jersey.server.ContainerException: java.io.IOException: Unable 
> to delete directory /grid 
> /spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb. at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:516) at 
> org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
>  at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) 
> at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:380) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at 
> org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36998) Handle concurrent eviction of same application in SHS

2021-10-13 Thread Thejdeep Gudivada (Jira)
Thejdeep Gudivada created SPARK-36998:
-

 Summary: Handle concurrent eviction of same application in SHS
 Key: SPARK-36998
 URL: https://issues.apache.org/jira/browse/SPARK-36998
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Thejdeep Gudivada


SHS throws this exception when trying to make room for parsing of a log file. 
Reason for this is - there is a race condition to make space for processing of 
two log files and the deleteDirectory method is overlapping.
{code:java}
21/10/13 09:13:54 INFO HistoryServerDiskManager: Lease of 49.0 KiB may cause 
usage to exceed max (101.7 GiB > 100.0 GiB) 21/10/13 09:13:54 WARN HttpChannel: 
handleException /api/v1/applications/application_1632281309592_2767775/1/jobs 
java.io.IOException : Unable to delete directory 
/grid/spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb. 
21/10/13 09:13:54 WARN HttpChannelState: unhandled due to prior sendError 
javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: 
java.io.IOException: Unable to delete directory /grid 
/spark/sparkhistory-leveldb/apps/application_1631288241341_3657151_1.ldb. at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) at 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
 at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
 at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
 at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
 at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
 at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
 at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
 at org.sparkproject.jetty.server.Server.handle(Server.java:516) at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) 
at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) at 
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:380) at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
 at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
 at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at 
org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32161) Hide JVM traceback for SparkUpgradeException

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32161:


Assignee: (was: Apache Spark)

> Hide JVM traceback for SparkUpgradeException
> 
>
> Key: SPARK-32161
> URL: https://issues.apache.org/jira/browse/SPARK-32161
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We added {{SparkUpgradeException}} which the JVM traceback is pretty useless. 
> See also https://github.com/apache/spark/pull/28736/files#r449184881
> It should better also whitelist and hide JVM traceback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32161) Hide JVM traceback for SparkUpgradeException

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32161:


Assignee: Apache Spark

> Hide JVM traceback for SparkUpgradeException
> 
>
> Key: SPARK-32161
> URL: https://issues.apache.org/jira/browse/SPARK-32161
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> We added {{SparkUpgradeException}} which the JVM traceback is pretty useless. 
> See also https://github.com/apache/spark/pull/28736/files#r449184881
> It should better also whitelist and hide JVM traceback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32161) Hide JVM traceback for SparkUpgradeException

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428345#comment-17428345
 ] 

Apache Spark commented on SPARK-32161:
--

User 'pralabhkumar' has created a pull request for this issue:
https://github.com/apache/spark/pull/34275

> Hide JVM traceback for SparkUpgradeException
> 
>
> Key: SPARK-32161
> URL: https://issues.apache.org/jira/browse/SPARK-32161
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We added {{SparkUpgradeException}} which the JVM traceback is pretty useless. 
> See also https://github.com/apache/spark/pull/28736/files#r449184881
> It should better also whitelist and hide JVM traceback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-36993:
-
Fix Version/s: (was: 3.2.0)
   3.2.1

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.1.3, 3.2.1, 3.3.0
>
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-36993:
-
Fix Version/s: 3.3.0

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.3.0
>
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36949) Fix CREATE TABLE AS SELECT of ANSI intervals

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-36949.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34259
[https://github.com/apache/spark/pull/34259]

> Fix CREATE TABLE AS SELECT of ANSI intervals
> 
>
> Key: SPARK-36949
> URL: https://issues.apache.org/jira/browse/SPARK-36949
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> The given SQL should work:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 STORED AS PARQUET AS SELECT INTERVAL '1-1' YEAR 
> TO MONTH AS YM;
> 21/10/07 21:35:59 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
> 'interval year to month' but 'interval year to month' is found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36949) Fix CREATE TABLE AS SELECT of ANSI intervals

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-36949:


Assignee: Max Gekk

> Fix CREATE TABLE AS SELECT of ANSI intervals
> 
>
> Key: SPARK-36949
> URL: https://issues.apache.org/jira/browse/SPARK-36949
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The given SQL should work:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 STORED AS PARQUET AS SELECT INTERVAL '1-1' YEAR 
> TO MONTH AS YM;
> 21/10/07 21:35:59 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
> 'interval year to month' but 'interval year to month' is found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-36993.
--
Fix Version/s: 3.1.3
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 34268
[https://github.com/apache/spark/pull/34268]

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-36993:


Assignee: XiDuo You

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428301#comment-17428301
 ] 

Apache Spark commented on SPARK-36868:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34274

> Migrate CreateFunctionStatement to v2 command framework
> ---
>
> Key: SPARK-36868
> URL: https://issues.apache.org/jira/browse/SPARK-36868
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428300#comment-17428300
 ] 

Apache Spark commented on SPARK-36868:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34274

> Migrate CreateFunctionStatement to v2 command framework
> ---
>
> Key: SPARK-36868
> URL: https://issues.apache.org/jira/browse/SPARK-36868
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36871) Migrate CreateViewStatement to v2 command

2021-10-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36871:
---

Assignee: Huaxin Gao

> Migrate CreateViewStatement to v2 command
> -
>
> Key: SPARK-36871
> URL: https://issues.apache.org/jira/browse/SPARK-36871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36871) Migrate CreateViewStatement to v2 command

2021-10-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36871.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34131
[https://github.com/apache/spark/pull/34131]

> Migrate CreateViewStatement to v2 command
> -
>
> Key: SPARK-36871
> URL: https://issues.apache.org/jira/browse/SPARK-36871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32161) Hide JVM traceback for SparkUpgradeException

2021-10-13 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428228#comment-17428228
 ] 

Hyukjin Kwon commented on SPARK-32161:
--

Please go ahead!

> Hide JVM traceback for SparkUpgradeException
> 
>
> Key: SPARK-32161
> URL: https://issues.apache.org/jira/browse/SPARK-32161
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We added {{SparkUpgradeException}} which the JVM traceback is pretty useless. 
> See also https://github.com/apache/spark/pull/28736/files#r449184881
> It should better also whitelist and hide JVM traceback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32161) Hide JVM traceback for SparkUpgradeException

2021-10-13 Thread pralabhkumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428185#comment-17428185
 ] 

pralabhkumar commented on SPARK-32161:
--

[~hyukjin.kwon]

 

Please let me if I can work on this .

 

IMHO its a  change in convert_exception method of (sql/util.py) to take care 
SparkUpgradeException

> Hide JVM traceback for SparkUpgradeException
> 
>
> Key: SPARK-32161
> URL: https://issues.apache.org/jira/browse/SPARK-32161
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We added {{SparkUpgradeException}} which the JVM traceback is pretty useless. 
> See also https://github.com/apache/spark/pull/28736/files#r449184881
> It should better also whitelist and hide JVM traceback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36921) The DIV function should support ANSI intervals

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-36921.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34257
[https://github.com/apache/spark/pull/34257]

> The DIV function should support ANSI intervals
> --
>
> Key: SPARK-36921
> URL: https://issues.apache.org/jira/browse/SPARK-36921
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Extended the div function to support ANSI intervals. The operation should 
> produce quotient of division.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36921) The DIV function should support ANSI intervals

2021-10-13 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-36921:


Assignee: PengLei

> The DIV function should support ANSI intervals
> --
>
> Key: SPARK-36921
> URL: https://issues.apache.org/jira/browse/SPARK-36921
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
>
> Extended the div function to support ANSI intervals. The operation should 
> produce quotient of division.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36997) Test type hints against examples

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428145#comment-17428145
 ] 

Apache Spark commented on SPARK-36997:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/34273

> Test type hints against examples
> 
>
> Key: SPARK-36997
> URL: https://issues.apache.org/jira/browse/SPARK-36997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Next to tests, examples are the largest piece of code showing actual usage of 
> public API, that we have. 
> Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
> tests weren't ported during the initial migration.
> We should add these back, and extend to {{mllib}}, {{streaming}} and core 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36997) Test type hints against examples

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36997:


Assignee: Apache Spark

> Test type hints against examples
> 
>
> Key: SPARK-36997
> URL: https://issues.apache.org/jira/browse/SPARK-36997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Major
>
> Next to tests, examples are the largest piece of code showing actual usage of 
> public API, that we have. 
> Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
> tests weren't ported during the initial migration.
> We should add these back, and extend to {{mllib}}, {{streaming}} and core 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36997) Test type hints against examples

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428143#comment-17428143
 ] 

Apache Spark commented on SPARK-36997:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/34273

> Test type hints against examples
> 
>
> Key: SPARK-36997
> URL: https://issues.apache.org/jira/browse/SPARK-36997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Next to tests, examples are the largest piece of code showing actual usage of 
> public API, that we have. 
> Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
> tests weren't ported during the initial migration.
> We should add these back, and extend to {{mllib}}, {{streaming}} and core 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36997) Test type hints against examples

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36997:


Assignee: (was: Apache Spark)

> Test type hints against examples
> 
>
> Key: SPARK-36997
> URL: https://issues.apache.org/jira/browse/SPARK-36997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Next to tests, examples are the largest piece of code showing actual usage of 
> public API, that we have. 
> Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
> tests weren't ported during the initial migration.
> We should add these back, and extend to {{mllib}}, {{streaming}} and core 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428141#comment-17428141
 ] 

Senthil Kumar commented on SPARK-36996:
---

Sample Output after this changes:

SQL :

mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
varchar(255), Age int);

 

mysql> desc Persons;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| Id | int | NO | | NULL | |
| FirstName | varchar(255) | YES | | NULL | |
| LastName | varchar(255) | YES | | NULL | |
| Age | int | YES | | NULL | |
+---+--+--+-+-+---+

--++---+++

Spark:

scala> val df = 
spark.read.format("jdbc").option("database","Test_DB").option("user", 
"root").option("password", "").option("driver", 
"com.mysql.cj.jdbc.Driver").option("url", 
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
 df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
fields]

scala> df.printSchema()
 root
 |-- Id: integer (nullable = false)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Age: integer (nullable = true)

 

 

And for TIMESTAMP columns

 

SQL:
create table timestamp_test(id int(11), time_stamp timestamp not null default 
current_timestamp);

SPARK:

scala> val df = 
spark.read.format("jdbc").option("database","Test_DB").option("user", 
"root").option("password", "").option("driver", 
"com.mysql.cj.jdbc.Driver").option("url", 
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", 
"timestamp_test").load()
df: org.apache.spark.sql.DataFrame = [id: int, time_stamp: timestamp]

scala> df.printSchema()
root
|-- id: integer (nullable = true)
|-- time_stamp: timestamp (nullable = true)

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428140#comment-17428140
 ] 

Senthil Kumar commented on SPARK-36996:
---

We need to consider 2 scenarios

 
 # maintain NULLABLE value as per SQL metadata for non timestamp columns
 # set NULLABLE as true(always) for timestamp columns

 

 

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36997) Test type hints against examples

2021-10-13 Thread Maciej Szymkiewicz (Jira)
Maciej Szymkiewicz created SPARK-36997:
--

 Summary: Test type hints against examples
 Key: SPARK-36997
 URL: https://issues.apache.org/jira/browse/SPARK-36997
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Tests
Affects Versions: 3.3.0
Reporter: Maciej Szymkiewicz


Next to tests, examples are the largest piece of code showing actual usage of 
public API, that we have. 

Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
tests weren't ported during the initial migration.

We should add these back, and extend to {{mllib}}, {{streaming}} and core files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36996:


Assignee: Apache Spark

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Assignee: Apache Spark
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36996:


Assignee: (was: Apache Spark)

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428137#comment-17428137
 ] 

Apache Spark commented on SPARK-36996:
--

User 'senthh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34272

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36855) Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first

2021-10-13 Thread jinhai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jinhai resolved SPARK-36855.

Resolution: Not A Problem

> Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first
> -
>
> Key: SPARK-36855
> URL: https://issues.apache.org/jira/browse/SPARK-36855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: jinhai
>Priority: Major
>
> In the ShuffleBlockFetcherIterator.partitionBlocksByFetchMode method, both 
> local and host-local execute the mergeContinuousShuffleBlockIdsIfNeeded 
> method first, but remote blocks executes the method many times in the 
> createFetchRequests method. Can the method of merge blocks be executed only 
> once in advance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36438) Support list-like Python objects for Series comparison

2021-10-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36438:


Assignee: Haejoon Lee

> Support list-like Python objects for Series comparison
> --
>
> Key: SPARK-36438
> URL: https://issues.apache.org/jira/browse/SPARK-36438
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36438) Support list-like Python objects for Series comparison

2021-10-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36438.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34114
[https://github.com/apache/spark/pull/34114]

> Support list-like Python objects for Series comparison
> --
>
> Key: SPARK-36438
> URL: https://issues.apache.org/jira/browse/SPARK-36438
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428104#comment-17428104
 ] 

Senthil Kumar commented on SPARK-36996:
---

Based on further analysis, Spark is hard-coding "nullable" as "true" always. 
This change has been inccluded due to 
"https://issues.apache.org/jira/browse/SPARK-19726;.

 

 

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428105#comment-17428105
 ] 

Senthil Kumar commented on SPARK-36996:
---

I m working on this

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36996:
-

 Summary: fixing "SQL column nullable setting not retained as part 
of spark read" issue
 Key: SPARK-36996
 URL: https://issues.apache.org/jira/browse/SPARK-36996
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0
Reporter: Senthil Kumar


Sql 'nullable' columns are not retaining 'nullable' type as it is while reading 
from Spark read using jdbc format.

 

SQL :



 

mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
varchar(255), Age int);

 

mysql> desc Persons;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| Id | int | NO | | NULL | |
| FirstName | varchar(255) | YES | | NULL | |
| LastName | varchar(255) | YES | | NULL | |
| Age | int | YES | | NULL | |
+---+--+--+-+-+---+

 

But in Spark  we get all the columns as "Nullable":

=

scala> val df = 
spark.read.format("jdbc").option("database","Test_DB").option("user", 
"root").option("password", "").option("driver", 
"com.mysql.cj.jdbc.Driver").option("url", 
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
fields]

scala> df.printSchema()
root
 |-- Id: integer (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Age: integer (nullable = true)

=

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36995) Add new SQLFileCommitProtocol

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36995:


Assignee: Apache Spark

> Add new SQLFileCommitProtocol
> -
>
> Key: SPARK-36995
> URL: https://issues.apache.org/jira/browse/SPARK-36995
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36995) Add new SQLFileCommitProtocol

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428082#comment-17428082
 ] 

Apache Spark commented on SPARK-36995:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34271

> Add new SQLFileCommitProtocol
> -
>
> Key: SPARK-36995
> URL: https://issues.apache.org/jira/browse/SPARK-36995
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36995) Add new SQLFileCommitProtocol

2021-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36995:


Assignee: (was: Apache Spark)

> Add new SQLFileCommitProtocol
> -
>
> Key: SPARK-36995
> URL: https://issues.apache.org/jira/browse/SPARK-36995
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36989) Migrate type hint data tests

2021-10-13 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-36989:
---
Component/s: Tests

> Migrate type hint data tests
> 
>
> Key: SPARK-36989
> URL: https://issues.apache.org/jira/browse/SPARK-36989
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Before the migration, {{pyspark-stubs}} contained a set of [data 
> tests|https://github.com/zero323/pyspark-stubs/tree/branch-3.0/test-data/unit],
>  modeled after, and using internal test utilities, of mypy.
> These were omitted during the migration for a few reasons:
>  * Simplicity.
>  * Relative slowness.
>  * Dependence on non public API.
>  
> Data tests are useful for a number of reasons:
>  
>  * Improve test coverage for type hints.
>  * Checking if type checkers infer expected types.
>  * Checking if type checkers reject incorrect code.
>  * Detecting unusual errors with code that otherwise type checks,
>  
> Especially, the last two functions are not fulfilled by simple validation of 
> existing codebase.
>  
> Data tests are not required for all annotations and can be restricted to code 
> that has high possibility of failure:
>  * Complex overloaded signatures.
>  * Complex generics.
>  * Generic {{self}} annotations
>  * Code containing {{type: ignore}}
> The biggest risk, is that output matchers have to be updated when signature 
> changes and / or mypy output changes.
> Example of problem detected with data tests can be found in SPARK-36894 PR 
> ([https://github.com/apache/spark/pull/34146]).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36989) Migrate type hint data tests

2021-10-13 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-36989:
---
Issue Type: Improvement  (was: Bug)

> Migrate type hint data tests
> 
>
> Key: SPARK-36989
> URL: https://issues.apache.org/jira/browse/SPARK-36989
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Before the migration, {{pyspark-stubs}} contained a set of [data 
> tests|https://github.com/zero323/pyspark-stubs/tree/branch-3.0/test-data/unit],
>  modeled after, and using internal test utilities, of mypy.
> These were omitted during the migration for a few reasons:
>  * Simplicity.
>  * Relative slowness.
>  * Dependence on non public API.
>  
> Data tests are useful for a number of reasons:
>  
>  * Improve test coverage for type hints.
>  * Checking if type checkers infer expected types.
>  * Checking if type checkers reject incorrect code.
>  * Detecting unusual errors with code that otherwise type checks,
>  
> Especially, the last two functions are not fulfilled by simple validation of 
> existing codebase.
>  
> Data tests are not required for all annotations and can be restricted to code 
> that has high possibility of failure:
>  * Complex overloaded signatures.
>  * Complex generics.
>  * Generic {{self}} annotations
>  * Code containing {{type: ignore}}
> The biggest risk, is that output matchers have to be updated when signature 
> changes and / or mypy output changes.
> Example of problem detected with data tests can be found in SPARK-36894 PR 
> ([https://github.com/apache/spark/pull/34146]).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35141) Support two level map for final hash aggregation

2021-10-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428075#comment-17428075
 ] 

Apache Spark commented on SPARK-35141:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/34270

> Support two level map for final hash aggregation
> 
>
> Key: SPARK-35141
> URL: https://issues.apache.org/jira/browse/SPARK-35141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.2.0
>
>
> For partial hash aggregation (code-gen path), we have two level of hash map 
> for aggregation. First level is from `RowBasedHashMapGenerator`, which is 
> computation faster compared to the second level from 
> `UnsafeFixedWidthAggregationMap`. The introducing of two level hash map can 
> help improve CPU performance of query as the first level hash map normally 
> fits in hardware cache and has cheaper hash function for key lookup.
> For final hash aggregation, we can also support two level of hash map, to 
> improve query performance further.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior

2021-10-13 Thread Alexey Diomin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428061#comment-17428061
 ] 

Alexey Diomin commented on SPARK-36717:
---

please update "fix versions"

commit was not included in 3.2.0 and will be included in 3.2.1

> Wrong order of variable initialization may lead to incorrect behavior
> -
>
> Key: SPARK-36717
> URL: https://issues.apache.org/jira/browse/SPARK-36717
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Jianmeng Li
>Assignee: Jianmeng Li
>Priority: Minor
> Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0
>
>
> Incorrect order of variable initialization may lead to incorrect behavior, 
> Related code: 
> [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94]
>  , TorrentBroadCast will get wrong checksumEnabled value after 
> initialization, this may not be what we need, we can move L94 front of 
> setConf(SparkEnv.get.conf) to avoid this.
> Supplement:
> Snippet 1:
> {code:java}
> class Broadcast {
>   def setConf(): Unit = {
> checksumEnabled = true
>   }
>   setConf()
>   var checksumEnabled = false
> }
> println(new Broadcast().checksumEnabled){code}
> output:
> {code:java}
> false{code}
> Snippet 2:
> {code:java}
> class Broadcast {
>   var checksumEnabled = false
>   def setConf(): Unit = {
> checksumEnabled = true
>   }
>   setConf()
> }
> println(new Broadcast().checksumEnabled){code}
> output: 
> {code:java}
> true{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36995) Add new SQLFileCommitProtocol

2021-10-13 Thread angerszhu (Jira)
angerszhu created SPARK-36995:
-

 Summary: Add new SQLFileCommitProtocol
 Key: SPARK-36995
 URL: https://issues.apache.org/jira/browse/SPARK-36995
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34990) Add ParquetEncryptionSuite

2021-10-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-34990:
---
Priority: Minor  (was: Major)

> Add ParquetEncryptionSuite
> --
>
> Key: SPARK-34990
> URL: https://issues.apache.org/jira/browse/SPARK-34990
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maya Anderson
>Assignee: Maya Anderson
>Priority: Minor
> Fix For: 3.2.0
>
>
> Now that Parquet Modular Encryption is available in Spark as of SPARK-34542 , 
> we need a test to demonstrate and verify its usage from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34990) Add ParquetEncryptionSuite

2021-10-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-34990:
---
Issue Type: Test  (was: New Feature)

> Add ParquetEncryptionSuite
> --
>
> Key: SPARK-34990
> URL: https://issues.apache.org/jira/browse/SPARK-34990
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maya Anderson
>Assignee: Maya Anderson
>Priority: Major
> Fix For: 3.2.0
>
>
> Now that Parquet Modular Encryption is available in Spark as of SPARK-34542 , 
> we need a test to demonstrate and verify its usage from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36973) Deduplicate prepare data method for HistogramPlotBase and KdePlotBase

2021-10-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36973.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34251
[https://github.com/apache/spark/pull/34251]

> Deduplicate prepare data method for HistogramPlotBase and KdePlotBase
> -
>
> Key: SPARK-36973
> URL: https://issues.apache.org/jira/browse/SPARK-36973
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >