date:20210929

[jira] [Assigned] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36830:


Assignee: (was: Apache Spark)

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to JSON datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36830:


Assignee: Apache Spark

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to JSON datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422555#comment-17422555
 ] 

Apache Spark commented on SPARK-36830:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34155

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to JSON datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-09-29 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-36900:
-
Description: 
Execute

 
{code:java}
build/mvn clean install  -pl core -am -Dtest=none 
-DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
{code}
with JDK 17,
{code:java}
ChunkedByteBufferOutputStreamSuite:
- empty output
- write a single byte
- write a single near boundary
- write a single at boundary
- single chunk output
- single chunk output at boundary size
- multiple chunk output
- multiple chunk output at boundary size
*** RUN ABORTED ***
  java.lang.OutOfMemoryError: Java heap space
  at java.base/java.lang.Integer.valueOf(Integer.java:1081)
  at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
  at 
org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
  at 
org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
  at java.base/java.io.OutputStream.write(OutputStream.java:127)
  at 
org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
  at 
org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
 Source)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
{code}
 

 

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-09-29 Thread Yang Jie (Jira)

Yang Jie created SPARK-36900:


 Summary: "SPARK-36464: size returns correct positive number even 
with over 2GB data" will oom with JDK17 
 Key: SPARK-36900
 URL: https://issues.apache.org/jira/browse/SPARK-36900
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36899.

Fix Version/s: 3.3.0
 Assignee: Leona Yoda
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/34152

> Support ILIKE API on R
> --
>
> Key: SPARK-36899
> URL: https://issues.apache.org/jira/browse/SPARK-36899
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Assignee: Leona Yoda
>Priority: Major
> Fix For: 3.3.0
>
>
> Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36896) Return boolean for `dropTempView` and `dropGlobalTempView`

2021-09-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36896:


Assignee: Xinrong Meng

> Return boolean for `dropTempView` and `dropGlobalTempView`
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Currently`dropTempView` and `dropGlobalTempView` don't 
> have return value, which conflicts with their docstring:
> `Returns true if this view is dropped successfully, false otherwise.`.
>  
> We should fix that.
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36896) Return boolean for `dropTempView` and `dropGlobalTempView`

2021-09-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36896.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34147
[https://github.com/apache/spark/pull/34147]

> Return boolean for `dropTempView` and `dropGlobalTempView`
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Currently`dropTempView` and `dropGlobalTempView` don't 
> have return value, which conflicts with their docstring:
> `Returns true if this view is dropped successfully, false otherwise.`.
>  
> We should fix that.
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-29 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422533#comment-17422533
 ] 

Gengliang Wang commented on SPARK-36892:


[~zhouyejoe] Thank you!

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36893.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34144
[https://github.com/apache/spark/pull/34144]

> upgrade mesos into 1.4.3
> 
>
> Key: SPARK-36893
> URL: https://issues.apache.org/jira/browse/SPARK-36893
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Assignee: Zhongwei Zhu
>Priority: Minor
> Fix For: 3.3.0
>
>
> Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36893:
-

Assignee: Zhongwei Zhu

> upgrade mesos into 1.4.3
> 
>
> Key: SPARK-36893
> URL: https://issues.apache.org/jira/browse/SPARK-36893
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Assignee: Zhongwei Zhu
>Priority: Minor
>
> Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36796) Make all unit tests pass on Java 17

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422517#comment-17422517
 ] 

Apache Spark commented on SPARK-36796:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34153

> Make all unit tests pass on Java 17
> ---
>
> Key: SPARK-36796
> URL: https://issues.apache.org/jira/browse/SPARK-36796
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36796) Make all unit tests pass on Java 17

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36796:


Assignee: (was: Apache Spark)

> Make all unit tests pass on Java 17
> ---
>
> Key: SPARK-36796
> URL: https://issues.apache.org/jira/browse/SPARK-36796
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36796) Make all unit tests pass on Java 17

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36796:


Assignee: Apache Spark

> Make all unit tests pass on Java 17
> ---
>
> Key: SPARK-36796
> URL: https://issues.apache.org/jira/browse/SPARK-36796
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36796) Make all unit tests pass on Java 17

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422516#comment-17422516
 ] 

Apache Spark commented on SPARK-36796:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34153

> Make all unit tests pass on Java 17
> ---
>
> Key: SPARK-36796
> URL: https://issues.apache.org/jira/browse/SPARK-36796
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422507#comment-17422507
 ] 

Apache Spark commented on SPARK-36899:
--

User 'yoda-mon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34152

> Support ILIKE API on R
> --
>
> Key: SPARK-36899
> URL: https://issues.apache.org/jira/browse/SPARK-36899
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Priority: Major
>
> Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36899:


Assignee: (was: Apache Spark)

> Support ILIKE API on R
> --
>
> Key: SPARK-36899
> URL: https://issues.apache.org/jira/browse/SPARK-36899
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Priority: Major
>
> Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422506#comment-17422506
 ] 

Apache Spark commented on SPARK-36899:
--

User 'yoda-mon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34152

> Support ILIKE API on R
> --
>
> Key: SPARK-36899
> URL: https://issues.apache.org/jira/browse/SPARK-36899
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Priority: Major
>
> Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36899:


Assignee: Apache Spark

> Support ILIKE API on R
> --
>
> Key: SPARK-36899
> URL: https://issues.apache.org/jira/browse/SPARK-36899
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Assignee: Apache Spark
>Priority: Major
>
> Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36886:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/sql/context.py
> ---
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/sql/context.py from Inline type hints 
> for python/pyspark/sql/context.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422503#comment-17422503
 ] 

Apache Spark commented on SPARK-36886:
--

User 'dgd-contributor' has created a pull request for this issue:
https://github.com/apache/spark/pull/34151

> Inline type hints for python/pyspark/sql/context.py
> ---
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/sql/context.py from Inline type hints 
> for python/pyspark/sql/context.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36886:


Assignee: Apache Spark

> Inline type hints for python/pyspark/sql/context.py
> ---
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Assignee: Apache Spark
>Priority: Major
>
> Inline type hints for python/pyspark/sql/context.py from Inline type hints 
> for python/pyspark/sql/context.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36899) Support ILIKE API on R

2021-09-29 Thread Leona Yoda (Jira)

Leona Yoda created SPARK-36899:
--

 Summary: Support ILIKE API on R
 Key: SPARK-36899
 URL: https://issues.apache.org/jira/browse/SPARK-36899
 Project: Spark
  Issue Type: Sub-task
  Components: R
Affects Versions: 3.3.0
Reporter: Leona Yoda


Support ILIKE (case sensitive LIKE) API on R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36898) Make the shuffle hash join factor configurable

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422499#comment-17422499
 ] 

Apache Spark commented on SPARK-36898:
--

User 'JkSelf' has created a pull request for this issue:
https://github.com/apache/spark/pull/34150

> Make the shuffle hash join factor configurable
> --
>
> Key: SPARK-36898
> URL: https://issues.apache.org/jira/browse/SPARK-36898
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Ke Jia
>Priority: Major
>
> Make the shuffle hash join factor configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36898) Make the shuffle hash join factor configurable

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36898:


Assignee: Apache Spark

> Make the shuffle hash join factor configurable
> --
>
> Key: SPARK-36898
> URL: https://issues.apache.org/jira/browse/SPARK-36898
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Ke Jia
>Assignee: Apache Spark
>Priority: Major
>
> Make the shuffle hash join factor configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36898) Make the shuffle hash join factor configurable

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422498#comment-17422498
 ] 

Apache Spark commented on SPARK-36898:
--

User 'JkSelf' has created a pull request for this issue:
https://github.com/apache/spark/pull/34150

> Make the shuffle hash join factor configurable
> --
>
> Key: SPARK-36898
> URL: https://issues.apache.org/jira/browse/SPARK-36898
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Ke Jia
>Priority: Major
>
> Make the shuffle hash join factor configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36898) Make the shuffle hash join factor configurable

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36898:


Assignee: (was: Apache Spark)

> Make the shuffle hash join factor configurable
> --
>
> Key: SPARK-36898
> URL: https://issues.apache.org/jira/browse/SPARK-36898
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Ke Jia
>Priority: Major
>
> Make the shuffle hash join factor configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36898) Make the shuffle hash join factor configurable

2021-09-29 Thread Ke Jia (Jira)

Ke Jia created SPARK-36898:
--

 Summary: Make the shuffle hash join factor configurable
 Key: SPARK-36898
 URL: https://issues.apache.org/jira/browse/SPARK-36898
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: Ke Jia


Make the shuffle hash join factor configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36897) Replace collections.namedtuple() by typing.NamedTuple

2021-09-29 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-36897:


 Summary: Replace collections.namedtuple() by typing.NamedTuple
 Key: SPARK-36897
 URL: https://issues.apache.org/jira/browse/SPARK-36897
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Xinrong Meng


Per discussion under 
[https://github.com/apache/spark/pull/34133#discussion_r718833451,] we wanted 
to replace collections.namedtuple() by typing.NamedTuple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36845) Inline type hint files

2021-09-29 Thread Takuya Ueshin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422487#comment-17422487
 ] 

Takuya Ueshin commented on SPARK-36845:
---

For the {{_typing.pyi}}, it uses {{Protocol}} that is supported Python 3.8 and 
above, so it won't be a straightforward, IIUC.

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36845) Inline type hint files

2021-09-29 Thread Takuya Ueshin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422486#comment-17422486
 ] 

Takuya Ueshin edited comment on SPARK-36845 at 9/30/21, 1:36 AM:
-

It would be great if we could use the "annotations" future flag!


was (Author: ueshin):
It would be great if we could use the "annotation" future flag!

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36845) Inline type hint files

2021-09-29 Thread Takuya Ueshin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422486#comment-17422486
 ] 

Takuya Ueshin commented on SPARK-36845:
---

It would be great if we could use the "annotation" future flag!

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36830:
---
Description: Implement writing and reading ANSI intervals (year-month and 
day-time intervals) columns in dataframes to JSON datasources.  (was: Implement 
writing and reading ANSI intervals (year-month and day-time intervals) columns 
in dataframes to Parquet datasources.)

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to JSON datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422483#comment-17422483
 ] 

Kousuke Saruta commented on SPARK-36830:


Thank you, will do.

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to Parquet datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36888) Sha2 with bit_length 512 not being tested

2021-09-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36888.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34145
[https://github.com/apache/spark/pull/34145]

> Sha2 with bit_length 512 not being tested
> -
>
> Key: SPARK-36888
> URL: https://issues.apache.org/jira/browse/SPARK-36888
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Assignee: Richard Chen
>Priority: Major
> Fix For: 3.3.0
>
>
> Looking at 
> [https://github.com/apache/spark/commit/6c6291b3f6ac13b8415b87b2b741a9cd95bc6c3b]
>  for https://issues.apache.org/jira/browse/SPARK-36836, it's clear that 512 
> bits are supported
> {{bitLength match {}}
>  {{[...]}}
>  {{  case 512 =>}}
>  {{    UTF8String.fromString(DigestUtils.sha512Hex(input))}}
> resp.
> {{nullSafeCodeGen(ctx, ev, (eval1, eval2) => {}}
>  {{    [...]}}
>  {{    else if ($eval2 == 512) {}}
>  {{   ${ev.value} =}}
>  {{ UTF8String.fromString($digestUtils.sha512Hex($eval1));}}
> but the test claims it is unsupported:
> {{// unsupported bit length}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
> null)}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), 
> null)}}
> To avoid a similar fate as SPARK-36836, tests should be added.
>   
>  CC [~richardc-db]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36888) Sha2 with bit_length 512 not being tested

2021-09-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36888:


Assignee: Richard Chen

> Sha2 with bit_length 512 not being tested
> -
>
> Key: SPARK-36888
> URL: https://issues.apache.org/jira/browse/SPARK-36888
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Assignee: Richard Chen
>Priority: Major
>
> Looking at 
> [https://github.com/apache/spark/commit/6c6291b3f6ac13b8415b87b2b741a9cd95bc6c3b]
>  for https://issues.apache.org/jira/browse/SPARK-36836, it's clear that 512 
> bits are supported
> {{bitLength match {}}
>  {{[...]}}
>  {{  case 512 =>}}
>  {{    UTF8String.fromString(DigestUtils.sha512Hex(input))}}
> resp.
> {{nullSafeCodeGen(ctx, ev, (eval1, eval2) => {}}
>  {{    [...]}}
>  {{    else if ($eval2 == 512) {}}
>  {{   ${ev.value} =}}
>  {{ UTF8String.fromString($digestUtils.sha512Hex($eval1));}}
> but the test claims it is unsupported:
> {{// unsupported bit length}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
> null)}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), 
> null)}}
> To avoid a similar fate as SPARK-36836, tests should be added.
>   
>  CC [~richardc-db]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36891) Add new test suite to cover Parquet decoding

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422480#comment-17422480
 ] 

Apache Spark commented on SPARK-36891:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34149

> Add new test suite to cover Parquet decoding
> 
>
> Key: SPARK-36891
> URL: https://issues.apache.org/jira/browse/SPARK-36891
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Add a new test suite to add more coverage for Parquet vectorized decoding, 
> focusing on different combinations of Parquet column index, dictionary, batch 
> size, page size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36891) Add new test suite to cover Parquet decoding

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422476#comment-17422476
 ] 

Apache Spark commented on SPARK-36891:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34149

> Add new test suite to cover Parquet decoding
> 
>
> Key: SPARK-36891
> URL: https://issues.apache.org/jira/browse/SPARK-36891
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Add a new test suite to add more coverage for Parquet vectorized decoding, 
> focusing on different combinations of Parquet column index, dictionary, batch 
> size, page size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36891) Add new test suite to cover Parquet decoding

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36891:


Assignee: Apache Spark

> Add new test suite to cover Parquet decoding
> 
>
> Key: SPARK-36891
> URL: https://issues.apache.org/jira/browse/SPARK-36891
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> Add a new test suite to add more coverage for Parquet vectorized decoding, 
> focusing on different combinations of Parquet column index, dictionary, batch 
> size, page size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36891) Add new test suite to cover Parquet decoding

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36891:


Assignee: (was: Apache Spark)

> Add new test suite to cover Parquet decoding
> 
>
> Key: SPARK-36891
> URL: https://issues.apache.org/jira/browse/SPARK-36891
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Add a new test suite to add more coverage for Parquet vectorized decoding, 
> focusing on different combinations of Parquet column index, dictionary, batch 
> size, page size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36895) Add Create Index syntax support

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422468#comment-17422468
 ] 

Apache Spark commented on SPARK-36895:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34148

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36895) Add Create Index syntax support

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422466#comment-17422466
 ] 

Apache Spark commented on SPARK-36895:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34148

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36895) Add Create Index syntax support

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36895:


Assignee: (was: Apache Spark)

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36895) Add Create Index syntax support

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36895:


Assignee: Apache Spark

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36845) Inline type hint files

2021-09-29 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422464#comment-17422464
 ] 

Hyukjin Kwon commented on SPARK-36845:
--

Yeah, _typing.pyi one I am not sure. Agree we should think about which approach 
we'll take on this ..
I added SPARK-36145 as a related ticket for now. I will make sure making it 
resolved all together for Spark 3.3.

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36894) RDD.toDF should be synchronized with dispatched variants of SparkSession.createDataFrame

2021-09-29 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-36894:
---
Description: 
There are some variants that are supported:
 * Providing a schema as a {{str}} object for {{RDD[RowLike]}} objects
 * Providing a schema as a {{Tuple[str, ...]}} names
 * Calling {{toDF}} on {{RDD}} of atomic values, when schema of {{str}} or 
{{AtomicType}} is provided.

  was:In {{toDF}} docs we explicitly mention that {{str}} schema is supported, 
so it should be reflected in the type hints.


> RDD.toDF should be synchronized with dispatched variants of 
> SparkSession.createDataFrame
> 
>
> Key: SPARK-36894
> URL: https://issues.apache.org/jira/browse/SPARK-36894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> There are some variants that are supported:
>  * Providing a schema as a {{str}} object for {{RDD[RowLike]}} objects
>  * Providing a schema as a {{Tuple[str, ...]}} names
>  * Calling {{toDF}} on {{RDD}} of atomic values, when schema of {{str}} or 
> {{AtomicType}} is provided.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36894) RDD.toDF should be synchronized with dispatched variants of SparkSession.createDataFrame

2021-09-29 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-36894:
---
Summary: RDD.toDF should be synchronized with dispatched variants of 
SparkSession.createDataFrame  (was: RDD.toDF should support string schema)

> RDD.toDF should be synchronized with dispatched variants of 
> SparkSession.createDataFrame
> 
>
> Key: SPARK-36894
> URL: https://issues.apache.org/jira/browse/SPARK-36894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In {{toDF}} docs we explicitly mention that {{str}} schema is supported, so 
> it should be reflected in the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36896) Return boolean for `dropTempView` and `dropGlobalTempView`

2021-09-29 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36896:
-
Environment: 
Currently`dropTempView` and `dropGlobalTempView` don't have return value, which 
conflicts with their docstring:
`Returns true if this view is dropped successfully, false otherwise.`.

 

We should fix that.

  was:
dropTempView, dropGlobalTempView should have return values.

setCurrentDatabase shouldn't return anything.

We should fix these API accordingly.


> Return boolean for `dropTempView` and `dropGlobalTempView`
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Currently`dropTempView` and `dropGlobalTempView` don't 
> have return value, which conflicts with their docstring:
> `Returns true if this view is dropped successfully, false otherwise.`.
>  
> We should fix that.
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36896) Return boolean for `dropTempView` and `dropGlobalTempView`

2021-09-29 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36896:
-
Summary: Return boolean for `dropTempView` and `dropGlobalTempView`  (was: 
Fix returns of functions in python/pyspark/sql/catalog.py)

> Return boolean for `dropTempView` and `dropGlobalTempView`
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: dropTempView, dropGlobalTempView should have return 
> values.
> setCurrentDatabase shouldn't return anything.
> We should fix these API accordingly.
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36896) Fix returns of functions in python/pyspark/sql/catalog.py

2021-09-29 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36896:
-
Summary: Fix returns of functions in python/pyspark/sql/catalog.py  (was: 
Fix returns of functions python/pyspark/sql/catalog.py)

> Fix returns of functions in python/pyspark/sql/catalog.py
> -
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: dropTempView, dropGlobalTempView should have return 
> values.
> setCurrentDatabase shouldn't return anything.
> We should fix these API accordingly.
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36896) Fix returns of functions python/pyspark/sql/catalog.py

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422440#comment-17422440
 ] 

Apache Spark commented on SPARK-36896:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34147

> Fix returns of functions python/pyspark/sql/catalog.py
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: dropTempView, dropGlobalTempView should have return 
> values.
> setCurrentDatabase shouldn't return anything.
> We should fix these API accordingly.
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36896) Fix returns of functions python/pyspark/sql/catalog.py

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36896:


Assignee: Apache Spark

> Fix returns of functions python/pyspark/sql/catalog.py
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: dropTempView, dropGlobalTempView should have return 
> values.
> setCurrentDatabase shouldn't return anything.
> We should fix these API accordingly.
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36896) Fix returns of functions python/pyspark/sql/catalog.py

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422439#comment-17422439
 ] 

Apache Spark commented on SPARK-36896:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34147

> Fix returns of functions python/pyspark/sql/catalog.py
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: dropTempView, dropGlobalTempView should have return 
> values.
> setCurrentDatabase shouldn't return anything.
> We should fix these API accordingly.
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36896) Fix returns of functions python/pyspark/sql/catalog.py

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36896:


Assignee: (was: Apache Spark)

> Fix returns of functions python/pyspark/sql/catalog.py
> --
>
> Key: SPARK-36896
> URL: https://issues.apache.org/jira/browse/SPARK-36896
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: dropTempView, dropGlobalTempView should have return 
> values.
> setCurrentDatabase shouldn't return anything.
> We should fix these API accordingly.
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36896) Fix returns of functions python/pyspark/sql/catalog.py

2021-09-29 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-36896:


 Summary: Fix returns of functions python/pyspark/sql/catalog.py
 Key: SPARK-36896
 URL: https://issues.apache.org/jira/browse/SPARK-36896
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.3.0
 Environment: dropTempView, dropGlobalTempView should have return 
values.

setCurrentDatabase shouldn't return anything.

We should fix these API accordingly.
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36895) Add Create Index syntax support

2021-09-29 Thread Huaxin Gao (Jira)

Huaxin Gao created SPARK-36895:
--

 Summary: Add Create Index syntax support
 Key: SPARK-36895
 URL: https://issues.apache.org/jira/browse/SPARK-36895
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36894) RDD.toDF should support string schema

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422430#comment-17422430
 ] 

Apache Spark commented on SPARK-36894:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/34146

> RDD.toDF should support string schema
> -
>
> Key: SPARK-36894
> URL: https://issues.apache.org/jira/browse/SPARK-36894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In {{toDF}} docs we explicitly mention that {{str}} schema is supported, so 
> it should be reflected in the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36894) RDD.toDF should support string schema

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36894:


Assignee: (was: Apache Spark)

> RDD.toDF should support string schema
> -
>
> Key: SPARK-36894
> URL: https://issues.apache.org/jira/browse/SPARK-36894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In {{toDF}} docs we explicitly mention that {{str}} schema is supported, so 
> it should be reflected in the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36894) RDD.toDF should support string schema

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422429#comment-17422429
 ] 

Apache Spark commented on SPARK-36894:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/34146

> RDD.toDF should support string schema
> -
>
> Key: SPARK-36894
> URL: https://issues.apache.org/jira/browse/SPARK-36894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In {{toDF}} docs we explicitly mention that {{str}} schema is supported, so 
> it should be reflected in the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36894) RDD.toDF should support string schema

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36894:


Assignee: Apache Spark

> RDD.toDF should support string schema
> -
>
> Key: SPARK-36894
> URL: https://issues.apache.org/jira/browse/SPARK-36894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> In {{toDF}} docs we explicitly mention that {{str}} schema is supported, so 
> it should be reflected in the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-29 Thread Ye Zhou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422425#comment-17422425
 ] 

Ye Zhou commented on SPARK-36892:
-

I am working on this issue. We have a job which can reproduce this hanging 
issue and after disabling the batch fetch, the job can go through. Will post PR 
soon.

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36894) RDD.toDF should support string schema

2021-09-29 Thread Maciej Szymkiewicz (Jira)

Maciej Szymkiewicz created SPARK-36894:
--

 Summary: RDD.toDF should support string schema
 Key: SPARK-36894
 URL: https://issues.apache.org/jira/browse/SPARK-36894
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 3.1.2, 3.2.0, 3.3.0
Reporter: Maciej Szymkiewicz


In {{toDF}} docs we explicitly mention that {{str}} schema is supported, so it 
should be reflected in the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36888) Sha2 with bit_length 512 not being tested

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36888:


Assignee: (was: Apache Spark)

> Sha2 with bit_length 512 not being tested
> -
>
> Key: SPARK-36888
> URL: https://issues.apache.org/jira/browse/SPARK-36888
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> Looking at 
> [https://github.com/apache/spark/commit/6c6291b3f6ac13b8415b87b2b741a9cd95bc6c3b]
>  for https://issues.apache.org/jira/browse/SPARK-36836, it's clear that 512 
> bits are supported
> {{bitLength match {}}
>  {{[...]}}
>  {{  case 512 =>}}
>  {{    UTF8String.fromString(DigestUtils.sha512Hex(input))}}
> resp.
> {{nullSafeCodeGen(ctx, ev, (eval1, eval2) => {}}
>  {{    [...]}}
>  {{    else if ($eval2 == 512) {}}
>  {{   ${ev.value} =}}
>  {{ UTF8String.fromString($digestUtils.sha512Hex($eval1));}}
> but the test claims it is unsupported:
> {{// unsupported bit length}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
> null)}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), 
> null)}}
> To avoid a similar fate as SPARK-36836, tests should be added.
>   
>  CC [~richardc-db]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36888) Sha2 with bit_length 512 not being tested

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422424#comment-17422424
 ] 

Apache Spark commented on SPARK-36888:
--

User 'richardc-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/34145

> Sha2 with bit_length 512 not being tested
> -
>
> Key: SPARK-36888
> URL: https://issues.apache.org/jira/browse/SPARK-36888
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> Looking at 
> [https://github.com/apache/spark/commit/6c6291b3f6ac13b8415b87b2b741a9cd95bc6c3b]
>  for https://issues.apache.org/jira/browse/SPARK-36836, it's clear that 512 
> bits are supported
> {{bitLength match {}}
>  {{[...]}}
>  {{  case 512 =>}}
>  {{    UTF8String.fromString(DigestUtils.sha512Hex(input))}}
> resp.
> {{nullSafeCodeGen(ctx, ev, (eval1, eval2) => {}}
>  {{    [...]}}
>  {{    else if ($eval2 == 512) {}}
>  {{   ${ev.value} =}}
>  {{ UTF8String.fromString($digestUtils.sha512Hex($eval1));}}
> but the test claims it is unsupported:
> {{// unsupported bit length}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
> null)}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), 
> null)}}
> To avoid a similar fate as SPARK-36836, tests should be added.
>   
>  CC [~richardc-db]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36888) Sha2 with bit_length 512 not being tested

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36888:


Assignee: Apache Spark

> Sha2 with bit_length 512 not being tested
> -
>
> Key: SPARK-36888
> URL: https://issues.apache.org/jira/browse/SPARK-36888
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Assignee: Apache Spark
>Priority: Major
>
> Looking at 
> [https://github.com/apache/spark/commit/6c6291b3f6ac13b8415b87b2b741a9cd95bc6c3b]
>  for https://issues.apache.org/jira/browse/SPARK-36836, it's clear that 512 
> bits are supported
> {{bitLength match {}}
>  {{[...]}}
>  {{  case 512 =>}}
>  {{    UTF8String.fromString(DigestUtils.sha512Hex(input))}}
> resp.
> {{nullSafeCodeGen(ctx, ev, (eval1, eval2) => {}}
>  {{    [...]}}
>  {{    else if ($eval2 == 512) {}}
>  {{   ${ev.value} =}}
>  {{ UTF8String.fromString($digestUtils.sha512Hex($eval1));}}
> but the test claims it is unsupported:
> {{// unsupported bit length}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
> null)}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), 
> null)}}
> To avoid a similar fate as SPARK-36836, tests should be added.
>   
>  CC [~richardc-db]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36888) Sha2 with bit_length 512 not being tested

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422422#comment-17422422
 ] 

Apache Spark commented on SPARK-36888:
--

User 'richardc-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/34145

> Sha2 with bit_length 512 not being tested
> -
>
> Key: SPARK-36888
> URL: https://issues.apache.org/jira/browse/SPARK-36888
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> Looking at 
> [https://github.com/apache/spark/commit/6c6291b3f6ac13b8415b87b2b741a9cd95bc6c3b]
>  for https://issues.apache.org/jira/browse/SPARK-36836, it's clear that 512 
> bits are supported
> {{bitLength match {}}
>  {{[...]}}
>  {{  case 512 =>}}
>  {{    UTF8String.fromString(DigestUtils.sha512Hex(input))}}
> resp.
> {{nullSafeCodeGen(ctx, ev, (eval1, eval2) => {}}
>  {{    [...]}}
>  {{    else if ($eval2 == 512) {}}
>  {{   ${ev.value} =}}
>  {{ UTF8String.fromString($digestUtils.sha512Hex($eval1));}}
> but the test claims it is unsupported:
> {{// unsupported bit length}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
> null)}}
>  {{checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), 
> null)}}
> To avoid a similar fate as SPARK-36836, tests should be added.
>   
>  CC [~richardc-db]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422395#comment-17422395
 ] 

Apache Spark commented on SPARK-36893:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/34144

> upgrade mesos into 1.4.3
> 
>
> Key: SPARK-36893
> URL: https://issues.apache.org/jira/browse/SPARK-36893
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422393#comment-17422393
 ] 

Apache Spark commented on SPARK-36893:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/34144

> upgrade mesos into 1.4.3
> 
>
> Key: SPARK-36893
> URL: https://issues.apache.org/jira/browse/SPARK-36893
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36893:


Assignee: Apache Spark

> upgrade mesos into 1.4.3
> 
>
> Key: SPARK-36893
> URL: https://issues.apache.org/jira/browse/SPARK-36893
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Assignee: Apache Spark
>Priority: Minor
>
> Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36893:


Assignee: (was: Apache Spark)

> upgrade mesos into 1.4.3
> 
>
> Key: SPARK-36893
> URL: https://issues.apache.org/jira/browse/SPARK-36893
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36893) upgrade mesos into 1.4.3

2021-09-29 Thread Zhongwei Zhu (Jira)

Zhongwei Zhu created SPARK-36893:


 Summary: upgrade mesos into 1.4.3
 Key: SPARK-36893
 URL: https://issues.apache.org/jira/browse/SPARK-36893
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.2
Reporter: Zhongwei Zhu


Upgrade mesos to 1.4.3 to fix CVE-2018-11793



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36845) Inline type hint files

2021-09-29 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422379#comment-17422379
 ] 

Maciej Szymkiewicz commented on SPARK-36845:


Additionally, we should probably rethink `_typing.pyi` modules. In general it 
should be safe to move these to plain Python modules (see for example how 
Pandas folks do similar thing).

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36869) Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible

2021-09-29 Thread Hamid EL MAAZOUZ (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422375#comment-17422375
 ] 

Hamid EL MAAZOUZ commented on SPARK-36869:
--

Thank you :)

> Spark job fails due to java.io.InvalidClassException: 
> scala.collection.mutable.WrappedArray$ofRef; local class incompatible
> ---
>
> Key: SPARK-36869
> URL: https://issues.apache.org/jira/browse/SPARK-36869
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.2
> Environment: * RHEL 8.4
>  * Java 11.0.12
>  * Spark 3.1.2 (only prebuilt with *2.12.10)*
>  * Scala *2.12.14* for the application code
>Reporter: Hamid EL MAAZOUZ
>Priority: Blocker
>  Labels: scala, serialization, spark
>
> This is a Scala problem. It has been already reported here 
> [https://github.com/scala/bug/issues/5046] and a fix has been merged here 
> [https://github.com/scala/scala/pull/9166.|https://github.com/scala/scala/pull/9166]
> According to 
> [https://github.com/scala/bug/issues/5046#issuecomment-928108088], the *fix* 
> is available on *Scala 2.12.14*, but *Spark 3.0+* is only pre-built with 
> Scala *2.12.10*.
>  
>  * Stacktrace of the failure: (Taken from stderr of a worker process)
> {code:java}
> Spark Executor Command: "/usr/java/jdk-11.0.12/bin/java" "-cp" 
> "/opt/apache/spark-3.1.2-bin-hadoop3.2/conf/:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/*"
>  "-Xmx1024M" "-Dspark.driver.port=45887" 
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
> "spark://CoarseGrainedScheduler@192.168.0.191:45887" "--executor-id" "0" 
> "--hostname" "192.168.0.191" "--cores" "12" "--app-id" 
> "app-20210927231035-" "--worker-url" "spark://Worker@192.168.0.191:35261"
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 21/09/27 23:10:36 INFO CoarseGrainedExecutorBackend: Started daemon with 
> process name: 18957@localhost
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for TERM
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for HUP
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for INT
> 21/09/27 23:10:36 WARN Utils: Your hostname, localhost resolves to a loopback 
> address: 127.0.0.1; using 192.168.0.191 instead (on interface wlp82s0)
> 21/09/27 23:10:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) 
> to constructor java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 21/09/27 23:10:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls to: hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls to: 
> hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: 
> Set(hamidelmaazouz); groups with view permissions: Set(); users  with modify 
> permissions: Set(hamidelmaazouz); groups with modify permissions: Set()
> 21/09/27 23:10:37 INFO TransportClientFactory: Successfully created 
> connection to /192.168.0.191:45887 after 44 ms (0 ms spent in bootstraps)
> 21/09/27 23:10:37 WARN TransportChannelHandler: Exception in connection from 
> /192.168.0.191:45887
> java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; 
> local class incompatible: stream classdesc serialVersionUID = 
> 3456489343829468865, local class serialVersionUID = 1028182004549731694
>   at 
> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
>   at 
> java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012)
>   at 
> java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
>   at 
> java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169)
>   at 
> java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
>   at 
> java.base/

[jira] [Created] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-29 Thread Mridul Muralidharan (Jira)

Mridul Muralidharan created SPARK-36892:
---

 Summary: Disable batch fetch for a shuffle when push based shuffle 
is enabled
 Key: SPARK-36892
 URL: https://issues.apache.org/jira/browse/SPARK-36892
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.0
Reporter: Mridul Muralidharan


When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
output happens.
Unfortunately, this currently interacts badly with 
spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle fetch 
to hang and/or duplicate data to be fetched, causing correctness issues.

Given batch fetch does not benefit spark stages reading merged blocks when push 
based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can be 
disabled when push based shuffle is enabled.


Thx to [~Ngone51] for surfacing this issue.
+CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36845) Inline type hint files

2021-09-29 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422371#comment-17422371
 ] 

Maciej Szymkiewicz edited comment on SPARK-36845 at 9/29/21, 8:14 PM:
--

I'd recommend adding SPARK-36145 as blocker for this, so we can proceed cleanly 
with

{code:python}

from __future__ import annotations

{code}

and avoid all the quoting in in-lined annotations.

See [PEP 563|https://www.python.org/dev/peps/pep-0563/#id6]


was (Author: zero323):
I'd recommend adding SPARK-36145 as blocker for this, so we can proceed cleanly 
with

{code:python}

from __future__ import annotations

{code}

and avoid all the quoting in in-lined annotations.



> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36845) Inline type hint files

2021-09-29 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422371#comment-17422371
 ] 

Maciej Szymkiewicz edited comment on SPARK-36845 at 9/29/21, 8:13 PM:
--

I'd recommend adding SPARK-36145 as blocker for this, so we can proceed cleanly 
with

{code:python}

from __future__ import annotations

{code}

and avoid all the quoting in in-lined annotations.




was (Author: zero323):
I'd recommend adding SPARK-36145 as blocker for this, so we can proceed cleanly 
with {{from __future__ import annotations}} and avoid all the quoting in 
inlined annotations.



> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36845) Inline type hint files

2021-09-29 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422371#comment-17422371
 ] 

Maciej Szymkiewicz commented on SPARK-36845:


I'd recommend adding SPARK-36145 as blocker for this, so we can proceed cleanly 
with {{from __future__ import annotations}} and avoid all the quoting in 
inlined annotations.



> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36830) Read/write dataframes with ANSI intervals from/to JSON files

2021-09-29 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422341#comment-17422341
 ] 

Max Gekk commented on SPARK-36830:
--

[~sarutak] FYI, I don't plan to work on this. Please, feel free to take this.

> Read/write dataframes with ANSI intervals from/to JSON files
> 
>
> Key: SPARK-36830
> URL: https://issues.apache.org/jira/browse/SPARK-36830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to Parquet datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36883) Upgrade R version to 4.1.1 in CI images

2021-09-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36883.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34138
[https://github.com/apache/spark/pull/34138]

> Upgrade R version to 4.1.1 in CI images
> ---
>
> Key: SPARK-36883
> URL: https://issues.apache.org/jira/browse/SPARK-36883
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> https://developer.r-project.org/#:~:text=Release%20plans,on%202021%2D08%2D10.
> R 4.1.1 is released. We might better to test the latest version of R with 
> SparkR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36883) Upgrade R version to 4.1.1 in CI images

2021-09-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36883:
-

Assignee: Dongjoon Hyun

> Upgrade R version to 4.1.1 in CI images
> ---
>
> Key: SPARK-36883
> URL: https://issues.apache.org/jira/browse/SPARK-36883
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Dongjoon Hyun
>Priority: Major
>
> https://developer.r-project.org/#:~:text=Release%20plans,on%202021%2D08%2D10.
> R 4.1.1 is released. We might better to test the latest version of R with 
> SparkR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-36831.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34142
[https://github.com/apache/spark/pull/34142]

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.3.0
>
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-36831:


Assignee: Kousuke Saruta

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36891) Add new test suite to cover Parquet decoding

2021-09-29 Thread Chao Sun (Jira)

Chao Sun created SPARK-36891:


 Summary: Add new test suite to cover Parquet decoding
 Key: SPARK-36891
 URL: https://issues.apache.org/jira/browse/SPARK-36891
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.3.0
Reporter: Chao Sun


Add a new test suite to add more coverage for Parquet vectorized decoding, 
focusing on different combinations of Parquet column index, dictionary, batch 
size, page size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36869) Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible

2021-09-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422294#comment-17422294
 ] 

Dongjoon Hyun commented on SPARK-36869:
---

Thank you for your confirmation, [~hamidelmaazouz]. As you noticed at the RC 
number `6`, Apache Spark 3.2.0 is almost ready.
> However, Scala 2.12.15 against the RC6 jars work fine 

> Spark job fails due to java.io.InvalidClassException: 
> scala.collection.mutable.WrappedArray$ofRef; local class incompatible
> ---
>
> Key: SPARK-36869
> URL: https://issues.apache.org/jira/browse/SPARK-36869
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.2
> Environment: * RHEL 8.4
>  * Java 11.0.12
>  * Spark 3.1.2 (only prebuilt with *2.12.10)*
>  * Scala *2.12.14* for the application code
>Reporter: Hamid EL MAAZOUZ
>Priority: Blocker
>  Labels: scala, serialization, spark
>
> This is a Scala problem. It has been already reported here 
> [https://github.com/scala/bug/issues/5046] and a fix has been merged here 
> [https://github.com/scala/scala/pull/9166.|https://github.com/scala/scala/pull/9166]
> According to 
> [https://github.com/scala/bug/issues/5046#issuecomment-928108088], the *fix* 
> is available on *Scala 2.12.14*, but *Spark 3.0+* is only pre-built with 
> Scala *2.12.10*.
>  
>  * Stacktrace of the failure: (Taken from stderr of a worker process)
> {code:java}
> Spark Executor Command: "/usr/java/jdk-11.0.12/bin/java" "-cp" 
> "/opt/apache/spark-3.1.2-bin-hadoop3.2/conf/:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/*"
>  "-Xmx1024M" "-Dspark.driver.port=45887" 
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
> "spark://CoarseGrainedScheduler@192.168.0.191:45887" "--executor-id" "0" 
> "--hostname" "192.168.0.191" "--cores" "12" "--app-id" 
> "app-20210927231035-" "--worker-url" "spark://Worker@192.168.0.191:35261"
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 21/09/27 23:10:36 INFO CoarseGrainedExecutorBackend: Started daemon with 
> process name: 18957@localhost
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for TERM
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for HUP
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for INT
> 21/09/27 23:10:36 WARN Utils: Your hostname, localhost resolves to a loopback 
> address: 127.0.0.1; using 192.168.0.191 instead (on interface wlp82s0)
> 21/09/27 23:10:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) 
> to constructor java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 21/09/27 23:10:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls to: hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls to: 
> hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: 
> Set(hamidelmaazouz); groups with view permissions: Set(); users  with modify 
> permissions: Set(hamidelmaazouz); groups with modify permissions: Set()
> 21/09/27 23:10:37 INFO TransportClientFactory: Successfully created 
> connection to /192.168.0.191:45887 after 44 ms (0 ms spent in bootstraps)
> 21/09/27 23:10:37 WARN TransportChannelHandler: Exception in connection from 
> /192.168.0.191:45887
> java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; 
> local class incompatible: stream classdesc serialVersionUID = 
> 3456489343829468865, local class serialVersionUID = 1028182004549731694
>   at 
> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
>   at 
> java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012)
>   at 
> java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
>   at 
> java.base/java.io.ObjectInputStream

[jira] [Commented] (SPARK-36890) Websocket timeouts to K8s-API

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422252#comment-17422252
 ] 

Apache Spark commented on SPARK-36890:
--

User 'Reamer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34143

> Websocket timeouts to K8s-API
> -
>
> Key: SPARK-36890
> URL: https://issues.apache.org/jira/browse/SPARK-36890
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2
>Reporter: Philipp Dallig
>Priority: Major
>
> If you access the Kubernetes API via a load balancer (e.g. HAProxy) and have 
> set a tunnel timeout, the following error message is thrown exactly after 
> each timeout.
> {code}
> >>> 21/09/27 15:35:19 WARN WatchConnectionManager: Exec Failure
> java.io.EOFException
> at okio.RealBufferedSource.require(RealBufferedSource.java:61)
> at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
> at 
> okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
> at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
> at 
> okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
> at 
> okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is quite annoying when working interactively with a paused 
> pySpark shell where the driver component runs locally but the executors run 
> in Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36890) Websocket timeouts to K8s-API

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36890:


Assignee: (was: Apache Spark)

> Websocket timeouts to K8s-API
> -
>
> Key: SPARK-36890
> URL: https://issues.apache.org/jira/browse/SPARK-36890
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2
>Reporter: Philipp Dallig
>Priority: Major
>
> If you access the Kubernetes API via a load balancer (e.g. HAProxy) and have 
> set a tunnel timeout, the following error message is thrown exactly after 
> each timeout.
> {code}
> >>> 21/09/27 15:35:19 WARN WatchConnectionManager: Exec Failure
> java.io.EOFException
> at okio.RealBufferedSource.require(RealBufferedSource.java:61)
> at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
> at 
> okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
> at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
> at 
> okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
> at 
> okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is quite annoying when working interactively with a paused 
> pySpark shell where the driver component runs locally but the executors run 
> in Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36890) Websocket timeouts to K8s-API

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422253#comment-17422253
 ] 

Apache Spark commented on SPARK-36890:
--

User 'Reamer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34143

> Websocket timeouts to K8s-API
> -
>
> Key: SPARK-36890
> URL: https://issues.apache.org/jira/browse/SPARK-36890
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2
>Reporter: Philipp Dallig
>Priority: Major
>
> If you access the Kubernetes API via a load balancer (e.g. HAProxy) and have 
> set a tunnel timeout, the following error message is thrown exactly after 
> each timeout.
> {code}
> >>> 21/09/27 15:35:19 WARN WatchConnectionManager: Exec Failure
> java.io.EOFException
> at okio.RealBufferedSource.require(RealBufferedSource.java:61)
> at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
> at 
> okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
> at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
> at 
> okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
> at 
> okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is quite annoying when working interactively with a paused 
> pySpark shell where the driver component runs locally but the executors run 
> in Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36890) Websocket timeouts to K8s-API

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36890:


Assignee: Apache Spark

> Websocket timeouts to K8s-API
> -
>
> Key: SPARK-36890
> URL: https://issues.apache.org/jira/browse/SPARK-36890
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2
>Reporter: Philipp Dallig
>Assignee: Apache Spark
>Priority: Major
>
> If you access the Kubernetes API via a load balancer (e.g. HAProxy) and have 
> set a tunnel timeout, the following error message is thrown exactly after 
> each timeout.
> {code}
> >>> 21/09/27 15:35:19 WARN WatchConnectionManager: Exec Failure
> java.io.EOFException
> at okio.RealBufferedSource.require(RealBufferedSource.java:61)
> at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
> at 
> okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
> at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
> at 
> okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
> at 
> okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is quite annoying when working interactively with a paused 
> pySpark shell where the driver component runs locally but the executors run 
> in Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36890) Websocket timeouts to K8s-API

2021-09-29 Thread Philipp Dallig (Jira)

Philipp Dallig created SPARK-36890:
--

 Summary: Websocket timeouts to K8s-API
 Key: SPARK-36890
 URL: https://issues.apache.org/jira/browse/SPARK-36890
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0, 2.4.8, 
2.4.7, 2.4.6, 2.4.5, 2.4.4, 2.4.3, 2.4.2, 2.4.1, 2.4.0, 2.3.4, 2.3.3, 2.3.2, 
2.3.1, 2.3.0
Reporter: Philipp Dallig


If you access the Kubernetes API via a load balancer (e.g. HAProxy) and have 
set a tunnel timeout, the following error message is thrown exactly after each 
timeout.
{code}
>>> 21/09/27 15:35:19 WARN WatchConnectionManager: Exec Failure
java.io.EOFException
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
at 
okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
at 
okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

This exception is quite annoying when working interactively with a paused 
pySpark shell where the driver component runs locally but the executors run in 
Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36869) Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible

2021-09-29 Thread Hamid EL MAAZOUZ (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422240#comment-17422240
 ] 

Hamid EL MAAZOUZ commented on SPARK-36869:
--

Testing Scala 2.12.15 against old Spark jars (3.1.2 pulled from mvn repos) 
version fails similarly (serialVersionUID mismatch for a different class)

However, Scala 2.12.15 against the RC6 jars work fine :)

> Spark job fails due to java.io.InvalidClassException: 
> scala.collection.mutable.WrappedArray$ofRef; local class incompatible
> ---
>
> Key: SPARK-36869
> URL: https://issues.apache.org/jira/browse/SPARK-36869
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.2
> Environment: * RHEL 8.4
>  * Java 11.0.12
>  * Spark 3.1.2 (only prebuilt with *2.12.10)*
>  * Scala *2.12.14* for the application code
>Reporter: Hamid EL MAAZOUZ
>Priority: Blocker
>  Labels: scala, serialization, spark
>
> This is a Scala problem. It has been already reported here 
> [https://github.com/scala/bug/issues/5046] and a fix has been merged here 
> [https://github.com/scala/scala/pull/9166.|https://github.com/scala/scala/pull/9166]
> According to 
> [https://github.com/scala/bug/issues/5046#issuecomment-928108088], the *fix* 
> is available on *Scala 2.12.14*, but *Spark 3.0+* is only pre-built with 
> Scala *2.12.10*.
>  
>  * Stacktrace of the failure: (Taken from stderr of a worker process)
> {code:java}
> Spark Executor Command: "/usr/java/jdk-11.0.12/bin/java" "-cp" 
> "/opt/apache/spark-3.1.2-bin-hadoop3.2/conf/:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/*"
>  "-Xmx1024M" "-Dspark.driver.port=45887" 
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
> "spark://CoarseGrainedScheduler@192.168.0.191:45887" "--executor-id" "0" 
> "--hostname" "192.168.0.191" "--cores" "12" "--app-id" 
> "app-20210927231035-" "--worker-url" "spark://Worker@192.168.0.191:35261"
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 21/09/27 23:10:36 INFO CoarseGrainedExecutorBackend: Started daemon with 
> process name: 18957@localhost
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for TERM
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for HUP
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for INT
> 21/09/27 23:10:36 WARN Utils: Your hostname, localhost resolves to a loopback 
> address: 127.0.0.1; using 192.168.0.191 instead (on interface wlp82s0)
> 21/09/27 23:10:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) 
> to constructor java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 21/09/27 23:10:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls to: hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls to: 
> hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: 
> Set(hamidelmaazouz); groups with view permissions: Set(); users  with modify 
> permissions: Set(hamidelmaazouz); groups with modify permissions: Set()
> 21/09/27 23:10:37 INFO TransportClientFactory: Successfully created 
> connection to /192.168.0.191:45887 after 44 ms (0 ms spent in bootstraps)
> 21/09/27 23:10:37 WARN TransportChannelHandler: Exception in connection from 
> /192.168.0.191:45887
> java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; 
> local class incompatible: stream classdesc serialVersionUID = 
> 3456489343829468865, local class serialVersionUID = 1028182004549731694
>   at 
> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
>   at 
> java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012)
>   at 
> java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
>   at 
> java

[jira] [Resolved] (SPARK-36624) When application killed, sc should not exit with code 0

2021-09-29 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-36624.
---
Fix Version/s: 3.3.0
 Assignee: angerszhu
   Resolution: Fixed

> When application killed, sc should not exit with code 0
> ---
>
> Key: SPARK-36624
> URL: https://issues.apache.org/jira/browse/SPARK-36624
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> When application killed, sc should not exit with code 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36869) Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible

2021-09-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422215#comment-17422215
 ] 

Dongjoon Hyun commented on SPARK-36869:
---

You can test Scala 2.12.15 with Apache Spark 3.2.0 RC6 binaries.
- https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/

> Spark job fails due to java.io.InvalidClassException: 
> scala.collection.mutable.WrappedArray$ofRef; local class incompatible
> ---
>
> Key: SPARK-36869
> URL: https://issues.apache.org/jira/browse/SPARK-36869
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.2
> Environment: * RHEL 8.4
>  * Java 11.0.12
>  * Spark 3.1.2 (only prebuilt with *2.12.10)*
>  * Scala *2.12.14* for the application code
>Reporter: Hamid EL MAAZOUZ
>Priority: Blocker
>  Labels: scala, serialization, spark
>
> This is a Scala problem. It has been already reported here 
> [https://github.com/scala/bug/issues/5046] and a fix has been merged here 
> [https://github.com/scala/scala/pull/9166.|https://github.com/scala/scala/pull/9166]
> According to 
> [https://github.com/scala/bug/issues/5046#issuecomment-928108088], the *fix* 
> is available on *Scala 2.12.14*, but *Spark 3.0+* is only pre-built with 
> Scala *2.12.10*.
>  
>  * Stacktrace of the failure: (Taken from stderr of a worker process)
> {code:java}
> Spark Executor Command: "/usr/java/jdk-11.0.12/bin/java" "-cp" 
> "/opt/apache/spark-3.1.2-bin-hadoop3.2/conf/:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/*"
>  "-Xmx1024M" "-Dspark.driver.port=45887" 
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
> "spark://CoarseGrainedScheduler@192.168.0.191:45887" "--executor-id" "0" 
> "--hostname" "192.168.0.191" "--cores" "12" "--app-id" 
> "app-20210927231035-" "--worker-url" "spark://Worker@192.168.0.191:35261"
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 21/09/27 23:10:36 INFO CoarseGrainedExecutorBackend: Started daemon with 
> process name: 18957@localhost
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for TERM
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for HUP
> 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for INT
> 21/09/27 23:10:36 WARN Utils: Your hostname, localhost resolves to a loopback 
> address: 127.0.0.1; using 192.168.0.191 instead (on interface wlp82s0)
> 21/09/27 23:10:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) 
> to constructor java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 21/09/27 23:10:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls to: hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls to: 
> hamidelmaazouz
> 21/09/27 23:10:36 INFO SecurityManager: Changing view acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls groups to: 
> 21/09/27 23:10:36 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: 
> Set(hamidelmaazouz); groups with view permissions: Set(); users  with modify 
> permissions: Set(hamidelmaazouz); groups with modify permissions: Set()
> 21/09/27 23:10:37 INFO TransportClientFactory: Successfully created 
> connection to /192.168.0.191:45887 after 44 ms (0 ms spent in bootstraps)
> 21/09/27 23:10:37 WARN TransportChannelHandler: Exception in connection from 
> /192.168.0.191:45887
> java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; 
> local class incompatible: stream classdesc serialVersionUID = 
> 3456489343829468865, local class serialVersionUID = 1028182004549731694
>   at 
> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
>   at 
> java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012)
>   at 
> java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
>   at 
> java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169)
>

[jira] [Assigned] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36831:


Assignee: (was: Apache Spark)

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36831:


Assignee: Apache Spark

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422117#comment-17422117
 ] 

Apache Spark commented on SPARK-36831:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34142

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36550) Propagation cause when UDF reflection fails

2021-09-29 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-36550:


Assignee: dzcxzl

> Propagation cause when UDF reflection fails
> ---
>
> Key: SPARK-36550
> URL: https://issues.apache.org/jira/browse/SPARK-36550
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
>
> Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
> not a specific exception.
> {code:java}
> Error in query: No handler for Hive UDF 'XXX': 
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36550) Propagation cause when UDF reflection fails

2021-09-29 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-36550.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33796
[https://github.com/apache/spark/pull/33796]

> Propagation cause when UDF reflection fails
> ---
>
> Key: SPARK-36550
> URL: https://issues.apache.org/jira/browse/SPARK-36550
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.3.0
>
>
> Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
> not a specific exception.
> {code:java}
> Error in query: No handler for Hive UDF 'XXX': 
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Kousuke Saruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422108#comment-17422108
 ] 

Kousuke Saruta commented on SPARK-36831:


Thank you. I'll open a PR.

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36831) Read/write dataframes with ANSI intervals from/to CSV files

2021-09-29 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422093#comment-17422093
 ] 

Max Gekk commented on SPARK-36831:
--

[~sarutak] No, feel free to take this.

> Read/write dataframes with ANSI intervals from/to CSV files
> ---
>
> Key: SPARK-36831
> URL: https://issues.apache.org/jira/browse/SPARK-36831
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Implement writing and reading ANSI intervals (year-month and day-time 
> intervals) columns in dataframes to CSV datasources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36424) Support eliminate limits in AQE Optimizer

2021-09-29 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-36424:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Improvement)

> Support eliminate limits in AQE Optimizer
> -
>
> Key: SPARK-36424
> URL: https://issues.apache.org/jira/browse/SPARK-36424
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> In Ad-hoc scenario, we always add limit for the query if user have no special 
> limit value, but not all limit is nesessary.
> With the power of AQE, we can eliminate limits using running statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 114 matches

Mail list logo