[jira] [Assigned] (SPARK-36255) FileNotFound exceptions from the shuffle push can cause the executor to terminate

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36255:


Assignee: (was: Apache Spark)

> FileNotFound exceptions from the shuffle push can cause the executor to 
> terminate
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> When the shuffle files are cleaned up by the executors once a job in a Spark 
> application completes, the push of the shuffle data by the executor can throw 
> FileNotFound exception. When this exception is thrown from the 
> {{shuffle-block-push-thread}}, it causes the executor to fail. This is 
> because of the default uncaught exception handler for Spark daemon threads 
> which terminates the executor when there are uncaught exceptions for the 
> daemon threads.
> {code:java}
> 21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[block-push-thread-1,5,main]
> java.lang.Error: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer
> {file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
> at 
> org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
> at 
> org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ... 2 more
> Caused by: java.io.FileNotFoundException: 
> **/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
>  (No such file or directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
> {code}
> We can address the issue by handling "FileNotFound" exceptions in the push 
> threads and netty threads by stopping the push when {{FileNotFound}} is 
> encountered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36257) Updated the version of TimestampNTZ related changes as 3.3.0

2021-07-21 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-36257:
--

 Summary: Updated the version of TimestampNTZ related changes as 
3.3.0
 Key: SPARK-36257
 URL: https://issues.apache.org/jira/browse/SPARK-36257
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


As we decided to release TimestampNTZ type in Spark 3.3, we should update the 
versions of TimestampNTZ related changes as 3.3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36255) FileNotFound exceptions from the shuffle push can cause the executor to terminate

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385280#comment-17385280
 ] 

Apache Spark commented on SPARK-36255:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/33477

> FileNotFound exceptions from the shuffle push can cause the executor to 
> terminate
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> When the shuffle files are cleaned up by the executors once a job in a Spark 
> application completes, the push of the shuffle data by the executor can throw 
> FileNotFound exception. When this exception is thrown from the 
> {{shuffle-block-push-thread}}, it causes the executor to fail. This is 
> because of the default uncaught exception handler for Spark daemon threads 
> which terminates the executor when there are uncaught exceptions for the 
> daemon threads.
> {code:java}
> 21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[block-push-thread-1,5,main]
> java.lang.Error: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer
> {file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
> at 
> org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
> at 
> org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ... 2 more
> Caused by: java.io.FileNotFoundException: 
> **/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
>  (No such file or directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
> {code}
> We can address the issue by handling "FileNotFound" exceptions in the push 
> threads and netty threads by stopping the push when {{FileNotFound}} is 
> encountered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36255) FileNotFound exceptions from the shuffle push can cause the executor to terminate

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36255:


Assignee: Apache Spark

> FileNotFound exceptions from the shuffle push can cause the executor to 
> terminate
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Assignee: Apache Spark
>Priority: Major
>
> When the shuffle files are cleaned up by the executors once a job in a Spark 
> application completes, the push of the shuffle data by the executor can throw 
> FileNotFound exception. When this exception is thrown from the 
> {{shuffle-block-push-thread}}, it causes the executor to fail. This is 
> because of the default uncaught exception handler for Spark daemon threads 
> which terminates the executor when there are uncaught exceptions for the 
> daemon threads.
> {code:java}
> 21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[block-push-thread-1,5,main]
> java.lang.Error: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer
> {file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
> at 
> org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
> at 
> org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ... 2 more
> Caused by: java.io.FileNotFoundException: 
> **/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
>  (No such file or directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
> {code}
> We can address the issue by handling "FileNotFound" exceptions in the push 
> threads and netty threads by stopping the push when {{FileNotFound}} is 
> encountered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36255) FileNotFound exceptions from the shuffle push can cause the executor to terminate

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385279#comment-17385279
 ] 

Apache Spark commented on SPARK-36255:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/33477

> FileNotFound exceptions from the shuffle push can cause the executor to 
> terminate
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> When the shuffle files are cleaned up by the executors once a job in a Spark 
> application completes, the push of the shuffle data by the executor can throw 
> FileNotFound exception. When this exception is thrown from the 
> {{shuffle-block-push-thread}}, it causes the executor to fail. This is 
> because of the default uncaught exception handler for Spark daemon threads 
> which terminates the executor when there are uncaught exceptions for the 
> daemon threads.
> {code:java}
> 21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[block-push-thread-1,5,main]
> java.lang.Error: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer
> {file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
> at 
> org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
> at 
> org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ... 2 more
> Caused by: java.io.FileNotFoundException: 
> **/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
>  (No such file or directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
> {code}
> We can address the issue by handling "FileNotFound" exceptions in the push 
> threads and netty threads by stopping the push when {{FileNotFound}} is 
> encountered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36256) Upgrade lz4-java to 1.8.0

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385278#comment-17385278
 ] 

Apache Spark commented on SPARK-36256:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33476

> Upgrade lz4-java to 1.8.0
> -
>
> Key: SPARK-36256
> URL: https://issues.apache.org/jira/browse/SPARK-36256
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> lz4-java 1.8.0 was released, which includes not only performance improvement 
> but also Darwin aarch64 support.
> https://github.com/lz4/lz4-java/releases/tag/1.8.0
> https://github.com/lz4/lz4-java/blob/1.8.0/CHANGES.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36256) Upgrade lz4-java to 1.8.0

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36256:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Upgrade lz4-java to 1.8.0
> -
>
> Key: SPARK-36256
> URL: https://issues.apache.org/jira/browse/SPARK-36256
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> lz4-java 1.8.0 was released, which includes not only performance improvement 
> but also Darwin aarch64 support.
> https://github.com/lz4/lz4-java/releases/tag/1.8.0
> https://github.com/lz4/lz4-java/blob/1.8.0/CHANGES.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36256) Upgrade lz4-java to 1.8.0

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385277#comment-17385277
 ] 

Apache Spark commented on SPARK-36256:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33476

> Upgrade lz4-java to 1.8.0
> -
>
> Key: SPARK-36256
> URL: https://issues.apache.org/jira/browse/SPARK-36256
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> lz4-java 1.8.0 was released, which includes not only performance improvement 
> but also Darwin aarch64 support.
> https://github.com/lz4/lz4-java/releases/tag/1.8.0
> https://github.com/lz4/lz4-java/blob/1.8.0/CHANGES.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36256) Upgrade lz4-java to 1.8.0

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36256:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Upgrade lz4-java to 1.8.0
> -
>
> Key: SPARK-36256
> URL: https://issues.apache.org/jira/browse/SPARK-36256
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> lz4-java 1.8.0 was released, which includes not only performance improvement 
> but also Darwin aarch64 support.
> https://github.com/lz4/lz4-java/releases/tag/1.8.0
> https://github.com/lz4/lz4-java/blob/1.8.0/CHANGES.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36246) WorkerDecommissionExtendedSuite flakes with GHA

2021-07-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385275#comment-17385275
 ] 

Hyukjin Kwon commented on SPARK-36246:
--

Fixed in https://github.com/apache/spark/pull/33467

> WorkerDecommissionExtendedSuite flakes with GHA
> ---
>
> Key: SPARK-36246
> URL: https://issues.apache.org/jira/browse/SPARK-36246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36246) WorkerDecommissionExtendedSuite flakes with GHA

2021-07-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36246.
--
Fix Version/s: 3.1.3
   3.2.0
   Resolution: Fixed

> WorkerDecommissionExtendedSuite flakes with GHA
> ---
>
> Key: SPARK-36246
> URL: https://issues.apache.org/jira/browse/SPARK-36246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36256) Upgrade lz4-java to 1.8.0

2021-07-21 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-36256:
--

 Summary: Upgrade lz4-java to 1.8.0
 Key: SPARK-36256
 URL: https://issues.apache.org/jira/browse/SPARK-36256
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


lz4-java 1.8.0 was released, which includes not only performance improvement 
but also Darwin aarch64 support.
https://github.com/lz4/lz4-java/releases/tag/1.8.0
https://github.com/lz4/lz4-java/blob/1.8.0/CHANGES.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36249) Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36249:


Assignee: Apache Spark

> Add remove_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36249
> URL: https://issues.apache.org/jira/browse/SPARK-36249
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36249) Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36249:


Assignee: (was: Apache Spark)

> Add remove_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36249
> URL: https://issues.apache.org/jira/browse/SPARK-36249
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36249) Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385273#comment-17385273
 ] 

Apache Spark commented on SPARK-36249:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33474

> Add remove_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36249
> URL: https://issues.apache.org/jira/browse/SPARK-36249
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36255) FileNotFound exceptions from the shuffle push can cause the executor to terminate

2021-07-21 Thread Chandni Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-36255:
--
Summary: FileNotFound exceptions from the shuffle push can cause the 
executor to terminate  (was: FileNotFound exceptions in the Shuffle-push-thread 
can cause the executor to fail)

> FileNotFound exceptions from the shuffle push can cause the executor to 
> terminate
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> When the shuffle files are cleaned up by the executors once a job in a Spark 
> application completes, the push of the shuffle data by the executor can throw 
> FileNotFound exception. When this exception is thrown from the 
> {{shuffle-block-push-thread}}, it causes the executor to fail. This is 
> because of the default uncaught exception handler for Spark daemon threads 
> which terminates the executor when there are uncaught exceptions for the 
> daemon threads.
> {code:java}
> 21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[block-push-thread-1,5,main]
> java.lang.Error: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer
> {file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
> at 
> org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
> at 
> org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
> at 
> org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ... 2 more
> Caused by: java.io.FileNotFoundException: 
> **/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
>  (No such file or directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
> {code}
> We can address the issue by handling "FileNotFound" exceptions in the push 
> threads and netty threads by stopping the push when {{FileNotFound}} is 
> encountered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36255) FileNotFound exceptions in the Shuffle-push-thread can cause the executor to fail

2021-07-21 Thread Chandni Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-36255:
--
Description: 
When the shuffle files are cleaned up by the executors once a job in a Spark 
application completes, the push of the shuffle data by the executor can throw 
FileNotFound exception. When this exception is thrown from the 
{{shuffle-block-push-thread}}, it causes the executor to fail. This is because 
of the default uncaught exception handler for Spark daemon threads which 
terminates the executor when there are uncaught exceptions for the daemon 
threads.
{code:java}
21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception 
in thread Thread[block-push-thread-1,5,main]
java.lang.Error: java.io.IOException: Error in opening FileSegmentManagedBuffer

{file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
 offset=10640, length=190}
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Error in opening 
FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
 offset=10640, length=190}

at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
at 
org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
at 
org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
at 
org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
at 
org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
... 2 more
Caused by: java.io.FileNotFoundException: 
**/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
 (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
{code}
We can address the issue by handling "FileNotFound" exceptions in the push 
threads and netty threads by stopping the push when {{FileNotFound}} is 
encountered.

  was:
When the shuffle files are cleaned up by the executors once a job in a Spark 
application completes, the push of the shuffle data by the executor can throw 
FileNotFound exception. When this exception is thrown from the 
{{shuffle-block-push-thread}}, it causes the executor to fail. This is because 
of the default uncaught exception handler for Spark daemon threads which 
terminates the executor when there are exceptions for the daemon threads.
{code:java}
21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception 
in thread Thread[block-push-thread-1,5,main]
java.lang.Error: java.io.IOException: Error in opening FileSegmentManagedBuffer

{file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
 offset=10640, length=190}
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Error in opening 
FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
 offset=10640, length=190}

at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
at 
org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
at 
org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
at 
org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
at 
org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
... 2 more
Caused by: java.io.FileNotFoundException: 
**/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
 (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at 
org.apache.spark.network.bu

[jira] [Updated] (SPARK-36255) FileNotFound exceptions in the Shuffle-push-thread can cause the executor to fail

2021-07-21 Thread Chandni Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-36255:
--
Description: 
When the shuffle files are cleaned up by the executors once a job in a Spark 
application completes, the push of the shuffle data by the executor can throw 
FileNotFound exception. When this exception is thrown from the 
{{shuffle-block-push-thread}}, it causes the executor to fail. This is because 
of the default uncaught exception handler for Spark daemon threads which 
terminates the executor when there are exceptions for the daemon threads.
{code:java}
21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception 
in thread Thread[block-push-thread-1,5,main]
java.lang.Error: java.io.IOException: Error in opening FileSegmentManagedBuffer

{file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
 offset=10640, length=190}
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Error in opening 
FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
 offset=10640, length=190}

at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
at 
org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
at 
org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
at 
org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
at 
org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
... 2 more
Caused by: java.io.FileNotFoundException: 
**/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data
 (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
{code}
We can address the issue by handling "FileNotFound" exceptions in the push 
threads and netty threads by stopping the push when {{FileNotFound}} is 
encountered.

  was:When the shuffle files are cleaned up the executors once a job completes, 
the push of the shuffle data will throw FileNotFound exceptions. This exception 
when thrown from the {{shuffle-block-push-thread}} still causes the executor to 
fail. 


> FileNotFound exceptions in the Shuffle-push-thread can cause the executor to 
> fail
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> When the shuffle files are cleaned up by the executors once a job in a Spark 
> application completes, the push of the shuffle data by the executor can throw 
> FileNotFound exception. When this exception is thrown from the 
> {{shuffle-block-push-thread}}, it causes the executor to fail. This is 
> because of the default uncaught exception handler for Spark daemon threads 
> which terminates the executor when there are exceptions for the daemon 
> threads.
> {code:java}
> 21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[block-push-thread-1,5,main]
> java.lang.Error: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer
> {file=/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Error in opening 
> FileSegmentManagedBuffer\{file=***/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data,
>  offset=10640, length=190}
> at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
> at 
> org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
> at 
> org.apache.spark.shuffle.ShuffleWriter

[jira] [Resolved] (SPARK-36214) Add add_categories to CategoricalAccessor and CategoricalIndex.

2021-07-21 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36214.
---
Fix Version/s: 3.2.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 33470
https://github.com/apache/spark/pull/33470

> Add add_categories to CategoricalAccessor and CategoricalIndex.
> ---
>
> Key: SPARK-36214
> URL: https://issues.apache.org/jira/browse/SPARK-36214
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36255) FileNotFound exceptions in the Shuffle-push-thread can cause the executor to fail

2021-07-21 Thread Chandni Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-36255:
--
Parent: SPARK-30602
Issue Type: Sub-task  (was: Bug)

> FileNotFound exceptions in the Shuffle-push-thread can cause the executor to 
> fail
> -
>
> Key: SPARK-36255
> URL: https://issues.apache.org/jira/browse/SPARK-36255
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> When the shuffle files are cleaned up the executors once a job completes, the 
> push of the shuffle data will throw FileNotFound exceptions. This exception 
> when thrown from the {{shuffle-block-push-thread}} still causes the executor 
> to fail. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36255) FileNotFound exceptions in the Shuffle-push-thread can cause the executor to fail

2021-07-21 Thread Chandni Singh (Jira)
Chandni Singh created SPARK-36255:
-

 Summary: FileNotFound exceptions in the Shuffle-push-thread can 
cause the executor to fail
 Key: SPARK-36255
 URL: https://issues.apache.org/jira/browse/SPARK-36255
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.1.0
Reporter: Chandni Singh


When the shuffle files are cleaned up the executors once a job completes, the 
push of the shuffle data will throw FileNotFound exceptions. This exception 
when thrown from the {{shuffle-block-push-thread}} still causes the executor to 
fail. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36253:
-
Fix Version/s: 3.2.0

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.2.0
>
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36253.
--
Resolution: Fixed

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.2.0
>
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385263#comment-17385263
 ] 

Hyukjin Kwon commented on SPARK-36253:
--

Fixed in https://github.com/apache/spark/pull/33473

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36254) Install mlflow and delta in Github Actions CI

2021-07-21 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-36254:
---

 Summary: Install mlflow and delta in Github Actions CI
 Key: SPARK-36254
 URL: https://issues.apache.org/jira/browse/SPARK-36254
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


Since the pandas-on-Spark includes the mlflow and delta features and related 
tests, we should install the mlflow and delta in our Github Actions CI so that 
the test won't be skipped from Spark 3.2.

 

We should also add a logic to check Spark version, to install the mlflow and 
delta only for the Spark 3.2 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385245#comment-17385245
 ] 

Apache Spark commented on SPARK-36253:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/33473

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36253:


Assignee: Hyukjin Kwon  (was: Apache Spark)

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36253:


Assignee: Apache Spark  (was: Hyukjin Kwon)

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385243#comment-17385243
 ] 

Apache Spark commented on SPARK-36253:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/33473

> Document added version of pandas-on-Spark support
> -
>
> Key: SPARK-36253
> URL: https://issues.apache.org/jira/browse/SPARK-36253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36253) Document added version of pandas-on-Spark support

2021-07-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-36253:


 Summary: Document added version of pandas-on-Spark support
 Key: SPARK-36253
 URL: https://issues.apache.org/jira/browse/SPARK-36253
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon
Assignee: Hyukjin Kwon


we should add when we added the support of pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36252) Add log files rolling policy for driver running in cluster mode with spark standalone cluster

2021-07-21 Thread Jack Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Hu updated SPARK-36252:

Description: 
For a long running driver in cluster mode, there is no rolling policy, the log 
(stdout/stderr) may accupy lots of space, user needs an external tool to clean 
the old logs, it's not user friendly. 

For executor, following 5 configurations is used to control the log file 
rolling policy:
{code:java}
spark.executor.logs.rolling.maxRetainedFiles
spark.executor.logs.rolling.enableCompression
spark.executor.logs.rolling.maxSize
spark.executor.logs.rolling.strategy
spark.executor.logs.rolling.time.interval
{code}

For driver running in cluster mode:
1. reuse the executor settings
2. similar to executor: add following configurations (only works for 
stderr/stdout for driver in cluster mode)
{code:java}
spark.driver.logs.rolling.maxRetainedFiles
spark.driver.logs.rolling.enableCompression
spark.driver.logs.rolling.maxSize
spark.driver.logs.rolling.strategy
spark.driver.logs.rolling.time.interval
{code}

#2 seems better, do you agree?

  was:
For a long running driver in cluster mode, there is no rolling policy, the log 
(stdout/stderr) may accupy lots of space, user needs a external tool to clean 
the old logs, it's not friendly. 

For executor, following 5 configurations is used to control the log file 
rolling policy:
{code:java}
spark.executor.logs.rolling.maxRetainedFiles
spark.executor.logs.rolling.enableCompression
spark.executor.logs.rolling.maxSize
spark.executor.logs.rolling.strategy
spark.executor.logs.rolling.time.interval
{code}

For driver running in cluster mode:
1. reuse the executor settings
2. similar to executor: add following configurations (only works for 
stderr/stdout for driver in cluster mode)
{code:java}
spark.driver.logs.rolling.maxRetainedFiles
spark.driver.logs.rolling.enableCompression
spark.driver.logs.rolling.maxSize
spark.driver.logs.rolling.strategy
spark.driver.logs.rolling.time.interval
{code}

#2 seems better, do you agree?


> Add log files rolling policy for driver running in cluster mode with spark 
> standalone cluster
> -
>
> Key: SPARK-36252
> URL: https://issues.apache.org/jira/browse/SPARK-36252
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Jack Hu
>Priority: Major
>
> For a long running driver in cluster mode, there is no rolling policy, the 
> log (stdout/stderr) may accupy lots of space, user needs an external tool to 
> clean the old logs, it's not user friendly. 
> For executor, following 5 configurations is used to control the log file 
> rolling policy:
> {code:java}
> spark.executor.logs.rolling.maxRetainedFiles
> spark.executor.logs.rolling.enableCompression
> spark.executor.logs.rolling.maxSize
> spark.executor.logs.rolling.strategy
> spark.executor.logs.rolling.time.interval
> {code}
> For driver running in cluster mode:
> 1. reuse the executor settings
> 2. similar to executor: add following configurations (only works for 
> stderr/stdout for driver in cluster mode)
> {code:java}
> spark.driver.logs.rolling.maxRetainedFiles
> spark.driver.logs.rolling.enableCompression
> spark.driver.logs.rolling.maxSize
> spark.driver.logs.rolling.strategy
> spark.driver.logs.rolling.time.interval
> {code}
> #2 seems better, do you agree?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36252) Add log files rolling policy for driver running in cluster mode with spark standalone cluster

2021-07-21 Thread Jack Hu (Jira)
Jack Hu created SPARK-36252:
---

 Summary: Add log files rolling policy for driver running in 
cluster mode with spark standalone cluster
 Key: SPARK-36252
 URL: https://issues.apache.org/jira/browse/SPARK-36252
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Jack Hu


For a long running driver in cluster mode, there is no rolling policy, the log 
(stdout/stderr) may accupy lots of space, user needs a external tool to clean 
the old logs, it's not friendly. 

For executor, following 5 configurations is used to control the log file 
rolling policy:
{code:java}
spark.executor.logs.rolling.maxRetainedFiles
spark.executor.logs.rolling.enableCompression
spark.executor.logs.rolling.maxSize
spark.executor.logs.rolling.strategy
spark.executor.logs.rolling.time.interval
{code}

For driver running in cluster mode:
1. reuse the executor settings
2. similar to executor: add following configurations (only works for 
stderr/stdout for driver in cluster mode)
{code:java}
spark.driver.logs.rolling.maxRetainedFiles
spark.driver.logs.rolling.enableCompression
spark.driver.logs.rolling.maxSize
spark.driver.logs.rolling.strategy
spark.driver.logs.rolling.time.interval
{code}

#2 seems better, do you agree?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36063) Optimize OneRowRelation subqueries

2021-07-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36063.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33284
[https://github.com/apache/spark/pull/33284]

> Optimize OneRowRelation subqueries
> --
>
> Key: SPARK-36063
> URL: https://issues.apache.org/jira/browse/SPARK-36063
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Inline subqueries with OneRowRelation as leaf nodes instead of decorrelating 
> and rewriting them as left outer joins.
> Scalar subquery:
>  ```
>  SELECT (SELECT c1) FROM t1 -> SELECT c1 FROM t1
>  ```
> Lateral subquery:
>  ```
>  SELECT * FROM t1, LATERAL (SELECT c1, c2) -> SELECT c1, c2 , c1, c2 FROM t1
>  ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36063) Optimize OneRowRelation subqueries

2021-07-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36063:
---

Assignee: Allison Wang

> Optimize OneRowRelation subqueries
> --
>
> Key: SPARK-36063
> URL: https://issues.apache.org/jira/browse/SPARK-36063
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Inline subqueries with OneRowRelation as leaf nodes instead of decorrelating 
> and rewriting them as left outer joins.
> Scalar subquery:
>  ```
>  SELECT (SELECT c1) FROM t1 -> SELECT c1 FROM t1
>  ```
> Lateral subquery:
>  ```
>  SELECT * FROM t1, LATERAL (SELECT c1, c2) -> SELECT c1, c2 , c1, c2 FROM t1
>  ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36244) Upgrade zstd-jni to 1.5.0-3 to avoid a bug about buffer size calculation

2021-07-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36244.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33464
[https://github.com/apache/spark/pull/33464]

> Upgrade zstd-jni to 1.5.0-3 to avoid a bug about buffer size calculation
> 
>
> Key: SPARK-36244
> URL: https://issues.apache.org/jira/browse/SPARK-36244
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.2.0
>
>
> zstd-jni 1.5.0-3 was released few days ago.
> This release resolves an issue about buffer size calculation, which can 
> affect usage in Spark.
> https://github.com/luben/zstd-jni/releases/tag/v1.5.0-3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35912) [SQL] JSON read behavior is different depending on the cache setting when nullable is false.

2021-07-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35912.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33436
[https://github.com/apache/spark/pull/33436]

> [SQL] JSON read behavior is different depending on the cache setting when 
> nullable is false.
> 
>
> Key: SPARK-35912
> URL: https://issues.apache.org/jira/browse/SPARK-35912
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Heedo Lee
>Assignee: Fu Chen
>Priority: Minor
> Fix For: 3.3.0
>
>
> Below is the reproduced code.
>  
> {code:java}
> import org.apache.spark.sql.Encoders
>  
> case class TestSchema(x: Int, y: Int)
> case class BaseSchema(value: TestSchema)
>  
> val schema = Encoders.product[BaseSchema].schema
> val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS
> val jsonDS = spark.read.schema(schema).json(testDS)
> jsonDS.show
> +-+
> |value|
> +-+
> |{1, null}|
> |{2, null}|
> +-+
> jsonDS.cache.show
> +--+
> | value|
> +--+
> |{1, 0}|
> |{2, 0}|
> +--+
> {code}
>  
> The above result occurs when a schema is created with a nested StructType and 
> nullable of StructField is false.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35912) [SQL] JSON read behavior is different depending on the cache setting when nullable is false.

2021-07-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35912:


Assignee: Fu Chen

> [SQL] JSON read behavior is different depending on the cache setting when 
> nullable is false.
> 
>
> Key: SPARK-35912
> URL: https://issues.apache.org/jira/browse/SPARK-35912
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Heedo Lee
>Assignee: Fu Chen
>Priority: Minor
>
> Below is the reproduced code.
>  
> {code:java}
> import org.apache.spark.sql.Encoders
>  
> case class TestSchema(x: Int, y: Int)
> case class BaseSchema(value: TestSchema)
>  
> val schema = Encoders.product[BaseSchema].schema
> val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS
> val jsonDS = spark.read.schema(schema).json(testDS)
> jsonDS.show
> +-+
> |value|
> +-+
> |{1, null}|
> |{2, null}|
> +-+
> jsonDS.cache.show
> +--+
> | value|
> +--+
> |{1, 0}|
> |{2, 0}|
> +--+
> {code}
>  
> The above result occurs when a schema is created with a nested StructType and 
> nullable of StructField is false.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385176#comment-17385176
 ] 

Hyukjin Kwon commented on SPARK-32666:
--

Thanks [~shaneknapp]!!!

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.3.0
>
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36251) Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385174#comment-17385174
 ] 

Apache Spark commented on SPARK-36251:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/33472

> Cover GitHub Actions runs without SHA in testing script
> ---
>
> Key: SPARK-36251
> URL: https://issues.apache.org/jira/browse/SPARK-36251
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-36204 added the periodical jobs for branch-3.2 too but the job runs 
> without SHA being set.
> The test script should be able to handle this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36251) Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36251:


Assignee: (was: Apache Spark)

> Cover GitHub Actions runs without SHA in testing script
> ---
>
> Key: SPARK-36251
> URL: https://issues.apache.org/jira/browse/SPARK-36251
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-36204 added the periodical jobs for branch-3.2 too but the job runs 
> without SHA being set.
> The test script should be able to handle this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36251) Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385173#comment-17385173
 ] 

Apache Spark commented on SPARK-36251:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/33472

> Cover GitHub Actions runs without SHA in testing script
> ---
>
> Key: SPARK-36251
> URL: https://issues.apache.org/jira/browse/SPARK-36251
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-36204 added the periodical jobs for branch-3.2 too but the job runs 
> without SHA being set.
> The test script should be able to handle this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36251) Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36251:


Assignee: Apache Spark

> Cover GitHub Actions runs without SHA in testing script
> ---
>
> Key: SPARK-36251
> URL: https://issues.apache.org/jira/browse/SPARK-36251
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-36204 added the periodical jobs for branch-3.2 too but the job runs 
> without SHA being set.
> The test script should be able to handle this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36251) Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-36251:


 Summary: Cover GitHub Actions runs without SHA in testing script
 Key: SPARK-36251
 URL: https://issues.apache.org/jira/browse/SPARK-36251
 Project: Spark
  Issue Type: Test
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


SPARK-36204 added the periodical jobs for branch-3.2 too but the job runs 
without SHA being set.

The test script should be able to handle this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36250) Add support for running make-distribution without a "clean"

2021-07-21 Thread Holden Karau (Jira)
Holden Karau created SPARK-36250:


 Summary: Add support for running make-distribution without a 
"clean"
 Key: SPARK-36250
 URL: https://issues.apache.org/jira/browse/SPARK-36250
 Project: Spark
  Issue Type: Improvement
  Components: Build, Kubernetes
Affects Versions: 3.2.0, 3.3.0
Reporter: Holden Karau


Running the K8s integration tests requires* building a distribution, but clean 
builds are really really slow. We could make the param BUILD_COMMAND set if 
unset or add a --skip-clean to our shell script to allow for folks to more 
quickly test there k8s related changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36248) Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36248:


Assignee: Apache Spark

> Add rename_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36248
> URL: https://issues.apache.org/jira/browse/SPARK-36248
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Add rename_categories to CategoricalAccessor and CategoricalIndex to follow 
> pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36248) Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385161#comment-17385161
 ] 

Apache Spark commented on SPARK-36248:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/33471

> Add rename_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36248
> URL: https://issues.apache.org/jira/browse/SPARK-36248
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Add rename_categories to CategoricalAccessor and CategoricalIndex to follow 
> pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36248) Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36248:


Assignee: (was: Apache Spark)

> Add rename_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36248
> URL: https://issues.apache.org/jira/browse/SPARK-36248
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Add rename_categories to CategoricalAccessor and CategoricalIndex to follow 
> pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-33242.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33469
[https://github.com/apache/spark/pull/33469]

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.3.0
>
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-32391.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33469
[https://github.com/apache/spark/pull/33469]

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.3.0
>
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-32666.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33469
[https://github.com/apache/spark/pull/33469]

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.3.0
>
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32797) Install mypy on the Jenkins CI workers

2021-07-21 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-32797.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33469
[https://github.com/apache/spark/pull/33469]

> Install mypy on the Jenkins CI workers
> --
>
> Key: SPARK-32797
> URL: https://issues.apache.org/jira/browse/SPARK-32797
> Project: Spark
>  Issue Type: Improvement
>  Components: jenkins, PySpark
>Affects Versions: 3.1.0
>Reporter: Fokko Driesprong
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.3.0
>
>
> We want to check the types of the PySpark code. This requires mypy to be 
> installed on the CI. Can you do this [~shaneknapp]? 
> Related PR: [https://github.com/apache/spark/pull/29180]
> You can install this using pip: [https://pypi.org/project/mypy/] Should be 
> similar to flake8 and sphinx. The latest version is ok! Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385132#comment-17385132
 ] 

Apache Spark commented on SPARK-32666:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385133#comment-17385133
 ] 

Apache Spark commented on SPARK-32666:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32666:


Assignee: Shane Knapp  (was: Apache Spark)

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385131#comment-17385131
 ] 

Apache Spark commented on SPARK-32666:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385130#comment-17385130
 ] 

Apache Spark commented on SPARK-33242:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32666:


Assignee: Shane Knapp  (was: Apache Spark)

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385129#comment-17385129
 ] 

Apache Spark commented on SPARK-33242:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32666:


Assignee: Apache Spark  (was: Shane Knapp)

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33242:


Assignee: Shane Knapp  (was: Apache Spark)

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385128#comment-17385128
 ] 

Apache Spark commented on SPARK-33242:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33242:


Assignee: Apache Spark  (was: Shane Knapp)

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385127#comment-17385127
 ] 

Apache Spark commented on SPARK-32391:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32391:


Assignee: Apache Spark  (was: Shane Knapp)

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32391:


Assignee: Shane Knapp  (was: Apache Spark)

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32797) Install mypy on the Jenkins CI workers

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385124#comment-17385124
 ] 

Apache Spark commented on SPARK-32797:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install mypy on the Jenkins CI workers
> --
>
> Key: SPARK-32797
> URL: https://issues.apache.org/jira/browse/SPARK-32797
> Project: Spark
>  Issue Type: Improvement
>  Components: jenkins, PySpark
>Affects Versions: 3.1.0
>Reporter: Fokko Driesprong
>Assignee: Shane Knapp
>Priority: Major
>
> We want to check the types of the PySpark code. This requires mypy to be 
> installed on the CI. Can you do this [~shaneknapp]? 
> Related PR: [https://github.com/apache/spark/pull/29180]
> You can install this using pip: [https://pypi.org/project/mypy/] Should be 
> similar to flake8 and sphinx. The latest version is ok! Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385125#comment-17385125
 ] 

Apache Spark commented on SPARK-32391:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385126#comment-17385126
 ] 

Apache Spark commented on SPARK-32391:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32797) Install mypy on the Jenkins CI workers

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32797:


Assignee: Apache Spark  (was: Shane Knapp)

> Install mypy on the Jenkins CI workers
> --
>
> Key: SPARK-32797
> URL: https://issues.apache.org/jira/browse/SPARK-32797
> Project: Spark
>  Issue Type: Improvement
>  Components: jenkins, PySpark
>Affects Versions: 3.1.0
>Reporter: Fokko Driesprong
>Assignee: Apache Spark
>Priority: Major
>
> We want to check the types of the PySpark code. This requires mypy to be 
> installed on the CI. Can you do this [~shaneknapp]? 
> Related PR: [https://github.com/apache/spark/pull/29180]
> You can install this using pip: [https://pypi.org/project/mypy/] Should be 
> similar to flake8 and sphinx. The latest version is ok! Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32797) Install mypy on the Jenkins CI workers

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32797:


Assignee: Shane Knapp  (was: Apache Spark)

> Install mypy on the Jenkins CI workers
> --
>
> Key: SPARK-32797
> URL: https://issues.apache.org/jira/browse/SPARK-32797
> Project: Spark
>  Issue Type: Improvement
>  Components: jenkins, PySpark
>Affects Versions: 3.1.0
>Reporter: Fokko Driesprong
>Assignee: Shane Knapp
>Priority: Major
>
> We want to check the types of the PySpark code. This requires mypy to be 
> installed on the CI. Can you do this [~shaneknapp]? 
> Related PR: [https://github.com/apache/spark/pull/29180]
> You can install this using pip: [https://pypi.org/project/mypy/] Should be 
> similar to flake8 and sphinx. The latest version is ok! Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32797) Install mypy on the Jenkins CI workers

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385123#comment-17385123
 ] 

Apache Spark commented on SPARK-32797:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/33469

> Install mypy on the Jenkins CI workers
> --
>
> Key: SPARK-32797
> URL: https://issues.apache.org/jira/browse/SPARK-32797
> Project: Spark
>  Issue Type: Improvement
>  Components: jenkins, PySpark
>Affects Versions: 3.1.0
>Reporter: Fokko Driesprong
>Assignee: Shane Knapp
>Priority: Major
>
> We want to check the types of the PySpark code. This requires mypy to be 
> installed on the CI. Can you do this [~shaneknapp]? 
> Related PR: [https://github.com/apache/spark/pull/29180]
> You can install this using pip: [https://pypi.org/project/mypy/] Should be 
> similar to flake8 and sphinx. The latest version is ok! Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31162) Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing

2021-07-21 Thread Ashish Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385071#comment-17385071
 ] 

Ashish Singh commented on SPARK-31162:
--

This is needed for reasons other than supporting hive bucketing write. For 
example, this is also needed to make sure custom partitioners from Hive (using 
Hive udf) can partition similar to hive.

Assigning it to myself, but let me know if you are working on this already 
[~maropu].

> Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing
> -
>
> Key: SPARK-31162
> URL: https://issues.apache.org/jira/browse/SPARK-31162
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Felix Kizhakkel Jose
>Priority: Major
>
> I couldn't find a configuration parameter to choose Hive Hashing instead of 
> Spark's default Murmur Hash when performing Spark BucketBy operation. 
> According to the discussion with @[~maropu] [~hyukjin.kwon], suggested to 
> open a new JIRA. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36249) Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385069#comment-17385069
 ] 

Takuya Ueshin commented on SPARK-36249:
---

I'm working on this.

> Add remove_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36249
> URL: https://issues.apache.org/jira/browse/SPARK-36249
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36249) Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-36249:
-

 Summary: Add remove_categories to CategoricalAccessor and 
CategoricalIndex
 Key: SPARK-36249
 URL: https://issues.apache.org/jira/browse/SPARK-36249
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36214) Add add_categories to CategoricalAccessor and CategoricalIndex.

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385067#comment-17385067
 ] 

Apache Spark commented on SPARK-36214:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33470

> Add add_categories to CategoricalAccessor and CategoricalIndex.
> ---
>
> Key: SPARK-36214
> URL: https://issues.apache.org/jira/browse/SPARK-36214
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36214) Add add_categories to CategoricalAccessor and CategoricalIndex.

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36214:


Assignee: (was: Apache Spark)

> Add add_categories to CategoricalAccessor and CategoricalIndex.
> ---
>
> Key: SPARK-36214
> URL: https://issues.apache.org/jira/browse/SPARK-36214
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36214) Add add_categories to CategoricalAccessor and CategoricalIndex.

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36214:


Assignee: Apache Spark

> Add add_categories to CategoricalAccessor and CategoricalIndex.
> ---
>
> Key: SPARK-36214
> URL: https://issues.apache.org/jira/browse/SPARK-36214
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35546) Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way

2021-07-21 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-35546:
---

Assignee: Ye Zhou

> Enable push-based shuffle when multiple app attempts are enabled and manage 
> concurrent access to the state in a better way 
> ---
>
> Key: SPARK-35546
> URL: https://issues.apache.org/jira/browse/SPARK-35546
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Assignee: Ye Zhou
>Priority: Major
> Fix For: 3.2.0
>
>
> In the current implementation of RemoteBlockPushResolver, two 
> ConcurrentHashmap are used to store #1 applicationId -> 
> mergedShuffleLocalDirPath #2 applicationId+attemptId+shuffleID -> 
> mergedShuffleParitionInfo. As there are four types of messages: 
> ExecutorRegister, PushBlocks, FinalizeShuffleMerge and ApplicationRemove, 
> will trigger different types of operations within these two hashmaps, it is 
> required to maintain strong consistency about the informations stored in 
> these two hashmaps. Otherwise, either there will be data 
> corruption/correctness issues or memory leak in shuffle server. 
> We should come up with systematic way to resolve this, other than spot fixing 
> the potential issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36248) Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385063#comment-17385063
 ] 

Xinrong Meng commented on SPARK-36248:
--

I'm working on this.

> Add rename_categories to CategoricalAccessor and CategoricalIndex
> -
>
> Key: SPARK-36248
> URL: https://issues.apache.org/jira/browse/SPARK-36248
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Add rename_categories to CategoricalAccessor and CategoricalIndex to follow 
> pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36248) Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-36248:


 Summary: Add rename_categories to CategoricalAccessor and 
CategoricalIndex
 Key: SPARK-36248
 URL: https://issues.apache.org/jira/browse/SPARK-36248
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


Add rename_categories to CategoricalAccessor and CategoricalIndex to follow 
pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36188) Add categories setter to CategoricalAccessor and CategoricalIndex.

2021-07-21 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36188.
---
Fix Version/s: 3.2.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 33448
https://github.com/apache/spark/pull/33448

> Add categories setter to CategoricalAccessor and CategoricalIndex.
> --
>
> Key: SPARK-36188
> URL: https://issues.apache.org/jira/browse/SPARK-36188
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36247) check string length for char/varchar in UPDATE/MERGE command

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36247:


Assignee: (was: Apache Spark)

> check string length for char/varchar in UPDATE/MERGE command
> 
>
> Key: SPARK-36247
> URL: https://issues.apache.org/jira/browse/SPARK-36247
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36247) check string length for char/varchar in UPDATE/MERGE command

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36247:


Assignee: Apache Spark

> check string length for char/varchar in UPDATE/MERGE command
> 
>
> Key: SPARK-36247
> URL: https://issues.apache.org/jira/browse/SPARK-36247
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36247) check string length for char/varchar in UPDATE/MERGE command

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385045#comment-17385045
 ] 

Apache Spark commented on SPARK-36247:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33468

> check string length for char/varchar in UPDATE/MERGE command
> 
>
> Key: SPARK-36247
> URL: https://issues.apache.org/jira/browse/SPARK-36247
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36142) Adjust exponentiation between Series with missing values and bool literal to follow pandas

2021-07-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36142:
-
Summary: Adjust exponentiation between Series with missing values and bool 
literal to follow pandas  (was: Adjust exponentiation between ExtentionDtypes 
and bools to follow pandas)

> Adjust exponentiation between Series with missing values and bool literal to 
> follow pandas
> --
>
> Key: SPARK-36142
> URL: https://issues.apache.org/jira/browse/SPARK-36142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Currently, exponentiation between ExtentionDtypes and bools is not consistent 
> with pandas' behavior.
>  
> {code:java}
>  >>> pser = pd.Series([1, 2, np.nan], dtype=float)
>  >>> psser = ps.from_pandas(pser)
>  >>> pser ** False
>  0 1.0
>  1 1.0
>  2 1.0
>  dtype: float64
>  >>> psser ** False
>  0 1.0
>  1 1.0
>  2 NaN
>  dtype: float64
> {code}
> We ought to adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385044#comment-17385044
 ] 

Shane Knapp commented on SPARK-32391:
-

anyways, i installed this via conda and will roll out to all workers later this 
week.  :)

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36247) check string length for char/varchar in UPDATE/MERGE command

2021-07-21 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-36247:
---

 Summary: check string length for char/varchar in UPDATE/MERGE 
command
 Key: SPARK-36247
 URL: https://issues.apache.org/jira/browse/SPARK-36247
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34930) Install PyArrow and pandas on Jenkins

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385040#comment-17385040
 ] 

Shane Knapp commented on SPARK-34930:
-

oh yeah, a LOT of those skipped tests are for pypy3, not python3.6

> Install PyArrow and pandas on Jenkins
> -
>
> Key: SPARK-34930
> URL: https://issues.apache.org/jira/browse/SPARK-34930
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Critical
>
> Looks like Jenkins mahcines don't have pandas and PyArrow (ever since it got 
> upgraded?) which result in skipping related tests in PySpark, see also 
> https://github.com/apache/spark/pull/31470#issuecomment-811618571
> It would be great if we can install both in Python 3.6 on Jenkins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32797) Install mypy on the Jenkins CI workers

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385038#comment-17385038
 ] 

Shane Knapp commented on SPARK-32797:
-

ill roll this out (and other python package updates) later today/this week.

> Install mypy on the Jenkins CI workers
> --
>
> Key: SPARK-32797
> URL: https://issues.apache.org/jira/browse/SPARK-32797
> Project: Spark
>  Issue Type: Improvement
>  Components: jenkins, PySpark
>Affects Versions: 3.1.0
>Reporter: Fokko Driesprong
>Assignee: Shane Knapp
>Priority: Major
>
> We want to check the types of the PySpark code. This requires mypy to be 
> installed on the CI. Can you do this [~shaneknapp]? 
> Related PR: [https://github.com/apache/spark/pull/29180]
> You can install this using pip: [https://pypi.org/project/mypy/] Should be 
> similar to flake8 and sphinx. The latest version is ok! Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36246) WorkerDecommissionExtendedSuite flakes with GHA

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36246:


Assignee: Holden Karau  (was: Apache Spark)

> WorkerDecommissionExtendedSuite flakes with GHA
> ---
>
> Key: SPARK-36246
> URL: https://issues.apache.org/jira/browse/SPARK-36246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36246) WorkerDecommissionExtendedSuite flakes with GHA

2021-07-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385035#comment-17385035
 ] 

Apache Spark commented on SPARK-36246:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/33467

> WorkerDecommissionExtendedSuite flakes with GHA
> ---
>
> Key: SPARK-36246
> URL: https://issues.apache.org/jira/browse/SPARK-36246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36246) WorkerDecommissionExtendedSuite flakes with GHA

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36246:


Assignee: Apache Spark  (was: Holden Karau)

> WorkerDecommissionExtendedSuite flakes with GHA
> ---
>
> Key: SPARK-36246
> URL: https://issues.apache.org/jira/browse/SPARK-36246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29183) Upgrade JDK 11 Installation to 11.0.6

2021-07-21 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-29183.
-
Resolution: Fixed

this is done and all java11 installs are at 11.0.10

> Upgrade JDK 11 Installation to 11.0.6
> -
>
> Key: SPARK-29183
> URL: https://issues.apache.org/jira/browse/SPARK-29183
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> Every JDK 11.0.x releases have many fixes including performance regression 
> fix. We had better upgrade it to the latest 11.0.4.
> - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8221760



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34930) Install PyArrow and pandas on Jenkins

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385030#comment-17385030
 ] 

Shane Knapp commented on SPARK-34930:
-

pandas is installed, so i'm a little curious as to why the tests aren't running:
{noformat}
jenkins@research-jenkins-worker-01:~$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.__version__
'0.24.2'
>>>{noformat}
pyarrow is a much more complex install than just adding the package, and 
requires manual compilation.  i'll revisit pyarrow in the next couple of 
weeks...

> Install PyArrow and pandas on Jenkins
> -
>
> Key: SPARK-34930
> URL: https://issues.apache.org/jira/browse/SPARK-34930
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Critical
>
> Looks like Jenkins mahcines don't have pandas and PyArrow (ever since it got 
> upgraded?) which result in skipping related tests in PySpark, see also 
> https://github.com/apache/spark/pull/31470#issuecomment-811618571
> It would be great if we can install both in Python 3.6 on Jenkins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32391) Install pydata_sphinx_theme in Jenkins machines

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385026#comment-17385026
 ] 

Shane Knapp commented on SPARK-32391:
-

[~hyukjin.kwon] i am able to install this via conda...  any particular reason 
why you're requesting this through pip?

 

pydata-sphinx-theme-0.6.3 | pyhd8ed1ab_0 1.3 MB conda-forge

> Install pydata_sphinx_theme in Jenkins machines
> ---
>
> Key: SPARK-32391
> URL: https://issues.apache.org/jira/browse/SPARK-32391
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> After SPARK-32179, {{pydata_sphinx_theme}} 
> https://pypi.org/project/pydata-sphinx-theme/ is needed as a new Python 
> dependency for PySpark documentation build.
> We should install it in Jenkins to test PySpark documentation build in Python 
> 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32666) Install ipython and nbsphinx in Jenkins for Binder integration

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385021#comment-17385021
 ] 

Shane Knapp commented on SPARK-32666:
-

ill roll this out (and other python package updates) later today/this week.

> Install ipython and nbsphinx in Jenkins for Binder integration
> --
>
> Key: SPARK-32666
> URL: https://issues.apache.org/jira/browse/SPARK-32666
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> Binder integration requires IPython and nbsphinx to use the notebook file as 
> the documentation in PySpark.
> See SPARK-32204 and its PR for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33242) Install numpydoc in Jenkins machines

2021-07-21 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385022#comment-17385022
 ] 

Shane Knapp commented on SPARK-33242:
-

ill roll this out (and other python package updates) later today/this week.

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36246) WorkerDecommissionExtendedSuite flakes with GHA

2021-07-21 Thread Holden Karau (Jira)
Holden Karau created SPARK-36246:


 Summary: WorkerDecommissionExtendedSuite flakes with GHA
 Key: SPARK-36246
 URL: https://issues.apache.org/jira/browse/SPARK-36246
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.3.0
Reporter: Holden Karau
Assignee: Holden Karau






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36143) Adjust astype of Series with missing values to follow pandas

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36143:


Assignee: (was: Apache Spark)

> Adjust astype of Series with missing values to follow pandas
> 
>
> Key: SPARK-36143
> URL: https://issues.apache.org/jira/browse/SPARK-36143
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> {code:java}
> >>> pser = pd.Series([1, 2, np.nan], dtype=float)
> >>> psser = ps.from_pandas(pser)
> >>> pser.astype(int)
> ...
>  ValueError: Cannot convert non-finite values (NA or inf) to integer
> >>> psser.astype(int)
>  0 1.0
>  1 2.0
>  2 NaN
>  dtype: float64
> {code}
> As shown above, astype of Series with missing values doesn't behave the same 
> as pandas, we ought to adjust that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36143) Adjust astype of Series with missing values to follow pandas

2021-07-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36143:


Assignee: Apache Spark

> Adjust astype of Series with missing values to follow pandas
> 
>
> Key: SPARK-36143
> URL: https://issues.apache.org/jira/browse/SPARK-36143
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> >>> pser = pd.Series([1, 2, np.nan], dtype=float)
> >>> psser = ps.from_pandas(pser)
> >>> pser.astype(int)
> ...
>  ValueError: Cannot convert non-finite values (NA or inf) to integer
> >>> psser.astype(int)
>  0 1.0
>  1 2.0
>  2 NaN
>  dtype: float64
> {code}
> As shown above, astype of Series with missing values doesn't behave the same 
> as pandas, we ought to adjust that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >