[jira] [Resolved] (SPARK-35683) Fix Index.difference to avoid collect 'other' to driver side

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35683. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32853

[jira] [Updated] (SPARK-35696) Refine the code examples in the pandas API on Spark documentation.

2021-06-14 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35696: Summary: Refine the code examples in the pandas API on Spark documentation. (was: Refine the

[jira] [Commented] (SPARK-35696) Refine the code examples in the pandas APIs on Spark documentation.

2021-06-14 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363377#comment-17363377 ] Haejoon Lee commented on SPARK-35696: - I'm working on this > Refine the code examples in the pandas

[jira] [Commented] (SPARK-35064) Group exception messages in spark/sql (catalyst)

2021-06-14 Thread dgd_contributor (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363375#comment-17363375 ] dgd_contributor commented on SPARK-35064: - I would like to work on this > Group exception

[jira] [Updated] (SPARK-35762) Errors while using spark-sql read hive 3.1 orc table

2021-06-14 Thread laokong (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] laokong updated SPARK-35762: Description: === recreate problem steps 1.create orc table in  hive ``` hive> drop table demo; OK

[jira] [Updated] (SPARK-35762) Errors while using spark-sql read hive 3.1 orc table

2021-06-14 Thread laokong (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] laokong updated SPARK-35762: Attachment: full-stack-trace.log > Errors while using spark-sql read hive 3.1 orc table >

[jira] [Updated] (SPARK-35762) Errors while using spark-sql read hive 3.1 orc table

2021-06-14 Thread laokong (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] laokong updated SPARK-35762: Description: problem *** === recreate problem 1.create orc table in  hive ``` hive> drop table

[jira] [Created] (SPARK-35762) Errors while using spark-sql read hive 3.1 orc table

2021-06-14 Thread laokong (Jira)
laokong created SPARK-35762: --- Summary: Errors while using spark-sql read hive 3.1 orc table Key: SPARK-35762 URL: https://issues.apache.org/jira/browse/SPARK-35762 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-35622) DataFrame's count function do not need groupBy and avoid shuffle

2021-06-14 Thread dgd_contributor (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363336#comment-17363336 ] dgd_contributor commented on SPARK-35622: - Run a benchmark on my computer, df.rdd.count()

[jira] [Resolved] (SPARK-35678) add a common softmax function

2021-06-14 Thread zhengruifeng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-35678. -- Resolution: Resolved > add a common softmax function > - > >

[jira] [Commented] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-06-14 Thread Qi Zhu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363318#comment-17363318 ] Qi Zhu commented on SPARK-35426: Thanks [~mshen] for clarify, i will check the corresponding code.  >

[jira] [Assigned] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35761: Assignee: Takuya Ueshin > Use type-annotation based pandas_udf or avoid specifying udf

[jira] [Resolved] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35761. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32913

[jira] [Commented] (SPARK-35758) Spark Core doesn't build when selecting -Dhadoop.version=2.x from Spark 3.1.1

2021-06-14 Thread Kousuke Saruta (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363307#comment-17363307 ] Kousuke Saruta commented on SPARK-35758: [~ferranjr] Could you build with -Phadoop-2.7 ? I can

[jira] [Assigned] (SPARK-35750) Rename "pandas APIs on Spark" to "pandas API on Spark" in the documents

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35750: Assignee: Haejoon Lee > Rename "pandas APIs on Spark" to "pandas API on Spark" in the

[jira] [Resolved] (SPARK-35750) Rename "pandas APIs on Spark" to "pandas API on Spark" in the documents

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35750. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32903

[jira] [Assigned] (SPARK-35759) Remove the upperbound for numpy for pandas-on-Spark

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35759: Assignee: Takuya Ueshin > Remove the upperbound for numpy for pandas-on-Spark >

[jira] [Resolved] (SPARK-35759) Remove the upperbound for numpy for pandas-on-Spark

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35759. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32908

[jira] [Resolved] (SPARK-35755) Use higher PyArrow in GitHub Actions build

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35755. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32906

[jira] [Resolved] (SPARK-35616) Make astype data-type-based

2021-06-14 Thread Takuya Ueshin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-35616. --- Fix Version/s: 3.2.0 Assignee: Xinrong Meng Resolution: Fixed Issue

[jira] [Commented] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363288#comment-17363288 ] Apache Spark commented on SPARK-35761: -- User 'ueshin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35761: Assignee: Apache Spark > Use type-annotation based pandas_udf or avoid specifying udf

[jira] [Commented] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363287#comment-17363287 ] Apache Spark commented on SPARK-35761: -- User 'ueshin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35761: Assignee: (was: Apache Spark) > Use type-annotation based pandas_udf or avoid

[jira] [Created] (SPARK-35761) Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings.

2021-06-14 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35761: - Summary: Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings. Key: SPARK-35761 URL: https://issues.apache.org/jira/browse/SPARK-35761

[jira] [Commented] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363232#comment-17363232 ] Apache Spark commented on SPARK-35429: -- User 'sumeetgajjar' has created a pull request for this

[jira] [Commented] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363233#comment-17363233 ] Apache Spark commented on SPARK-35429: -- User 'sumeetgajjar' has created a pull request for this

[jira] [Commented] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-06-14 Thread Min Shen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363231#comment-17363231 ] Min Shen commented on SPARK-35426: -- When a merger is removed from the retained list, it only prevents

[jira] [Commented] (SPARK-35760) Fix the max rows check for broadcast exchange

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363230#comment-17363230 ] Apache Spark commented on SPARK-35760: -- User 'c21' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35760) Fix the max rows check for broadcast exchange

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35760: Assignee: (was: Apache Spark) > Fix the max rows check for broadcast exchange >

[jira] [Assigned] (SPARK-35760) Fix the max rows check for broadcast exchange

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35760: Assignee: Apache Spark > Fix the max rows check for broadcast exchange >

[jira] [Commented] (SPARK-35760) Fix the max rows check for broadcast exchange

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363229#comment-17363229 ] Apache Spark commented on SPARK-35760: -- User 'c21' has created a pull request for this issue:

[jira] [Created] (SPARK-35760) Fix the max rows check for broadcast exchange

2021-06-14 Thread Cheng Su (Jira)
Cheng Su created SPARK-35760: Summary: Fix the max rows check for broadcast exchange Key: SPARK-35760 URL: https://issues.apache.org/jira/browse/SPARK-35760 Project: Spark Issue Type:

[jira] [Assigned] (SPARK-35614) Make the conversion to pandas data-type-based for ExtensionDtypes

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35614: Assignee: Apache Spark > Make the conversion to pandas data-type-based for

[jira] [Assigned] (SPARK-35614) Make the conversion to pandas data-type-based for ExtensionDtypes

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35614: Assignee: (was: Apache Spark) > Make the conversion to pandas data-type-based for

[jira] [Commented] (SPARK-35614) Make the conversion to pandas data-type-based for ExtensionDtypes

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363225#comment-17363225 ] Apache Spark commented on SPARK-35614: -- User 'xinrong-databricks' has created a pull request for

[jira] [Updated] (SPARK-35614) Make the conversion to pandas data-type-based for ExtensionDtypes

2021-06-14 Thread Xinrong Meng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35614: - Summary: Make the conversion to pandas data-type-based for ExtensionDtypes (was: Make the

[jira] [Commented] (SPARK-35680) Support fields by year-month interval type

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363207#comment-17363207 ] Apache Spark commented on SPARK-35680: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35759) Remove the upperbound for numpy for pandas-on-Spark

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35759: Assignee: Apache Spark > Remove the upperbound for numpy for pandas-on-Spark >

[jira] [Commented] (SPARK-35759) Remove the upperbound for numpy for pandas-on-Spark

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363143#comment-17363143 ] Apache Spark commented on SPARK-35759: -- User 'ueshin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35759) Remove the upperbound for numpy for pandas-on-Spark

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35759: Assignee: (was: Apache Spark) > Remove the upperbound for numpy for pandas-on-Spark

[jira] [Created] (SPARK-35759) Remove the upperbound for numpy for pandas-on-Spark

2021-06-14 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35759: - Summary: Remove the upperbound for numpy for pandas-on-Spark Key: SPARK-35759 URL: https://issues.apache.org/jira/browse/SPARK-35759 Project: Spark Issue

[jira] [Commented] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs

2021-06-14 Thread Sumeet (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363131#comment-17363131 ] Sumeet commented on SPARK-35429: Re-opening this Jira since Spark upgraded to Hive 2.3.9 which no longer

[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders

2021-06-14 Thread Steven Aerts (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363119#comment-17363119 ] Steven Aerts commented on SPARK-35744: -- [~xkrogen] in the past we use them solely in RDD's as I

[jira] [Commented] (SPARK-35688) GeneratePredicate eliminate will fail in some case

2021-06-14 Thread Fu Chen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363080#comment-17363080 ] Fu Chen commented on SPARK-35688: - I think the subexpressions evaluations in the SpecificPredicate

[jira] [Commented] (SPARK-35688) GeneratePredicate eliminate will fail in some case

2021-06-14 Thread Fu Chen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363077#comment-17363077 ] Fu Chen commented on SPARK-35688: - Hi, [~maropu] , [~hyukjin.kwon] . I can reproduce the bug by

[jira] [Commented] (SPARK-29683) Job failed due to executor failures all available nodes are blacklisted

2021-06-14 Thread Jogesh Anand (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363055#comment-17363055 ] Jogesh Anand commented on SPARK-29683: -- Facing the same issue with 3.0.1 with streaming:

[jira] [Assigned] (SPARK-35757) Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35757: Assignee: Apache Spark > Add bitwise AND operation to BitArray and add intersect AND

[jira] [Commented] (SPARK-35757) Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363041#comment-17363041 ] Apache Spark commented on SPARK-35757: -- User 'kudhru' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35757) Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35757: Assignee: (was: Apache Spark) > Add bitwise AND operation to BitArray and add

[jira] [Commented] (SPARK-35757) Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363042#comment-17363042 ] Apache Spark commented on SPARK-35757: -- User 'kudhru' has created a pull request for this issue:

[jira] [Created] (SPARK-35758) Spark Core doesn't build when selecting -Dhadoop.version=2.x from Spark 3.1.1

2021-06-14 Thread Ferran Puig-Calvache (Jira)
Ferran Puig-Calvache created SPARK-35758: Summary: Spark Core doesn't build when selecting -Dhadoop.version=2.x from Spark 3.1.1 Key: SPARK-35758 URL: https://issues.apache.org/jira/browse/SPARK-35758

[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders

2021-06-14 Thread Erik Krogen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363029#comment-17363029 ] Erik Krogen commented on SPARK-35744: - [~steven.aerts] can you elaborate on where you're using

[jira] [Updated] (SPARK-35757) Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters

2021-06-14 Thread Dhruv Kumar (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruv Kumar updated SPARK-35757: Issue Type: Improvement (was: New Feature) > Add bitwise AND operation to BitArray and add

[jira] [Created] (SPARK-35757) Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters

2021-06-14 Thread Dhruv Kumar (Jira)
Dhruv Kumar created SPARK-35757: --- Summary: Add bitwise AND operation to BitArray and add intersect AND operation for bloom filters Key: SPARK-35757 URL: https://issues.apache.org/jira/browse/SPARK-35757

[jira] [Assigned] (SPARK-35755) Use higher PyArrow in GitHub Actions build

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35755: Assignee: Apache Spark (was: Hyukjin Kwon) > Use higher PyArrow in GitHub Actions build

[jira] [Updated] (SPARK-35755) Use higher PyArrow in GitHub Actions build

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35755: - Summary: Use higher PyArrow in GitHub Actions build (was: Use PyArrow 4.x in GitHub Actions

[jira] [Commented] (SPARK-35755) Use higher PyArrow in GitHub Actions build

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363007#comment-17363007 ] Apache Spark commented on SPARK-35755: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-35755) Use higher PyArrow in GitHub Actions build

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35755: Assignee: Hyukjin Kwon (was: Apache Spark) > Use higher PyArrow in GitHub Actions build

[jira] [Updated] (SPARK-35755) Use PyArrow 4.x in GitHub Actions build

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35755: - Summary: Use PyArrow 4.x in GitHub Actions build (was: Use PyArrow 3.x for Python 3.8 in

[jira] [Updated] (SPARK-35755) Use PyArrow 4.x in GitHub Actions build

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35755: - Description: We set the upperbound of PyArrow at SPARK-33190 that's fixed in SPARK-33189. With

[jira] [Created] (SPARK-35756) unionByName should support nested struct also

2021-06-14 Thread Wassim Almaaoui (Jira)
Wassim Almaaoui created SPARK-35756: --- Summary: unionByName should support nested struct also Key: SPARK-35756 URL: https://issues.apache.org/jira/browse/SPARK-35756 Project: Spark Issue

[jira] [Created] (SPARK-35755) Use PyArrow 3.x for Python 3.8 in GitHub Actions build

2021-06-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-35755: Summary: Use PyArrow 3.x for Python 3.8 in GitHub Actions build Key: SPARK-35755 URL: https://issues.apache.org/jira/browse/SPARK-35755 Project: Spark Issue

[jira] [Updated] (SPARK-35755) Use PyArrow 3.x for Python 3.8 in GitHub Actions build

2021-06-14 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35755: - Issue Type: Test (was: Improvement) > Use PyArrow 3.x for Python 3.8 in GitHub Actions build >

[jira] [Commented] (SPARK-35737) Parse day-time interval literals to tightest types

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362964#comment-17362964 ] Apache Spark commented on SPARK-35737: -- User 'sarutak' has created a pull request for this issue:

[jira] [Commented] (SPARK-35737) Parse day-time interval literals to tightest types

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362963#comment-17362963 ] Apache Spark commented on SPARK-35737: -- User 'sarutak' has created a pull request for this issue:

[jira] [Commented] (SPARK-35736) Parse any day-time interval types in SQL

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362962#comment-17362962 ] Apache Spark commented on SPARK-35736: -- User 'sarutak' has created a pull request for this issue:

[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-06-14 Thread Brandon Dahler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362955#comment-17362955 ] Brandon Dahler commented on SPARK-35739: I can do array instead of List, note that it'll have to

[jira] [Reopened] (SPARK-35745) Serie to Scalar pandas_udf in GroupedData.agg() breaks the following monotonically_increasing_id()

2021-06-14 Thread Hadrien Glaude (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hadrien Glaude reopened SPARK-35745: The ticket has been marked as resolved because marking the udf function as non-deterministic

[jira] [Resolved] (SPARK-35748) Fix StreamingJoinHelper to be able to handle day-time interval

2021-06-14 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35748. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32896

[jira] [Commented] (SPARK-35745) Serie to Scalar pandas_udf in GroupedData.agg() breaks the following monotonically_increasing_id()

2021-06-14 Thread Hadrien Glaude (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362929#comment-17362929 ] Hadrien Glaude commented on SPARK-35745: > This is the correct way to avoid this problem. How

[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-14 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362910#comment-17362910 ] Robert Joseph Evans commented on SPARK-35563: - [~dc-heros] Thanks for looking into this. I

[jira] [Commented] (SPARK-33122) Remove redundant aggregates in the Optimzier

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362901#comment-17362901 ] Apache Spark commented on SPARK-33122: -- User 'tanelk' has created a pull request for this issue:

[jira] [Commented] (SPARK-35750) Rename "pandas APIs on Spark" to "pandas API on Spark" in the documents

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362864#comment-17362864 ] Apache Spark commented on SPARK-35750: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-35750) Rename "pandas APIs on Spark" to "pandas API on Spark" in the documents

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35750: Assignee: (was: Apache Spark) > Rename "pandas APIs on Spark" to "pandas API on

[jira] [Commented] (SPARK-35750) Rename "pandas APIs on Spark" to "pandas API on Spark" in the documents

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362862#comment-17362862 ] Apache Spark commented on SPARK-35750: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-35750) Rename "pandas APIs on Spark" to "pandas API on Spark" in the documents

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35750: Assignee: Apache Spark > Rename "pandas APIs on Spark" to "pandas API on Spark" in the

[jira] [Commented] (SPARK-35754) Put blocks only on disk while migrating RDD cached data

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362860#comment-17362860 ] Apache Spark commented on SPARK-35754: -- User 'q2w' has created a pull request for this issue:

[jira] [Commented] (SPARK-35754) Put blocks only on disk while migrating RDD cached data

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362859#comment-17362859 ] Apache Spark commented on SPARK-35754: -- User 'q2w' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35754) Put blocks only on disk while migrating RDD cached data

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35754: Assignee: Apache Spark > Put blocks only on disk while migrating RDD cached data >

[jira] [Assigned] (SPARK-35754) Put blocks only on disk while migrating RDD cached data

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35754: Assignee: (was: Apache Spark) > Put blocks only on disk while migrating RDD cached

[jira] [Created] (SPARK-35754) Put blocks only on disk while migrating RDD cached data

2021-06-14 Thread abhishek kumar tiwari (Jira)
abhishek kumar tiwari created SPARK-35754: - Summary: Put blocks only on disk while migrating RDD cached data Key: SPARK-35754 URL: https://issues.apache.org/jira/browse/SPARK-35754 Project:

[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-14 Thread dgd_contributor (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362824#comment-17362824 ] dgd_contributor commented on SPARK-35563: - After looking to this, I found out rowNumber in 

[jira] [Commented] (SPARK-35753) Increase stacksize in Maven for AppVeyor build

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362819#comment-17362819 ] Apache Spark commented on SPARK-35753: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-35753) Increase stacksize in Maven for AppVeyor build

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35753: Assignee: Apache Spark > Increase stacksize in Maven for AppVeyor build >

[jira] [Assigned] (SPARK-35753) Increase stacksize in Maven for AppVeyor build

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35753: Assignee: (was: Apache Spark) > Increase stacksize in Maven for AppVeyor build >

[jira] [Created] (SPARK-35753) Increase stacksize in Maven for AppVeyor build

2021-06-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-35753: Summary: Increase stacksize in Maven for AppVeyor build Key: SPARK-35753 URL: https://issues.apache.org/jira/browse/SPARK-35753 Project: Spark Issue Type:

[jira] [Commented] (SPARK-35752) Clean up unused code in getLocalInputVariableValues

2021-06-14 Thread L. C. Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362805#comment-17362805 ] L. C. Hsieh commented on SPARK-35752: - Found exceptional case. Seems invalid. > Clean up unused

[jira] [Resolved] (SPARK-35752) Clean up unused code in getLocalInputVariableValues

2021-06-14 Thread L. C. Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-35752. - Resolution: Invalid > Clean up unused code in getLocalInputVariableValues >

[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders

2021-06-14 Thread Steven Aerts (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362804#comment-17362804 ] Steven Aerts commented on SPARK-35744: -- [~Gengliang.Wang] in the avro java/scala world there are

[jira] [Created] (SPARK-35752) Clean up unused code in getLocalInputVariableValues

2021-06-14 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-35752: --- Summary: Clean up unused code in getLocalInputVariableValues Key: SPARK-35752 URL: https://issues.apache.org/jira/browse/SPARK-35752 Project: Spark Issue

[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders

2021-06-14 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362780#comment-17362780 ] Gengliang Wang commented on SPARK-35744: [~steven.aerts] Thanks for reporting. Could you tell

[jira] [Commented] (SPARK-32891) Enhance UTF8String.trim

2021-06-14 Thread dgd_contributor (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362766#comment-17362766 ] dgd_contributor commented on SPARK-32891: - After looking into this and run a few benchmark, I

[jira] [Resolved] (SPARK-35737) Parse day-time interval literals to tightest types

2021-06-14 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35737. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32892

[jira] [Assigned] (SPARK-35737) Parse day-time interval literals to tightest types

2021-06-14 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-35737: Assignee: Kousuke Saruta > Parse day-time interval literals to tightest types >

[jira] [Commented] (SPARK-35751) Support Joint eviction strategies for cached RDD partitions

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362730#comment-17362730 ] Apache Spark commented on SPARK-35751: -- User 'qfoxzjd' has created a pull request for this issue:

[jira] [Assigned] (SPARK-35751) Support Joint eviction strategies for cached RDD partitions

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35751: Assignee: Apache Spark > Support Joint eviction strategies for cached RDD partitions >

[jira] [Assigned] (SPARK-35751) Support Joint eviction strategies for cached RDD partitions

2021-06-14 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35751: Assignee: (was: Apache Spark) > Support Joint eviction strategies for cached RDD

[jira] [Created] (SPARK-35751) Support Joint eviction strategies for cached RDD partitions

2021-06-14 Thread JindongZhang (Jira)
JindongZhang created SPARK-35751: Summary: Support Joint eviction strategies for cached RDD partitions Key: SPARK-35751 URL: https://issues.apache.org/jira/browse/SPARK-35751 Project: Spark