[jira] [Created] (SPARK-46666) Make lxml as an optional testing dependency in test_session
Hyukjin Kwon created SPARK-4: Summary: Make lxml as an optional testing dependency in test_session Key: SPARK-4 URL: https://issues.apache.org/jira/browse/SPARK-4 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/__w/spark/spark/python/pyspark/sql/tests/test_session.py", line 22, in from lxml import etree ModuleNotFoundError: No module named 'lxml' {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46651) Split `FrameTakeTests`
[ https://issues.apache.org/jira/browse/SPARK-46651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46651: Assignee: Ruifeng Zheng > Split `FrameTakeTests` > -- > > Key: SPARK-46651 > URL: https://issues.apache.org/jira/browse/SPARK-46651 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46651) Split `FrameTakeTests`
[ https://issues.apache.org/jira/browse/SPARK-46651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46651. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44656 [https://github.com/apache/spark/pull/44656] > Split `FrameTakeTests` > -- > > Key: SPARK-46651 > URL: https://issues.apache.org/jira/browse/SPARK-46651 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46649) Run PyPy 3 and Python 3.10 tests independently
[ https://issues.apache.org/jira/browse/SPARK-46649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46649: Assignee: Hyukjin Kwon > Run PyPy 3 and Python 3.10 tests independently > -- > > Key: SPARK-46649 > URL: https://issues.apache.org/jira/browse/SPARK-46649 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/7462843546/job/20306241275 > Seems like it terminates in the middle because of OOM. we should split -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46645) Exclude unittest-xml-reporting in Python 3.12 image
[ https://issues.apache.org/jira/browse/SPARK-46645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46645: Assignee: Hyukjin Kwon > Exclude unittest-xml-reporting in Python 3.12 image > --- > > Key: SPARK-46645 > URL: https://issues.apache.org/jira/browse/SPARK-46645 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > unittest-xml-reporting seems not supporting, and this seems hiding the real > error: > {code} > File "/__w/spark/spark/python/pyspark/streaming/tests/test_kinesis.py", > line 118, in > unittest.main(testRunner=testRunner, verbosity=2) > File "/usr/lib/python3.12/unittest/main.py", line 105, in __init__ > self.runTests() > File "/usr/lib/python3.12/unittest/main.py", line 281, in runTests > self.result = testRunner.run(self.test) > ^ > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/runner.py", line > 67, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/case.py", line 692, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/case.py", line 662, in run > result.stopTest(self) > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 327, in stopTest > self.callback() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 235, in callback > test_info.test_finished() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 180, in test_finished > self.test_result.stop_time - self.test_result.start_time > ^^^ > AttributeError: '_XMLTestResult' object has no attribute 'start_time'. Did > you mean: 'stop_time'? > {code} > This is optional dependency in testing so we can exclude this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46649) Run PyPy 3 and Python 3.10 tests independently
[ https://issues.apache.org/jira/browse/SPARK-46649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46649. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44655 [https://github.com/apache/spark/pull/44655] > Run PyPy 3 and Python 3.10 tests independently > -- > > Key: SPARK-46649 > URL: https://issues.apache.org/jira/browse/SPARK-46649 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/runs/7462843546/job/20306241275 > Seems like it terminates in the middle because of OOM. we should split -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46645) Exclude unittest-xml-reporting in Python 3.12 image
[ https://issues.apache.org/jira/browse/SPARK-46645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46645. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44652 [https://github.com/apache/spark/pull/44652] > Exclude unittest-xml-reporting in Python 3.12 image > --- > > Key: SPARK-46645 > URL: https://issues.apache.org/jira/browse/SPARK-46645 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > unittest-xml-reporting seems not supporting, and this seems hiding the real > error: > {code} > File "/__w/spark/spark/python/pyspark/streaming/tests/test_kinesis.py", > line 118, in > unittest.main(testRunner=testRunner, verbosity=2) > File "/usr/lib/python3.12/unittest/main.py", line 105, in __init__ > self.runTests() > File "/usr/lib/python3.12/unittest/main.py", line 281, in runTests > self.result = testRunner.run(self.test) > ^ > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/runner.py", line > 67, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/case.py", line 692, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/case.py", line 662, in run > result.stopTest(self) > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 327, in stopTest > self.callback() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 235, in callback > test_info.test_finished() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 180, in test_finished > self.test_result.stop_time - self.test_result.start_time > ^^^ > AttributeError: '_XMLTestResult' object has no attribute 'start_time'. Did > you mean: 'stop_time'? > {code} > This is optional dependency in testing so we can exclude this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46649) Run PyPy 3 and Python 3.10 tests independently
[ https://issues.apache.org/jira/browse/SPARK-46649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46649: - Description: https://github.com/apache/spark/actions/runs/7462843546/job/20306241275 Seems like it terminates in the middle because of OOM. we should split > Run PyPy 3 and Python 3.10 tests independently > -- > > Key: SPARK-46649 > URL: https://issues.apache.org/jira/browse/SPARK-46649 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > https://github.com/apache/spark/actions/runs/7462843546/job/20306241275 > Seems like it terminates in the middle because of OOM. we should split -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46649) Run PyPy 3 and Python 3.10 tests independently
Hyukjin Kwon created SPARK-46649: Summary: Run PyPy 3 and Python 3.10 tests independently Key: SPARK-46649 URL: https://issues.apache.org/jira/browse/SPARK-46649 Project: Spark Issue Type: Test Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46536) Support GROUP BY calendar_interval_type
[ https://issues.apache.org/jira/browse/SPARK-46536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46536. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44538 [https://github.com/apache/spark/pull/44538] > Support GROUP BY calendar_interval_type > --- > > Key: SPARK-46536 > URL: https://issues.apache.org/jira/browse/SPARK-46536 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark GROUP BY only allows orderable data types, otherwise the > plan analysis fails: > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala#L197-L203] > However, this is too strict as GROUP BY only cares about equality, not > ordering. The CalendarInterval type is not orderable (1 month and 30 days, we > don't know which one is larger), but has well-defined equality. In fact, we > already support `SELECT DISTINCT calendar_interval_type` in some cases (when > hash aggregate is picked by the planner). > The proposal here is to officially support calendar interval type in GROUP > BY. We should relax the check inside `CheckAnalysis`, and make > `CalendarInterval` implements `Comparable` using natural ordering (compare > months first, then days, then seconds), and test with both hash aggregate and > sort aggregate. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46647) Add unittest-xml-reporting into Python 3.12 image
Hyukjin Kwon created SPARK-46647: Summary: Add unittest-xml-reporting into Python 3.12 image Key: SPARK-46647 URL: https://issues.apache.org/jira/browse/SPARK-46647 Project: Spark Issue Type: Test Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon unittest-xml-reporting seems not supporting Python 3.12. We should add it back once it supports it, see also SPARK-46645 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46645) Exclude unittest-xml-reporting in Python 3.12 image
Hyukjin Kwon created SPARK-46645: Summary: Exclude unittest-xml-reporting in Python 3.12 image Key: SPARK-46645 URL: https://issues.apache.org/jira/browse/SPARK-46645 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon unittest-xml-reporting seems not supporting, and this seems hiding the real error: {code} File "/__w/spark/spark/python/pyspark/streaming/tests/test_kinesis.py", line 118, in unittest.main(testRunner=testRunner, verbosity=2) File "/usr/lib/python3.12/unittest/main.py", line 105, in __init__ self.runTests() File "/usr/lib/python3.12/unittest/main.py", line 281, in runTests self.result = testRunner.run(self.test) ^ File "/usr/local/lib/python3.12/dist-packages/xmlrunner/runner.py", line 67, in run test(result) File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) ^^^ File "/usr/lib/python3.12/unittest/suite.py", line 122, in run test(result) File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) ^^^ File "/usr/lib/python3.12/unittest/suite.py", line 122, in run test(result) File "/usr/lib/python3.12/unittest/case.py", line 692, in __call__ return self.run(*args, **kwds) ^^^ File "/usr/lib/python3.12/unittest/case.py", line 662, in run result.stopTest(self) File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line 327, in stopTest self.callback() File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line 235, in callback test_info.test_finished() File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line 180, in test_finished self.test_result.stop_time - self.test_result.start_time ^^^ AttributeError: '_XMLTestResult' object has no attribute 'start_time'. Did you mean: 'stop_time'? {code} This is optional dependency in testing so we can exclude this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46645) Exclude unittest-xml-reporting in Python 3.12 image
[ https://issues.apache.org/jira/browse/SPARK-46645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46645: - Issue Type: Test (was: Improvement) > Exclude unittest-xml-reporting in Python 3.12 image > --- > > Key: SPARK-46645 > URL: https://issues.apache.org/jira/browse/SPARK-46645 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > unittest-xml-reporting seems not supporting, and this seems hiding the real > error: > {code} > File "/__w/spark/spark/python/pyspark/streaming/tests/test_kinesis.py", > line 118, in > unittest.main(testRunner=testRunner, verbosity=2) > File "/usr/lib/python3.12/unittest/main.py", line 105, in __init__ > self.runTests() > File "/usr/lib/python3.12/unittest/main.py", line 281, in runTests > self.result = testRunner.run(self.test) > ^ > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/runner.py", line > 67, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/case.py", line 692, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/case.py", line 662, in run > result.stopTest(self) > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 327, in stopTest > self.callback() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 235, in callback > test_info.test_finished() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 180, in test_finished > self.test_result.stop_time - self.test_result.start_time > ^^^ > AttributeError: '_XMLTestResult' object has no attribute 'start_time'. Did > you mean: 'stop_time'? > {code} > This is optional dependency in testing so we can exclude this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46645) Exclude unittest-xml-reporting in Python 3.12 image
[ https://issues.apache.org/jira/browse/SPARK-46645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46645: - Issue Type: Improvement (was: Bug) > Exclude unittest-xml-reporting in Python 3.12 image > --- > > Key: SPARK-46645 > URL: https://issues.apache.org/jira/browse/SPARK-46645 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > unittest-xml-reporting seems not supporting, and this seems hiding the real > error: > {code} > File "/__w/spark/spark/python/pyspark/streaming/tests/test_kinesis.py", > line 118, in > unittest.main(testRunner=testRunner, verbosity=2) > File "/usr/lib/python3.12/unittest/main.py", line 105, in __init__ > self.runTests() > File "/usr/lib/python3.12/unittest/main.py", line 281, in runTests > self.result = testRunner.run(self.test) > ^ > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/runner.py", line > 67, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/suite.py", line 122, in run > test(result) > File "/usr/lib/python3.12/unittest/case.py", line 692, in __call__ > return self.run(*args, **kwds) >^^^ > File "/usr/lib/python3.12/unittest/case.py", line 662, in run > result.stopTest(self) > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 327, in stopTest > self.callback() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 235, in callback > test_info.test_finished() > File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line > 180, in test_finished > self.test_result.stop_time - self.test_result.start_time > ^^^ > AttributeError: '_XMLTestResult' object has no attribute 'start_time'. Did > you mean: 'stop_time'? > {code} > This is optional dependency in testing so we can exclude this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37039) np.nan series.astype(bool) should be True
[ https://issues.apache.org/jira/browse/SPARK-37039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37039: Assignee: Haejoon Lee > np.nan series.astype(bool) should be True > - > > Key: SPARK-37039 > URL: https://issues.apache.org/jira/browse/SPARK-37039 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > np.nan series.astype(bool) should be True, rather than Fasle: > https://github.com/apache/spark/blob/46bcef7472edd40c23afd9ac74cffe13c6a608ad/python/pyspark/pandas/data_type_ops/base.py#L147 > >>> pd.Series([1, 2, np.nan], dtype=float).astype(bool) > >>> pd.Series([1, 2, np.nan], dtype=str).astype(bool) > >>> pd.Series([datetime.date(1994, 1, 31), datetime.date(1994, 2, 1), np.nan]) > 0 True > 1 True > 2 True > dtype: bool > But in pyspark, it is: > 0 True > 1 True > 2 False > dtype: bool -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37039) np.nan series.astype(bool) should be True
[ https://issues.apache.org/jira/browse/SPARK-37039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37039. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44570 [https://github.com/apache/spark/pull/44570] > np.nan series.astype(bool) should be True > - > > Key: SPARK-37039 > URL: https://issues.apache.org/jira/browse/SPARK-37039 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > np.nan series.astype(bool) should be True, rather than Fasle: > https://github.com/apache/spark/blob/46bcef7472edd40c23afd9ac74cffe13c6a608ad/python/pyspark/pandas/data_type_ops/base.py#L147 > >>> pd.Series([1, 2, np.nan], dtype=float).astype(bool) > >>> pd.Series([1, 2, np.nan], dtype=str).astype(bool) > >>> pd.Series([datetime.date(1994, 1, 31), datetime.date(1994, 2, 1), np.nan]) > 0 True > 1 True > 2 True > dtype: bool > But in pyspark, it is: > 0 True > 1 True > 2 False > dtype: bool -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46633) Reading a non-empty Avro file with empty blocks returns 0 records
[ https://issues.apache.org/jira/browse/SPARK-46633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46633. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44635 [https://github.com/apache/spark/pull/44635] > Reading a non-empty Avro file with empty blocks returns 0 records > - > > Key: SPARK-46633 > URL: https://issues.apache.org/jira/browse/SPARK-46633 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When an Avro file contains empty blocks, Spark returns 0 records while > "fastavro" and "avro-python-3" both read the file correctly and return > records. > > This is due to the way Spark handles empty blocks (or does not handle). Call > to `hasNext` loads the next block and if that block is empty, it returns > false. But instead of exiting the loop, we need to probe the next block until > sync point is reached. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46633) Reading a non-empty Avro file with empty blocks returns 0 records
[ https://issues.apache.org/jira/browse/SPARK-46633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46633: Assignee: Ivan Sadikov > Reading a non-empty Avro file with empty blocks returns 0 records > - > > Key: SPARK-46633 > URL: https://issues.apache.org/jira/browse/SPARK-46633 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > > When an Avro file contains empty blocks, Spark returns 0 records while > "fastavro" and "avro-python-3" both read the file correctly and return > records. > > This is due to the way Spark handles empty blocks (or does not handle). Call > to `hasNext` loads the next block and if that block is empty, it returns > false. But instead of exiting the loop, we need to probe the next block until > sync point is reached. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46437) Enable conditional includes in Jekyll documentation
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46437: Assignee: Nicholas Chammas > Enable conditional includes in Jekyll documentation > --- > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46437) Enable conditional includes in Jekyll documentation
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46437. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44630 [https://github.com/apache/spark/pull/44630] > Enable conditional includes in Jekyll documentation > --- > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46593) Refactor `data_type_ops` tests
[ https://issues.apache.org/jira/browse/SPARK-46593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46593. -- Resolution: Fixed Issue resolved by pull request 44637 [https://github.com/apache/spark/pull/44637] > Refactor `data_type_ops` tests > -- > > Key: SPARK-46593 > URL: https://issues.apache.org/jira/browse/SPARK-46593 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46637) Enhancing the Visual Appeal of Spark doc website
[ https://issues.apache.org/jira/browse/SPARK-46637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46637. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44642 [https://github.com/apache/spark/pull/44642] > Enhancing the Visual Appeal of Spark doc website > > > Key: SPARK-46637 > URL: https://issues.apache.org/jira/browse/SPARK-46637 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46630) XML: Validate XML element name on write
[ https://issues.apache.org/jira/browse/SPARK-46630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46630. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44634 [https://github.com/apache/spark/pull/44634] > XML: Validate XML element name on write > --- > > Key: SPARK-46630 > URL: https://issues.apache.org/jira/browse/SPARK-46630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46630) XML: Validate XML element name on write
[ https://issues.apache.org/jira/browse/SPARK-46630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46630: Assignee: Sandip Agarwala > XML: Validate XML element name on write > --- > > Key: SPARK-46630 > URL: https://issues.apache.org/jira/browse/SPARK-46630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46626) Bump jekyll version to support Ruby 3.3
[ https://issues.apache.org/jira/browse/SPARK-46626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46626: Assignee: Nicholas Chammas > Bump jekyll version to support Ruby 3.3 > --- > > Key: SPARK-46626 > URL: https://issues.apache.org/jira/browse/SPARK-46626 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46626) Bump jekyll version to support Ruby 3.3
[ https://issues.apache.org/jira/browse/SPARK-46626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46626. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44628 [https://github.com/apache/spark/pull/44628] > Bump jekyll version to support Ruby 3.3 > --- > > Key: SPARK-46626 > URL: https://issues.apache.org/jira/browse/SPARK-46626 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46621) Address null from Exception.getMessage in Py4J captured exception
[ https://issues.apache.org/jira/browse/SPARK-46621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46621: - Priority: Minor (was: Major) > Address null from Exception.getMessage in Py4J captured exception > - > > Key: SPARK-46621 > URL: https://issues.apache.org/jira/browse/SPARK-46621 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > If JVM throws an exception without a message, the message becomes null and > returns: > {code} > File "/.../pyspark/errors/exceptions/captured.py", line 88, in __str__ > desc = desc + "\n\nJVM stacktrace:\n%s" % self._stackTrace > TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46621) Address null from Exception.getMessage in Py4J captured exception
[ https://issues.apache.org/jira/browse/SPARK-46621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46621: - Description: If JVM throws an exception without a message, the message becomes null and returns: {code} File "/.../pyspark/errors/exceptions/captured.py", line 88, in __str__ desc = desc + "\n\nJVM stacktrace:\n%s" % self._stackTrace TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' {code} was: If JVM throws an exception without a message, the message becomes null and returns: {code} pyspark.errors.exceptions.captured.UnsupportedOperationException: JVM stacktrace: java.lang.UnsupportedOperationException at com.databricks.sql.acl.PlaceholderScimClient.getUserInfo(MockScimClient.scala:49) at com.databricks.sql.acl.InlineUserInfoExpressions.userInfo$lzycompute$1(InlineUserInfoExpressions.scala:73) at com.databricks.sql.acl.InlineUserInfoExpressions.com$databricks$sql$acl$InlineUserInfoExpressions$$userInfo$1(InlineUserInfoExpressions.scala:73) at com.databricks.sql.acl.InlineUserInfoExpressions$$anonfun$rewrite$2.$anonfun$applyOrElse$2(InlineUserInfoExpressions.scala:98) at scala.Option.getOrElse(Option.scala:189) at com.databricks.sql.acl.InlineUserInfoExpressions$$anonfun$rewrite$2.applyOrElse(InlineUserInfoExpressions.scala:98) at com.databricks.sql.acl.InlineUserInfoExpressions$$anonfun$rewrite$2.applyOrElse(InlineUserInfoExpressions.scala:84) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:473) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:473) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:478) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1277) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1276) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:656) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:478) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:174) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:215) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:215) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:226) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:231) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) {code} > Address null from Exception.getMessage in Py4J captured exception > - > > Key: SPARK-46621 > URL: https://issues.apache.org/jira/browse/SPARK-46621 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > If JVM throws an exception without a message, the message becomes null and > returns: > {code} > File "/.../pyspark/errors/exceptions/captured.py", line 88, in __str__ > desc = desc + "\n\nJVM stacktrace:\n%s" % self._stackTrace > TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46601) Fix log error in handleStatusMessage
[ https://issues.apache.org/jira/browse/SPARK-46601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46601: Assignee: qingbo jiao > Fix log error in handleStatusMessage > > > Key: SPARK-46601 > URL: https://issues.apache.org/jira/browse/SPARK-46601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: qingbo jiao >Assignee: qingbo jiao >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46601) Fix log error in handleStatusMessage
[ https://issues.apache.org/jira/browse/SPARK-46601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46601. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44606 [https://github.com/apache/spark/pull/44606] > Fix log error in handleStatusMessage > > > Key: SPARK-46601 > URL: https://issues.apache.org/jira/browse/SPARK-46601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: qingbo jiao >Assignee: qingbo jiao >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46522) Block Python data source registration with name conflicts
[ https://issues.apache.org/jira/browse/SPARK-46522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46522. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44507 [https://github.com/apache/spark/pull/44507] > Block Python data source registration with name conflicts > - > > Key: SPARK-46522 > URL: https://issues.apache.org/jira/browse/SPARK-46522 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Users should not be allowed to register Python data sources with names that > are the same as builtin or existing Scala/Java data sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46522) Block Python data source registration with name conflicts
[ https://issues.apache.org/jira/browse/SPARK-46522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46522: Assignee: Allison Wang > Block Python data source registration with name conflicts > - > > Key: SPARK-46522 > URL: https://issues.apache.org/jira/browse/SPARK-46522 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Users should not be allowed to register Python data sources with names that > are the same as builtin or existing Scala/Java data sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-46437: -- Assignee: (was: Nicholas Chammas) Reverted at https://github.com/apache/spark/commit/a88c64e7dbdd813fa0a9df85a0ce9f1db6706ede > Remove unnecessary cruft from SQL built-in functions docs > - > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46437: - Fix Version/s: (was: 4.0.0) > Remove unnecessary cruft from SQL built-in functions docs > - > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46613. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44617 [https://github.com/apache/spark/pull/44617] > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://github.com/apache/spark/pull/44617 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46613: Assignee: Hyukjin Kwon > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > See https://github.com/apache/spark/pull/44617 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46603) Refine docstring of `parse_url/url_encode/url_decode`
[ https://issues.apache.org/jira/browse/SPARK-46603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46603: Assignee: Yang Jie > Refine docstring of `parse_url/url_encode/url_decode` > - > > Key: SPARK-46603 > URL: https://issues.apache.org/jira/browse/SPARK-46603 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46603) Refine docstring of `parse_url/url_encode/url_decode`
[ https://issues.apache.org/jira/browse/SPARK-46603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46603. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44604 [https://github.com/apache/spark/pull/44604] > Refine docstring of `parse_url/url_encode/url_decode` > - > > Key: SPARK-46603 > URL: https://issues.apache.org/jira/browse/SPARK-46603 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46606) Refine docstring of `convert_timezone/make_dt_interval/make_interval`
[ https://issues.apache.org/jira/browse/SPARK-46606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46606. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44610 [https://github.com/apache/spark/pull/44610] > Refine docstring of `convert_timezone/make_dt_interval/make_interval` > - > > Key: SPARK-46606 > URL: https://issues.apache.org/jira/browse/SPARK-46606 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46606) Refine docstring of `convert_timezone/make_dt_interval/make_interval`
[ https://issues.apache.org/jira/browse/SPARK-46606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46606: Assignee: BingKun Pan > Refine docstring of `convert_timezone/make_dt_interval/make_interval` > - > > Key: SPARK-46606 > URL: https://issues.apache.org/jira/browse/SPARK-46606 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46613: - Description: See https://github.com/apache/spark/pull/44617 > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > See https://github.com/apache/spark/pull/44617 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
Hyukjin Kwon created SPARK-46613: Summary: Log full exception when failed to lookup Python Data Sources Key: SPARK-46613 URL: https://issues.apache.org/jira/browse/SPARK-46613 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML
[ https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46248: Assignee: Shujing Yang > Support ignoreCorruptFiles and ignoreMissingFiles options in XML > > > Key: SPARK-46248 > URL: https://issues.apache.org/jira/browse/SPARK-46248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > This PR corrects the handling of corrupt or missing multiline XML files by > respecting user-specific options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46607) Check the testing mode
[ https://issues.apache.org/jira/browse/SPARK-46607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46607: Assignee: Ruifeng Zheng > Check the testing mode > -- > > Key: SPARK-46607 > URL: https://issues.apache.org/jira/browse/SPARK-46607 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML
[ https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46248. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44163 [https://github.com/apache/spark/pull/44163] > Support ignoreCorruptFiles and ignoreMissingFiles options in XML > > > Key: SPARK-46248 > URL: https://issues.apache.org/jira/browse/SPARK-46248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This PR corrects the handling of corrupt or missing multiline XML files by > respecting user-specific options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46607) Check the testing mode
[ https://issues.apache.org/jira/browse/SPARK-46607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46607. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44611 [https://github.com/apache/spark/pull/44611] > Check the testing mode > -- > > Key: SPARK-46607 > URL: https://issues.apache.org/jira/browse/SPARK-46607 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46599) XML: Use TypeCoercion.findTightestCommonType for compatibility check
[ https://issues.apache.org/jira/browse/SPARK-46599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46599. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44601 [https://github.com/apache/spark/pull/44601] > XML: Use TypeCoercion.findTightestCommonType for compatibility check > > > Key: SPARK-46599 > URL: https://issues.apache.org/jira/browse/SPARK-46599 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46599) XML: Use TypeCoercion.findTightestCommonType for compatibility check
[ https://issues.apache.org/jira/browse/SPARK-46599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46599: Assignee: Sandip Agarwala > XML: Use TypeCoercion.findTightestCommonType for compatibility check > > > Key: SPARK-46599 > URL: https://issues.apache.org/jira/browse/SPARK-46599 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46587) XML: Fix XSD big integer conversion
[ https://issues.apache.org/jira/browse/SPARK-46587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46587. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44587 [https://github.com/apache/spark/pull/44587] > XML: Fix XSD big integer conversion > --- > > Key: SPARK-46587 > URL: https://issues.apache.org/jira/browse/SPARK-46587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46582) Upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor
Hyukjin Kwon created SPARK-46582: Summary: Upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor Key: SPARK-46582 URL: https://issues.apache.org/jira/browse/SPARK-46582 Project: Spark Issue Type: Bug Components: Project Infra, R Affects Versions: 4.0.0 Reporter: Hyukjin Kwon R Tools 4.3.X is for R 4.3.X. We did not upgrade because of the test failure previously. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46571) Re-enable TODOs that are resolved from recent Pandas
[ https://issues.apache.org/jira/browse/SPARK-46571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46571: Assignee: Haejoon Lee > Re-enable TODOs that are resolved from recent Pandas > > > Key: SPARK-46571 > URL: https://issues.apache.org/jira/browse/SPARK-46571 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We can uncomments some TODOs that are already resolved from test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46571) Re-enable TODOs that are resolved from recent Pandas
[ https://issues.apache.org/jira/browse/SPARK-46571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46571. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44568 [https://github.com/apache/spark/pull/44568] > Re-enable TODOs that are resolved from recent Pandas > > > Key: SPARK-46571 > URL: https://issues.apache.org/jira/browse/SPARK-46571 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We can uncomments some TODOs that are already resolved from test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46565) Improve Python data source error classes and messages
[ https://issues.apache.org/jira/browse/SPARK-46565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46565: Assignee: Allison Wang > Improve Python data source error classes and messages > - > > Key: SPARK-46565 > URL: https://issues.apache.org/jira/browse/SPARK-46565 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46565) Improve Python data source error classes and messages
[ https://issues.apache.org/jira/browse/SPARK-46565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46565. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44560 [https://github.com/apache/spark/pull/44560] > Improve Python data source error classes and messages > - > > Key: SPARK-46565 > URL: https://issues.apache.org/jira/browse/SPARK-46565 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44001) Improve parsing of well known wrapper types
[ https://issues.apache.org/jira/browse/SPARK-44001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44001. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43767 [https://github.com/apache/spark/pull/43767] > Improve parsing of well known wrapper types > --- > > Key: SPARK-44001 > URL: https://issues.apache.org/jira/browse/SPARK-44001 > Project: Spark > Issue Type: Improvement > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Parth Upadhyay >Assignee: Parth Upadhyay >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Under `com.google.protobuf`, there are some well known wrapper types for > primitives, > [namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto], > useful for distinguishing between absence of primitive fields and their > default values, as well as for use within `google.protobuf.Any` types. These > types are: > {code} > DoubleValue > FloatValue > Int64Value > Uint64Value > Int32Value > Uint32Value > BoolValue > StringValue > BytesValue > {code} > Currently, when we deserialize these from a serialized protobuf into a spark > struct, we expand them as if they were normal messages. Concretely, if we have > {code} > syntax = "proto3"; > import "google/protobuf/wrappers.proto" > message WktExample { > google.protobuf.BoolValue bool_val = 1; > google.protobuf.Int32Value int32_val = 2; > } > {code} > And a message like > {code} > WktExample(true, 100) > {code} > Then the behavior today is to deserialize this as. > {code} > {"bool_val": {"value": true}, "int32_val": {"value": 100}} > {code} > This is quite difficult to work with and not in the spirit of the wrapper > type, so it would be nice to deserialize as > {code} > {"bool_val": true, "int32_val": 100} > {code} > This is also the behavior by other popular deserialization libraries, > including java protobuf util > [Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914] > and golangs > [jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214]. > So for consistency with other libraries and improved usability, I propose we > deserialize well known types in this way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46570) Run Python 3.11 and 3.12 test independently
[ https://issues.apache.org/jira/browse/SPARK-46570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46570. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44566 [https://github.com/apache/spark/pull/44566] > Run Python 3.11 and 3.12 test independently > --- > > Key: SPARK-46570 > URL: https://issues.apache.org/jira/browse/SPARK-46570 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46570) Run Python 3.11 and 3.12 test independently
[ https://issues.apache.org/jira/browse/SPARK-46570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46570: Assignee: Dongjoon Hyun > Run Python 3.11 and 3.12 test independently > --- > > Key: SPARK-46570 > URL: https://issues.apache.org/jira/browse/SPARK-46570 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46553) FutureWarning for interpolate with object dtype
[ https://issues.apache.org/jira/browse/SPARK-46553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46553. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44550 [https://github.com/apache/spark/pull/44550] > FutureWarning for interpolate with object dtype > --- > > Key: SPARK-46553 > URL: https://issues.apache.org/jira/browse/SPARK-46553 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > >>> pdf.interpolate() > :1: FutureWarning: DataFrame.interpolate with object dtype is > deprecated and will raise in a future version. Call > obj.infer_objects(copy=False) before interpolating instead. > A B > 0 a 1 > 1 b 2 > 2 c 3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46553) FutureWarning for interpolate with object dtype
[ https://issues.apache.org/jira/browse/SPARK-46553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46553: Assignee: Haejoon Lee > FutureWarning for interpolate with object dtype > --- > > Key: SPARK-46553 > URL: https://issues.apache.org/jira/browse/SPARK-46553 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > >>> pdf.interpolate() > :1: FutureWarning: DataFrame.interpolate with object dtype is > deprecated and will raise in a future version. Call > obj.infer_objects(copy=False) before interpolating instead. > A B > 0 a 1 > 1 b 2 > 2 c 3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46564) Exclude unrelated files via using omit options properly in PySpark coverage report
Hyukjin Kwon created SPARK-46564: Summary: Exclude unrelated files via using omit options properly in PySpark coverage report Key: SPARK-46564 URL: https://issues.apache.org/jira/browse/SPARK-46564 Project: Spark Issue Type: Bug Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon The files are not excluded for some reasons at the PySpark test coverage report (https://app.codecov.io/gh/apache/spark) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46557: Assignee: Hyukjin Kwon > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46557. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44553 [https://github.com/apache/spark/pull/44553] > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46540. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44531 [https://github.com/apache/spark/pull/44531] > Respect column names when Python data source read function outputs named Row > objects > > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46540: Assignee: Allison Wang > Respect column names when Python data source read function outputs named Row > objects > > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
[ https://issues.apache.org/jira/browse/SPARK-46556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46556. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44552 [https://github.com/apache/spark/pull/44552] > Refine docstring for > DataFrame.createGlobalTempView/createOrReplaceGlobalTempView > - > > Key: SPARK-46556 > URL: https://issues.apache.org/jira/browse/SPARK-46556 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
[ https://issues.apache.org/jira/browse/SPARK-46556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46556: Assignee: Hyukjin Kwon > Refine docstring for > DataFrame.createGlobalTempView/createOrReplaceGlobalTempView > - > > Key: SPARK-46556 > URL: https://issues.apache.org/jira/browse/SPARK-46556 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46555: Assignee: Hyukjin Kwon > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46555. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44551 [https://github.com/apache/spark/pull/44551] > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46557: - Summary: Refine docstring for DataFrame.schema/explain/printSchema (was: Refine docstring for DataFrame.explain/printSchema) > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46557) Refine docstring for DataFrame.explain/printSchema
Hyukjin Kwon created SPARK-46557: Summary: Refine docstring for DataFrame.explain/printSchema Key: SPARK-46557 URL: https://issues.apache.org/jira/browse/SPARK-46557 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
Hyukjin Kwon created SPARK-46556: Summary: Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView Key: SPARK-46556 URL: https://issues.apache.org/jira/browse/SPARK-46556 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46555: - Summary: Refine docstring for DataFrame.createTempView/createOrReplaceTempView (was: Refine docstring for DataFrame.registerTempTable/createTempView/createOrReplaceTempView) > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46555) Refine docstring for DataFrame.registerTempTable/createTempView/createOrReplaceTempView
Hyukjin Kwon created SPARK-46555: Summary: Refine docstring for DataFrame.registerTempTable/createTempView/createOrReplaceTempView Key: SPARK-46555 URL: https://issues.apache.org/jira/browse/SPARK-46555 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45914) Support `commit` and `abort` API for Python data source write
[ https://issues.apache.org/jira/browse/SPARK-45914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45914. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44497 [https://github.com/apache/spark/pull/44497] > Support `commit` and `abort` API for Python data source write > - > > Key: SPARK-45914 > URL: https://issues.apache.org/jira/browse/SPARK-45914 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support `commit` and `abort` API for Python data source write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45914) Support `commit` and `abort` API for Python data source write
[ https://issues.apache.org/jira/browse/SPARK-45914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45914: Assignee: Allison Wang > Support `commit` and `abort` API for Python data source write > - > > Key: SPARK-45914 > URL: https://issues.apache.org/jira/browse/SPARK-45914 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Support `commit` and `abort` API for Python data source write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46382) XML: Capture values interspersed between elements
[ https://issues.apache.org/jira/browse/SPARK-46382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46382: Assignee: Shujing Yang > XML: Capture values interspersed between elements > - > > Key: SPARK-46382 > URL: https://issues.apache.org/jira/browse/SPARK-46382 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > In XML, elements typically consist of a name and a value, with the value > enclosed between the opening and closing tags. But XML also allows to include > arbitrary values interspersed between these elements. To address this, we > provide an option named `valueTags`, which is enabled by default, to capture > these values. Consider the following example: > ``` > > 1 > value1 > > value2 > 2 > value3 > > > ``` > In this example, ``,``, and `` are named elements with their > respective values enclosed within tags. There are arbitrary values value1 > value2 value3 interspersed between the elements. Please note that there can > be multiple occurrences of values in a single element (i.e. there are value2, > value3 in the element ) > > We should parse the values between tags into the valueTags field. If there > are multiple occurrences of value tags, the value tag field will be converted > to an array type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46382) XML: Capture values interspersed between elements
[ https://issues.apache.org/jira/browse/SPARK-46382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46382. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44318 [https://github.com/apache/spark/pull/44318] > XML: Capture values interspersed between elements > - > > Key: SPARK-46382 > URL: https://issues.apache.org/jira/browse/SPARK-46382 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In XML, elements typically consist of a name and a value, with the value > enclosed between the opening and closing tags. But XML also allows to include > arbitrary values interspersed between these elements. To address this, we > provide an option named `valueTags`, which is enabled by default, to capture > these values. Consider the following example: > ``` > > 1 > value1 > > value2 > 2 > value3 > > > ``` > In this example, ``,``, and `` are named elements with their > respective values enclosed within tags. There are arbitrary values value1 > value2 value3 interspersed between the elements. Please note that there can > be multiple occurrences of values in a single element (i.e. there are value2, > value3 in the element ) > > We should parse the values between tags into the valueTags field. If there > are multiple occurrences of value tags, the value tag field will be converted > to an array type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46530) Check Python executable when looking up available Data Sources
Hyukjin Kwon created SPARK-46530: Summary: Check Python executable when looking up available Data Sources Key: SPARK-46530 URL: https://issues.apache.org/jira/browse/SPARK-46530 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon When looking up available Data Sources, we should check if `python` executable is available in the system or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45917) Statically register Python Data Source
[ https://issues.apache.org/jira/browse/SPARK-45917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45917. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44504 [https://github.com/apache/spark/pull/44504] > Statically register Python Data Source > -- > > Key: SPARK-45917 > URL: https://issues.apache.org/jira/browse/SPARK-45917 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See the inlined comment in {{DataSourceManager}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45917) Statically register Python Data Source
[ https://issues.apache.org/jira/browse/SPARK-45917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45917: Assignee: Hyukjin Kwon > Statically register Python Data Source > -- > > Key: SPARK-45917 > URL: https://issues.apache.org/jira/browse/SPARK-45917 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See the inlined comment in {{DataSourceManager}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46521) Refine docstring of `array_compact/array_distinct/array_remove`
[ https://issues.apache.org/jira/browse/SPARK-46521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46521. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44506 [https://github.com/apache/spark/pull/44506] > Refine docstring of `array_compact/array_distinct/array_remove` > --- > > Key: SPARK-46521 > URL: https://issues.apache.org/jira/browse/SPARK-46521 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46521) Refine docstring of `array_compact/array_distinct/array_remove`
[ https://issues.apache.org/jira/browse/SPARK-46521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46521: Assignee: Yang Jie > Refine docstring of `array_compact/array_distinct/array_remove` > --- > > Key: SPARK-46521 > URL: https://issues.apache.org/jira/browse/SPARK-46521 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46528) Upgrade zstd-jni to 1.5.5-11
[ https://issues.apache.org/jira/browse/SPARK-46528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46528: Assignee: Dongjoon Hyun > Upgrade zstd-jni to 1.5.5-11 > > > Key: SPARK-46528 > URL: https://issues.apache.org/jira/browse/SPARK-46528 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46528) Upgrade zstd-jni to 1.5.5-11
[ https://issues.apache.org/jira/browse/SPARK-46528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46528. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44515 [https://github.com/apache/spark/pull/44515] > Upgrade zstd-jni to 1.5.5-11 > > > Key: SPARK-46528 > URL: https://issues.apache.org/jira/browse/SPARK-46528 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46520) Support overwrite mode for Python data source write
[ https://issues.apache.org/jira/browse/SPARK-46520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46520: Assignee: Allison Wang > Support overwrite mode for Python data source write > --- > > Key: SPARK-46520 > URL: https://issues.apache.org/jira/browse/SPARK-46520 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Support the `overwrite` mode for Python data source -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46520) Support overwrite mode for Python data source write
[ https://issues.apache.org/jira/browse/SPARK-46520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46520. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44505 [https://github.com/apache/spark/pull/44505] > Support overwrite mode for Python data source write > --- > > Key: SPARK-46520 > URL: https://issues.apache.org/jira/browse/SPARK-46520 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support the `overwrite` mode for Python data source -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46513) Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*`
[ https://issues.apache.org/jira/browse/SPARK-46513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46513. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44499 [https://github.com/apache/spark/pull/44499] > Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` > - > > Key: SPARK-46513 > URL: https://issues.apache.org/jira/browse/SPARK-46513 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46513) Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*`
[ https://issues.apache.org/jira/browse/SPARK-46513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46513: Assignee: Ruifeng Zheng > Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` > - > > Key: SPARK-46513 > URL: https://issues.apache.org/jira/browse/SPARK-46513 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45559) Support spark.read.schema(...) for Python data source API
[ https://issues.apache.org/jira/browse/SPARK-45559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45559. -- Resolution: Invalid It is supported, and I see some test cases in `PythonDataSourceSuite` > Support spark.read.schema(...) for Python data source API > - > > Key: SPARK-45559 > URL: https://issues.apache.org/jira/browse/SPARK-45559 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Support `spark.read.schema(...)` for Python data source read. > Add test cases where we send the schema as a string instead of StructType, > and a positive case as well as a negative case where it doesn't parse > successfully with fromDDL? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46503) Move test_default_index to `pyspark.pandas.tests.indexes.*`
[ https://issues.apache.org/jira/browse/SPARK-46503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46503. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44482 [https://github.com/apache/spark/pull/44482] > Move test_default_index to `pyspark.pandas.tests.indexes.*` > --- > > Key: SPARK-46503 > URL: https://issues.apache.org/jira/browse/SPARK-46503 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46503) Move test_default_index to `pyspark.pandas.tests.indexes.*`
[ https://issues.apache.org/jira/browse/SPARK-46503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46503: Assignee: Ruifeng Zheng > Move test_default_index to `pyspark.pandas.tests.indexes.*` > --- > > Key: SPARK-46503 > URL: https://issues.apache.org/jira/browse/SPARK-46503 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46437. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44393 [https://github.com/apache/spark/pull/44393] > Remove unnecessary cruft from SQL built-in functions docs > - > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46437: Assignee: Nicholas Chammas > Remove unnecessary cruft from SQL built-in functions docs > - > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46465) Implement Column.isNaN
[ https://issues.apache.org/jira/browse/SPARK-46465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46465: Assignee: Ruifeng Zheng > Implement Column.isNaN > -- > > Key: SPARK-46465 > URL: https://issues.apache.org/jira/browse/SPARK-46465 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46465) Implement Column.isNaN
[ https://issues.apache.org/jira/browse/SPARK-46465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46465. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44422 [https://github.com/apache/spark/pull/44422] > Implement Column.isNaN > -- > > Key: SPARK-46465 > URL: https://issues.apache.org/jira/browse/SPARK-46465 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46462) Reorganize `OpsOnDiffFramesGroupByRollingTests`
[ https://issues.apache.org/jira/browse/SPARK-46462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46462: Assignee: Ruifeng Zheng > Reorganize `OpsOnDiffFramesGroupByRollingTests` > --- > > Key: SPARK-46462 > URL: https://issues.apache.org/jira/browse/SPARK-46462 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46462) Reorganize `OpsOnDiffFramesGroupByRollingTests`
[ https://issues.apache.org/jira/browse/SPARK-46462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46462. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44420 [https://github.com/apache/spark/pull/44420] > Reorganize `OpsOnDiffFramesGroupByRollingTests` > --- > > Key: SPARK-46462 > URL: https://issues.apache.org/jira/browse/SPARK-46462 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46463) Reorganize `OpsOnDiffFramesGroupByExpandingTests`
[ https://issues.apache.org/jira/browse/SPARK-46463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46463. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44421 [https://github.com/apache/spark/pull/44421] > Reorganize `OpsOnDiffFramesGroupByExpandingTests` > - > > Key: SPARK-46463 > URL: https://issues.apache.org/jira/browse/SPARK-46463 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46463) Reorganize `OpsOnDiffFramesGroupByExpandingTests`
[ https://issues.apache.org/jira/browse/SPARK-46463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46463: Assignee: Ruifeng Zheng > Reorganize `OpsOnDiffFramesGroupByExpandingTests` > - > > Key: SPARK-46463 > URL: https://issues.apache.org/jira/browse/SPARK-46463 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46413. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44362 [https://github.com/apache/spark/pull/44362] > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Validate returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org