[jira] [Commented] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance
[ https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844808#comment-17844808 ] Hyukjin Kwon commented on SPARK-48094: -- Woohoo! > Reduce GitHub Action usage according to ASF project allowance > - > > Key: SPARK-48094 > URL: https://issues.apache.org/jira/browse/SPARK-48094 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 4.0.0 > > Attachments: Screenshot 2024-05-02 at 23.56.05.png > > > h2. ASF INFRA POLICY > - https://infra.apache.org/github-actions-policy.html > h2. MONITORING > - https://infra-reports.apache.org/#ghactions=spark=168 > !Screenshot 2024-05-02 at 23.56.05.png|width=100%! > h2. TARGET > * All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > * All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > * The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > * The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > h2. DEADLINE > bq. 17th of May, 2024 > Since the deadline is 17th of May, 2024, I set this as the highest priority, > `Blocker`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48205) Remove the private[sql] modifier for Python data sources
[ https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48205. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46487 [https://github.com/apache/spark/pull/46487] > Remove the private[sql] modifier for Python data sources > > > Key: SPARK-48205 > URL: https://issues.apache.org/jira/browse/SPARK-48205 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To make it consistent with UDFs and UDTFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48205) Remove the private[sql] modifier for Python data sources
[ https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48205: Assignee: Allison Wang > Remove the private[sql] modifier for Python data sources > > > Key: SPARK-48205 > URL: https://issues.apache.org/jira/browse/SPARK-48205 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > To make it consistent with UDFs and UDTFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48087: Assignee: Hyukjin Kwon > Python UDTF incompatibility in 3.5 client <> 4.0 server > --- > > Key: SPARK-48087 > URL: https://issues.apache.org/jira/browse/SPARK-48087 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > == > FAIL [0.103s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > self._check_result_or_exception(TestUDTF, ret_type, expected) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", > line 598, in _check_result_or_exception > with self.assertRaisesRegex(err_type, expected): > AssertionError: "AttributeError" does not match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 224, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 145, in dump_stream > for obj in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 213, in _batched > for item in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1391, in mapper > yield eval(*[a[o] for o in args_kwargs_offsets]) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1371, in evaluate > return tuple(map(verify_and_convert_result, res)) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1340, in verify_and_convert_result > return toInternal(result) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1291, in toInternal > return tuple( >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1292, in > f.toInternal(v) if c else v > ^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 907, in toInternal > return self.dataType.toInternal(obj) >^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 372, in toInternal > calendar.timegm(dt.utctimetuple()) if dt.tzinfo else > time.mktime(dt.timetuple()) > ..." > {code} > {code} > == > FAIL [0.096s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > > ^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 946, in read_udtf > raise PySparkRuntimeError( > pyspark.errors.exceptions.base.PySparkRuntimeError: >
[jira] [Resolved] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48087. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46473 [https://github.com/apache/spark/pull/46473] > Python UDTF incompatibility in 3.5 client <> 4.0 server > --- > > Key: SPARK-48087 > URL: https://issues.apache.org/jira/browse/SPARK-48087 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > {code} > == > FAIL [0.103s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > self._check_result_or_exception(TestUDTF, ret_type, expected) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", > line 598, in _check_result_or_exception > with self.assertRaisesRegex(err_type, expected): > AssertionError: "AttributeError" does not match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 224, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 145, in dump_stream > for obj in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 213, in _batched > for item in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1391, in mapper > yield eval(*[a[o] for o in args_kwargs_offsets]) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1371, in evaluate > return tuple(map(verify_and_convert_result, res)) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1340, in verify_and_convert_result > return toInternal(result) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1291, in toInternal > return tuple( >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1292, in > f.toInternal(v) if c else v > ^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 907, in toInternal > return self.dataType.toInternal(obj) >^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 372, in toInternal > calendar.timegm(dt.utctimetuple()) if dt.tzinfo else > time.mktime(dt.timetuple()) > ..." > {code} > {code} > == > FAIL [0.096s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > > ^^^ > File >
[jira] [Resolved] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance
[ https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48094. -- Assignee: Dongjoon Hyun Resolution: Done Seems like we're done :-)? I will resolve this one for now but feel free to reopen if there are more work to be done! > Reduce GitHub Action usage according to ASF project allowance > - > > Key: SPARK-48094 > URL: https://issues.apache.org/jira/browse/SPARK-48094 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Attachments: Screenshot 2024-05-02 at 23.56.05.png > > > h2. ASF INFRA POLICY > - https://infra.apache.org/github-actions-policy.html > h2. MONITORING > - https://infra-reports.apache.org/#ghactions=spark=168 > !Screenshot 2024-05-02 at 23.56.05.png|width=100%! > h2. TARGET > * All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > * All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > * The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > * The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > h2. DEADLINE > bq. 17th of May, 2024 > Since the deadline is 17th of May, 2024, I set this as the highest priority, > `Blocker`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance
[ https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48094: - Fix Version/s: 4.0.0 > Reduce GitHub Action usage according to ASF project allowance > - > > Key: SPARK-48094 > URL: https://issues.apache.org/jira/browse/SPARK-48094 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 4.0.0 > > Attachments: Screenshot 2024-05-02 at 23.56.05.png > > > h2. ASF INFRA POLICY > - https://infra.apache.org/github-actions-policy.html > h2. MONITORING > - https://infra-reports.apache.org/#ghactions=spark=168 > !Screenshot 2024-05-02 at 23.56.05.png|width=100%! > h2. TARGET > * All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > * All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > * The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > * The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > h2. DEADLINE > bq. 17th of May, 2024 > Since the deadline is 17th of May, 2024, I set this as the highest priority, > `Blocker`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48163: - Fix Version/s: (was: 4.0.0) > Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-48163: -- Assignee: (was: Dongjoon Hyun) > Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844618#comment-17844618 ] Hyukjin Kwon commented on SPARK-48163: -- reverted in https://github.com/apache/spark/commit/bd896cac168aa5793413058ca706c73705edbf96 > Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48164. -- Resolution: Invalid > Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > -- > > Key: SPARK-48164 > URL: https://issues.apache.org/jira/browse/SPARK-48164 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48164: - Target Version/s: (was: 4.0.0) > Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > -- > > Key: SPARK-48164 > URL: https://issues.apache.org/jira/browse/SPARK-48164 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times
[ https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48193. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46471 [https://github.com/apache/spark/pull/46471] > Make `maven-deploy-plugin` retry 3 times > > > Key: SPARK-48193 > URL: https://issues.apache.org/jira/browse/SPARK-48193 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times
[ https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48193: Assignee: BingKun Pan > Make `maven-deploy-plugin` retry 3 times > > > Key: SPARK-48193 > URL: https://issues.apache.org/jira/browse/SPARK-48193 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48192) Enable TPC-DS tests in forked repository
[ https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48192. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46470 [https://github.com/apache/spark/pull/46470] > Enable TPC-DS tests in forked repository > > > Key: SPARK-48192 > URL: https://issues.apache.org/jira/browse/SPARK-48192 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > TPC-DS is pretty important in SQL. Shoud at least enable it in forked > repositories (PR builders) which does not consume ASF resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48192) Enable TPC-DS tests in forked repository
[ https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48192: Assignee: Hyukjin Kwon > Enable TPC-DS tests in forked repository > > > Key: SPARK-48192 > URL: https://issues.apache.org/jira/browse/SPARK-48192 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > TPC-DS is pretty important in SQL. Shoud at least enable it in forked > repositories (PR builders) which does not consume ASF resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48192) Enable TPC-DS tests in forked repository
Hyukjin Kwon created SPARK-48192: Summary: Enable TPC-DS tests in forked repository Key: SPARK-48192 URL: https://issues.apache.org/jira/browse/SPARK-48192 Project: Spark Issue Type: Sub-task Components: Project Infra, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon TPC-DS is pretty important in SQL. Shoud at least enable it in forked repositories (PR builders) which does not consume ASF resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48045: Assignee: Saidatt Sinai Amonkar > Pandas API groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Assignee: Saidatt Sinai Amonkar >Priority: Minor > Labels: pull-request-available > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include group keys in the resulting DataFrame. This diverges from > expected behavior as well as from the behavior of native Pandas, e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48045. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46391 [https://github.com/apache/spark/pull/46391] > Pandas API groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Assignee: Saidatt Sinai Amonkar >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include group keys in the resulting DataFrame. This diverges from > expected behavior as well as from the behavior of native Pandas, e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48086) Different Arrow versions in client and server
[ https://issues.apache.org/jira/browse/SPARK-48086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48086. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46431 [https://github.com/apache/spark/pull/46431] > Different Arrow versions in client and server > -- > > Key: SPARK-48086 > URL: https://issues.apache.org/jira/browse/SPARK-48086 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > {code} > == > FAIL [1.071s]: test_pandas_udf_arrow_overflow > (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 531, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > > > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 104, in dump_stream > for batch in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 525, in init_stream_yield_batches > batch = self._create_batch(series) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 511, in _create_batch > arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast)) > ^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 330, in _create_array > raise PySparkValueError(error_msg % (series.dtype, series.na... > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py", > line 299, in test_pandas_udf_arrow_overflow > with self.assertRaisesRegex( > AssertionError: "Exception thrown when converting pandas.Series" does not > match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File >
[jira] [Assigned] (SPARK-48086) Different Arrow versions in client and server
[ https://issues.apache.org/jira/browse/SPARK-48086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48086: Assignee: Hyukjin Kwon > Different Arrow versions in client and server > -- > > Key: SPARK-48086 > URL: https://issues.apache.org/jira/browse/SPARK-48086 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > == > FAIL [1.071s]: test_pandas_udf_arrow_overflow > (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 531, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > > > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 104, in dump_stream > for batch in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 525, in init_stream_yield_batches > batch = self._create_batch(series) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 511, in _create_batch > arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast)) > ^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 330, in _create_array > raise PySparkValueError(error_msg % (series.dtype, series.na... > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py", > line 299, in test_pandas_udf_arrow_overflow > with self.assertRaisesRegex( > AssertionError: "Exception thrown when converting pandas.Series" does not > match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 531, in dump_stream > Traceback (most recent call last): > File >
[jira] [Resolved] (SPARK-48113) Allow Plugins to integrate with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48113. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46364 [https://github.com/apache/spark/pull/46364] > Allow Plugins to integrate with Spark Connect > - > > Key: SPARK-48113 > URL: https://issues.apache.org/jira/browse/SPARK-48113 > Project: Spark > Issue Type: Story > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48167) Skip known behaviour change by SPARK-46122
[ https://issues.apache.org/jira/browse/SPARK-48167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48167. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46430 [https://github.com/apache/spark/pull/46430] > Skip known behaviour change by SPARK-46122 > -- > > Key: SPARK-48167 > URL: https://issues.apache.org/jira/browse/SPARK-48167 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48167) Skip known behaviour change by SPARK-46122
[ https://issues.apache.org/jira/browse/SPARK-48167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48167: Assignee: Hyukjin Kwon > Skip known behaviour change by SPARK-46122 > -- > > Key: SPARK-48167 > URL: https://issues.apache.org/jira/browse/SPARK-48167 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48167) Skip known behaviour change by SPARK-46122
[ https://issues.apache.org/jira/browse/SPARK-48167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48167: - Affects Version/s: 3.5.2 (was: 4.0.0) > Skip known behaviour change by SPARK-46122 > -- > > Key: SPARK-48167 > URL: https://issues.apache.org/jira/browse/SPARK-48167 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.5.2 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48167) Skip known behaviour change by SPARK-46122
Hyukjin Kwon created SPARK-48167: Summary: Skip known behaviour change by SPARK-46122 Key: SPARK-48167 URL: https://issues.apache.org/jira/browse/SPARK-48167 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48090. -- Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46426 [https://github.com/apache/spark/pull/46426] > Streaming exception catch failure in 3.5 client <> 4.0 server > - > > Key: SPARK-48090 > URL: https://issues.apache.org/jira/browse/SPARK-48090 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {code} > == > FAIL [1.975s]: test_stream_exception > (pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", > line 287, in test_stream_exception > sq.processAllAvailable() > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 129, in processAllAvailable > self._execute_streaming_query_cmd(cmd) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 177, in _execute_streaming_query_cmd > (_, properties) = self._session.client.execute_command(exec_cmd) > ^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 982, in execute_command > data, _, _, _, properties = self._execute_and_fetch(req) > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1283, in _execute_and_fetch > for response in self._execute_and_fetch_as_iterator(req): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1264, in _execute_and_fetch_as_iterator > self._handle_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1503, in _handle_error > self._handle_rpc_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1539, in _handle_rpc_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] > Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = > 2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 79.0 (TID 116) > (fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor > driver): org.apache.spark.api.python.PythonException: Traceback (most recent > call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 224, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 145, in dump_stream > for obj in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 213, in _batched > for item in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1734, in mapper > result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) > ^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1734, in > result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) >^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 112, in > return args_kwargs_offsets, lambda *a: func(*a) > > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line > 118, in wrapper > return f(*args, **kwargs) >
[jira] [Assigned] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48090: Assignee: Hyukjin Kwon > Streaming exception catch failure in 3.5 client <> 4.0 server > - > > Key: SPARK-48090 > URL: https://issues.apache.org/jira/browse/SPARK-48090 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > == > FAIL [1.975s]: test_stream_exception > (pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", > line 287, in test_stream_exception > sq.processAllAvailable() > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 129, in processAllAvailable > self._execute_streaming_query_cmd(cmd) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 177, in _execute_streaming_query_cmd > (_, properties) = self._session.client.execute_command(exec_cmd) > ^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 982, in execute_command > data, _, _, _, properties = self._execute_and_fetch(req) > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1283, in _execute_and_fetch > for response in self._execute_and_fetch_as_iterator(req): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1264, in _execute_and_fetch_as_iterator > self._handle_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1503, in _handle_error > self._handle_rpc_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1539, in _handle_rpc_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] > Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = > 2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 79.0 (TID 116) > (fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor > driver): org.apache.spark.api.python.PythonException: Traceback (most recent > call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 224, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 145, in dump_stream > for obj in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 213, in _batched > for item in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1734, in mapper > result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) > ^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1734, in > result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) >^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 112, in > return args_kwargs_offsets, lambda *a: func(*a) > > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line > 118, in wrapper > return f(*args, **kwargs) >^^ > File "/home/runner/work/spark/spark-3 > During handling of the above exception, another exception occurred: > Traceback (most recent
[jira] [Assigned] (SPARK-48084) pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48084: Assignee: Weichen Xu > pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server > - > > Key: SPARK-48084 > URL: https://issues.apache.org/jira/browse/SPARK-48084 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Weichen Xu >Priority: Major > > {code} > == > ERROR [3.966s]: test_regressor_evaluator > (pyspark.ml.tests.connect.test_connect_evaluation.EvaluationTestsOnConnect.test_regressor_evaluator) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_evaluation.py", > line 69, in test_regressor_evaluator > rmse = rmse_evaluator.evaluate(df1) > > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", > line 255, in evaluate > return self._evaluate(dataset) >^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", > line 70, in _evaluate > return aggregate_dataframe( > > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", > line 93, in aggregate_dataframe > state = cloudpickle.loads(state) > > AttributeError: Can't get attribute '_class_setstate' on 'pyspark.cloudpickle.cloudpickle' from > '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'> > -- > {code} > {code} > == > ERROR [4.664s]: test_copy > (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_copy) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", > line 115, in test_copy > cvModel = cv.fit(dataset) > ^^^ > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", > line 106, in fit > return self._fit(dataset) >^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line > 437, in _fit > for j, metric in pool.imap_unordered(lambda f: f(), tasks): > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", > line 873, in next > raise value > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", > line 125, in worker > result = (True, func(*args, **kwds)) > ^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line > 437, in > for j, metric in pool.imap_unordered(lambda f: f(), tasks): >^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line > 188, in single_task > metric = evaluator.evaluate( > ^^^ > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", > line 255, in evaluate > return self._evaluate(dataset) >^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", > line 70, in _evaluate > return aggregate_dataframe( > > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", > line 93, in aggregate_dataframe > state = cloudpickle.loads(state) > > AttributeError: Can't get attribute '_class_setstate' on 'pyspark.cloudpickle.cloudpickle' from > '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'> > {code} > {code} > == > ERROR [3.938s]: test_fit_minimize_metric > (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_fit_minimize_metric) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", > line 149, in test_fit_minimize_metric > cvModel = cv.fit(dataset) > ^^^ > File
[jira] [Assigned] (SPARK-48083) session.copyFromLocalToFs failure with 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48083: Assignee: Weichen Xu > session.copyFromLocalToFs failure with 3.5 client <> 4.0 server > --- > > Key: SPARK-48083 > URL: https://issues.apache.org/jira/browse/SPARK-48083 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Weichen Xu >Priority: Major > Labels: pull-request-available > > {code} > == > ERROR [1.120s]: test_save_load > (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py", > line 144, in test_save_load > estimator.save(fs_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", > line 248, in save > _copy_dir_from_local_to_fs(tmp_local_dir, path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", > line 57, in _copy_dir_from_local_to_fs > _copy_file_from_local_to_fs(file_path, dest_file_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", > line 39, in _copy_file_from_local_to_fs > session.copyFromLocalToFs(local_path, dest_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/session.py", > line 756, in copyFromLocalToFs > self._client.copy_from_local_to_fs(local_path, dest_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1549, in copy_from_local_to_fs > self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", > line 280, in _add_forward_to_fs_artifacts > self._request_add_artifacts(requests) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", > line 259, in _request_add_artifacts > response: proto.AddArtifactsResponse = self._retrieve_responses(requests) >^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", > line 256, in _retrieve_responses > return self._stub.AddArtifacts(requests, metadata=self._metadata) >^^ > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py", > line 1536, in __call__ > return _end_unary_response_blocking(state, call, False, None) >^^ > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py", > line 1006, in _end_unary_response_blocking > raise _InactiveRpcError(state) # pytype: disable=not-instantiable > ^^ > grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated > with: > status = StatusCode.INTERNAL > details = "Uploading artifact file to local file system destination > path is not supported." > debug_error_string = "UNKNOWN:Error received from peer > {grpc_message:"Uploading artifact file to local file system destination path > is not supported.", grpc_status:13, > created_time:"2024-05-01T03:01:32.[558](https://github.com/HyukjinKwon/spark/actions/runs/8904629949/job/24454181142#step:9:559)489983+00:00"}" > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48083) session.copyFromLocalToFs failure with 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48083. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46419 [https://github.com/apache/spark/pull/46419] > session.copyFromLocalToFs failure with 3.5 client <> 4.0 server > --- > > Key: SPARK-48083 > URL: https://issues.apache.org/jira/browse/SPARK-48083 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Weichen Xu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > {code} > == > ERROR [1.120s]: test_save_load > (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py", > line 144, in test_save_load > estimator.save(fs_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", > line 248, in save > _copy_dir_from_local_to_fs(tmp_local_dir, path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", > line 57, in _copy_dir_from_local_to_fs > _copy_file_from_local_to_fs(file_path, dest_file_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", > line 39, in _copy_file_from_local_to_fs > session.copyFromLocalToFs(local_path, dest_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/session.py", > line 756, in copyFromLocalToFs > self._client.copy_from_local_to_fs(local_path, dest_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1549, in copy_from_local_to_fs > self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", > line 280, in _add_forward_to_fs_artifacts > self._request_add_artifacts(requests) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", > line 259, in _request_add_artifacts > response: proto.AddArtifactsResponse = self._retrieve_responses(requests) >^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", > line 256, in _retrieve_responses > return self._stub.AddArtifacts(requests, metadata=self._metadata) >^^ > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py", > line 1536, in __call__ > return _end_unary_response_blocking(state, call, False, None) >^^ > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py", > line 1006, in _end_unary_response_blocking > raise _InactiveRpcError(state) # pytype: disable=not-instantiable > ^^ > grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated > with: > status = StatusCode.INTERNAL > details = "Uploading artifact file to local file system destination > path is not supported." > debug_error_string = "UNKNOWN:Error received from peer > {grpc_message:"Uploading artifact file to local file system destination path > is not supported.", grpc_status:13, > created_time:"2024-05-01T03:01:32.[558](https://github.com/HyukjinKwon/spark/actions/runs/8904629949/job/24454181142#step:9:559)489983+00:00"}" > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48084) pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48084. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46419 [https://github.com/apache/spark/pull/46419] > pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server > - > > Key: SPARK-48084 > URL: https://issues.apache.org/jira/browse/SPARK-48084 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Weichen Xu >Priority: Major > Fix For: 3.5.2 > > > {code} > == > ERROR [3.966s]: test_regressor_evaluator > (pyspark.ml.tests.connect.test_connect_evaluation.EvaluationTestsOnConnect.test_regressor_evaluator) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_evaluation.py", > line 69, in test_regressor_evaluator > rmse = rmse_evaluator.evaluate(df1) > > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", > line 255, in evaluate > return self._evaluate(dataset) >^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", > line 70, in _evaluate > return aggregate_dataframe( > > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", > line 93, in aggregate_dataframe > state = cloudpickle.loads(state) > > AttributeError: Can't get attribute '_class_setstate' on 'pyspark.cloudpickle.cloudpickle' from > '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'> > -- > {code} > {code} > == > ERROR [4.664s]: test_copy > (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_copy) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", > line 115, in test_copy > cvModel = cv.fit(dataset) > ^^^ > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", > line 106, in fit > return self._fit(dataset) >^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line > 437, in _fit > for j, metric in pool.imap_unordered(lambda f: f(), tasks): > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", > line 873, in next > raise value > File > "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", > line 125, in worker > result = (True, func(*args, **kwds)) > ^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line > 437, in > for j, metric in pool.imap_unordered(lambda f: f(), tasks): >^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line > 188, in single_task > metric = evaluator.evaluate( > ^^^ > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", > line 255, in evaluate > return self._evaluate(dataset) >^^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", > line 70, in _evaluate > return aggregate_dataframe( > > File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", > line 93, in aggregate_dataframe > state = cloudpickle.loads(state) > > AttributeError: Can't get attribute '_class_setstate' on 'pyspark.cloudpickle.cloudpickle' from > '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'> > {code} > {code} > == > ERROR [3.938s]: test_fit_minimize_metric > (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_fit_minimize_metric) > -- > Traceback (most recent call last): > File >
[jira] [Updated] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48090: - Description: {code} == FAIL [1.975s]: test_stream_exception (pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 287, in test_stream_exception sq.processAllAvailable() File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", line 129, in processAllAvailable self._execute_streaming_query_cmd(cmd) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", line 177, in _execute_streaming_query_cmd (_, properties) = self._session.client.execute_command(exec_cmd) ^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 982, in execute_command data, _, _, _, properties = self._execute_and_fetch(req) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = 2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent failure: Lost task 0.0 in stage 79.0 (TID 116) (fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1834, in main process() File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1826, in process serializer.dump_stream(out_iter, outfile) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 224, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 145, in dump_stream for obj in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 213, in _batched for item in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1734, in mapper result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) ^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1734, in result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) ^^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 112, in return args_kwargs_offsets, lambda *a: func(*a) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 118, in wrapper return f(*args, **kwargs) ^^ File "/home/runner/work/spark/spark-3 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 291, in test_stream_exception self._assert_exception_tree_contains_msg(e, "ZeroDivisionError") File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 300, in _assert_exception_tree_contains_msg self._assert_exception_tree_contains_msg_connect(exception, msg) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 305, in _assert_exception_tree_contains_msg_connect self.assertTrue( AssertionError: False is not true : Exception tree doesn't contain the expected message: ZeroDivisionError
[jira] [Resolved] (SPARK-48147) Remove all client listeners when local spark session is deleted
[ https://issues.apache.org/jira/browse/SPARK-48147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48147. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46406 [https://github.com/apache/spark/pull/46406] > Remove all client listeners when local spark session is deleted > --- > > Key: SPARK-48147 > URL: https://issues.apache.org/jira/browse/SPARK-48147 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48035) try_add/try_multiply should not be semantic equal to add/multiply
[ https://issues.apache.org/jira/browse/SPARK-48035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48035: Assignee: Gengliang Wang > try_add/try_multiply should not be semantic equal to add/multiply > - > > Key: SPARK-48035 > URL: https://issues.apache.org/jira/browse/SPARK-48035 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > In the current implementation, the following code will return true > {code:java} > val l1 = Literal(1) > val l2 = Literal(2) > val l3 = Literal(3) > val expr1 = Add(Add(l1, l2), l3) > val expr2 = Add(Add(l2, l1, EvalMode.TRY), l3) > expr1.semanticEquals(expr2) {code} > The same applies to Multiply. > When creating MultiCommutativeOp for Add/Multiply, we should ensure all the > evalMode are consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48035) try_add/try_multiply should not be semantic equal to add/multiply
[ https://issues.apache.org/jira/browse/SPARK-48035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48035. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46307 [https://github.com/apache/spark/pull/46307] > try_add/try_multiply should not be semantic equal to add/multiply > - > > Key: SPARK-48035 > URL: https://issues.apache.org/jira/browse/SPARK-48035 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In the current implementation, the following code will return true > {code:java} > val l1 = Literal(1) > val l2 = Literal(2) > val l3 = Literal(3) > val expr1 = Add(Add(l1, l2), l3) > val expr2 = Add(Add(l2, l1, EvalMode.TRY), l3) > expr1.semanticEquals(expr2) {code} > The same applies to Multiply. > When creating MultiCommutativeOp for Add/Multiply, we should ensure all the > evalMode are consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48112) Expose session in SparkConnectPlanner to plugin developers
[ https://issues.apache.org/jira/browse/SPARK-48112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48112. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46363 [https://github.com/apache/spark/pull/46363] > Expose session in SparkConnectPlanner to plugin developers > -- > > Key: SPARK-48112 > URL: https://issues.apache.org/jira/browse/SPARK-48112 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-4: Assignee: (was: Chaoqin Li) > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS, Tests >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-4: -- Reverted at https://github.com/apache/spark/commit/4e69857195a6f95c22f962e3eed950876036c04f > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS, Tests >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-4: - Fix Version/s: (was: 4.0.0) > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS, Tests >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48111) Move tpcds-1g and docker-integration-tests to daily scheduled jobs
Hyukjin Kwon created SPARK-48111: Summary: Move tpcds-1g and docker-integration-tests to daily scheduled jobs Key: SPARK-48111 URL: https://issues.apache.org/jira/browse/SPARK-48111 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0
[ https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48088: Assignee: Hyukjin Kwon > Skip tests being failed in client 3.5 <> server 4.0 > --- > > Key: SPARK-48088 > URL: https://issues.apache.org/jira/browse/SPARK-48088 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We should skip, and set the CI first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0
[ https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48088. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46334 [https://github.com/apache/spark/pull/46334] > Skip tests being failed in client 3.5 <> server 4.0 > --- > > Key: SPARK-48088 > URL: https://issues.apache.org/jira/browse/SPARK-48088 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > We should skip, and set the CI first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48054) Backward compatibility test for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48054: Assignee: Hyukjin Kwon > Backward compatibility test for Spark Connect > - > > Key: SPARK-48054 > URL: https://issues.apache.org/jira/browse/SPARK-48054 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Now that we can run the Spark Connect server separately in CI, we can run the > Spark Connect server of lower version, and higher version of client, and the > opposite as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48054) Backward compatibility test for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48054. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46298 [https://github.com/apache/spark/pull/46298] > Backward compatibility test for Spark Connect > - > > Key: SPARK-48054 > URL: https://issues.apache.org/jira/browse/SPARK-48054 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Now that we can run the Spark Connect server separately in CI, we can run the > Spark Connect server of lower version, and higher version of client, and the > opposite as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48056) [CONNECT][PYTHON] Session not found error should automatically retry during reattach
[ https://issues.apache.org/jira/browse/SPARK-48056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48056. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46297 [https://github.com/apache/spark/pull/46297] > [CONNECT][PYTHON] Session not found error should automatically retry during > reattach > > > Key: SPARK-48056 > URL: https://issues.apache.org/jira/browse/SPARK-48056 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.3 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When an OPERATION_NOT_FOUND error is raised and no prior responses were > received, the client retries the ExecutePlan RPC: > [https://github.com/apache/spark/blob/e6217c111fbdd73f202400494c42091e93d3041f/python/pyspark/sql/connect/client/reattach.py#L257] > > Another error SESSION_NOT_FOUND should follow the same logic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server
Hyukjin Kwon created SPARK-48090: Summary: Streaming exception catch failure in 3.5 client <> 4.0 server Key: SPARK-48090 URL: https://issues.apache.org/jira/browse/SPARK-48090 Project: Spark Issue Type: Sub-task Components: PySpark, Structured Streaming Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == FAIL [1.975s]: test_stream_exception (pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 287, in test_stream_exception sq.processAllAvailable() File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", line 129, in processAllAvailable self._execute_streaming_query_cmd(cmd) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", line 177, in _execute_streaming_query_cmd (_, properties) = self._session.client.execute_command(exec_cmd) ^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 982, in execute_command data, _, _, _, properties = self._execute_and_fetch(req) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line [150](https://github.com/HyukjinKwon/spark/actions/runs/8907172876/job/24460568471#step:9:151)3, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = 2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent failure: Lost task 0.0 in stage 79.0 (TID 116) (fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1834, in main process() File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1826, in process serializer.dump_stream(out_iter, outfile) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 224, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 145, in dump_stream for obj in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 213, in _batched for item in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1734, in mapper result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) ^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1734, in result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs) ^^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 112, in return args_kwargs_offsets, lambda *a: func(*a) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 118, in wrapper return f(*args, **kwargs) ^^ File "/home/runner/work/spark/spark-3 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 291, in test_stream_exception self._assert_exception_tree_contains_msg(e, "ZeroDivisionError") File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py", line 300, in _assert_exception_tree_contains_msg
[jira] [Created] (SPARK-48089) Streaming query listener not working in 3.5 client <> 4.0 server
Hyukjin Kwon created SPARK-48089: Summary: Streaming query listener not working in 3.5 client <> 4.0 server Key: SPARK-48089 URL: https://issues.apache.org/jira/browse/SPARK-48089 Project: Spark Issue Type: Sub-task Components: PySpark, Structured Streaming Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == ERROR [1.488s]: test_listener_events (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py", line 53, in test_listener_events self.spark.streams.addListener(test_listener) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", line 244, in addListener self._execute_streaming_query_manager_cmd(cmd) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", line 260, in _execute_streaming_query_manager_cmd (_, properties) = self._session.client.execute_command(exec_cmd) ^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 982, in execute_command data, _, _, _, properties = self._execute_and_fetch(req) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.SparkConnectGrpcException: (java.io.EOFException) -- {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48085. -- Resolution: Invalid We actually made this all compatible in master branch. I will avoid backporting them because they are just tests. > ANSI enabled by default brings different results in the tests in 3.5 client > <> 4.0 server > - > > Key: SPARK-48085 > URL: https://issues.apache.org/jira/browse/SPARK-48085 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > == > FAIL [0.169s]: test_checking_csv_header > (pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header) > -- > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered > error while reading file > file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. > SQLSTATE: KD001 > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py", > line > [167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168), > in test_checking_csv_header > self.assertRaisesRegex( > AssertionError: "CSV header does not conform to the schema" does not match > "(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered > error while reading file > file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. > SQLSTATE: KD001" > {code} > {code} > == > ERROR [0.059s]: test_large_variable_types > (pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py", > line 115, in test_large_variable_types > actual = df.mapInPandas(func, "str string, bin binary").collect() > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect > table, schema = self._session.client.to_table(query) > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 858, in to_table > table, schema, _, _, _ = self._execute_and_fetch(req) > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1283, in _execute_and_fetch > for response in self._execute_and_fetch_as_iterator(req): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1264, in _execute_and_fetch_as_iterator > self._handle_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1503, in _handle_error > self._handle_rpc_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1539, in _handle_rpc_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.IllegalArgumentException: > [INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in > `encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', > 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: > 2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3 > {code} > {code} > == > ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass > (pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", > line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass >
[jira] [Updated] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48085: - Description: {code} == FAIL [0.169s]: test_checking_csv_header (pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header) -- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. SQLSTATE: KD001 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py", line [167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168), in test_checking_csv_header self.assertRaisesRegex( AssertionError: "CSV header does not conform to the schema" does not match "(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. SQLSTATE: KD001" {code} {code} == ERROR [0.059s]: test_large_variable_types (pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py", line 115, in test_large_variable_types actual = df.mapInPandas(func, "str string, bin binary").collect() File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 858, in to_table table, schema, _, _, _ = self._execute_and_fetch(req) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.IllegalArgumentException: [INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in `encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3 {code} {code} == ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass (pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass assertDataFrameEqual(df1, df2, rtol=1e-1) File "/home/runner/work/spark/spark-3.5/python/pyspark/testing/utils.py", line 595, in assertDataFrameEqual actual_list = actual.collect() File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 858, in to_table table, schema, _, _, _ = self._execute_and_fetch(req) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for
[jira] [Updated] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0
[ https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48088: - Affects Version/s: 3.5.2 (was: 4.0.0) > Skip tests being failed in client 3.5 <> server 4.0 > --- > > Key: SPARK-48088 > URL: https://issues.apache.org/jira/browse/SPARK-48088 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.2 >Reporter: Hyukjin Kwon >Priority: Major > > We should skip, and set the CI first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0
Hyukjin Kwon created SPARK-48088: Summary: Skip tests being failed in client 3.5 <> server 4.0 Key: SPARK-48088 URL: https://issues.apache.org/jira/browse/SPARK-48088 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We should skip, and set the CI first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server
Hyukjin Kwon created SPARK-48087: Summary: Python UDTF incompatibility in 3.5 client <> 4.0 server Key: SPARK-48087 URL: https://issues.apache.org/jira/browse/SPARK-48087 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == FAIL [0.103s]: test_udtf_init_with_additional_args (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args) -- pyspark.errors.exceptions.connect.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1816, in main func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, eval_type) self._check_result_or_exception(TestUDTF, ret_type, expected) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", line 598, in _check_result_or_exception with self.assertRaisesRegex(err_type, expected): AssertionError: "AttributeError" does not match " An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1834, in main process() File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1826, in process serializer.dump_stream(out_iter, outfile) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 224, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 145, in dump_stream for obj in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 213, in _batched for item in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1391, in mapper yield eval(*[a[o] for o in args_kwargs_offsets]) ^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1371, in evaluate return tuple(map(verify_and_convert_result, res)) ^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1340, in verify_and_convert_result return toInternal(result) ^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1291, in toInternal return tuple( ^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1292, in f.toInternal(v) if c else v ^^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 907, in toInternal return self.dataType.toInternal(obj) ^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 372, in toInternal calendar.timegm(dt.utctimetuple()) if dt.tzinfo else time.mktime(dt.timetuple()) ..." {code} {code} == FAIL [0.096s]: test_udtf_init_with_additional_args (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args) -- pyspark.errors.exceptions.connect.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1816, in main func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, eval_type) ^^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 946, in read_udtf raise PySparkRuntimeError( pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD] Failed to evaluate the user-defined table function 'TestUDTF' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again. During handling of the above exception, another exception occurred: Traceback (most recent call last): File
[jira] [Created] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server
Hyukjin Kwon created SPARK-48085: Summary: ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server Key: SPARK-48085 URL: https://issues.apache.org/jira/browse/SPARK-48085 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == FAIL [0.169s]: test_checking_csv_header (pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header) -- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. SQLSTATE: KD001 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py", line [167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168), in test_checking_csv_header self.assertRaisesRegex( AssertionError: "CSV header does not conform to the schema" does not match "(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. SQLSTATE: KD001" {code} {code} == ERROR [0.059s]: test_large_variable_types (pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py", line 115, in test_large_variable_types actual = df.mapInPandas(func, "str string, bin binary").collect() File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 858, in to_table table, schema, _, _, _ = self._execute_and_fetch(req) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.IllegalArgumentException: [INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in `encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3 {code} {code} == ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass (pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass assertDataFrameEqual(df1, df2, rtol=1e-1) File "/home/runner/work/spark/spark-3.5/python/pyspark/testing/utils.py", line 595, in assertDataFrameEqual actual_list = actual.collect() File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 858, in to_table
[jira] [Created] (SPARK-48084) pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server
Hyukjin Kwon created SPARK-48084: Summary: pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server Key: SPARK-48084 URL: https://issues.apache.org/jira/browse/SPARK-48084 Project: Spark Issue Type: Sub-task Components: ML, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == ERROR [3.966s]: test_regressor_evaluator (pyspark.ml.tests.connect.test_connect_evaluation.EvaluationTestsOnConnect.test_regressor_evaluator) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_evaluation.py", line 69, in test_regressor_evaluator rmse = rmse_evaluator.evaluate(df1) File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", line 255, in evaluate return self._evaluate(dataset) ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", line 70, in _evaluate return aggregate_dataframe( File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", line 93, in aggregate_dataframe state = cloudpickle.loads(state) AttributeError: Can't get attribute '_class_setstate' on -- {code} {code} == ERROR [4.664s]: test_copy (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_copy) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", line 115, in test_copy cvModel = cv.fit(dataset) ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", line 106, in fit return self._fit(dataset) ^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 437, in _fit for j, metric in pool.imap_unordered(lambda f: f(), tasks): File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", line 873, in next raise value File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 437, in for j, metric in pool.imap_unordered(lambda f: f(), tasks): ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 188, in single_task metric = evaluator.evaluate( ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", line 255, in evaluate return self._evaluate(dataset) ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", line 70, in _evaluate return aggregate_dataframe( File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", line 93, in aggregate_dataframe state = cloudpickle.loads(state) AttributeError: Can't get attribute '_class_setstate' on {code} {code} == ERROR [3.938s]: test_fit_minimize_metric (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_fit_minimize_metric) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", line 149, in test_fit_minimize_metric cvModel = cv.fit(dataset) ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", line 106, in fit return self._fit(dataset) ^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 437, in _fit for j, metric in pool.imap_unordered(lambda f: f(), tasks): File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", line 873, in next raise value File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line
[jira] [Created] (SPARK-48083) session.copyFromLocalToFs failure with 3.5 client <> 4.0 server
Hyukjin Kwon created SPARK-48083: Summary: session.copyFromLocalToFs failure with 3.5 client <> 4.0 server Key: SPARK-48083 URL: https://issues.apache.org/jira/browse/SPARK-48083 Project: Spark Issue Type: Sub-task Components: Connect, ML, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == ERROR [1.120s]: test_save_load (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load) -- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py", line 144, in test_save_load estimator.save(fs_path) File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", line 248, in save _copy_dir_from_local_to_fs(tmp_local_dir, path) File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", line 57, in _copy_dir_from_local_to_fs _copy_file_from_local_to_fs(file_path, dest_file_path) File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", line 39, in _copy_file_from_local_to_fs session.copyFromLocalToFs(local_path, dest_path) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/session.py", line 756, in copyFromLocalToFs self._client.copy_from_local_to_fs(local_path, dest_path) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1549, in copy_from_local_to_fs self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", line 280, in _add_forward_to_fs_artifacts self._request_add_artifacts(requests) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", line 259, in _request_add_artifacts response: proto.AddArtifactsResponse = self._retrieve_responses(requests) ^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py", line 256, in _retrieve_responses return self._stub.AddArtifacts(requests, metadata=self._metadata) ^^ File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py", line 1536, in __call__ return _end_unary_response_blocking(state, call, False, None) ^^ File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable ^^ grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INTERNAL details = "Uploading artifact file to local file system destination path is not supported." debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Uploading artifact file to local file system destination path is not supported.", grpc_status:13, created_time:"2024-05-01T03:01:32.[558](https://github.com/HyukjinKwon/spark/actions/runs/8904629949/job/24454181142#step:9:559)489983+00:00"}" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48086) Different Arrow versions in client and server
Hyukjin Kwon created SPARK-48086: Summary: Different Arrow versions in client and server Key: SPARK-48086 URL: https://issues.apache.org/jira/browse/SPARK-48086 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == FAIL [1.071s]: test_pandas_udf_arrow_overflow (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow) -- pyspark.errors.exceptions.connect.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 302, in _create_array return pa.Array.from_pandas( ^ File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 323, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1834, in main process() File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1826, in process serializer.dump_stream(out_iter, outfile) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 531, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 104, in dump_stream for batch in iterator: File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 525, in init_stream_yield_batches batch = self._create_batch(series) ^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 511, in _create_batch arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast)) ^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 330, in _create_array raise PySparkValueError(error_msg % (series.dtype, series.na... During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py", line 299, in test_pandas_udf_arrow_overflow with self.assertRaisesRegex( AssertionError: "Exception thrown when converting pandas.Series" does not match " An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 302, in _create_array return pa.Array.from_pandas( ^ File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 323, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1834, in main process() File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1826, in process serializer.dump_stream(out_iter, outfile) File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 531, in dump_stream Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py", line 279, in test_pandas_udf_detect_unsafe_type_conversion with self.assertRaisesRegex( AssertionError: "Exception thrown when converting pandas.Series" does not match " An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line
[jira] [Updated] (SPARK-48054) Backward compatibility test for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48054: - Parent: SPARK-48082 Issue Type: Sub-task (was: Improvement) > Backward compatibility test for Spark Connect > - > > Key: SPARK-48054 > URL: https://issues.apache.org/jira/browse/SPARK-48054 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Now that we can run the Spark Connect server separately in CI, we can run the > Spark Connect server of lower version, and higher version of client, and the > opposite as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48082) Recover compatibility with Spark Connect client 3.5 <> Spark Connect server 4.0
Hyukjin Kwon created SPARK-48082: Summary: Recover compatibility with Spark Connect client 3.5 <> Spark Connect server 4.0 Key: SPARK-48082 URL: https://issues.apache.org/jira/browse/SPARK-48082 Project: Spark Issue Type: Umbrella Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/pull/46298#issuecomment-2087905857 There are test failures identified when you run Spark 3.5 Spark Connect client <> Spark Connect server 4.0. They should ideally be compatible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45988) Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11
[ https://issues.apache.org/jira/browse/SPARK-45988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45988: - Fix Version/s: 3.5.2 > Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11 > - > > Key: SPARK-45988 > URL: https://issues.apache.org/jira/browse/SPARK-45988 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > https://github.com/apache/spark/actions/runs/6914662405/job/18812759697 > {code} > == > ERROR [0.686s]: test_apply_batch_with_type > (pyspark.pandas.tests.computation.test_apply_func.FrameApplyFunctionTests.test_apply_batch_with_type) > -- > Traceback (most recent call last): > File > "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py", > line 248, in test_apply_batch_with_type > def identify3(x) -> ps.DataFrame[float, [int, List[int]]]: > ^ > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in > __class_getitem__ > return create_tuple_for_frame_type(params) >^^^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 721, in create_tuple_for_frame_type > return Tuple[_to_type_holders(params)] > > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 766, in _to_type_holders > data_types = _new_type_holders(data_types, NameTypeHolder) > ^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 832, in _new_type_holders > raise TypeError( > TypeError: Type hints should be specified as one of: > - DataFrame[type, type, ...] > - DataFrame[name: type, name: type, ...] > - DataFrame[dtypes instance] > - DataFrame[zip(names, types)] > - DataFrame[index_type, [type, ...]] > - DataFrame[(index_name, index_type), [(name, type), ...]] > - DataFrame[dtype instance, dtypes instance] > - DataFrame[(index_name, index_type), zip(names, types)] > - DataFrame[[index_type, ...], [type, ...]] > - DataFrame[[(index_name, index_type), ...], [(name, type), ...]] > - DataFrame[dtypes instance, dtypes instance] > - DataFrame[zip(index_names, index_types), zip(names, types)] > However, got (, typing.List[int]). > -- > Ran 10 tests in 34.327s > FAILED (errors=1) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45989) Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in Python 3.11
[ https://issues.apache.org/jira/browse/SPARK-45989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45989: - Fix Version/s: 3.5.2 > Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in > Python 3.11 > > > Key: SPARK-45989 > URL: https://issues.apache.org/jira/browse/SPARK-45989 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0, 3.5.2 > > > https://github.com/apache/spark/actions/runs/6914662405/job/18816505612 > {code} > == > ERROR [1.237s]: test_apply_batch_with_type > (pyspark.pandas.tests.connect.computation.test_parity_apply_func.FrameParityApplyFunctionTests.test_apply_batch_with_type) > -- > Traceback (most recent call last): > File > "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py", > line 248, in test_apply_batch_with_type > def identify3(x) -> ps.DataFrame[float, [int, List[int]]]: > ^ > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in > __class_getitem__ > return create_tuple_for_frame_type(params) >^^^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 721, in create_tuple_for_frame_type > return Tuple[_to_type_holders(params)] > > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 766, in _to_type_holders > data_types = _new_type_holders(data_types, NameTypeHolder) > ^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 832, in _new_type_holders > raise TypeError( > TypeError: Type hints should be specified as one of: > - DataFrame[type, type, ...] > - DataFrame[name: type, name: type, ...] > - DataFrame[dtypes instance] > - DataFrame[zip(names, types)] > - DataFrame[index_type, [type, ...]] > - DataFrame[(index_name, index_type), [(name, type), ...]] > - DataFrame[dtype instance, dtypes instance] > - DataFrame[(index_name, index_type), zip(names, types)] > - DataFrame[[index_type, ...], [type, ...]] > - DataFrame[[(index_name, index_type), ...], [(name, type), ...]] > - DataFrame[dtypes instance, dtypes instance] > - DataFrame[zip(index_names, index_types), zip(names, types)] > However, got (, typing.List[int]). > -- > Ran 10 tests in 78.247s > FAILED (errors=1) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48075) Enforce type checking in from_avro and to_avro functions
[ https://issues.apache.org/jira/browse/SPARK-48075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48075. -- Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/46324 > Enforce type checking in from_avro and to_avro functions > > > Key: SPARK-48075 > URL: https://issues.apache.org/jira/browse/SPARK-48075 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Fanyue Xia >Assignee: Fanyue Xia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Because types are not enforced in Python, users can pass in all sort of > arguments for functions. This could lead to invoking wrong functions in spark > connect. > If we perform type checking for arguments and output sensible errors when the > type of arguments passed into the functions don’t match, we can give the user > a better user experience -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48064) Improve error messages for routine related errors
[ https://issues.apache.org/jira/browse/SPARK-48064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48064: Assignee: Allison Wang > Improve error messages for routine related errors > - > > Key: SPARK-48064 > URL: https://issues.apache.org/jira/browse/SPARK-48064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48064) Improve error messages for routine related errors
[ https://issues.apache.org/jira/browse/SPARK-48064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48064. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46310 [https://github.com/apache/spark/pull/46310] > Improve error messages for routine related errors > - > > Key: SPARK-48064 > URL: https://issues.apache.org/jira/browse/SPARK-48064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43727) Parity returnType check in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43727. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46300 [https://github.com/apache/spark/pull/46300] > Parity returnType check in Spark Connect > > > Key: SPARK-43727 > URL: https://issues.apache.org/jira/browse/SPARK-43727 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48058) `UserDefinedFunction.returnType` parse the DDL string
[ https://issues.apache.org/jira/browse/SPARK-48058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48058: Assignee: Ruifeng Zheng > `UserDefinedFunction.returnType` parse the DDL string > - > > Key: SPARK-48058 > URL: https://issues.apache.org/jira/browse/SPARK-48058 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48058) `UserDefinedFunction.returnType` parse the DDL string
[ https://issues.apache.org/jira/browse/SPARK-48058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48058. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46300 [https://github.com/apache/spark/pull/46300] > `UserDefinedFunction.returnType` parse the DDL string > - > > Key: SPARK-48058 > URL: https://issues.apache.org/jira/browse/SPARK-48058 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48062) Add pyspark test for SimpleDataSourceStreamingReader
[ https://issues.apache.org/jira/browse/SPARK-48062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48062. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46306 [https://github.com/apache/spark/pull/46306] > Add pyspark test for SimpleDataSourceStreamingReader > > > Key: SPARK-48062 > URL: https://issues.apache.org/jira/browse/SPARK-48062 > Project: Spark > Issue Type: Test > Components: SS >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add end to end pyspark test for SimpleDataSourceStreamingReader -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48062) Add pyspark test for SimpleDataSourceStreamingReader
[ https://issues.apache.org/jira/browse/SPARK-48062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48062: Assignee: Chaoqin Li > Add pyspark test for SimpleDataSourceStreamingReader > > > Key: SPARK-48062 > URL: https://issues.apache.org/jira/browse/SPARK-48062 > Project: Spark > Issue Type: Test > Components: SS >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Add end to end pyspark test for SimpleDataSourceStreamingReader -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46894) Move PySpark error conditions into standalone JSON file
[ https://issues.apache.org/jira/browse/SPARK-46894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46894: Assignee: Nicholas Chammas > Move PySpark error conditions into standalone JSON file > --- > > Key: SPARK-46894 > URL: https://issues.apache.org/jira/browse/SPARK-46894 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46894) Move PySpark error conditions into standalone JSON file
[ https://issues.apache.org/jira/browse/SPARK-46894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46894. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44920 [https://github.com/apache/spark/pull/44920] > Move PySpark error conditions into standalone JSON file > --- > > Key: SPARK-46894 > URL: https://issues.apache.org/jira/browse/SPARK-46894 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48052) Recover pyspark-connect CI by parent classes
[ https://issues.apache.org/jira/browse/SPARK-48052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48052. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46294 [https://github.com/apache/spark/pull/46294] > Recover pyspark-connect CI by parent classes > > > Key: SPARK-48052 > URL: https://issues.apache.org/jira/browse/SPARK-48052 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
[ https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48053: Assignee: Hyukjin Kwon > SparkSession.createDataFrame should warn for unsupported options > > > Key: SPARK-48053 > URL: https://issues.apache.org/jira/browse/SPARK-48053 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > spark.createDataFrame([1,2,3], verifySchema=True) > {code} > and > {code} > spark.createDataFrame([1,2,3], samplingRatio=0.5) > {code} > Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
[ https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48053. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46295 [https://github.com/apache/spark/pull/46295] > SparkSession.createDataFrame should warn for unsupported options > > > Key: SPARK-48053 > URL: https://issues.apache.org/jira/browse/SPARK-48053 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > spark.createDataFrame([1,2,3], verifySchema=True) > {code} > and > {code} > spark.createDataFrame([1,2,3], samplingRatio=0.5) > {code} > Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48054) Backward compatibility test for Spark Connect
Hyukjin Kwon created SPARK-48054: Summary: Backward compatibility test for Spark Connect Key: SPARK-48054 URL: https://issues.apache.org/jira/browse/SPARK-48054 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Now that we can run the Spark Connect server separately in CI, we can run the Spark Connect server of lower version, and higher version of client, and the opposite as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
Hyukjin Kwon created SPARK-48053: Summary: SparkSession.createDataFrame should warn for unsupported options Key: SPARK-48053 URL: https://issues.apache.org/jira/browse/SPARK-48053 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} spark.createDataFrame([1,2,3], verifySchema=True) {code} and {code} spark.createDataFrame([1,2,3], samplingRatio=0.5) {code} Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48052) Recover pyspark-connect CI by parent classes
Hyukjin Kwon created SPARK-48052: Summary: Recover pyspark-connect CI by parent classes Key: SPARK-48052 URL: https://issues.apache.org/jira/browse/SPARK-48052 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48039) Update the error class for `group.apply`
[ https://issues.apache.org/jira/browse/SPARK-48039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48039. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46277 [https://github.com/apache/spark/pull/46277] > Update the error class for `group.apply` > > > Key: SPARK-48039 > URL: https://issues.apache.org/jira/browse/SPARK-48039 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47292) safeMapToJValue should consider when map is null
[ https://issues.apache.org/jira/browse/SPARK-47292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47292. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46260 [https://github.com/apache/spark/pull/46260] > safeMapToJValue should consider when map is null > > > Key: SPARK-47292 > URL: https://issues.apache.org/jira/browse/SPARK-47292 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0, 3.5.1 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error
[ https://issues.apache.org/jira/browse/SPARK-48014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48014: Assignee: Allison Wang > Change the makeFromJava error in EvaluatePython to a user-facing error > -- > > Key: SPARK-48014 > URL: https://issues.apache.org/jira/browse/SPARK-48014 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error
[ https://issues.apache.org/jira/browse/SPARK-48014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48014. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46250 [https://github.com/apache/spark/pull/46250] > Change the makeFromJava error in EvaluatePython to a user-facing error > -- > > Key: SPARK-48014 > URL: https://issues.apache.org/jira/browse/SPARK-48014 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48024) Enable `UDFParityTests.test_udf_timestamp_ntz`
[ https://issues.apache.org/jira/browse/SPARK-48024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48024. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46257 [https://github.com/apache/spark/pull/46257] > Enable `UDFParityTests.test_udf_timestamp_ntz` > -- > > Key: SPARK-48024 > URL: https://issues.apache.org/jira/browse/SPARK-48024 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners
[ https://issues.apache.org/jira/browse/SPARK-48002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48002. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46237 [https://github.com/apache/spark/pull/46237] > Add Observed metrics test in PySpark StreamingQueryListeners > > > Key: SPARK-48002 > URL: https://issues.apache.org/jira/browse/SPARK-48002 > Project: Spark > Issue Type: New Feature > Components: SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners
[ https://issues.apache.org/jira/browse/SPARK-48002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48002: Assignee: Wei Liu > Add Observed metrics test in PySpark StreamingQueryListeners > > > Key: SPARK-48002 > URL: https://issues.apache.org/jira/browse/SPARK-48002 > Project: Spark > Issue Type: New Feature > Components: SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47993) Drop Python 3.8 support
[ https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47993. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46228 [https://github.com/apache/spark/pull/46228] > Drop Python 3.8 support > --- > > Key: SPARK-47993 > URL: https://issues.apache.org/jira/browse/SPARK-47993 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available, release-notes > Fix For: 4.0.0 > > > Python 3.8 is EOL in this October. Considering the release schedule, we > should better drop it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47993) Drop Python 3.8 support
[ https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47993: - Labels: release-notes (was: release-note) > Drop Python 3.8 support > --- > > Key: SPARK-47993 > URL: https://issues.apache.org/jira/browse/SPARK-47993 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Python 3.8 is EOL in this October. Considering the release schedule, we > should better drop it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47962) Improve doc test in pyspark dataframe
[ https://issues.apache.org/jira/browse/SPARK-47962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47962. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46189 [https://github.com/apache/spark/pull/46189] > Improve doc test in pyspark dataframe > - > > Key: SPARK-47962 > URL: https://issues.apache.org/jira/browse/SPARK-47962 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The doc test for dataframe's observe API doesn't use a streaming DF which is > wrong. We should start a streaming df to make sure it runs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47965. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46197 [https://github.com/apache/spark/pull/46197] > Avoid orNull in TypedConfigBuilder and OptionalConfigEntry > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47965: Assignee: Hyukjin Kwon > Avoid orNull in TypedConfigBuilder and OptionalConfigEntry > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47964) Hide SQLContext and HiveContext
[ https://issues.apache.org/jira/browse/SPARK-47964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47964. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46194 [https://github.com/apache/spark/pull/46194] > Hide SQLContext and HiveContext > --- > > Key: SPARK-47964 > URL: https://issues.apache.org/jira/browse/SPARK-47964 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47965: - Issue Type: Improvement (was: Bug) > Avoid orNull in TypedConfigBuilder > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47965: - Summary: Avoid orNull in TypedConfigBuilder and OptionalConfigEntry (was: Avoid orNull in TypedConfigBuilder) > Avoid orNull in TypedConfigBuilder and OptionalConfigEntry > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47965: - Priority: Minor (was: Major) > Avoid orNull in TypedConfigBuilder > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47965) Avoid orNull in TypedConfigBuilder
Hyukjin Kwon created SPARK-47965: Summary: Avoid orNull in TypedConfigBuilder Key: SPARK-47965 URL: https://issues.apache.org/jira/browse/SPARK-47965 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Configuration values/keys cannot be nulls. We should fix: {code} diff --git a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala index 1f19e9444d38..d06535722625 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( import ConfigHelpers._ def this(parent: ConfigBuilder, converter: String => T) = { -this(parent, converter, Option(_).map(_.toString).orNull) +this(parent, converter, { v: T => v.toString }) } /** Apply a transformation to the user-provided values of the config entry. */ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47933: Assignee: Hyukjin Kwon > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47933. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46155 [https://github.com/apache/spark/pull/46155] > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47903. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46122 [https://github.com/apache/spark/pull/46122] > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47903: Assignee: Harsh Motwani > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47890) Add python and scala dataframe variant expression aliases.
[ https://issues.apache.org/jira/browse/SPARK-47890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47890. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46123 [https://github.com/apache/spark/pull/46123] > Add python and scala dataframe variant expression aliases. > -- > > Key: SPARK-47890 > URL: https://issues.apache.org/jira/browse/SPARK-47890 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
Hyukjin Kwon created SPARK-47933: Summary: Parent Column class for Spark Connect and Spark Classic Key: SPARK-47933 URL: https://issues.apache.org/jira/browse/SPARK-47933 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org