from:"Hyukjin Kwon \(Jira\)"

[jira] [Commented] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-08 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844808#comment-17844808
 ] 

Hyukjin Kwon commented on SPARK-48094:
--

Woohoo!

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> - https://infra-reports.apache.org/#ghactions=spark=168
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48205) Remove the private[sql] modifier for Python data sources

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48205.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46487
[https://github.com/apache/spark/pull/46487]

> Remove the private[sql] modifier for Python data sources
> 
>
> Key: SPARK-48205
> URL: https://issues.apache.org/jira/browse/SPARK-48205
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To make it consistent with UDFs and UDTFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48205) Remove the private[sql] modifier for Python data sources

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48205:


Assignee: Allison Wang

> Remove the private[sql] modifier for Python data sources
> 
>
> Key: SPARK-48205
> URL: https://issues.apache.org/jira/browse/SPARK-48205
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> To make it consistent with UDFs and UDTFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48087:


Assignee: Hyukjin Kwon

> Python UDTF incompatibility in 3.5 client <> 4.0 server
> ---
>
> Key: SPARK-48087
> URL: https://issues.apache.org/jira/browse/SPARK-48087
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> FAIL [0.103s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
> self._check_result_or_exception(TestUDTF, ret_type, expected)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", 
> line 598, in _check_result_or_exception
> with self.assertRaisesRegex(err_type, expected):
> AssertionError: "AttributeError" does not match "
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 145, in dump_stream
> for obj in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 213, in _batched
> for item in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1391, in mapper
> yield eval(*[a[o] for o in args_kwargs_offsets])
>   ^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1371, in evaluate
> return tuple(map(verify_and_convert_result, res))
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1340, in verify_and_convert_result
> return toInternal(result)
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1291, in toInternal
> return tuple(
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1292, in 
> f.toInternal(v) if c else v
> ^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 907, in toInternal
> return self.dataType.toInternal(obj)
>^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 372, in toInternal
> calendar.timegm(dt.utctimetuple()) if dt.tzinfo else 
> time.mktime(dt.timetuple())
> ..."
> {code}
> {code}
> ==
> FAIL [0.096s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
>
> ^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 946, in read_udtf
> raise PySparkRuntimeError(
> pyspark.errors.exceptions.base.PySparkRuntimeError: 
>

[jira] [Resolved] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48087.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46473
[https://github.com/apache/spark/pull/46473]

> Python UDTF incompatibility in 3.5 client <> 4.0 server
> ---
>
> Key: SPARK-48087
> URL: https://issues.apache.org/jira/browse/SPARK-48087
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> {code}
> ==
> FAIL [0.103s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
> self._check_result_or_exception(TestUDTF, ret_type, expected)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", 
> line 598, in _check_result_or_exception
> with self.assertRaisesRegex(err_type, expected):
> AssertionError: "AttributeError" does not match "
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 145, in dump_stream
> for obj in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 213, in _batched
> for item in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1391, in mapper
> yield eval(*[a[o] for o in args_kwargs_offsets])
>   ^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1371, in evaluate
> return tuple(map(verify_and_convert_result, res))
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1340, in verify_and_convert_result
> return toInternal(result)
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1291, in toInternal
> return tuple(
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1292, in 
> f.toInternal(v) if c else v
> ^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 907, in toInternal
> return self.dataType.toInternal(obj)
>^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 372, in toInternal
> calendar.timegm(dt.utctimetuple()) if dt.tzinfo else 
> time.mktime(dt.timetuple())
> ..."
> {code}
> {code}
> ==
> FAIL [0.096s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
>
> ^^^
>   File 
>

[jira] [Resolved] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48094.
--
  Assignee: Dongjoon Hyun
Resolution: Done

Seems like we're done :-)? I will resolve this one for now but feel free to 
reopen if there are more work to be done!

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> - https://infra-reports.apache.org/#ghactions=spark=168
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48094:
-
Fix Version/s: 4.0.0

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> - https://infra-reports.apache.org/#ghactions=spark=168
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48163:
-
Fix Version/s: (was: 4.0.0)

> Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-48163:
--
  Assignee: (was: Dongjoon Hyun)

> Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844618#comment-17844618
 ] 

Hyukjin Kwon commented on SPARK-48163:
--

reverted in 
https://github.com/apache/spark/commit/bd896cac168aa5793413058ca706c73705edbf96

> Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48164.
--
Resolution: Invalid

> Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> --
>
> Key: SPARK-48164
> URL: https://issues.apache.org/jira/browse/SPARK-48164
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48164:
-
Target Version/s:   (was: 4.0.0)

> Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> --
>
> Key: SPARK-48164
> URL: https://issues.apache.org/jira/browse/SPARK-48164
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48193.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46471
[https://github.com/apache/spark/pull/46471]

> Make `maven-deploy-plugin` retry 3 times
> 
>
> Key: SPARK-48193
> URL: https://issues.apache.org/jira/browse/SPARK-48193
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48193:


Assignee: BingKun Pan

> Make `maven-deploy-plugin` retry 3 times
> 
>
> Key: SPARK-48193
> URL: https://issues.apache.org/jira/browse/SPARK-48193
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48192) Enable TPC-DS tests in forked repository

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48192.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46470
[https://github.com/apache/spark/pull/46470]

> Enable TPC-DS tests in forked repository
> 
>
> Key: SPARK-48192
> URL: https://issues.apache.org/jira/browse/SPARK-48192
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> TPC-DS is pretty important in SQL. Shoud at least enable it in forked 
> repositories (PR builders) which does not consume ASF resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48192) Enable TPC-DS tests in forked repository

2024-05-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48192:


Assignee: Hyukjin Kwon

> Enable TPC-DS tests in forked repository
> 
>
> Key: SPARK-48192
> URL: https://issues.apache.org/jira/browse/SPARK-48192
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> TPC-DS is pretty important in SQL. Shoud at least enable it in forked 
> repositories (PR builders) which does not consume ASF resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48192) Enable TPC-DS tests in forked repository

2024-05-08 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48192:


 Summary: Enable TPC-DS tests in forked repository
 Key: SPARK-48192
 URL: https://issues.apache.org/jira/browse/SPARK-48192
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


TPC-DS is pretty important in SQL. Shoud at least enable it in forked 
repositories (PR builders) which does not consume ASF resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48045:


Assignee: Saidatt Sinai Amonkar

> Pandas API groupby with multi-agg-relabel ignores as_index=False
> 
>
> Key: SPARK-48045
> URL: https://issues.apache.org/jira/browse/SPARK-48045
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.5.1
> Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2
>Reporter: Paul George
>Assignee: Saidatt Sinai Amonkar
>Priority: Minor
>  Labels: pull-request-available
>
> A Pandas API DataFrame groupby with as_index=False and a multilevel 
> relabeling, such as
> {code:java}
> from pyspark import pandas as ps
> ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", 
> as_index=False).agg(b_max=("b", "max")){code}
> fails to include group keys in the resulting DataFrame. This diverges from 
> expected behavior as well as from the behavior of native Pandas, e.g.
> *actual*
> {code:java}
>    b_max
> 0      1 {code}
> *expected*
> {code:java}
>    a  b_max
> 0  0      1 {code}
>  
> A possible fix is to prepend groupby key columns to {{*order*}} and 
> {{*columns*}} before filtering here:  
> [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48045.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46391
[https://github.com/apache/spark/pull/46391]

> Pandas API groupby with multi-agg-relabel ignores as_index=False
> 
>
> Key: SPARK-48045
> URL: https://issues.apache.org/jira/browse/SPARK-48045
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.5.1
> Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2
>Reporter: Paul George
>Assignee: Saidatt Sinai Amonkar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> A Pandas API DataFrame groupby with as_index=False and a multilevel 
> relabeling, such as
> {code:java}
> from pyspark import pandas as ps
> ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", 
> as_index=False).agg(b_max=("b", "max")){code}
> fails to include group keys in the resulting DataFrame. This diverges from 
> expected behavior as well as from the behavior of native Pandas, e.g.
> *actual*
> {code:java}
>    b_max
> 0      1 {code}
> *expected*
> {code:java}
>    a  b_max
> 0  0      1 {code}
>  
> A possible fix is to prepend groupby key columns to {{*order*}} and 
> {{*columns*}} before filtering here:  
> [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48086) Different Arrow versions in client and server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48086.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46431
[https://github.com/apache/spark/pull/46431]

> Different Arrow versions in client and server 
> --
>
> Key: SPARK-48086
> URL: https://issues.apache.org/jira/browse/SPARK-48086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> {code}
> ==
> FAIL [1.071s]: test_pandas_udf_arrow_overflow 
> (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 302, in _create_array
> return pa.Array.from_pandas(
>^
>   File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 531, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>
> 
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 104, in dump_stream
> for batch in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 525, in init_stream_yield_batches
> batch = self._create_batch(series)
> ^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 511, in _create_batch
> arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast))
> ^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 330, in _create_array
> raise PySparkValueError(error_msg % (series.dtype, series.na...
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py",
>  line 299, in test_pandas_udf_arrow_overflow
> with self.assertRaisesRegex(
> AssertionError: "Exception thrown when converting pandas.Series" does not 
> match "
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 302, in _create_array
> return pa.Array.from_pandas(
>^
>   File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
>

[jira] [Assigned] (SPARK-48086) Different Arrow versions in client and server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48086:


Assignee: Hyukjin Kwon

> Different Arrow versions in client and server 
> --
>
> Key: SPARK-48086
> URL: https://issues.apache.org/jira/browse/SPARK-48086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> FAIL [1.071s]: test_pandas_udf_arrow_overflow 
> (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 302, in _create_array
> return pa.Array.from_pandas(
>^
>   File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 531, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>
> 
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 104, in dump_stream
> for batch in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 525, in init_stream_yield_batches
> batch = self._create_batch(series)
> ^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 511, in _create_batch
> arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast))
> ^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 330, in _create_array
> raise PySparkValueError(error_msg % (series.dtype, series.na...
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py",
>  line 299, in test_pandas_udf_arrow_overflow
> with self.assertRaisesRegex(
> AssertionError: "Exception thrown when converting pandas.Series" does not 
> match "
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 302, in _create_array
> return pa.Array.from_pandas(
>^
>   File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
>  line 531, in dump_stream
> Traceback (most recent call last):
>   File 
>

[jira] [Resolved] (SPARK-48113) Allow Plugins to integrate with Spark Connect

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48113.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46364
[https://github.com/apache/spark/pull/46364]

> Allow Plugins to integrate with Spark Connect
> -
>
> Key: SPARK-48113
> URL: https://issues.apache.org/jira/browse/SPARK-48113
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48167) Skip known behaviour change by SPARK-46122

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48167.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46430
[https://github.com/apache/spark/pull/46430]

> Skip known behaviour change by SPARK-46122
> --
>
> Key: SPARK-48167
> URL: https://issues.apache.org/jira/browse/SPARK-48167
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48167) Skip known behaviour change by SPARK-46122

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48167:


Assignee: Hyukjin Kwon

> Skip known behaviour change by SPARK-46122
> --
>
> Key: SPARK-48167
> URL: https://issues.apache.org/jira/browse/SPARK-48167
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48167) Skip known behaviour change by SPARK-46122

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48167:
-
Affects Version/s: 3.5.2
   (was: 4.0.0)

> Skip known behaviour change by SPARK-46122
> --
>
> Key: SPARK-48167
> URL: https://issues.apache.org/jira/browse/SPARK-48167
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48167) Skip known behaviour change by SPARK-46122

2024-05-07 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48167:


 Summary: Skip known behaviour change by SPARK-46122
 Key: SPARK-48167
 URL: https://issues.apache.org/jira/browse/SPARK-48167
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48090.
--
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46426
[https://github.com/apache/spark/pull/46426]

> Streaming exception catch failure in 3.5 client <> 4.0 server
> -
>
> Key: SPARK-48090
> URL: https://issues.apache.org/jira/browse/SPARK-48090
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> {code}
> ==
> FAIL [1.975s]: test_stream_exception 
> (pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
>  line 287, in test_stream_exception
> sq.processAllAvailable()
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 129, in processAllAvailable
> self._execute_streaming_query_cmd(cmd)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 177, in _execute_streaming_query_cmd
> (_, properties) = self._session.client.execute_command(exec_cmd)
>   ^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 982, in execute_command
> data, _, _, _, properties = self._execute_and_fetch(req)
> 
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1283, in _execute_and_fetch
> for response in self._execute_and_fetch_as_iterator(req):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1264, in _execute_and_fetch_as_iterator
> self._handle_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1503, in _handle_error
> self._handle_rpc_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1539, in _handle_rpc_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] 
> Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = 
> 2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 79.0 (TID 116) 
> (fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor 
> driver): org.apache.spark.api.python.PythonException: Traceback (most recent 
> call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 145, in dump_stream
> for obj in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 213, in _batched
> for item in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1734, in mapper
> result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
>  ^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1734, in 
> result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
>^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 112, in 
> return args_kwargs_offsets, lambda *a: func(*a)
>
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 
> 118, in wrapper
> return f(*args, **kwargs)
>

[jira] [Assigned] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48090:


Assignee: Hyukjin Kwon

> Streaming exception catch failure in 3.5 client <> 4.0 server
> -
>
> Key: SPARK-48090
> URL: https://issues.apache.org/jira/browse/SPARK-48090
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> FAIL [1.975s]: test_stream_exception 
> (pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
>  line 287, in test_stream_exception
> sq.processAllAvailable()
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 129, in processAllAvailable
> self._execute_streaming_query_cmd(cmd)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 177, in _execute_streaming_query_cmd
> (_, properties) = self._session.client.execute_command(exec_cmd)
>   ^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 982, in execute_command
> data, _, _, _, properties = self._execute_and_fetch(req)
> 
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1283, in _execute_and_fetch
> for response in self._execute_and_fetch_as_iterator(req):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1264, in _execute_and_fetch_as_iterator
> self._handle_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1503, in _handle_error
> self._handle_rpc_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1539, in _handle_rpc_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] 
> Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = 
> 2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 79.0 (TID 116) 
> (fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor 
> driver): org.apache.spark.api.python.PythonException: Traceback (most recent 
> call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 145, in dump_stream
> for obj in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 213, in _batched
> for item in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1734, in mapper
> result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
>  ^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1734, in 
> result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
>^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 112, in 
> return args_kwargs_offsets, lambda *a: func(*a)
>
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 
> 118, in wrapper
> return f(*args, **kwargs)
>^^
>   File "/home/runner/work/spark/spark-3
> During handling of the above exception, another exception occurred:
> Traceback (most recent

[jira] [Assigned] (SPARK-48084) pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48084:


Assignee: Weichen Xu

> pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server
> -
>
> Key: SPARK-48084
> URL: https://issues.apache.org/jira/browse/SPARK-48084
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Weichen Xu
>Priority: Major
>
> {code}
> ==
> ERROR [3.966s]: test_regressor_evaluator 
> (pyspark.ml.tests.connect.test_connect_evaluation.EvaluationTestsOnConnect.test_regressor_evaluator)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_evaluation.py",
>  line 69, in test_regressor_evaluator
> rmse = rmse_evaluator.evaluate(df1)
>
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
> line 255, in evaluate
> return self._evaluate(dataset)
>^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", 
> line 70, in _evaluate
> return aggregate_dataframe(
>
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", 
> line 93, in aggregate_dataframe
> state = cloudpickle.loads(state)
> 
> AttributeError: Can't get attribute '_class_setstate' on  'pyspark.cloudpickle.cloudpickle' from 
> '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'>
> --
> {code}
> {code}
> ==
> ERROR [4.664s]: test_copy 
> (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_copy)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py",
>  line 115, in test_copy
> cvModel = cv.fit(dataset)
>   ^^^
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
> line 106, in fit
> return self._fit(dataset)
>^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 
> 437, in _fit
> for j, metric in pool.imap_unordered(lambda f: f(), tasks):
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
>  line 873, in next
> raise value
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
>  line 125, in worker
> result = (True, func(*args, **kwds))
> ^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 
> 437, in 
> for j, metric in pool.imap_unordered(lambda f: f(), tasks):
>^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 
> 188, in single_task
> metric = evaluator.evaluate(
>  ^^^
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
> line 255, in evaluate
> return self._evaluate(dataset)
>^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", 
> line 70, in _evaluate
> return aggregate_dataframe(
>
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", 
> line 93, in aggregate_dataframe
> state = cloudpickle.loads(state)
> 
> AttributeError: Can't get attribute '_class_setstate' on  'pyspark.cloudpickle.cloudpickle' from 
> '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'>
> {code}
> {code}
> ==
> ERROR [3.938s]: test_fit_minimize_metric 
> (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_fit_minimize_metric)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py",
>  line 149, in test_fit_minimize_metric
> cvModel = cv.fit(dataset)
>   ^^^
>   File

[jira] [Assigned] (SPARK-48083) session.copyFromLocalToFs failure with 3.5 client <> 4.0 server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48083:


Assignee: Weichen Xu

> session.copyFromLocalToFs failure with 3.5 client <> 4.0 server
> ---
>
> Key: SPARK-48083
> URL: https://issues.apache.org/jira/browse/SPARK-48083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> ERROR [1.120s]: test_save_load 
> (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py",
>  line 144, in test_save_load
> estimator.save(fs_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", 
> line 248, in save
> _copy_dir_from_local_to_fs(tmp_local_dir, path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", 
> line 57, in _copy_dir_from_local_to_fs
> _copy_file_from_local_to_fs(file_path, dest_file_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", 
> line 39, in _copy_file_from_local_to_fs
> session.copyFromLocalToFs(local_path, dest_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/session.py", 
> line 756, in copyFromLocalToFs
> self._client.copy_from_local_to_fs(local_path, dest_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1549, in copy_from_local_to_fs
> self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
>  line 280, in _add_forward_to_fs_artifacts
> self._request_add_artifacts(requests)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
>  line 259, in _request_add_artifacts
> response: proto.AddArtifactsResponse = self._retrieve_responses(requests)
>^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
>  line 256, in _retrieve_responses
> return self._stub.AddArtifacts(requests, metadata=self._metadata)
>^^
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py",
>  line 1536, in __call__
> return _end_unary_response_blocking(state, call, False, None)
>^^
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py",
>  line 1006, in _end_unary_response_blocking
> raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
> ^^
> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated 
> with:
>   status = StatusCode.INTERNAL
>   details = "Uploading artifact file to local file system destination 
> path is not supported."
>   debug_error_string = "UNKNOWN:Error received from peer  
> {grpc_message:"Uploading artifact file to local file system destination path 
> is not supported.", grpc_status:13, 
> created_time:"2024-05-01T03:01:32.[558](https://github.com/HyukjinKwon/spark/actions/runs/8904629949/job/24454181142#step:9:559)489983+00:00"}"
> >
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48083) session.copyFromLocalToFs failure with 3.5 client <> 4.0 server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48083.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46419
[https://github.com/apache/spark/pull/46419]

> session.copyFromLocalToFs failure with 3.5 client <> 4.0 server
> ---
>
> Key: SPARK-48083
> URL: https://issues.apache.org/jira/browse/SPARK-48083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> {code}
> ==
> ERROR [1.120s]: test_save_load 
> (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py",
>  line 144, in test_save_load
> estimator.save(fs_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", 
> line 248, in save
> _copy_dir_from_local_to_fs(tmp_local_dir, path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", 
> line 57, in _copy_dir_from_local_to_fs
> _copy_file_from_local_to_fs(file_path, dest_file_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", 
> line 39, in _copy_file_from_local_to_fs
> session.copyFromLocalToFs(local_path, dest_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/session.py", 
> line 756, in copyFromLocalToFs
> self._client.copy_from_local_to_fs(local_path, dest_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1549, in copy_from_local_to_fs
> self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
>  line 280, in _add_forward_to_fs_artifacts
> self._request_add_artifacts(requests)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
>  line 259, in _request_add_artifacts
> response: proto.AddArtifactsResponse = self._retrieve_responses(requests)
>^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
>  line 256, in _retrieve_responses
> return self._stub.AddArtifacts(requests, metadata=self._metadata)
>^^
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py",
>  line 1536, in __call__
> return _end_unary_response_blocking(state, call, False, None)
>^^
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py",
>  line 1006, in _end_unary_response_blocking
> raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
> ^^
> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated 
> with:
>   status = StatusCode.INTERNAL
>   details = "Uploading artifact file to local file system destination 
> path is not supported."
>   debug_error_string = "UNKNOWN:Error received from peer  
> {grpc_message:"Uploading artifact file to local file system destination path 
> is not supported.", grpc_status:13, 
> created_time:"2024-05-01T03:01:32.[558](https://github.com/HyukjinKwon/spark/actions/runs/8904629949/job/24454181142#step:9:559)489983+00:00"}"
> >
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48084) pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server

2024-05-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48084.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46419
[https://github.com/apache/spark/pull/46419]

> pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server
> -
>
> Key: SPARK-48084
> URL: https://issues.apache.org/jira/browse/SPARK-48084
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.5.2
>
>
> {code}
> ==
> ERROR [3.966s]: test_regressor_evaluator 
> (pyspark.ml.tests.connect.test_connect_evaluation.EvaluationTestsOnConnect.test_regressor_evaluator)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_evaluation.py",
>  line 69, in test_regressor_evaluator
> rmse = rmse_evaluator.evaluate(df1)
>
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
> line 255, in evaluate
> return self._evaluate(dataset)
>^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", 
> line 70, in _evaluate
> return aggregate_dataframe(
>
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", 
> line 93, in aggregate_dataframe
> state = cloudpickle.loads(state)
> 
> AttributeError: Can't get attribute '_class_setstate' on  'pyspark.cloudpickle.cloudpickle' from 
> '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'>
> --
> {code}
> {code}
> ==
> ERROR [4.664s]: test_copy 
> (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_copy)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py",
>  line 115, in test_copy
> cvModel = cv.fit(dataset)
>   ^^^
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
> line 106, in fit
> return self._fit(dataset)
>^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 
> 437, in _fit
> for j, metric in pool.imap_unordered(lambda f: f(), tasks):
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
>  line 873, in next
> raise value
>   File 
> "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
>  line 125, in worker
> result = (True, func(*args, **kwds))
> ^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 
> 437, in 
> for j, metric in pool.imap_unordered(lambda f: f(), tasks):
>^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", line 
> 188, in single_task
> metric = evaluator.evaluate(
>  ^^^
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
> line 255, in evaluate
> return self._evaluate(dataset)
>^^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", 
> line 70, in _evaluate
> return aggregate_dataframe(
>
>   File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", 
> line 93, in aggregate_dataframe
> state = cloudpickle.loads(state)
> 
> AttributeError: Can't get attribute '_class_setstate' on  'pyspark.cloudpickle.cloudpickle' from 
> '/home/runner/work/spark/spark-3.5/python/pyspark/cloudpickle/cloudpickle.py'>
> {code}
> {code}
> ==
> ERROR [3.938s]: test_fit_minimize_metric 
> (pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_fit_minimize_metric)
> --
> Traceback (most recent call last):
>   File 
>

[jira] [Updated] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server

2024-05-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48090:
-
Description: 
{code}
==
FAIL [1.975s]: test_stream_exception 
(pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 287, in test_stream_exception
sq.processAllAvailable()
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
 line 129, in processAllAvailable
self._execute_streaming_query_cmd(cmd)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
 line 177, in _execute_streaming_query_cmd
(_, properties) = self._session.client.execute_command(exec_cmd)
  ^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 982, in execute_command
data, _, _, _, properties = self._execute_and_fetch(req)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] 
Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = 
2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted 
due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 79.0 (TID 116) 
(fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor 
driver): org.apache.spark.api.python.PythonException: Traceback (most recent 
call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
process()
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 224, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 145, in dump_stream
for obj in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 213, in _batched
for item in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1734, in mapper
result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
 ^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1734, in 
result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
   ^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
112, in 
return args_kwargs_offsets, lambda *a: func(*a)
   
  File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", 
line 118, in wrapper
return f(*args, **kwargs)
   ^^
  File "/home/runner/work/spark/spark-3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 291, in test_stream_exception
self._assert_exception_tree_contains_msg(e, "ZeroDivisionError")
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 300, in _assert_exception_tree_contains_msg
self._assert_exception_tree_contains_msg_connect(exception, msg)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 305, in _assert_exception_tree_contains_msg_connect
self.assertTrue(
AssertionError: False is not true : Exception tree doesn't contain the expected 
message: ZeroDivisionError

[jira] [Resolved] (SPARK-48147) Remove all client listeners when local spark session is deleted

2024-05-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48147.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46406
[https://github.com/apache/spark/pull/46406]

> Remove all client listeners when local spark session is deleted
> ---
>
> Key: SPARK-48147
> URL: https://issues.apache.org/jira/browse/SPARK-48147
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48035) try_add/try_multiply should not be semantic equal to add/multiply

2024-05-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48035:


Assignee: Gengliang Wang

> try_add/try_multiply should not be semantic equal to add/multiply
> -
>
> Key: SPARK-48035
> URL: https://issues.apache.org/jira/browse/SPARK-48035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> In the current implementation, the following code will return true
> {code:java}
> val l1 = Literal(1)
> val l2 = Literal(2)
> val l3 = Literal(3)
> val expr1 = Add(Add(l1, l2), l3)
> val expr2 = Add(Add(l2, l1, EvalMode.TRY), l3)
> expr1.semanticEquals(expr2) {code}
> The same applies to Multiply.
> When creating MultiCommutativeOp for Add/Multiply, we should ensure all the 
> evalMode are consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48035) try_add/try_multiply should not be semantic equal to add/multiply

2024-05-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48035.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46307
[https://github.com/apache/spark/pull/46307]

> try_add/try_multiply should not be semantic equal to add/multiply
> -
>
> Key: SPARK-48035
> URL: https://issues.apache.org/jira/browse/SPARK-48035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In the current implementation, the following code will return true
> {code:java}
> val l1 = Literal(1)
> val l2 = Literal(2)
> val l3 = Literal(3)
> val expr1 = Add(Add(l1, l2), l3)
> val expr2 = Add(Add(l2, l1, EvalMode.TRY), l3)
> expr1.semanticEquals(expr2) {code}
> The same applies to Multiply.
> When creating MultiCommutativeOp for Add/Multiply, we should ensure all the 
> evalMode are consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48112) Expose session in SparkConnectPlanner to plugin developers

2024-05-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48112.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46363
[https://github.com/apache/spark/pull/46363]

> Expose session in SparkConnectPlanner to plugin developers
> --
>
> Key: SPARK-48112
> URL: https://issues.apache.org/jira/browse/SPARK-48112
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47777) Add spark connect test for python streaming data source

2024-05-05 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-4:


Assignee: (was: Chaoqin Li)

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-47777) Add spark connect test for python streaming data source

2024-05-05 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-4:
--

Reverted at 
https://github.com/apache/spark/commit/4e69857195a6f95c22f962e3eed950876036c04f

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source

2024-05-05 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-4:
-
Fix Version/s: (was: 4.0.0)

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48111) Move tpcds-1g and docker-integration-tests to daily scheduled jobs

2024-05-03 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48111:


 Summary: Move tpcds-1g and docker-integration-tests to daily 
scheduled jobs
 Key: SPARK-48111
 URL: https://issues.apache.org/jira/browse/SPARK-48111
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0

2024-05-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48088:


Assignee: Hyukjin Kwon

> Skip tests being failed in client 3.5 <> server 4.0
> ---
>
> Key: SPARK-48088
> URL: https://issues.apache.org/jira/browse/SPARK-48088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should skip, and set the CI first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0

2024-05-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48088.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46334
[https://github.com/apache/spark/pull/46334]

> Skip tests being failed in client 3.5 <> server 4.0
> ---
>
> Key: SPARK-48088
> URL: https://issues.apache.org/jira/browse/SPARK-48088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> We should skip, and set the CI first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48054) Backward compatibility test for Spark Connect

2024-05-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48054:


Assignee: Hyukjin Kwon

> Backward compatibility test for Spark Connect
> -
>
> Key: SPARK-48054
> URL: https://issues.apache.org/jira/browse/SPARK-48054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Now that we can run the Spark Connect server separately in CI, we can run the 
> Spark Connect server of lower version, and higher version of client, and the 
> opposite as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48054) Backward compatibility test for Spark Connect

2024-05-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48054.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46298
[https://github.com/apache/spark/pull/46298]

> Backward compatibility test for Spark Connect
> -
>
> Key: SPARK-48054
> URL: https://issues.apache.org/jira/browse/SPARK-48054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Now that we can run the Spark Connect server separately in CI, we can run the 
> Spark Connect server of lower version, and higher version of client, and the 
> opposite as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48056) [CONNECT][PYTHON] Session not found error should automatically retry during reattach

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48056.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46297
[https://github.com/apache/spark/pull/46297]

> [CONNECT][PYTHON] Session not found error should automatically retry during 
> reattach
> 
>
> Key: SPARK-48056
> URL: https://issues.apache.org/jira/browse/SPARK-48056
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.3
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When an OPERATION_NOT_FOUND error is raised and no prior responses were 
> received, the client retries the ExecutePlan RPC: 
> [https://github.com/apache/spark/blob/e6217c111fbdd73f202400494c42091e93d3041f/python/pyspark/sql/connect/client/reattach.py#L257]
>  
> Another error SESSION_NOT_FOUND should follow the same logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48090) Streaming exception catch failure in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48090:


 Summary: Streaming exception catch failure in 3.5 client <> 4.0 
server
 Key: SPARK-48090
 URL: https://issues.apache.org/jira/browse/SPARK-48090
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
FAIL [1.975s]: test_stream_exception 
(pyspark.sql.tests.connect.streaming.test_parity_streaming.StreamingParityTests.test_stream_exception)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 287, in test_stream_exception
sq.processAllAvailable()
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
 line 129, in processAllAvailable
self._execute_streaming_query_cmd(cmd)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
 line 177, in _execute_streaming_query_cmd
(_, properties) = self._session.client.execute_command(exec_cmd)
  ^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 982, in execute_command
data, _, _, _, properties = self._execute_and_fetch(req)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 
[150](https://github.com/HyukjinKwon/spark/actions/runs/8907172876/job/24460568471#step:9:151)3,
 in _handle_error
self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.StreamingQueryException: [STREAM_FAILED] 
Query [id = 38d0d145-1f57-4b92-b317-d9de727d9468, runId = 
2b963119-d391-4c62-abea-970274859b80] terminated with exception: Job aborted 
due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 79.0 (TID 116) 
(fv-az1144-341.tm43j05r3bqe3lauap1nzddazg.ex.internal.cloudapp.net executor 
driver): org.apache.spark.api.python.PythonException: Traceback (most recent 
call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
process()
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 224, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 145, in dump_stream
for obj in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 213, in _batched
for item in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1734, in mapper
result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
 ^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1734, in 
result = tuple(f(*[a[o] for o in arg_offsets]) for arg_offsets, f in udfs)
   ^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
112, in 
return args_kwargs_offsets, lambda *a: func(*a)
   
  File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/util.py", 
line 118, in wrapper
return f(*args, **kwargs)
   ^^
  File "/home/runner/work/spark/spark-3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 291, in test_stream_exception
self._assert_exception_tree_contains_msg(e, "ZeroDivisionError")
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/streaming/test_streaming.py",
 line 300, in _assert_exception_tree_contains_msg

[jira] [Created] (SPARK-48089) Streaming query listener not working in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48089:


 Summary: Streaming query listener not working in 3.5 client <> 4.0 
server
 Key: SPARK-48089
 URL: https://issues.apache.org/jira/browse/SPARK-48089
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
ERROR [1.488s]: test_listener_events 
(pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py",
 line 53, in test_listener_events
self.spark.streams.addListener(test_listener)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
 line 244, in addListener
self._execute_streaming_query_manager_cmd(cmd)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
 line 260, in _execute_streaming_query_manager_cmd
(_, properties) = self._session.client.execute_command(exec_cmd)
  ^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 982, in execute_command
data, _, _, _, properties = self._execute_and_fetch(req)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.io.EOFException) 
--
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48085.
--
Resolution: Invalid

We actually made this all compatible in master branch. I will avoid backporting 
them because they are just tests.

> ANSI enabled by default brings different results in the tests in 3.5 client 
> <> 4.0 server
> -
>
> Key: SPARK-48085
> URL: https://issues.apache.org/jira/browse/SPARK-48085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> ==
> FAIL [0.169s]: test_checking_csv_header 
> (pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header)
> --
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered 
> error while reading file 
> file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
>   SQLSTATE: KD001
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py",
>  line 
> [167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168),
>  in test_checking_csv_header
> self.assertRaisesRegex(
> AssertionError: "CSV header does not conform to the schema" does not match 
> "(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered 
> error while reading file 
> file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
>   SQLSTATE: KD001"
> {code}
> {code}
> ==
> ERROR [0.059s]: test_large_variable_types 
> (pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py",
>  line 115, in test_large_variable_types
> actual = df.mapInPandas(func, "str string, bin binary").collect()
>  
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
> line 1645, in collect
> table, schema = self._session.client.to_table(query)
> 
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table
> table, schema, _, _, _ = self._execute_and_fetch(req)
>  
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1283, in _execute_and_fetch
> for response in self._execute_and_fetch_as_iterator(req):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1264, in _execute_and_fetch_as_iterator
> self._handle_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1503, in _handle_error
> self._handle_rpc_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1539, in _handle_rpc_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.IllegalArgumentException: 
> [INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in 
> `encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 
> 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 
> 2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3
> {code}
> {code}
> ==
> ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass 
> (pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", 
> line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass
>

[jira] [Updated] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48085:
-
Description: 
{code}
==
FAIL [0.169s]: test_checking_csv_header 
(pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header)
--
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error 
while reading file 
file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
  SQLSTATE: KD001
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py",
 line 
[167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168),
 in test_checking_csv_header
self.assertRaisesRegex(
AssertionError: "CSV header does not conform to the schema" does not match 
"(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error 
while reading file 
file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
  SQLSTATE: KD001"
{code}
{code}
==
ERROR [0.059s]: test_large_variable_types 
(pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py",
 line 115, in test_large_variable_types
actual = df.mapInPandas(func, "str string, bin binary").collect()
 
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1645, in collect
table, schema = self._session.client.to_table(query)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 858, in to_table
table, schema, _, _, _ = self._execute_and_fetch(req)
 
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.IllegalArgumentException: 
[INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in 
`encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 
'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 
2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3
{code}
{code}
==
ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass 
(pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", 
line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass
assertDataFrameEqual(df1, df2, rtol=1e-1)
  File "/home/runner/work/spark/spark-3.5/python/pyspark/testing/utils.py", 
line 595, in assertDataFrameEqual
actual_list = actual.collect()
  
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1645, in collect
table, schema = self._session.client.to_table(query)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 858, in to_table
table, schema, _, _, _ = self._execute_and_fetch(req)
 
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
for

[jira] [Updated] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48088:
-
Affects Version/s: 3.5.2
   (was: 4.0.0)

> Skip tests being failed in client 3.5 <> server 4.0
> ---
>
> Key: SPARK-48088
> URL: https://issues.apache.org/jira/browse/SPARK-48088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should skip, and set the CI first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48088:


 Summary: Skip tests being failed in client 3.5 <> server 4.0
 Key: SPARK-48088
 URL: https://issues.apache.org/jira/browse/SPARK-48088
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We should skip, and set the CI first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48087:


 Summary: Python UDTF incompatibility in 3.5 client <> 4.0 server
 Key: SPARK-48087
 URL: https://issues.apache.org/jira/browse/SPARK-48087
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
FAIL [0.103s]: test_udtf_init_with_additional_args 
(pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
--
pyspark.errors.exceptions.connect.PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
self._check_result_or_exception(TestUDTF, ret_type, expected)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", line 
598, in _check_result_or_exception
with self.assertRaisesRegex(err_type, expected):
AssertionError: "AttributeError" does not match "
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
process()
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 224, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 145, in dump_stream
for obj in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 213, in _batched
for item in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1391, in mapper
yield eval(*[a[o] for o in args_kwargs_offsets])
  ^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1371, in evaluate
return tuple(map(verify_and_convert_result, res))
   ^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1340, in verify_and_convert_result
return toInternal(result)
   ^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 1291, in toInternal
return tuple(
   ^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 1292, in 
f.toInternal(v) if c else v
^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 907, in toInternal
return self.dataType.toInternal(obj)
   ^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 372, in toInternal
calendar.timegm(dt.utctimetuple()) if dt.tzinfo else 
time.mktime(dt.timetuple())
..."
{code}

{code}

==
FAIL [0.096s]: test_udtf_init_with_additional_args 
(pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
--
pyspark.errors.exceptions.connect.PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
   
^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
946, in read_udtf
raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: 
[UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD] Failed to evaluate the 
user-defined table function 'TestUDTF' because its constructor is invalid: the 
function does not implement the 'analyze' method, and its constructor has more 
than one argument (including the 'self' reference). Please update the table 
function so that its constructor accepts exactly one 'self' argument, and try 
the query again.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File

[jira] [Created] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48085:


 Summary: ANSI enabled by default brings different results in the 
tests in 3.5 client <> 4.0 server
 Key: SPARK-48085
 URL: https://issues.apache.org/jira/browse/SPARK-48085
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
FAIL [0.169s]: test_checking_csv_header 
(pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header)
--
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error 
while reading file 
file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
  SQLSTATE: KD001
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py",
 line 
[167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168),
 in test_checking_csv_header
self.assertRaisesRegex(
AssertionError: "CSV header does not conform to the schema" does not match 
"(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error 
while reading file 
file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-0-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
  SQLSTATE: KD001"
{code}
{code}
==
ERROR [0.059s]: test_large_variable_types 
(pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py",
 line 115, in test_large_variable_types
actual = df.mapInPandas(func, "str string, bin binary").collect()
 
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1645, in collect
table, schema = self._session.client.to_table(query)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 858, in to_table
table, schema, _, _, _ = self._execute_and_fetch(req)
 
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.IllegalArgumentException: 
[INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in 
`encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 
'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 
2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3
{code}
{code}
==
ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass 
(pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", 
line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass
assertDataFrameEqual(df1, df2, rtol=1e-1)
  File "/home/runner/work/spark/spark-3.5/python/pyspark/testing/utils.py", 
line 595, in assertDataFrameEqual
actual_list = actual.collect()
  
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1645, in collect
table, schema = self._session.client.to_table(query)

  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 858, in to_table

[jira] [Created] (SPARK-48084) pyspark.ml.connect.evaluation not working in 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48084:


 Summary: pyspark.ml.connect.evaluation not working in 3.5 client 
<> 4.0 server
 Key: SPARK-48084
 URL: https://issues.apache.org/jira/browse/SPARK-48084
 Project: Spark
  Issue Type: Sub-task
  Components: ML, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
ERROR [3.966s]: test_regressor_evaluator 
(pyspark.ml.tests.connect.test_connect_evaluation.EvaluationTestsOnConnect.test_regressor_evaluator)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_evaluation.py",
 line 69, in test_regressor_evaluator
rmse = rmse_evaluator.evaluate(df1)
   
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
line 255, in evaluate
return self._evaluate(dataset)
   ^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", 
line 70, in _evaluate
return aggregate_dataframe(
   
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", 
line 93, in aggregate_dataframe
state = cloudpickle.loads(state)

AttributeError: Can't get attribute '_class_setstate' on 
--

{code}

{code}
==
ERROR [4.664s]: test_copy 
(pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_copy)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py",
 line 115, in test_copy
cvModel = cv.fit(dataset)
  ^^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
line 106, in fit
return self._fit(dataset)
   ^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", 
line 437, in _fit
for j, metric in pool.imap_unordered(lambda f: f(), tasks):
  File 
"/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
 line 873, in next
raise value
  File 
"/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
 line 125, in worker
result = (True, func(*args, **kwds))
^^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", 
line 437, in 
for j, metric in pool.imap_unordered(lambda f: f(), tasks):
   ^^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", 
line 188, in single_task
metric = evaluator.evaluate(
 ^^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
line 255, in evaluate
return self._evaluate(dataset)
   ^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/evaluation.py", 
line 70, in _evaluate
return aggregate_dataframe(
   
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/util.py", 
line 93, in aggregate_dataframe
state = cloudpickle.loads(state)

AttributeError: Can't get attribute '_class_setstate' on 
{code}

{code}
==
ERROR [3.938s]: test_fit_minimize_metric 
(pyspark.ml.tests.connect.test_connect_tuning.CrossValidatorTestsOnConnect.test_fit_minimize_metric)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py",
 line 149, in test_fit_minimize_metric
cvModel = cv.fit(dataset)
  ^^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/base.py", 
line 106, in fit
return self._fit(dataset)
   ^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", 
line 437, in _fit
for j, metric in pool.imap_unordered(lambda f: f(), tasks):
  File 
"/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
 line 873, in next
raise value
  File 
"/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/multiprocessing/pool.py",
 line 125, in worker
result = (True, func(*args, **kwds))
^^^
  File "/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/tuning.py", 
line

[jira] [Created] (SPARK-48083) session.copyFromLocalToFs failure with 3.5 client <> 4.0 server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48083:


 Summary: session.copyFromLocalToFs failure with 3.5 client <> 4.0 
server
 Key: SPARK-48083
 URL: https://issues.apache.org/jira/browse/SPARK-48083
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, ML, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
ERROR [1.120s]: test_save_load 
(pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load)
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py",
 line 144, in test_save_load
estimator.save(fs_path)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", line 
248, in save
_copy_dir_from_local_to_fs(tmp_local_dir, path)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", line 
57, in _copy_dir_from_local_to_fs
_copy_file_from_local_to_fs(file_path, dest_file_path)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/ml/connect/io_utils.py", line 
39, in _copy_file_from_local_to_fs
session.copyFromLocalToFs(local_path, dest_path)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/session.py", line 
756, in copyFromLocalToFs
self._client.copy_from_local_to_fs(local_path, dest_path)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1549, in copy_from_local_to_fs
self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
 line 280, in _add_forward_to_fs_artifacts
self._request_add_artifacts(requests)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
 line 259, in _request_add_artifacts
response: proto.AddArtifactsResponse = self._retrieve_responses(requests)
   ^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/artifact.py",
 line 256, in _retrieve_responses
return self._stub.AddArtifacts(requests, metadata=self._metadata)
   ^^
  File 
"/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py",
 line 1536, in __call__
return _end_unary_response_blocking(state, call, False, None)
   ^^
  File 
"/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/grpc/_channel.py",
 line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Uploading artifact file to local file system destination 
path is not supported."
debug_error_string = "UNKNOWN:Error received from peer  
{grpc_message:"Uploading artifact file to local file system destination path is 
not supported.", grpc_status:13, 
created_time:"2024-05-01T03:01:32.[558](https://github.com/HyukjinKwon/spark/actions/runs/8904629949/job/24454181142#step:9:559)489983+00:00"}"
>
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48086) Different Arrow versions in client and server

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48086:


 Summary: Different Arrow versions in client and server 
 Key: SPARK-48086
 URL: https://issues.apache.org/jira/browse/SPARK-48086
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
FAIL [1.071s]: test_pandas_udf_arrow_overflow 
(pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow)
--
pyspark.errors.exceptions.connect.PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 302, in _create_array
return pa.Array.from_pandas(
   ^
  File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
process()
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 531, in dump_stream
return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), 
stream)
   

  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 104, in dump_stream
for batch in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 525, in init_stream_yield_batches
batch = self._create_batch(series)
^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 511, in _create_batch
arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast))
^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 330, in _create_array
raise PySparkValueError(error_msg % (series.dtype, series.na...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py",
 line 299, in test_pandas_udf_arrow_overflow
with self.assertRaisesRegex(
AssertionError: "Exception thrown when converting pandas.Series" does not match 
"
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 302, in _create_array
return pa.Array.from_pandas(
   ^
  File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
process()
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 531, in dump_stream

Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py",
 line 279, in test_pandas_udf_detect_unsafe_type_conversion
with self.assertRaisesRegex(
AssertionError: "Exception thrown when converting pandas.Series" does not match 
"
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line

[jira] [Updated] (SPARK-48054) Backward compatibility test for Spark Connect

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48054:
-
Parent: SPARK-48082
Issue Type: Sub-task  (was: Improvement)

> Backward compatibility test for Spark Connect
> -
>
> Key: SPARK-48054
> URL: https://issues.apache.org/jira/browse/SPARK-48054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Now that we can run the Spark Connect server separately in CI, we can run the 
> Spark Connect server of lower version, and higher version of client, and the 
> opposite as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48082) Recover compatibility with Spark Connect client 3.5 <> Spark Connect server 4.0

2024-05-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48082:


 Summary:  Recover compatibility with Spark Connect client 3.5 <> 
Spark Connect server 4.0
 Key: SPARK-48082
 URL: https://issues.apache.org/jira/browse/SPARK-48082
 Project: Spark
  Issue Type: Umbrella
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/pull/46298#issuecomment-2087905857

There are test failures identified when you run Spark 3.5 Spark Connect client 
<> Spark Connect server 4.0.

They should ideally be compatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45988) Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45988:
-
Fix Version/s: 3.5.2

> Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11
> -
>
> Key: SPARK-45988
> URL: https://issues.apache.org/jira/browse/SPARK-45988
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> https://github.com/apache/spark/actions/runs/6914662405/job/18812759697
> {code}
> ==
> ERROR [0.686s]: test_apply_batch_with_type 
> (pyspark.pandas.tests.computation.test_apply_func.FrameApplyFunctionTests.test_apply_batch_with_type)
> --
> Traceback (most recent call last):
>   File 
> "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py",
>  line 248, in test_apply_batch_with_type
> def identify3(x) -> ps.DataFrame[float, [int, List[int]]]:
> ^
>   File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in 
> __class_getitem__
> return create_tuple_for_frame_type(params)
>^^^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 721, in create_tuple_for_frame_type
> return Tuple[_to_type_holders(params)]
>  
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 766, in _to_type_holders
> data_types = _new_type_holders(data_types, NameTypeHolder)
>  ^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 832, in _new_type_holders
> raise TypeError(
> TypeError: Type hints should be specified as one of:
>   - DataFrame[type, type, ...]
>   - DataFrame[name: type, name: type, ...]
>   - DataFrame[dtypes instance]
>   - DataFrame[zip(names, types)]
>   - DataFrame[index_type, [type, ...]]
>   - DataFrame[(index_name, index_type), [(name, type), ...]]
>   - DataFrame[dtype instance, dtypes instance]
>   - DataFrame[(index_name, index_type), zip(names, types)]
>   - DataFrame[[index_type, ...], [type, ...]]
>   - DataFrame[[(index_name, index_type), ...], [(name, type), ...]]
>   - DataFrame[dtypes instance, dtypes instance]
>   - DataFrame[zip(index_names, index_types), zip(names, types)]
> However, got (, typing.List[int]).
> --
> Ran 10 tests in 34.327s
> FAILED (errors=1)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45989) Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in Python 3.11

2024-05-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45989:
-
Fix Version/s: 3.5.2

> Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in 
> Python 3.11
> 
>
> Key: SPARK-45989
> URL: https://issues.apache.org/jira/browse/SPARK-45989
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0, 3.5.2
>
>
> https://github.com/apache/spark/actions/runs/6914662405/job/18816505612
> {code}
> ==
> ERROR [1.237s]: test_apply_batch_with_type 
> (pyspark.pandas.tests.connect.computation.test_parity_apply_func.FrameParityApplyFunctionTests.test_apply_batch_with_type)
> --
> Traceback (most recent call last):
>   File 
> "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py",
>  line 248, in test_apply_batch_with_type
> def identify3(x) -> ps.DataFrame[float, [int, List[int]]]:
> ^
>   File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in 
> __class_getitem__
> return create_tuple_for_frame_type(params)
>^^^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 721, in create_tuple_for_frame_type
> return Tuple[_to_type_holders(params)]
>  
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 766, in _to_type_holders
> data_types = _new_type_holders(data_types, NameTypeHolder)
>  ^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 832, in _new_type_holders
> raise TypeError(
> TypeError: Type hints should be specified as one of:
>   - DataFrame[type, type, ...]
>   - DataFrame[name: type, name: type, ...]
>   - DataFrame[dtypes instance]
>   - DataFrame[zip(names, types)]
>   - DataFrame[index_type, [type, ...]]
>   - DataFrame[(index_name, index_type), [(name, type), ...]]
>   - DataFrame[dtype instance, dtypes instance]
>   - DataFrame[(index_name, index_type), zip(names, types)]
>   - DataFrame[[index_type, ...], [type, ...]]
>   - DataFrame[[(index_name, index_type), ...], [(name, type), ...]]
>   - DataFrame[dtypes instance, dtypes instance]
>   - DataFrame[zip(index_names, index_types), zip(names, types)]
> However, got (, typing.List[int]).
> --
> Ran 10 tests in 78.247s
> FAILED (errors=1)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48075) Enforce type checking in from_avro and to_avro functions

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48075.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/46324

> Enforce type checking in from_avro and to_avro functions
> 
>
> Key: SPARK-48075
> URL: https://issues.apache.org/jira/browse/SPARK-48075
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Fanyue Xia
>Assignee: Fanyue Xia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Because types are not enforced in Python, users can pass in all sort of 
> arguments for functions. This could lead to invoking wrong functions in spark 
> connect.
> If we perform type checking for arguments and output sensible errors when the 
> type of arguments passed into the functions don’t match, we can give the user 
> a better user experience



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48064) Improve error messages for routine related errors

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48064:


Assignee: Allison Wang

> Improve error messages for routine related errors
> -
>
> Key: SPARK-48064
> URL: https://issues.apache.org/jira/browse/SPARK-48064
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48064) Improve error messages for routine related errors

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48064.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46310
[https://github.com/apache/spark/pull/46310]

> Improve error messages for routine related errors
> -
>
> Key: SPARK-48064
> URL: https://issues.apache.org/jira/browse/SPARK-48064
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43727) Parity returnType check in Spark Connect

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43727.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46300
[https://github.com/apache/spark/pull/46300]

> Parity returnType check in Spark Connect
> 
>
> Key: SPARK-43727
> URL: https://issues.apache.org/jira/browse/SPARK-43727
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48058) `UserDefinedFunction.returnType` parse the DDL string

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48058:


Assignee: Ruifeng Zheng

> `UserDefinedFunction.returnType` parse the DDL string
> -
>
> Key: SPARK-48058
> URL: https://issues.apache.org/jira/browse/SPARK-48058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48058) `UserDefinedFunction.returnType` parse the DDL string

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48058.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46300
[https://github.com/apache/spark/pull/46300]

> `UserDefinedFunction.returnType` parse the DDL string
> -
>
> Key: SPARK-48058
> URL: https://issues.apache.org/jira/browse/SPARK-48058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48062) Add pyspark test for SimpleDataSourceStreamingReader

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48062.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46306
[https://github.com/apache/spark/pull/46306]

> Add pyspark test for SimpleDataSourceStreamingReader
> 
>
> Key: SPARK-48062
> URL: https://issues.apache.org/jira/browse/SPARK-48062
> Project: Spark
>  Issue Type: Test
>  Components: SS
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add end to end pyspark test for SimpleDataSourceStreamingReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48062) Add pyspark test for SimpleDataSourceStreamingReader

2024-05-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48062:


Assignee: Chaoqin Li

> Add pyspark test for SimpleDataSourceStreamingReader
> 
>
> Key: SPARK-48062
> URL: https://issues.apache.org/jira/browse/SPARK-48062
> Project: Spark
>  Issue Type: Test
>  Components: SS
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Add end to end pyspark test for SimpleDataSourceStreamingReader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46894) Move PySpark error conditions into standalone JSON file

2024-04-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46894:


Assignee: Nicholas Chammas

> Move PySpark error conditions into standalone JSON file
> ---
>
> Key: SPARK-46894
> URL: https://issues.apache.org/jira/browse/SPARK-46894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46894) Move PySpark error conditions into standalone JSON file

2024-04-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46894.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44920
[https://github.com/apache/spark/pull/44920]

> Move PySpark error conditions into standalone JSON file
> ---
>
> Key: SPARK-46894
> URL: https://issues.apache.org/jira/browse/SPARK-46894
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48052) Recover pyspark-connect CI by parent classes

2024-04-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48052.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46294
[https://github.com/apache/spark/pull/46294]

> Recover pyspark-connect CI by parent classes
> 
>
> Key: SPARK-48052
> URL: https://issues.apache.org/jira/browse/SPARK-48052
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options

2024-04-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48053:


Assignee: Hyukjin Kwon

> SparkSession.createDataFrame should warn for unsupported options
> 
>
> Key: SPARK-48053
> URL: https://issues.apache.org/jira/browse/SPARK-48053
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> spark.createDataFrame([1,2,3], verifySchema=True)
> {code}
> and
> {code}
> spark.createDataFrame([1,2,3], samplingRatio=0.5)
> {code}
> Do not work with Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options

2024-04-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48053.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46295
[https://github.com/apache/spark/pull/46295]

> SparkSession.createDataFrame should warn for unsupported options
> 
>
> Key: SPARK-48053
> URL: https://issues.apache.org/jira/browse/SPARK-48053
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> spark.createDataFrame([1,2,3], verifySchema=True)
> {code}
> and
> {code}
> spark.createDataFrame([1,2,3], samplingRatio=0.5)
> {code}
> Do not work with Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48054) Backward compatibility test for Spark Connect

2024-04-29 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48054:


 Summary: Backward compatibility test for Spark Connect
 Key: SPARK-48054
 URL: https://issues.apache.org/jira/browse/SPARK-48054
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Now that we can run the Spark Connect server separately in CI, we can run the 
Spark Connect server of lower version, and higher version of client, and the 
opposite as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options

2024-04-29 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48053:


 Summary: SparkSession.createDataFrame should warn for unsupported 
options
 Key: SPARK-48053
 URL: https://issues.apache.org/jira/browse/SPARK-48053
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
spark.createDataFrame([1,2,3], verifySchema=True)
{code}

and

{code}
spark.createDataFrame([1,2,3], samplingRatio=0.5)
{code}

Do not work with Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48052) Recover pyspark-connect CI by parent classes

2024-04-29 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48052:


 Summary: Recover pyspark-connect CI by parent classes
 Key: SPARK-48052
 URL: https://issues.apache.org/jira/browse/SPARK-48052
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48039) Update the error class for `group.apply`

2024-04-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48039.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46277
[https://github.com/apache/spark/pull/46277]

> Update the error class for `group.apply`
> 
>
> Key: SPARK-48039
> URL: https://issues.apache.org/jira/browse/SPARK-48039
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47292) safeMapToJValue should consider when map is null

2024-04-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47292.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46260
[https://github.com/apache/spark/pull/46260]

> safeMapToJValue should consider when map is null
> 
>
> Key: SPARK-47292
> URL: https://issues.apache.org/jira/browse/SPARK-47292
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error

2024-04-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48014:


Assignee: Allison Wang

> Change the makeFromJava error in EvaluatePython to a user-facing error
> --
>
> Key: SPARK-48014
> URL: https://issues.apache.org/jira/browse/SPARK-48014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error

2024-04-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48014.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46250
[https://github.com/apache/spark/pull/46250]

> Change the makeFromJava error in EvaluatePython to a user-facing error
> --
>
> Key: SPARK-48014
> URL: https://issues.apache.org/jira/browse/SPARK-48014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48024) Enable `UDFParityTests.test_udf_timestamp_ntz`

2024-04-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48024.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46257
[https://github.com/apache/spark/pull/46257]

> Enable `UDFParityTests.test_udf_timestamp_ntz`
> --
>
> Key: SPARK-48024
> URL: https://issues.apache.org/jira/browse/SPARK-48024
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners

2024-04-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48002.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46237
[https://github.com/apache/spark/pull/46237]

> Add Observed metrics test in PySpark StreamingQueryListeners
> 
>
> Key: SPARK-48002
> URL: https://issues.apache.org/jira/browse/SPARK-48002
> Project: Spark
>  Issue Type: New Feature
>  Components: SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners

2024-04-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48002:


Assignee: Wei Liu

> Add Observed metrics test in PySpark StreamingQueryListeners
> 
>
> Key: SPARK-48002
> URL: https://issues.apache.org/jira/browse/SPARK-48002
> Project: Spark
>  Issue Type: New Feature
>  Components: SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47993) Drop Python 3.8 support

2024-04-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47993.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46228
[https://github.com/apache/spark/pull/46228]

> Drop Python 3.8 support
> ---
>
> Key: SPARK-47993
> URL: https://issues.apache.org/jira/browse/SPARK-47993
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available, release-notes
> Fix For: 4.0.0
>
>
> Python 3.8 is EOL in this October. Considering the release schedule, we 
> should better drop it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47993) Drop Python 3.8 support

2024-04-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47993:
-
Labels: release-notes  (was: release-note)

> Drop Python 3.8 support
> ---
>
> Key: SPARK-47993
> URL: https://issues.apache.org/jira/browse/SPARK-47993
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: release-notes
>
> Python 3.8 is EOL in this October. Considering the release schedule, we 
> should better drop it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47962) Improve doc test in pyspark dataframe

2024-04-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47962.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46189
[https://github.com/apache/spark/pull/46189]

> Improve doc test in pyspark dataframe
> -
>
> Key: SPARK-47962
> URL: https://issues.apache.org/jira/browse/SPARK-47962
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The doc test for dataframe's observe API doesn't use a streaming DF which is 
> wrong. We should start a streaming df to make sure it runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46197
[https://github.com/apache/spark/pull/46197]

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47965:


Assignee: Hyukjin Kwon

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47964) Hide SQLContext and HiveContext

2024-04-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47964.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46194
[https://github.com/apache/spark/pull/46194]

> Hide SQLContext and HiveContext
> ---
>
> Key: SPARK-47964
> URL: https://issues.apache.org/jira/browse/SPARK-47964
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder

2024-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47965:
-
Issue Type: Improvement  (was: Bug)

> Avoid orNull in TypedConfigBuilder
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47965:
-
Summary: Avoid orNull in TypedConfigBuilder and OptionalConfigEntry  (was: 
Avoid orNull in TypedConfigBuilder)

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder

2024-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47965:
-
Priority: Minor  (was: Major)

> Avoid orNull in TypedConfigBuilder
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47965) Avoid orNull in TypedConfigBuilder

2024-04-23 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-47965:


 Summary: Avoid orNull in TypedConfigBuilder
 Key: SPARK-47965
 URL: https://issues.apache.org/jira/browse/SPARK-47965
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Configuration values/keys cannot be nulls. We should fix:

{code}
diff --git 
a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
index 1f19e9444d38..d06535722625 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
@@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
   import ConfigHelpers._

   def this(parent: ConfigBuilder, converter: String => T) = {
-this(parent, converter, Option(_).map(_.toString).orNull)
+this(parent, converter, { v: T => v.toString })
   }

   /** Apply a transformation to the user-provided values of the config entry. 
*/
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47933:


Assignee: Hyukjin Kwon

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47933.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46155
[https://github.com/apache/spark/pull/46155]

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47903.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46122
[https://github.com/apache/spark/pull/46122]

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47903:


Assignee: Harsh Motwani

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47890) Add python and scala dataframe variant expression aliases.

2024-04-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47890.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46123
[https://github.com/apache/spark/pull/46123]

> Add python and scala dataframe variant expression aliases.
> --
>
> Key: SPARK-47890
> URL: https://issues.apache.org/jira/browse/SPARK-47890
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-47933:


 Summary: Parent Column class for Spark Connect and Spark Classic
 Key: SPARK-47933
 URL: https://issues.apache.org/jira/browse/SPARK-47933
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 22035 matches

Mail list logo