Hyukjin Kwon created SPARK-48085: ------------------------------------ Summary: ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server Key: SPARK-48085 URL: https://issues.apache.org/jira/browse/SPARK-48085 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon
{code} ====================================================================== FAIL [0.169s]: test_checking_csv_header (pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header) ---------------------------------------------------------------------- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-00000-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. SQLSTATE: KD001 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py", line [167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168), in test_checking_csv_header self.assertRaisesRegex( AssertionError: "CSV header does not conform to the schema" does not match "(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-00000-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv. SQLSTATE: KD001" {code} {code} ====================================================================== ERROR [0.059s]: test_large_variable_types (pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py", line 115, in test_large_variable_types actual = df.mapInPandas(func, "str string, bin binary").collect() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 858, in to_table table, schema, _, _, _ = self._execute_and_fetch(req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.IllegalArgumentException: [INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in `encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3 {code} {code} ====================================================================== ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass (pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass assertDataFrameEqual(df1, df2, rtol=1e-1) File "/home/runner/work/spark/spark-3.5/python/pyspark/testing/utils.py", line 595, in assertDataFrameEqual actual_list = actual.collect() ^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 858, in to_table table, schema, _, _, _ = self._execute_and_fetch(req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1539, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.ArithmeticException: [NUMERIC_VALUE_OUT_OF_RANGE.WITH_SUGGESTION] 83.14 cannot be represented as Decimal(4, 3). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead. SQLSTATE: 22003 ---------------------------------------------------------------------- {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org