[jira] [Created] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server

Hyukjin Kwon (Jira) Thu, 02 May 2024 02:41:14 -0700

Hyukjin Kwon created SPARK-48085:
------------------------------------

             Summary: ANSI enabled by default brings different results in the 
tests in 3.5 client <> 4.0 server
                 Key: SPARK-48085
                 URL: https://issues.apache.org/jira/browse/SPARK-48085
             Project: Spark
          Issue Type: Sub-task
          Components: Connect, PySpark, SQL
    Affects Versions: 4.0.0
            Reporter: Hyukjin Kwon



{code}
======================================================================
FAIL [0.169s]: test_checking_csv_header 
(pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests.test_checking_csv_header)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error 
while reading file 
file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-00000-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
  SQLSTATE: KD001
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_datasources.py",
 line 
[167](https://github.com/HyukjinKwon/spark/actions/runs/8908464265/job/24464135564#step:9:168),
 in test_checking_csv_header
    self.assertRaisesRegex(
AssertionError: "CSV header does not conform to the schema" does not match 
"(org.apache.spark.SparkException) [FAILED_READ_FILE.NO_HINT] Encountered error 
while reading file 
file:///home/runner/work/spark/spark-3.5/python/target/38acabf5-710b-4c21-b359-f61619e2adc7/tmpm7qyq23g/part-00000-d6c8793b-772d-44e7-bcca-6eeae9cc0ec7-c000.csv.
  SQLSTATE: KD001"
{code}
{code}
======================================================================
ERROR [0.059s]: test_large_variable_types 
(pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests.test_large_variable_types)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_map.py",
 line 115, in test_large_variable_types
    actual = df.mapInPandas(func, "str string, bin binary").collect()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1645, in collect
    table, schema = self._session.client.to_table(query)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 858, in to_table
    table, schema, _, _, _ = self._execute_and_fetch(req)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
    self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
    raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.IllegalArgumentException: 
[INVALID_PARAMETER_VALUE.CHARSET] The value of parameter(s) `charset` in 
`encode` is invalid: expects one of the charsets 'US-ASCII', 'ISO-8859-1', 
'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got utf8. SQLSTATE: 
2[202](https://github.com/HyukjinKwon/spark/actions/runs/8909131027/job/24465959134#step:9:203)3
{code}
{code}
======================================================================
ERROR [0.024s]: test_assert_approx_equal_decimaltype_custom_rtol_pass 
(pyspark.sql.tests.connect.test_utils.ConnectUtilsTests.test_assert_approx_equal_decimaltype_custom_rtol_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_utils.py", 
line 279, in test_assert_approx_equal_decimaltype_custom_rtol_pass
    assertDataFrameEqual(df1, df2, rtol=1e-1)
  File "/home/runner/work/spark/spark-3.5/python/pyspark/testing/utils.py", 
line 595, in assertDataFrameEqual
    actual_list = actual.collect()
                  ^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1645, in collect
    table, schema = self._session.client.to_table(query)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 858, in to_table
    table, schema, _, _, _ = self._execute_and_fetch(req)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
    self._handle_rpc_error(error)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1539, in _handle_rpc_error
    raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.ArithmeticException: 
[NUMERIC_VALUE_OUT_OF_RANGE.WITH_SUGGESTION]  83.14 cannot be represented as 
Decimal(4, 3). If necessary set "spark.sql.ansi.enabled" to "false" to bypass 
this error, and return NULL instead. SQLSTATE: 22003
----------------------------------------------------------------------
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48085) ANSI enabled by default brings different results in the tests in 3.5 client <> 4.0 server

Reply via email to