Petar Vasiljevic created SPARK-50092:
----------------------------------------
Summary: Fix PostgreSQL connector behaviour for multidimensional
arrays
Key: SPARK-50092
URL: https://issues.apache.org/jira/browse/SPARK-50092
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0
Reporter: Petar Vasiljevic
Fix For: 4.0.0
There is a bug introduced in this PR
[https://github.com/apache/spark/pull/46006]. This PR fixed the behaviour for
PostgreSQL connector for multidimensional arrays since we have mapped all
arrays to 1D arrays.
This PR has introduced a bug. Following scenario is broken:
* User has a table t1 on Postgres and does CTAS command to create table t2
with same data.
* PR 46006 is resolving the dimensionality of column by reading the metadata
from pg_attribute table and attndims column.
* This query returns correct dimensionality for table t1, but for table t2
that is created via CTAS it returns 0 always. This leads to all of the arrays
being mapped to 0-D array which is the type itself (for example int)
As a solution, we can query array_ndims function on given column that will
return the dimension of the column. It works for CTAS-like-created tables too.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]