[jira] [Commented] (PHOENIX-7377) phoenix5-spark dataframe issue with schema inference

ASF GitHub Bot (Jira) Mon, 29 Jul 2024 00:42:05 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869255#comment-17869255
 ]


ASF GitHub Bot commented on PHOENIX-7377:
-----------------------------------------

rejeb opened a new pull request, #134:
URL: https://github.com/apache/phoenix-connectors/pull/134

   The PR add a flag to fix the PHOENIX-7377 issue.
   - Add a flag `escapeColumnFamily` default to false.
   - Make `SparkSchemaUtil.normalizeColumnName` and 
`SparkSchemaUtil.phoenixTypeToCatalystType` private since they are used 
exclusively in `SparkSchemaUtil`.
   - Update README.




> phoenix5-spark dataframe issue with schema inference
> ----------------------------------------------------
>
>                 Key: PHOENIX-7377
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7377
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: rejeb ben rejeb
>            Priority: Major
>
> The fix of the  PHOENIX-4981 introduced a bracking change in the way the 
> schema was inferred.
> In previous versions of the connector, for non default column family , 
> columns mapped to "columnName" in DataFrame. Now, they are mapped to 
> "columnFamily.columnName".
> There are no unit tests that cover this case, all tests uses tables with 
> default column family "0".
> The change is made is this [pull 
> request|https://github.com/apache/phoenix/pull/402] (the project was moved to 
> another git repo since):
>  * In previous version code uses `ColumnInfo.getDisplayName` to define the 
> name of the column in the DF.
>  * The new class SparkSchemaUtil the method used is  
> `ColumnInfo.getColumnName` which returns the columnName as 
> `columnFamilyName.columnName`.
> The pull request is related to this ticket PHOENIX-4981 the change is not 
> documented.
> This change breaks jobs reading from tables having a non default column 
> family.
> The saprk3 connector have the same issue since code has been duplicated from 
> spark2 module to spark3 module.
> Since V1 api has been modified to use same method to resolve schema it has 
> the same behavior and it should not bcause they are now a deprecated classes 
> and should not contain a braking change.
>  
> *Resolution proposal:*
> The best way to fix the issue is to add a property to have both options for 
> schema  non default column family column name mapping.
> The issue is in spark connector and it's resolution will not have a side 
> effect on other phoenix-connectors like phoenix5-hive for example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-7377) phoenix5-spark dataframe issue with schema inference

Reply via email to