rahil-c commented on code in PR #1862:
URL: https://github.com/apache/polaris/pull/1862#discussion_r2180940060
##########
plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/PolarisCatalogUtils.java:
##########
@@ -64,9 +92,13 @@ public static boolean
isTableWithSparkManagedLocation(Map<String, String> proper
* Load spark table using DataSourceV2.
*
* @return V2Table if DataSourceV2 is available for the table format. For
delta table, it returns
- * DeltaTableV2.
+ * DeltaTableV2. For hudi it should return HoodieInternalV2Table.
*/
- public static Table loadSparkTable(GenericTable genericTable) {
+ public static Table loadSparkTable(GenericTable genericTable, Identifier
identifier) {
+ if (genericTable.getFormat().equalsIgnoreCase("hudi")) {
+ // hudi does not implement table provider interface, so will need to
catch it
Review Comment:
Currently the `PolarisCatalogUtils`.`loadSparkTable` is using a
`DataSourceV2Utils` Util in order to load the table using sparks table provider
as seen
[here](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/PolarisCatalogUtils.java#L86).
In the case for Delta this will go thru Delta's datasource impl which is
implementing this V2`TableProviderInterface`, see DeltaDataSource
https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L57
The `PolarisCatalogUtils`.`loadSparkTable` currently assumes that other
formats also implement this same TableProvider interface. However if they do
not then this will fail with an exception.
Hudi in its spark integration does not implement that interface, see the
entry point class for hudi datasource entry point `DefaultSource`
https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala#L55,
meaning we will have to provide another way to load the hudi table, which is
why I have added this condition and method called `loadHudiSparkTable`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]