Hi Guys,
I am working with iceberg 11.1 version iceberg with spark 3.0.1 and when i
run removeOrphanFiles either using Actions or SparkActions class and its
functions it works with hadoop catalog when run locally and i face below
exception when run on EMR with glue catalog. Could you please help me with
what I am missing here?
code snippet.
Actions.forTable(table).removeOrphanFiles().olderThan(removeOrphanFilesOlderThan).execute();
or
SparkActions.get().deleteOrphanFiles(table).olderThan(removeOrphanFilesOlderThan).execute();
issue (when run on EMR):
21/08/19 08:12:56 INFO RemoveOrphanFilesMaintenanceJob: Running
RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp,
Status:Started, tenant: 1, table:raghu3.cars,
removeOrphanFilesOlderThan: {1629360476572}.
21/08/19 08:12:56 ERROR RemoveOrphanFilesMaintenanceJob: Error in
RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp,
Illegal Arguments in table properties - Can't parse null value from
table properties, tenant: tenantId1, table: raghu3.cars,
removeOrphanFilesOlderThan: 1629360476572, Status: Failed, Reason: {}.
java.lang.IllegalArgumentException: Cannot find the metadata table for
glue_catalog.raghu3.cars of type ALL_MANIFESTS
at
org.apache.iceberg.spark.actions.BaseSparkAction.loadMetadataTable(BaseSparkAction.java:191)
at
org.apache.iceberg.spark.actions.BaseSparkAction.buildValidDataFileDF(BaseSparkAction.java:121)
at
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:154)
at
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:101)
at
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141)
at
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76)
at
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFilesOlderThanTimestamp(RemoveOrphanFilesMaintenanceJob.java:274)
at
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFiles(RemoveOrphanFilesMaintenanceJob.java:133)
at
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.maintain(RemoveOrphanFilesMaintenanceJob.java:58)
at
com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.run(LakeHouseTableMaintenanceJob.java:117)
at
com.salesforce.cdp.spark.core.job.SparkJob.submitAndRun(SparkJob.java:76)
at
com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.main(LakeHouseTableMaintenanceJob.java:247)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
Table does exists
[image: image.png]
Did any one face this? What is the fix? Is it a bug or am I missing
something here?
Thanks,
Raghu