Hi, even simple "update" SQL are not supported yet in iceberg.
spark.sql(""" UPDATE db1.iceberg_table2 t SET t.data = 'b1' WHERE t.id = 2 """).show() => error java.lang.UnsupportedOperationException: UPDATE TABLE is not supported temporarily. at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:794) It is not clear from Iceberg API or Doc site how to perform simple update / delete .. and how to convert from given sql to API if possible. I do see there is an api call "newRewrite()" but it looks very low level, requiring me to scan files, then rewrite new files content and remove previous files if modified. ... How to efficiently implement the "if-modified" condition when the where clause is a complex combination of "and-or-not-operators" ? For example, to rely on predicate-push-down wherever possible, and to avoid comparing previous/new file content for any effective change ? Example: UPDATE db1.iceberg_table2 t SET t.data = 'b1' WHERE ( t.id = 2) ... <= might use predicate-push-down from parquet, to blindly skip whole files, or blindly copy whole RowGroup UPDATE db1.iceberg_table2 t SET t.data = 'b1' WHERE ( t.id like '%abc%') .... <= no applyable predicate-push-down from parquet... but at runtime, some files might not be modified after scanning the "id" column content Is there at least some helper class to do simple file Parquet transformations, with copy by SQL "select" only, or spark DataSet.map() ? from previous example, the new file might be generated by this sql sql= " SELECT id, IF( t.id like '%abc%, 'b1', t.data) FROM tempTableForFileToUpdate " Is this (below code) the kind of API call that we are supposed to do to perform updates ?? FileScan ... map( (parquetFile) -> { if (noPredicatePushDownForFile( parquetFile, "id like %abc%" )) { val df = sqlContext.read.parquet( parquetFile ) df.registerTempTable("tempTableForFileToUpdate") spark.sql(sq).write().save( newTempFileToAdd ) if (compareFileEffectivelyDifferent( parquetFile , newTempFileToAdd )) { ... add to transaction newUpdate: fileToRemove=parquetFile, fileToAdd= newTempFileToAdd } else { ... delete temporarily created newFileToAdd } } } It looks scary and error prone to perform simple UPDATE like this, doesn't it ? I hope there is a better way, and I did not find it in the documentation Regards, Arnaud Le jeu. 15 oct. 2020 à 16:26, Ashish Mehta <mehta.ashis...@gmail.com> a écrit : > Hi, > > I am also trying to achieve something similar. The Merge INTO format has > multiple "when matched/not matched" condition, and usually, you can take > action like "delete" or "update" or "insert", can I do that by "overwrite > part of the destination table with the replacement"? Also, the > recommendation of overwriting, will it use UPSERT, or are you trying to > overwrite everything on the target table? > > Till now I have been able to use Iceberg API directly for UPSERT, I > believe there is no way I can do this via dataFrame operations, as a single > commit. > > Thanks, > Ashish > > > On Tue, Oct 13, 2020 at 12:24 PM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> Hi Arnaud, >> >> You're right that MERGE INTO isn't supported yet. What I've seen most >> people do is to implement the operation using SQL to select existing data >> and join it with new data, then overwrite part of the destination table >> with the replacement. >> >> On Mon, Oct 12, 2020 at 2:10 PM Arnaud Nauwynck < >> arnaud.nauwy...@gmail.com> wrote: >> >>> Hi Iceberg dev team, >>> >>> I am trying to use iceberg to do "upsert" based on an event table >>> In pure SQL, it is unsupported yet >>> >>> Here is my example with 2 tables: "iceberg_table" should contain updated >>> data, and "table_event" contains event updates. >>> >>> scala> spark.sql(""" MERGE INTO db1.iceberg_table t USING >>> db1.table_event e ON e.id = t.id WHEN MATCHED THEN UPDATE SET t.data >>> = e.data WHEN NOT MATCHED THEN INSERT (id, data) VALUES (id, data) >>> """).show() >>> .. >>> java.lang.UnsupportedOperationException: MERGE INTO TABLE is not >>> supported temporarily. >>> >>> Is there a way to execute it programmatically using Spark Api or Iceberg >>> Api? >>> Any idea when the sql feature might be available ? >>> >>> >>> Thanks in advance >>> >>> Regards, >>> Arnaud Nauwynck >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >