[ https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736060#comment-17736060 ]
Nandor Soma Abonyi commented on NIFI-11449: ------------------------------------------- Hi [~Abdelrahimk]! We have a dedicated processor called PutIceberg ([https://github.com/apache/nifi/blob/adb8420b484a971103d4d5e5017cab228c5c56de/nifi-nar-bundles/nifi-iceberg-bundle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java#L79]). Any chance that you've tried it already? > add autocommit property to PutDatabaseRecord processor > ------------------------------------------------------ > > Key: NIFI-11449 > URL: https://issues.apache.org/jira/browse/NIFI-11449 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Affects Versions: 1.21.0 > Environment: Any Nifi Deployment > Reporter: Abdelrahim Ahmad > Priority: Blocker > Labels: Trino, autocommit, database, iceberg, putdatabaserecord > > The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When > using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write > to an Iceberg catalog, it disables the autocommit feature. This leads to > errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}". > the autocommit feature needs to be added in the processor to be > enabled/disabled. > enabling auto-commit in the Nifi PutDatabaseRecord processor is important for > Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by > allowing atomic writes to be performed in the underlying database. This will > allow the process to be widely used with bigger range of databases. > _Improving this processor will allow Nifi to be the main tool to ingest data > into these new Technologies. So we don't have to deal with another tool to do > so._ > +*_{color:#de350b}BUT:{color}_*+ > I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts > records one by one into the database using a prepared statement, and commits > the transaction at the end of the loop that processes each record. This > approach can be inefficient and slow when inserting large volumes of data > into tables that are optimized for bulk ingestion, such as Delta Lake, > Iceberg, and Hudi tables. > These tables use various techniques to optimize the performance of bulk > ingestion, such as partitioning, clustering, and indexing. Inserting records > one by one using a prepared statement can bypass these optimizations, leading > to poor performance and potentially causing issues such as excessive disk > usage, increased memory consumption, and decreased query performance. > To avoid these issues, it is recommended to have a new processor, or add > feature to the current one, to bulk insert method with AutoCommit feature > when inserting large volumes of data into Delta Lake, Iceberg, and Hudi > tables. > > P.S.: using PutSQL is not a have autoCommit but have the same performance > problem described above.. > Thanks and best regards :) > Abdelrahim Ahmad -- This message was sent by Atlassian Jira (v8.20.10#820010)