[jira] [Updated] (NIFI-11449) add autocommit property to PutDatabaseRecord processor

2023-04-14 Thread Abdelrahim Ahmad (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdelrahim Ahmad updated NIFI-11449:

Description: 
The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
to an Iceberg catalog, it disables the autocommit feature. This leads to errors 
such as "{*}Catalog only supports writes using autocommit: iceberg{*}".

the autocommit feature needs to be added in the processor to be 
enabled/disabled.
enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
allowing atomic writes to be performed in the underlying database. This will 
allow the process to be widely used with bigger range of databases.

_Improving this processor will allow Nifi to be the main tool to ingest data 
into these new Technologies. So we don't have to deal with another tool to do 
so._

+*_{color:#de350b}BUT:{color}_*+



I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts records 
one by one into the database using a prepared statement, and commits the 
transaction at the end of the loop that processes each record. This approach 
can be inefficient and slow when inserting large volumes of data into tables 
that are optimized for bulk ingestion, such as Delta Lake, Iceberg, and Hudi 
tables.

These tables use various techniques to optimize the performance of bulk 
ingestion, such as partitioning, clustering, and indexing. Inserting records 
one by one using a prepared statement can bypass these optimizations, leading 
to poor performance and potentially causing issues such as excessive disk 
usage, increased memory consumption, and decreased query performance.

To avoid these issues, it is recommended to have a new processor, or add 
feature to the current one, to bulk insert method with AutoCommit feature when 
inserting large volumes of data into Delta Lake, Iceberg, and Hudi tables. 

 

P.S.: using PutSQL is not a have autoCommit but have the same performance 
problem described above..

Thanks and best regards :)
Abdelrahim Ahmad

  was:
The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
to an Iceberg catalog, it disables the autocommit feature. This leads to errors 
such as "{*}Catalog only supports writes using autocommit: iceberg{*}".

To fix this issue, the autocommit feature needs to be added in the processor to 
be enabled/disabled.
enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
allowing atomic writes to be performed in the underlying database. This will 
allow the process to be widely used with bigger range of databases.

_*Improving this processor will allow Nifi to be the main tool to ingest data 
into these new Technologies. So we don't have to deal with another tool to do 
so.*_

P.S.: using PutSQL is not a good option at all due to the sensitivity of these 
tables when dealing with small inserts.

Thanks and best regards
Abdelrahim Ahmad


> add autocommit property to PutDatabaseRecord processor
> --
>
> Key: NIFI-11449
> URL: https://issues.apache.org/jira/browse/NIFI-11449
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.21.0
> Environment: Any Nifi Deployment
>Reporter: Abdelrahim Ahmad
>Priority: Blocker
>  Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
> using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
> to an Iceberg catalog, it disables the autocommit feature. This leads to 
> errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> the autocommit feature needs to be added in the processor to be 
> enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
> Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
> allowing atomic writes to be performed in the underlying database. This will 
> allow the process to be widely used with bigger range of databases.
> _Improving this processor will allow Nifi to be the main tool to ingest data 
> into these new Technologies. So we don't have to deal with another tool to do 
> so._
> +*_{color:#de350b}BUT:{color}_*+
> I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts 
> records one by one into the database using a prepared statement, and commits 
> the transaction at the end of the loop that 

[jira] [Updated] (NIFI-11449) add autocommit property to PutDatabaseRecord processor

2023-04-14 Thread Abdelrahim Ahmad (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdelrahim Ahmad updated NIFI-11449:

Issue Type: New Feature  (was: Improvement)

> add autocommit property to PutDatabaseRecord processor
> --
>
> Key: NIFI-11449
> URL: https://issues.apache.org/jira/browse/NIFI-11449
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.21.0
> Environment: Any Nifi Deployment
>Reporter: Abdelrahim Ahmad
>Priority: Blocker
>  Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
> using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
> to an Iceberg catalog, it disables the autocommit feature. This leads to 
> errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> To fix this issue, the autocommit feature needs to be added in the processor 
> to be enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
> Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
> allowing atomic writes to be performed in the underlying database. This will 
> allow the process to be widely used with bigger range of databases.
> _*Improving this processor will allow Nifi to be the main tool to ingest data 
> into these new Technologies. So we don't have to deal with another tool to do 
> so.*_
> P.S.: using PutSQL is not a good option at all due to the sensitivity of 
> these tables when dealing with small inserts.
> Thanks and best regards
> Abdelrahim Ahmad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11449) add autocommit property to PutDatabaseRecord processor

2023-04-13 Thread Abdelrahim Ahmad (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdelrahim Ahmad updated NIFI-11449:

Summary: add autocommit property to PutDatabaseRecord processor  (was: add 
autocommit property to control commit in PutDatabaseRecord processor)

> add autocommit property to PutDatabaseRecord processor
> --
>
> Key: NIFI-11449
> URL: https://issues.apache.org/jira/browse/NIFI-11449
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.21.0
> Environment: Any Nifi Deployment
>Reporter: Abdelrahim Ahmad
>Priority: Blocker
>  Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
> using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
> to an Iceberg catalog, it disables the autocommit feature. This leads to 
> errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> To fix this issue, the autocommit feature needs to be added in the processor 
> to be enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
> Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
> allowing atomic writes to be performed in the underlying database. This will 
> allow the process to be widely used with bigger range of databases.
> _*Improving this processor will allow Nifi to be the main tool to ingest data 
> into these new Technologies. So we don't have to deal with another tool to do 
> so.*_
> P.S.: using PutSQL is not a good option at all due to the sensitivity of 
> these tables when dealing with small inserts.
> Thanks and best regards
> Abdelrahim Ahmad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)