[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675777#comment-16675777 ]
ASF GitHub Bot commented on NIFI-5788: -------------------------------------- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r230917511 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile } } ps.addBatch(); + if (++currentBatchSize == batchSize) { --- End diff -- Would it be beneficial to capture `currentBatchSize*batchIndex`, with `batchIndex` being incremented only after a successful call to `executeBatch()` as an attribute? My thinking is, if you have a failure, and only part of a batch was loaded, you could store how many rows were loaded successfully as an attribute? > Introduce batch size limit in PutDatabaseRecord processor > --------------------------------------------------------- > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB > Reporter: Vadim > Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)