[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675072#comment-16675072
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

GitHub user vadimar opened a pull request:

https://github.com/apache/nifi/pull/3128

NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vadimar/nifi-1 nifi-5788

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/3128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3128


commit 2f36c8b1a732e249238f5f6f53968e84c05b497c
Author: vadimar 
Date:   2018-11-05T11:15:12Z

NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor




> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.8.0
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675387#comment-16675387
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r230811717
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -265,6 +265,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
 .build();
 
+static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
--- End diff --

We should be consistent here with "batch size" and "bulk size" in the 
naming of variables, documentation, etc. Maybe "Maximum Batch Size"?


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675386#comment-16675386
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r230812123
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -265,6 +265,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
 .build();
 
+static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
+.name("put-db-record-batch-size")
+.displayName("Bulk Size")
+.description("Specifies batch size for INSERT and UPDATE 
statements. This parameter has no effect for other statements specified in 
'Statement Type'."
++ " Non-positive value has the effect of infinite bulk 
size.")
+.defaultValue("-1")
--- End diff --

What does a value of zero do? Would anyone ever use it? If not, perhaps 
zero is the best default to indicate infinite bulk size. If you do change it to 
zero, please change the validator to a NONNEGATIVE_INTEGER_VALIDATOR to match


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675775#comment-16675775
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r230916140
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -265,6 +265,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
 .build();
 
+static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
+.name("put-db-record-batch-size")
+.displayName("Bulk Size")
+.description("Specifies batch size for INSERT and UPDATE 
statements. This parameter has no effect for other statements specified in 
'Statement Type'."
++ " Non-positive value has the effect of infinite bulk 
size.")
+.defaultValue("-1")
--- End diff --

I agree that `0` should be the default, and would replicate the current 
behavior of the processor, "All records in one batch".


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675777#comment-16675777
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r230917511
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, 
ProcessSession session, FlowFile
 }
 }
 ps.addBatch();
+if (++currentBatchSize == batchSize) {
--- End diff --

Would it be beneficial to capture `currentBatchSize*batchIndex`, with 
`batchIndex` being incremented only after a successful call to `executeBatch()` 
as an attribute? My thinking is, if you have a failure, and only part of a 
batch was loaded, you could store how many rows were loaded successfully as an 
attribute?


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676638#comment-16676638
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user vadimar commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r231086664
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -265,6 +265,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
 .build();
 
+static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
--- End diff --

Agree regarding "Maximum Batch Size". Sounds better. What's "bulk size"? Is 
it relevant to this change?


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676639#comment-16676639
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user vadimar commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r231087439
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -265,6 +265,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
 .build();
 
+static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
+.name("put-db-record-batch-size")
+.displayName("Bulk Size")
+.description("Specifies batch size for INSERT and UPDATE 
statements. This parameter has no effect for other statements specified in 
'Statement Type'."
++ " Non-positive value has the effect of infinite bulk 
size.")
+.defaultValue("-1")
--- End diff --

I'll change the default to be zero and the validator to 
NONNEGATIVE_INTEGER_VALIDATOR


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676649#comment-16676649
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user vadimar commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r231088684
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, 
ProcessSession session, FlowFile
 }
 }
 ps.addBatch();
+if (++currentBatchSize == batchSize) {
--- End diff --

I'm not sure this would be benefitial. PutDatabaseRecord works without 
autoCommit. It's all or nothing.


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676660#comment-16676660
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user vadimar commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r231089816
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -265,6 +265,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
 .build();
 
+static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
--- End diff --

Oh. I see it now. The display label is "Bulk Size". I'll fix it to be 
"Maximum Batch Size". Thanks


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676856#comment-16676856
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3128#discussion_r231153599
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
@@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, 
ProcessSession session, FlowFile
 }
 }
 ps.addBatch();
+if (++currentBatchSize == batchSize) {
--- End diff --

True, I missed that override before, but I see it now. So definitely less 
valuable, the only thing it would provide would be troubleshooting guidance, 
"your bad data is roughly in this part of the file". Probably not worth it. 
Thanks!


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *max_batch_size* which defines the maximum batch size in INSERT/UPDATE 
> statement; the default value zero (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  
> [EDIT] Changed batch_size to max_batch_size. The default value would be zero 
> (INFINITY) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679466#comment-16679466
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user vadimar commented on the issue:

https://github.com/apache/nifi/pull/3128
  
Hi,
Can you please review the latest commits? I committed the changes that 
address all the issues raised by reviewers.
Thanks 


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *max_batch_size* which defines the maximum batch size in INSERT/UPDATE 
> statement; the default value zero (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  
> [EDIT] Changed batch_size to max_batch_size. The default value would be zero 
> (INFINITY) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688243#comment-16688243
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/3128


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *max_batch_size* which defines the maximum batch size in INSERT/UPDATE 
> statement; the default value zero (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  
> [EDIT] Changed batch_size to max_batch_size. The default value would be zero 
> (INFINITY) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688239#comment-16688239
 ] 

ASF subversion and git services commented on NIFI-5788:
---

Commit d319a3ef2f14317f29a1be5a189bc34f8b3fdbd6 in nifi's branch 
refs/heads/master from vadimar
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d319a3e ]

NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

Renamed 'batch size' to 'Maximum Batch Size'.
Changed default value of max_batch_size to zero (INFINITE)
Fixed parameter validation.
Added unit tests

Signed-off-by: Matthew Burgess 

This closes #3128


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *max_batch_size* which defines the maximum batch size in INSERT/UPDATE 
> statement; the default value zero (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  
> [EDIT] Changed batch_size to max_batch_size. The default value would be zero 
> (INFINITY) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688240#comment-16688240
 ] 

ASF subversion and git services commented on NIFI-5788:
---

Commit d319a3ef2f14317f29a1be5a189bc34f8b3fdbd6 in nifi's branch 
refs/heads/master from vadimar
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d319a3e ]

NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

Renamed 'batch size' to 'Maximum Batch Size'.
Changed default value of max_batch_size to zero (INFINITE)
Fixed parameter validation.
Added unit tests

Signed-off-by: Matthew Burgess 

This closes #3128


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *max_batch_size* which defines the maximum batch size in INSERT/UPDATE 
> statement; the default value zero (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  
> [EDIT] Changed batch_size to max_batch_size. The default value would be zero 
> (INFINITY) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688248#comment-16688248
 ] 

ASF GitHub Bot commented on NIFI-5788:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3128
  
+1 LGTM, tested with various batch sizes and ran unit tests. Thanks for 
this improvment! Merged to master


> Introduce batch size limit in PutDatabaseRecord processor
> -
>
> Key: NIFI-5788
> URL: https://issues.apache.org/jira/browse/NIFI-5788
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
> Environment: Teradata DB
>Reporter: Vadim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *max_batch_size* which defines the maximum batch size in INSERT/UPDATE 
> statement; the default value zero (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  
> [EDIT] Changed batch_size to max_batch_size. The default value would be zero 
> (INFINITY) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)