[ 
https://issues.apache.org/jira/browse/NIFI-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024236#comment-16024236
 ] 

ASF GitHub Bot commented on NIFI-3484:
--------------------------------------

Github user patricker commented on the issue:

    https://github.com/apache/nifi/pull/1513
  
    @ilyatau Why do you feel that this request isn't finished?
    While I do have a newer version, the functionality it adds is a bit more 
advanced and I've had a hard time automating the testing.
    
    The additional functionality I have working will generate a premature right 
boundary to reduce the number of records brought back. You might think that 
this is already the point of Generate Table Fetch, but I found that even with 
an indexed timestamp column you still have to page the index when paging 
through data on some systems. In one case we found that on SAP Hana we ran into 
some internal limitations which did not allow you to page past more than 2 
billion rows of data. The table we were loading with Generate Table Fetch had 
~6 billion rows. Using the un-committed change you can provide a per execution 
limit to Generate Table Fetch so that it will only generate pages of `x` size 
for the first `y` rows in the table.
    
    When I tried to test this on SQL Lite it did not work, though it is working 
on other SQL systems we've tried it on.


> GenerateTableFetch Should Allow for Right Boundary
> --------------------------------------------------
>
>                 Key: NIFI-3484
>                 URL: https://issues.apache.org/jira/browse/NIFI-3484
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Peter Wicks
>            Assignee: Peter Wicks
>            Priority: Minor
>
> When using GenerateTableFetch it places no right hand boundary on pages of 
> data.  This can lead to issues when the statement says to get the next 1000 
> records greater then a specific key, but records were added to the table 
> between the time the processor executed and when the SQL is being executed. 
> As a result it pulls in records that did not exist when the processor was 
> run.  On the next execution of the processor these records will be pulled in 
> a second time.
> Example:
> Partition Size = 1000
> First run (no state): Count(*)=4700 and MAX(ID)=4700.
> 5 FlowFiles are generated, the last one will say to fetch 1000, not 700. (But 
> I don't think this is really a bug, just an observation).
> 5 Flow Files are now in queue to be executed by ExecuteSQL.  Before the 5th 
> file can execute 400 new rows are added to the table.  When the final SQL 
> statement is executed 300 extra records, with higher ID values, will also be 
> pulled into NiFi.
> Second run (state: ID=4700).  Count(*) ID>4700 = 400 and MAX(ID)=5100.
> 1 Flow File is generated, but includes 300 records already pulled into NiFI.
> The solution is to have an optional property that will let users use the new 
> MAX(ID) as a right boundary when generating queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to