[ 
https://issues.apache.org/jira/browse/NIFI-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452184#comment-15452184
 ] 

ASF GitHub Bot commented on NIFI-2712:
--------------------------------------

GitHub user mattyb149 opened a pull request:

    https://github.com/apache/nifi/pull/976

    NIFI-2712: Fixed Fetch processors for multiple max-value columns

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mattyb149/nifi NIFI-2712

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/976.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #976
    
----
commit 6b99804d2057a03218a1d53e8c210d238d510c12
Author: Matt Burgess <mattyb...@apache.org>
Date:   2016-08-31T13:00:16Z

    NIFI-2712: Fixed Fetch processors for multiple max-value columns

----


> Database Fetch processors' max-value columns don't work as expected
> -------------------------------------------------------------------
>
>                 Key: NIFI-2712
>                 URL: https://issues.apache.org/jira/browse/NIFI-2712
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>
> Currently, for QueryDatabaseTable and GenerateTableFetch, the user can enter 
> any number of maximum-value columns, which are used to generate a SQL query 
> that will fetch all records whose values are greater than the last-observed 
> maximum values for those columns.
> However this makes multiple max-value columns not very useful, since they 
> will both have to increase in lockstep or records will be lost/skipped. In 
> such a case, using one or the other (but not both) would suffice, making 
> multiple max-value columns useless.
> The more likely use case is that there are multiple columns whose values are 
> strictly increasing, but at different rates. This is common with very large 
> tables where a column could be for "date_created" and also a "bucket number" 
> that strictly increases once a day. Queries for a day's worth of data are 
> more efficient if they can be filtered on "bucket" (in this case), then on 
> timestamp. However the generated SQL queries would have to reflect that 
> "bucket" may remain the same as timestamp is increasing, but once the bucket 
> value has increased, then only the (new) timestamps for that bucket should be 
> fetched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to