[ 
https://issues.apache.org/jira/browse/NIFI-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683989#comment-16683989
 ] 

ASF GitHub Bot commented on NIFI-5812:
--------------------------------------

Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    I did something like that for the Hive 1 version of PutHiveStreaming, the 
underlying library wasn't thread-safe if you were working on the same table, so 
I put in a "table lock" where multiple threads couldn't act on the same table. 
With QDT it doesn't take incoming flow files, so the table name is effectively 
hard-coded. By adding PrimaryNodeOnly we can guarantee that QDT only has one 
instance (it is already TriggerSerially so can't have multiple threads). For 
GTF if you use ListDatabaseTables on the primary node only (not sure if we 
should force that with this annotation or not) then each flow file should have 
a different table, and using a load-balanced connection (or RPG -> Input Port) 
then each instance of GTF should be working on a different table. The onus is 
on the user to set Max Concurrent Tasks for GTF to 1 to prevent multi-threaded 
execution.
    
    Perhaps for GTF, instead of forcing PrimaryNodeOnly, we can make it clear 
in the doc that if there are no incoming connections, it should probably be run 
on the primary node only. Alternatively, maybe during annotation processing we 
can enforce that processors annotated with PrimaryNodeOnly automatically have 
an InputRequirement of INPUT_FORBIDDEN? Otherwise flow files can get stalled if 
they are in the queue on a node that is not the primary.


> Make database processors as 'PrimaryNodeOnly'
> ---------------------------------------------
>
>                 Key: NIFI-5812
>                 URL: https://issues.apache.org/jira/browse/NIFI-5812
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>    Affects Versions: 1.7.0, 1.8.0, 1.7.1
>            Reporter: Sivaprasanna Sethuraman
>            Assignee: Sivaprasanna Sethuraman
>            Priority: Major
>
> With NIFI-543, we have introduced an behavior annotation to mark a particular 
> processor to run only on the Primary Node. It is recommended to mark the 
> following database related processors as 'PrimaryNodeOnly':
>  * QueryDatabaseTable
>  * GenerateTableFetch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to