[ 
https://issues.apache.org/jira/browse/NIFI-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243257#comment-16243257
 ] 

Koji Kawamura commented on NIFI-2835:
-------------------------------------

Hi [~josephxsxn] [~Eulicny] , are you guys still working on this? How is the 
implementation going? (Probably NIFI-3681 is blocking this to make progress)

I just wondered if this JIRA can be done by using EventProcessor Host instead 
of low-level PartitionReceiver API which GetAzureEventHub uses now. By reading 
these Azure Event hub docs/blogs, I think EventProcessor Host approach can make 
things simpler. NiFi does not have to implement leader election or 
partition/offset management itself.
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-java-get-started-receive-eph
https://blogs.biztalk360.com/understanding-consumer-side-of-azure-event-hubs-checkpoint-initialoffset-eventprocessorhost/

Also it will look like ConsumeKafka, as EventProcessor Host stores consumer 
group info in Azure blob storage so does Kafka in its special topic (or Zk 
previously), not in NiFi managed state. Since Event hub and Kafka are similar 
in architecture, storing consumer information at broker side might work better.

I'm going to test using EventProcessor Host from a NiFi processor. It will be 
hugely different from current GetAzureEventHub implementation, so it should be 
different processor such as ConsumeAzureEventHub. I will share my findings 
later.

> GetAzureEventHub processor should leverage partition offset to better handle 
> restarts
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-2835
>                 URL: https://issues.apache.org/jira/browse/NIFI-2835
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Joseph Percivall
>            Assignee: Eric Ulicny
>
> The GetAzureEventHub processor utilizes the Azure client that consists of 
> receivers for each partition. The processor stores them in a map[1] that gets 
> cleared every time the processor is stopped[2]. These receivers have 
> partition offsets which keep track of which message it's currently on and 
> which it should receive next. So currently, when the processor is 
> stopped/restarted, any tracking of which message is next to be received is 
> lost.
> If instead of clearing the map each time, we hold onto the receivers, or kept 
> track of the partitionId/Offsets when stopping, (barring any relevant 
> configuration changes) the processor would restart exactly where it left off 
> with no loss of data.
> This would work very well with NIFI-2826.
> [1]https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/eventhub/GetAzureEventHub.java#L122
> [2] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/eventhub/GetAzureEventHub.java#L229



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to