[ https://issues.apache.org/jira/browse/NIFI-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243257#comment-16243257 ]
Koji Kawamura commented on NIFI-2835: ------------------------------------- Hi [~josephxsxn] [~Eulicny] , are you guys still working on this? How is the implementation going? (Probably NIFI-3681 is blocking this to make progress) I just wondered if this JIRA can be done by using EventProcessor Host instead of low-level PartitionReceiver API which GetAzureEventHub uses now. By reading these Azure Event hub docs/blogs, I think EventProcessor Host approach can make things simpler. NiFi does not have to implement leader election or partition/offset management itself. https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-java-get-started-receive-eph https://blogs.biztalk360.com/understanding-consumer-side-of-azure-event-hubs-checkpoint-initialoffset-eventprocessorhost/ Also it will look like ConsumeKafka, as EventProcessor Host stores consumer group info in Azure blob storage so does Kafka in its special topic (or Zk previously), not in NiFi managed state. Since Event hub and Kafka are similar in architecture, storing consumer information at broker side might work better. I'm going to test using EventProcessor Host from a NiFi processor. It will be hugely different from current GetAzureEventHub implementation, so it should be different processor such as ConsumeAzureEventHub. I will share my findings later. > GetAzureEventHub processor should leverage partition offset to better handle > restarts > ------------------------------------------------------------------------------------- > > Key: NIFI-2835 > URL: https://issues.apache.org/jira/browse/NIFI-2835 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Joseph Percivall > Assignee: Eric Ulicny > > The GetAzureEventHub processor utilizes the Azure client that consists of > receivers for each partition. The processor stores them in a map[1] that gets > cleared every time the processor is stopped[2]. These receivers have > partition offsets which keep track of which message it's currently on and > which it should receive next. So currently, when the processor is > stopped/restarted, any tracking of which message is next to be received is > lost. > If instead of clearing the map each time, we hold onto the receivers, or kept > track of the partitionId/Offsets when stopping, (barring any relevant > configuration changes) the processor would restart exactly where it left off > with no loss of data. > This would work very well with NIFI-2826. > [1]https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/eventhub/GetAzureEventHub.java#L122 > [2] > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/eventhub/GetAzureEventHub.java#L229 -- This message was sent by Atlassian JIRA (v6.4.14#64029)