[ 
https://issues.apache.org/jira/browse/NIFI-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanhao Zhu updated NIFI-13340:
-------------------------------
    Description: 
*Background:* We run nifi as a standalone instance in a docker container in on 
all our stages, the statemanagement provider used by our instance is the local 
WAL backed provider.

{*}Description{*}: We observed that the flowfiles in front of one of our 
processor groups stopped to be ingested once in a while and it happens on all 
our stages without noticeable pattern. The flowfile concurrency policy of the 
processor group that stopped ingesting data is set to SINGLE_BATCH_PER_NODE and 
the outbound policy is set to BATCH_OUTPUT. The ingestion should continue since 
the processor group had already send out every flowfile in it, but it stopped. 

 

There is only one brutal solution from our side(We have to manually switch the 
flowfile concurrency to unbounded and then switch it back to make it work 
again) and the occurrence of this issue had impacted our data ingestion.
!image-2024-06-03-14-42-37-280.png!

!image-2024-06-03-14-44-15-590.png!

As you can see in the screenshot that the processor group 'Delete before 
Insert' has no more flowfile to output but still it does not ingest the data 
queued in the input port

 

{{In the log file I found the following:}}

 
{code:java}

{code}
{{2024-06-03 11:34:01,772 TRACE [Timer-Driven Process Thread-15] 
o.apache.nifi.groups.StandardDataValve Will not allow data to flow into 
StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert] because Outbound Policy is Batch Output and valve is already 
open to allow data to flow out of group }}

{{ }}
{{Also in the diagnostics, I found the following for the 'Delete before Insert' 
processor group:}}
{{ }}
{code:java}
Process Group e6510a87-aa78-3268-1b11-3c310f0ad144, Name = Search and Delete 
existing reports(This is the parent processor group of the Delete before Insert)
Currently Have Data Flowing In: []
Currently Have Data Flowing Out: 
[StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert]]
Reason for Not allowing data to flow in:
    Data Valve is already allowing data to flow out of group:
        
StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert]
Reason for Not allowing data to flow out:
    Output is Allowed:
        
StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert] {code}

{{ }}
{{Which is clearly {*}not correct{*}. since there are currently not data 
flowing out from the 'Delete before Insert' processor group.}}
{{ }}
{{We dig through the source code of StandardDataValve.java and found that that 
data valve's states are stored every time the data valve is opened and close so 
the potential reason causing this issue is that the processor group id was put 
into the statemap when data flowed in but somehow the removal of the entry was 
not successful. We are aware that if the statemap is not stored before the nifi 
restarts, it could lead to such circumstances, but in the recent occurrences of 
this issue, there were not nifi restart recorded or observed at the time when 
all those flowfiles started to queue in front of the processor group}}
{{ }}

 

  was:
*Background:* We run nifi as a standalone instance in a docker container in on 
all our stages, the statemanagement provider used by our instance is the local 
WAL backed provider.

{*}Description{*}: We observed that the flowfiles in front of one of our 
processor groups stopped to be ingested once in a while and it happens on all 
our stages without noticeable pattern. The flowfile concurrency policy of the 
processor group that stopped ingesting data is set to SINGLE_BATCH_PER_NODE and 
the outbound policy is set to BATCH_OUTPUT. The ingestion should continue since 
the processor group had already send out every flowfile in it, but it stopped. 

 

There is only one brutal solution from our side(We have to manually switch the 
flowfile concurrency to unbounded and then switch it back to make it work 
again) and the occurrence of this issue had impacted our data ingestion.
!image-2024-06-03-14-42-37-280.png!

!image-2024-06-03-14-44-15-590.png!

As you can see in the screenshot that the processor group 'Delete before 
Insert' has no more flowfile to output but still it does not ingest the data 
queued in the input port

 

{{In the log file I found the following:}}

{{```}}
{{{color:#ff0000}2024-06-03 11:34:01,772 TRACE [Timer-Driven Process Thread-15] 
o.apache.nifi.groups.StandardDataValve Will not allow data to flow into 
StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert] because Outbound Policy is Batch Output and valve is already 
open to allow data to flow out of group{color}}}

{{```}}
{{ }}
{{Also in the diagnostics, I found the following for the 'Delete before Insert' 
processor group:}}
{{ }}
{{Process Group e6510a87-aa78-3268-1b11-3c310f0ad144, Name = Search and Delete 
existing reports(This is the parent processor group of the Delete before 
Insert)}}
{{Currently Have Data Flowing In: []}}
{{{color:#ff0000}Currently Have Data Flowing Out: 
[StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert]]{color}}}
{{Reason for Not allowing data to flow in:}}
{{    Data Valve is already allowing data to flow out of group:}}
{{        
StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert]}}
{{Reason for Not allowing data to flow out:}}
{{    Output is Allowed:}}
{{        
StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
 before Insert]}}
{{ }}
{{Which is clearly {*}not correct{*}. since there are currently not data 
flowing out from the 'Delete before Insert' processor group.}}
{{ }}
{{We dig through the source code of StandardDataValve.java and found that that 
data valve's states are stored every time the data valve is opened and close so 
the potential reason causing this issue is that the processor group id was put 
into the statemap when data flowed in but somehow the removal of the entry was 
not successful. We are aware that if the statemap is not stored before the nifi 
restarts, it could lead to such circumstances, but in the recent occurrences of 
this issue, there were not nifi restart recorded or observed at the time when 
all those flowfiles started to queue in front of the processor group}}
{{ }}


> Flowfiles stopped to be ingested before a processor group
> ---------------------------------------------------------
>
>                 Key: NIFI-13340
>                 URL: https://issues.apache.org/jira/browse/NIFI-13340
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.25.0
>         Environment: Host OS :ubuntu 20.04 in wsl
> CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
> RAM: 128GB
> ZFS ARC cache was configured to be maximum 4 GB
>            Reporter: Yuanhao Zhu
>            Priority: Critical
>              Labels: data-consistency, statestore
>         Attachments: image-2024-06-03-14-42-37-280.png, 
> image-2024-06-03-14-44-15-590.png
>
>
> *Background:* We run nifi as a standalone instance in a docker container in 
> on all our stages, the statemanagement provider used by our instance is the 
> local WAL backed provider.
> {*}Description{*}: We observed that the flowfiles in front of one of our 
> processor groups stopped to be ingested once in a while and it happens on all 
> our stages without noticeable pattern. The flowfile concurrency policy of the 
> processor group that stopped ingesting data is set to SINGLE_BATCH_PER_NODE 
> and the outbound policy is set to BATCH_OUTPUT. The ingestion should continue 
> since the processor group had already send out every flowfile in it, but it 
> stopped. 
>  
> There is only one brutal solution from our side(We have to manually switch 
> the flowfile concurrency to unbounded and then switch it back to make it work 
> again) and the occurrence of this issue had impacted our data ingestion.
> !image-2024-06-03-14-42-37-280.png!
> !image-2024-06-03-14-44-15-590.png!
> As you can see in the screenshot that the processor group 'Delete before 
> Insert' has no more flowfile to output but still it does not ingest the data 
> queued in the input port
>  
> {{In the log file I found the following:}}
>  
> {code:java}
> {code}
> {{2024-06-03 11:34:01,772 TRACE [Timer-Driven Process Thread-15] 
> o.apache.nifi.groups.StandardDataValve Will not allow data to flow into 
> StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
>  before Insert] because Outbound Policy is Batch Output and valve is already 
> open to allow data to flow out of group }}
> {{ }}
> {{Also in the diagnostics, I found the following for the 'Delete before 
> Insert' processor group:}}
> {{ }}
> {code:java}
> Process Group e6510a87-aa78-3268-1b11-3c310f0ad144, Name = Search and Delete 
> existing reports(This is the parent processor group of the Delete before 
> Insert)
> Currently Have Data Flowing In: []
> Currently Have Data Flowing Out: 
> [StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
>  before Insert]]
> Reason for Not allowing data to flow in:
>     Data Valve is already allowing data to flow out of group:
>         
> StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
>  before Insert]
> Reason for Not allowing data to flow out:
>     Output is Allowed:
>         
> StandardProcessGroup[identifier=5eb0ad69-e8ed-3ba0-52da-af94fb9836cd,name=Delete
>  before Insert] {code}
> {{ }}
> {{Which is clearly {*}not correct{*}. since there are currently not data 
> flowing out from the 'Delete before Insert' processor group.}}
> {{ }}
> {{We dig through the source code of StandardDataValve.java and found that 
> that data valve's states are stored every time the data valve is opened and 
> close so the potential reason causing this issue is that the processor group 
> id was put into the statemap when data flowed in but somehow the removal of 
> the entry was not successful. We are aware that if the statemap is not stored 
> before the nifi restarts, it could lead to such circumstances, but in the 
> recent occurrences of this issue, there were not nifi restart recorded or 
> observed at the time when all those flowfiles started to queue in front of 
> the processor group}}
> {{ }}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to