[jira] [Updated] (NIFI-12498) The Prioritization description in the User Guide is different from the actual source code implementation.

2023-12-10 Thread Doin Cha (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doin Cha updated NIFI-12498:

Description: 
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization]_

 

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

_([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90])_
 

 

It looks like the user guide needs to be revised.

  was:
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization]_

 

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

_([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90])_
 

 


> The Prioritization description in the User Guide is different from the actual 
> source code implementation.
> -
>
> Key: NIFI-12498
> URL: https://issues.apache.org/jira/browse/NIFI-12498
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Doin Cha
>Priority: Minor
>
> In the prioritization explanation of the User Guide, it is stated that 
> *OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
> prioritizers are selected."_
> _([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization]_
>  
>  
> However, in the actual source code implementation, {color:#ff}*there is 
> no automatic default setting when prioritizers are not selected.* {color}
> In such cases, the sorting is done by comparing the *ContentClaim* *of 
> FlowFiles.*
> _([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90])_
>  
>  
> It looks like the user guide needs to be revised.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12498) The Prioritization description in the User Guide is different from the actual source code implementation.

2023-12-10 Thread Doin Cha (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doin Cha updated NIFI-12498:

Description: 
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization]_

 

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

_([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90])_
 

 

  was:
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

{_}[user-guide#prioritization|{_}{_}[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]{_}{_}]{_}

 

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

{_}[source 
code|{_}{_}[https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90|https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90)]]{_}

 


> The Prioritization description in the User Guide is different from the actual 
> source code implementation.
> -
>
> Key: NIFI-12498
> URL: https://issues.apache.org/jira/browse/NIFI-12498
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Doin Cha
>Priority: Minor
>
> In the prioritization explanation of the User Guide, it is stated that 
> *OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
> prioritizers are selected."_
> _([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization]_
>  
>  
> However, in the actual source code implementation, {color:#ff}*there is 
> no automatic default setting when prioritizers are not selected.* {color}
> In such cases, the sorting is done by comparing the *ContentClaim* *of 
> FlowFiles.*
> _([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90])_
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12498) The Prioritization description in the User Guide is different from the actual source code implementation.

2023-12-10 Thread Doin Cha (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doin Cha updated NIFI-12498:

Description: 
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

{_}[user-guide#prioritization|{_}{_}[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]{_}{_}]{_}

 

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

{_}[source 
code|{_}{_}[https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90|https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90)]]{_}

 

  was:
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]_

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

{_}([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90)]{_}{*}{*}{*}{*}

 


> The Prioritization description in the User Guide is different from the actual 
> source code implementation.
> -
>
> Key: NIFI-12498
> URL: https://issues.apache.org/jira/browse/NIFI-12498
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Doin Cha
>Priority: Minor
>
> In the prioritization explanation of the User Guide, it is stated that 
> *OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
> prioritizers are selected."_
> {_}[user-guide#prioritization|{_}{_}[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]{_}{_}]{_}
>  
>  
> However, in the actual source code implementation, {color:#ff}*there is 
> no automatic default setting when prioritizers are not selected.* {color}
> In such cases, the sorting is done by comparing the *ContentClaim* *of 
> FlowFiles.*
> {_}[source 
> code|{_}{_}[https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90|https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90)]]{_}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12498) The Prioritization description in the User Guide is different from the actual source code implementation.

2023-12-10 Thread Doin Cha (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doin Cha updated NIFI-12498:

Description: 
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]_

 

However, in the actual source code implementation, {color:#ff}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

{_}([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90)]{_}{*}{*}{*}{*}

 

  was:
In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]_

 

However, in the actual source code implementation, {color:#FF}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

 


> The Prioritization description in the User Guide is different from the actual 
> source code implementation.
> -
>
> Key: NIFI-12498
> URL: https://issues.apache.org/jira/browse/NIFI-12498
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Doin Cha
>Priority: Minor
>
> In the prioritization explanation of the User Guide, it is stated that 
> *OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
> prioritizers are selected."_
> _([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]_
>  
> However, in the actual source code implementation, {color:#ff}*there is 
> no automatic default setting when prioritizers are not selected.* {color}
> In such cases, the sorting is done by comparing the *ContentClaim* *of 
> FlowFiles.*
> {_}([https://github.com/apache/nifi/blob/9a5ec83baa1b3593031f0917659a69e7a36bb0be/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/QueuePrioritizer.java#L39-L90)]{_}{*}{*}{*}{*}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12498) The Prioritization description in the User Guide is different from the actual source code implementation.

2023-12-10 Thread Doin Cha (Jira)
Doin Cha created NIFI-12498:
---

 Summary: The Prioritization description in the User Guide is 
different from the actual source code implementation.
 Key: NIFI-12498
 URL: https://issues.apache.org/jira/browse/NIFI-12498
 Project: Apache NiFi
  Issue Type: Bug
Reporter: Doin Cha


In the prioritization explanation of the User Guide, it is stated that 
*OldestFlowFileFirstPrioritizer* is the _"default scheme that is used if no 
prioritizers are selected."_

_([https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization)]_

 

However, in the actual source code implementation, {color:#FF}*there is no 
automatic default setting when prioritizers are not selected.* {color}

In such cases, the sorting is done by comparing the *ContentClaim* *of 
FlowFiles.*

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12241) Efficient Parquet Splitting

2023-12-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795124#comment-17795124
 ] 

ASF subversion and git services commented on NIFI-12241:


Commit 387d263b3b15fa187bcac5bc4e0af00034f5f7ac in nifi's branch 
refs/heads/support/nifi-1.x from Rajmund Takacs
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=387d263b3b ]

NIFI-12241 Efficient Parquet Splitting

(cherry picked from commit 9a5ec83baa1b3593031f0917659a69e7a36bb0be)


> Efficient Parquet Splitting
> ---
>
> Key: NIFI-12241
> URL: https://issues.apache.org/jira/browse/NIFI-12241
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Rajmund Takacs
>Assignee: Rajmund Takacs
>Priority: Major
>  Labels: feature, performance, pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> SplitParquet processor that expects as input a FlowFile with Parquet content 
> and would take as parameter a number of records as the split configuration.
> The processor would generate X flow files with unmodified content and would 
> add attributes with the offsets required to read the group of rows in the 
> flowfile's content.
> Then the Parquet Reader would be improved to accept optional flow file 
> attributes containing the information so that the reader can only read the 
> required part of the data.
> Instead of having something like
> {noformat}
> X -> SplitRecord (Parquet / JSON) -> ...{noformat}
> It'd be something like
> {noformat}
> X -> SplitParquet -> ConvertRecord (Parquet / JSON) -> ...{noformat}
> The goal here is to increase the overall efficiency of this operation for 
> extremely large Parquet files (hundreds of GBs). With the second approach, it 
> could leverage multi-threading for processing a single file.
> SplitParquet processor should also have a property (true/false) to write 
> zero-content flow files. The existing FetchParquet processor should be 
> enhanced to accept the flow file attributes for giving offsets. It'd give 
> something like
> {noformat}
> X -> SplitParquet -> FetchParquet (JSON Writer) -> ...{noformat}
> This way, a load balanced connection could be used between SplitParquet and 
> FetchParquet in order to distribute the work across the nodes (without 
> transferring a lot of data across the nodes of the cluster).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12241) Efficient Parquet Splitting

2023-12-10 Thread Tamas Palfy (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Palfy updated NIFI-12241:
---
Fix Version/s: 1.25.0
   2.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Efficient Parquet Splitting
> ---
>
> Key: NIFI-12241
> URL: https://issues.apache.org/jira/browse/NIFI-12241
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Rajmund Takacs
>Assignee: Rajmund Takacs
>Priority: Major
>  Labels: feature, performance, pull-request-available
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> SplitParquet processor that expects as input a FlowFile with Parquet content 
> and would take as parameter a number of records as the split configuration.
> The processor would generate X flow files with unmodified content and would 
> add attributes with the offsets required to read the group of rows in the 
> flowfile's content.
> Then the Parquet Reader would be improved to accept optional flow file 
> attributes containing the information so that the reader can only read the 
> required part of the data.
> Instead of having something like
> {noformat}
> X -> SplitRecord (Parquet / JSON) -> ...{noformat}
> It'd be something like
> {noformat}
> X -> SplitParquet -> ConvertRecord (Parquet / JSON) -> ...{noformat}
> The goal here is to increase the overall efficiency of this operation for 
> extremely large Parquet files (hundreds of GBs). With the second approach, it 
> could leverage multi-threading for processing a single file.
> SplitParquet processor should also have a property (true/false) to write 
> zero-content flow files. The existing FetchParquet processor should be 
> enhanced to accept the flow file attributes for giving offsets. It'd give 
> something like
> {noformat}
> X -> SplitParquet -> FetchParquet (JSON Writer) -> ...{noformat}
> This way, a load balanced connection could be used between SplitParquet and 
> FetchParquet in order to distribute the work across the nodes (without 
> transferring a lot of data across the nodes of the cluster).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NIFI-12241 Efficient Parquet Splitting [nifi]

2023-12-10 Thread via GitHub


tpalfy commented on PR #7893:
URL: https://github.com/apache/nifi/pull/7893#issuecomment-1849105654

   Pushed to nifi-1.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12241 Efficient Parquet Splitting [nifi]

2023-12-10 Thread via GitHub


tpalfy commented on PR #7893:
URL: https://github.com/apache/nifi/pull/7893#issuecomment-1849087854

   LGTM
   Thank you @takraj for your work.
   Merged to main and merging to nifi-1.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NIFI-12241) Efficient Parquet Splitting

2023-12-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795116#comment-17795116
 ] 

ASF subversion and git services commented on NIFI-12241:


Commit 9a5ec83baa1b3593031f0917659a69e7a36bb0be in nifi's branch 
refs/heads/main from Rajmund Takacs
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=9a5ec83baa ]

NIFI-12241 Efficient Parquet Splitting

This closes #7893.

Signed-off-by: Tamas Palfy 


> Efficient Parquet Splitting
> ---
>
> Key: NIFI-12241
> URL: https://issues.apache.org/jira/browse/NIFI-12241
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Rajmund Takacs
>Assignee: Rajmund Takacs
>Priority: Major
>  Labels: feature, performance, pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> SplitParquet processor that expects as input a FlowFile with Parquet content 
> and would take as parameter a number of records as the split configuration.
> The processor would generate X flow files with unmodified content and would 
> add attributes with the offsets required to read the group of rows in the 
> flowfile's content.
> Then the Parquet Reader would be improved to accept optional flow file 
> attributes containing the information so that the reader can only read the 
> required part of the data.
> Instead of having something like
> {noformat}
> X -> SplitRecord (Parquet / JSON) -> ...{noformat}
> It'd be something like
> {noformat}
> X -> SplitParquet -> ConvertRecord (Parquet / JSON) -> ...{noformat}
> The goal here is to increase the overall efficiency of this operation for 
> extremely large Parquet files (hundreds of GBs). With the second approach, it 
> could leverage multi-threading for processing a single file.
> SplitParquet processor should also have a property (true/false) to write 
> zero-content flow files. The existing FetchParquet processor should be 
> enhanced to accept the flow file attributes for giving offsets. It'd give 
> something like
> {noformat}
> X -> SplitParquet -> FetchParquet (JSON Writer) -> ...{noformat}
> This way, a load balanced connection could be used between SplitParquet and 
> FetchParquet in order to distribute the work across the nodes (without 
> transferring a lot of data across the nodes of the cluster).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NIFI-12241 Efficient Parquet Splitting [nifi]

2023-12-10 Thread via GitHub


asfgit closed pull request #7893: NIFI-12241 Efficient Parquet Splitting
URL: https://github.com/apache/nifi/pull/7893


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-11294 ConsumeAzureEventHub supports storing checkpoints in compo… [nifi]

2023-12-10 Thread via GitHub


turcsanyip commented on PR #8013:
URL: https://github.com/apache/nifi/pull/8013#issuecomment-1848993611

   Thanks for your review @exceptionfactory!
   
   Based on your suggestion, I extracted the conversion methods to a utility 
class and also reorganized the methods a bit.
   
   Also found a bug with the retry logic that has been fixed. With retry 
backoff, the retry calls are executed asynchronously and the caller would not 
wait the result. Backoff is not really needed (it does not help on the race 
condition) so I simply removed it.
   
   I have one more open question: cleaning up the obsolete items from the state 
when the user changes the flow and configures e.g. a new event hub or consumer 
group on the processor. In this case the old ownership and checkpoint data 
remains persisted in the state currently. Clearing the state is the user's 
responsibility now. On the other hand, it also means that the user has the 
option to go back to the original settings and continue with those checkpoints.
   In general, I would opt for cleaning up the state after configuration 
changes and storing only the current checkpoints (state size, no outdated 
items). Unfortunately, there is no trivial way to clear the state 
(`StateManager.clear()` cannot be used due to the concurrent access in the 
cluster) and it was implemented without clean-up in the original PR so I did 
not change it. However, now it looks feasible to me and it may be worth 
implementing the clean-up logic. 
   What is your opinion?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NIFI-12442) Adding support for RocksDB

2023-12-10 Thread Giovanni (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795080#comment-17795080
 ] 

Giovanni commented on NIFI-12442:
-

Dear David, 

As you suggested, I've implemented the DistributedMapCacheClient interface, 
while still keeping the old components within the project.

There's a pre-release version, feel free to try it.
When creating the services, you can add dynamic properties to add RocksOptions 
to the RocksDb instance. I suggest you to use the option: "setCreateIfMissing", 
"true", to create automatically the RocksDb instance if not existing. 

I'm up to other suggestions. Soon I'll release it to maven central.

Here's the link to the pre-release:
[Pre-release 
v1.0.1|https://github.com/kommpn/nifi-rocksdb-manager/releases/tag/v1.0.1]

> Adding support for RocksDB
> --
>
> Key: NIFI-12442
> URL: https://issues.apache.org/jira/browse/NIFI-12442
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.23.2
>Reporter: Giovanni
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I would like to suggest the creation of 3 new components:
> The first one is a service, which opens an existing RocksDb or, eventually, 
> using RocksOptions, create it from scratch. It will manage all the open 
> options (classic, read/write, only read, secondary).
> The second one, is a RocksDbReader, that uses the service to communicate with 
> the RocksDb in order to retrieve informations through a lookup. It can save 
> the searched content inside an attribute or inside the flowFile content. It 
> will be capable of using both APIs, such as simple "db.get" and via 
> RocksIterator.
> The last one, is a RocksDbWriter, that uses the same service as the reader, 
> but can write values inside the RocksDb, both from flowFile attribute or 
> flowFile content, using properties to determine the key to use.
>  
> Feel free to express your opinions, if you think this will be useful or 
> useless.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (NIFI-12442) Adding support for RocksDB

2023-12-10 Thread Giovanni (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795080#comment-17795080
 ] 

Giovanni edited comment on NIFI-12442 at 12/10/23 1:59 PM:
---

Dear David, 

As you suggested, I've implemented the DistributedMapCacheClient interface, 
while still keeping the old components within the project.

There's a pre-release version, feel free to try it.
When creating the services, you can add dynamic properties to add RocksOptions 
to the RocksDb instance. I suggest you to use the option: "setCreateIfMissing", 
"true", to create automatically the RocksDb instance if not existing. 

I'm up to other suggestions. Soon I'll release it to maven central.

Here's the link to the pre-release with the NAR file in it:
[Pre-release 
v1.0.1|https://github.com/kommpn/nifi-rocksdb-manager/releases/tag/v1.0.1]


was (Author: JIRAUSER301883):
Dear David, 

As you suggested, I've implemented the DistributedMapCacheClient interface, 
while still keeping the old components within the project.

There's a pre-release version, feel free to try it.
When creating the services, you can add dynamic properties to add RocksOptions 
to the RocksDb instance. I suggest you to use the option: "setCreateIfMissing", 
"true", to create automatically the RocksDb instance if not existing. 

I'm up to other suggestions. Soon I'll release it to maven central.

Here's the link to the pre-release:
[Pre-release 
v1.0.1|https://github.com/kommpn/nifi-rocksdb-manager/releases/tag/v1.0.1]

> Adding support for RocksDB
> --
>
> Key: NIFI-12442
> URL: https://issues.apache.org/jira/browse/NIFI-12442
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.23.2
>Reporter: Giovanni
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I would like to suggest the creation of 3 new components:
> The first one is a service, which opens an existing RocksDb or, eventually, 
> using RocksOptions, create it from scratch. It will manage all the open 
> options (classic, read/write, only read, secondary).
> The second one, is a RocksDbReader, that uses the service to communicate with 
> the RocksDb in order to retrieve informations through a lookup. It can save 
> the searched content inside an attribute or inside the flowFile content. It 
> will be capable of using both APIs, such as simple "db.get" and via 
> RocksIterator.
> The last one, is a RocksDbWriter, that uses the same service as the reader, 
> but can write values inside the RocksDb, both from flowFile attribute or 
> flowFile content, using properties to determine the key to use.
>  
> Feel free to express your opinions, if you think this will be useful or 
> useless.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)