takraj commented on PR #7893:
URL: https://github.com/apache/nifi/pull/7893#issuecomment-1782683835

   @pvillard31 I've been working on further performance improvements in the 
last couple of days, and created a new variant of this processor, which 
calculates the offsets of the row group boundaries, and updated the reader to 
take these offsets and seek onto these positions in the input file. This is 
achieved with a special configuration option of the Parquet reader, called 
'File Range', that selects only row groups that overlap with the specified 
start and end offset range. Hopefully this brings major improvement.
   
   Could you share how did you extract the processing times from NiFi in your 
benchmarks? Did you simply take the 'Tasks/Time' indicator, or is there 
something else that I can monitor?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to