takraj commented on PR #7893: URL: https://github.com/apache/nifi/pull/7893#issuecomment-1782683835
@pvillard31 I've been working on further performance improvements in the last couple of days, and created a new variant of this processor, which calculates the offsets of the row group boundaries, and updated the reader to take these offsets and seek onto these positions in the input file. This is achieved with a special configuration option of the Parquet reader, called 'File Range', that selects only row groups that overlap with the specified start and end offset range. Hopefully this brings major improvement. Could you share how did you extract the processing times from NiFi in your benchmarks? Did you simply take the 'Tasks/Time' indicator, or is there something else that I can monitor? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org