markap14 commented on PR #7893: URL: https://github.com/apache/nifi/pull/7893#issuecomment-1768506240
Hey @takraj I don't know much of anything about Parquet so I'm probably not the best to really review this in terms of Parquet. But looking at what's happening here, the processor does not split Parquet at all. Instead, it clones the input and adds 'count' and 'offset' types of attributes. So the naming is problematic. If I sent in a 10 GB Parquet file to SplitParquet and I get out 10 FlowFiles, I expect each to be 1 GB. Here, each one will be 10 GB because it's a clone of the original. This would lead to a lot confusion. Perhaps a name like 'CalculateParquetOffsets' is appropriate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org