dariuszseweryn commented on PR #10053: URL: https://github.com/apache/nifi/pull/10053#issuecomment-3046541832
Given the schema is widen on each next FlowFile produced by a batch – where split may happen multiple times in a single aggregated kinesis record – a subsequence number allows for better understanding on what exactly is in a FlowFile. The writer schema is determined by the first record in a batch. A batch/shard is not necessarily composed of messages sharing the same schema but we can test this scenario as an edge case which will rather use other strategy than schema inference. Even if they share the schema, inference may assume a number is a 32 bit integer if it fits in range on the first record — all subsequent messages may contain numbers that won't fit as the type is 64 bit integer in reality. Schema widening should happen only when schema is inferred — unfortunately we cannot identify this case programmatically without enhancing RecordSchema or RecordReader interfaces which would raise even more questions. Or maybe this behaviour should be guarded by a new property? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
