dariuszseweryn commented on PR #10053:
URL: https://github.com/apache/nifi/pull/10053#issuecomment-3046541832

   Given the schema is widen on each next FlowFile produced by a batch – where 
split may happen multiple times in a single aggregated kinesis record – a 
subsequence number allows for better understanding on what exactly is in a 
FlowFile.
   
   The writer schema is determined by the first record in a batch. A 
batch/shard is not necessarily composed of messages sharing the same schema but 
we can test this scenario as an edge case which will rather use other strategy 
than schema inference. Even if they share the schema, inference may assume a 
number is a 32 bit integer if it fits in range on the first record — all 
subsequent messages may contain numbers that won't fit as the type is 64 bit 
integer in reality.
   
   Schema widening should happen only when schema is inferred — unfortunately 
we cannot identify this case programmatically without enhancing RecordSchema or 
RecordReader interfaces which would raise even more questions. Or maybe this 
behaviour should be guarded by a new property?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to