scovich commented on PR #7092:
URL: https://github.com/apache/arrow-rs/pull/7092#issuecomment-2648951247

   > This makes sense to me, my understanding being this allows deserializing 
`StringArray` one value at a time, ensuring records are not split across value 
boundaries.
   
   That's a good description of what I hoped to achieve, yes.
   
   > Whilst this probably has some additional overheads, I'd be curious to see 
these quantified e.g. compared to the approach of not checking, I suspect these 
are low relative to the inherent costs of JSON decoding, and such an approach 
still benefits from the vectorised tape->array conversion.
   
   In the common case where all strings contain correct JSON, the check should 
be branch-predicted away. It's ultimately just checking two variables that 
should already be hot in CPU cache, if not in registers, and both branches 
should be not-taken almost always. 
   
   In any case tho -- this enables the user of a `Decoder` to express 
correctness constraints they care about, and the small performance overhead 
would be totally acceptable. The change doesn't impact normal parsing at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to