scovich commented on code in PR #9097:
URL: https://github.com/apache/arrow-rs/pull/9097#discussion_r2670273139
##########
arrow-json/src/reader/tape.rs:
##########
@@ -237,8 +238,21 @@ enum DecoderState {
///
/// Consists of `(literal, decoded length)`
Literal(Literal, u8),
+ /// Skipping a value (for unprojected fields)
+ ///
+ /// Consists of:
+ /// - `depth`: Nesting level of objects/arrays being skipped (u32)
+ /// - `flags`: Bit-packed flags (in_string: bit 0, escape: bit 1)
+ SkipValue {
+ depth: u32,
+ flags: u8,
+ },
}
+// Bit flags for SkipValue state
+const SKIP_IN_STRING: u8 = 1 << 0; // 0x01
+const SKIP_ESCAPE: u8 = 1 << 1; // 0x02
Review Comment:
I threw an LLM at this whole situation during a boring meeting, and arrived
at a surprisingly different potential approach, if you're game to try it out?
The short version is:
* Keep the existing (highly optimized and efficient) decoding logic, but
factor it out to a helper method that is generic over `const SKIP: bool` that
says whether to actually store the parsed output.
* Wrap that helper in `decode` and `decode_skip` methods, with a clean
transition between the two: enter at the `:` match (like the PR does today),
and `decode_skip` breaks back out to `decode` when the stack length drops back
down.
* We need a new boolean `skipping` field to handle cases where input bytes
were exhausted while skipping (so the next call to `decode` can jump straight
to `decode_skip` when starting the next buffer of bytes)
* The state stack tracks everything related to skipping (small memory cost
but very efficient).
* No new tape decoder enum variants needed.
In theory, the approach should be simpler (less duplicated source code)
while also having friendlier branching (fewer and/or more predictable
branches).
Is that something you'd want me to put a bit more time into exploring
further?
Or something you'd prefer to dig into yourself?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]