scovich commented on PR #8006: URL: https://github.com/apache/arrow-rs/pull/8006#issuecomment-3152638873
> this PR has a lot of comments and quite a few refinements were needed. I also know it was way too big. Honestly, I think the "state machine" for avro decoding is just really complex. It felt like a lot of the questions and churn ultimately come from that underlying complexity. And my lack of familiarity with the avro spec. Not sure how easily those issues could be avoided merely by raising a smaller PR? > To move forward, would it be best to simply focus on getting the `schema.rs` changes in first (~500-600 LOC) and then follow-up with a `arrow-avro/src/reader/mod.rs` PR? Interesting. By "schema.rs changes" you mean adding the new `SchemaStore` and fingerprinting infrastructure? With unit tests but not yet integrated into the actual decoder? That does seem like a good idea. I don't think there's any outstanding controversy in that part of the code (just a couple of follow-on items)? > I do apologize for both the size of this PR and the quality issues. This is _NOT_ the kind of run of the mill logic that I would normally associate with "quality issues." For me, at least, this has been an invigorating PR to review -- not a frustrating one. Some hard core software engineering here with depth and subtlety to work through. > Let me know what I should do and I'll get on it immediately. Now that we know more, the split you propose seems quite reasonable; I'm not sure it was obvious two weeks ago tho? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org