alamb commented on PR #7783: URL: https://github.com/apache/arrow-rs/pull/7783#issuecomment-3025152962
> The `arbitrary_precision` feature also gets propagated and causes `test_serialize_decimal` and `test_serialize_timestamp` to fail. I tried fixing it for a while with no success. Let me know what you think @alamb I played around with this for a while and I concluded that if we use the serde_json parser we will potentially lose precision as internally it converts the JSON strings to `f64`. I suspect this is what lead you down the `arbitrary_precision` hole I think the only way we'll be able to properly handle / round trip values in JSON will be to use our own parser rather than than the serde_json one. Conveniently we already have such a think in arrow-json (the tape decoder that @scovich has played around with https://github.com/apache/arrow-rs/pull/7403) What I suggest is that we do this incrementally: 1. Implement basic JSON decoding (this PR) merged without proper decimal handling 2. Mark any failing tests as `#[ignore]` and file a ticket to fix the json decoding in a follow on PR 3. Merge this one in So that would mean in this PR, remove the added dependencies on serde_json and Rust decimal and do the best we can to convert from serde_json::Value --> Variant. A bunch of tests will fail that we can mark as `#[ignore]` and leave a reference to the ticket to fix. While maybe not as satisfying as getting it all working at first, I think it will unblock other development What I am thinking we can do is 1. Merge this PR 2. Move all the JSON handling into `parquet-variant-json` 3. Potentially use / reuse the tape decoder API from `parquet-variant-json` Does this make sense @harshmotw-db ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org