zeroshade opened a new pull request, #816: URL: https://github.com/apache/arrow-go/pull/816
## Summary Fixes #804. JSON decoding in ` + "`arrow/array`" + ` silently corrupted integer values larger than 2^53 because Go's ` + "`encoding/json`" + ` decodes numeric tokens as ` + "`float64`" + ` by default, losing precision for very large integers. This PR makes ` + "`UseNumber`" + ` the unconditional default across ` + "`FromJSON`" + `, ` + "`RecordFromJSON`" + `, and ` + "`NewJSONReader`" + ` so full ` + "`int64`" + ` precision is preserved without requiring callers to opt in. ## Changes - **` + "`RecordBuilder`" + ` gains ` + "`UnmarshalOne`" + ` and ` + "`Unmarshal`" + `** so caller-configured ` + "`json.Decoder`" + ` options propagate through nested field decoding. ` + "`RecordFromJSON`" + ` now calls ` + "`bldr.UnmarshalOne(dec)`" + ` instead of ` + "`dec.Decode(bldr)`" + `, avoiding the bytes-roundtrip through ` + "`json.Unmarshaler`" + ` that previously dropped the outer decoder configuration (root cause noted in [comment](https://github.com/apache/arrow-go/issues/804#issuecomment-4454244067)). - **All builder ` + "`UnmarshalJSON([]byte)`" + ` methods now enable ` + "`UseNumber`" + `** on their internal decoder so large integers also survive the standard ` + "`json.Unmarshal(data, &builder)`" + ` path. - **String-encoded integers now work for time-based types.** ` + "`Duration`" + `, ` + "`Timestamp`" + `, ` + "`Time32`" + `, ` + "`Time64`" + `, ` + "`Date32`" + `, and ` + "`Date64`" + ` builders accept stringified integers (e.g. ` + "`\"9223372036854775807\"`" + `) as a fallback before their format-specific parsers — addressing the second snippet from the issue. This matches how ` + "`Int64Builder`" + ` and the decimal builders already behaved. - **` + "`WithUseNumber`" + ` and ` + "`WithUseNumberJSONReader`" + ` are deprecated** (marked with ` + "`Deprecated:`" + ` per pkg.go.dev convention) since ` + "`UseNumber`" + ` is now always enabled. Both options remain functional no-ops for backward compatibility. ## Test Plan 15 new tests added in ` + "`arrow/array/util_test.go`" + ` and ` + "`arrow/array/json_reader_test.go`" + ` covering: - ` + "`RecordFromJSON`" + ` and ` + "`JSONReader`" + ` round-tripping ` + "`MaxInt64`" + `/` + "`MinInt64`" + ` with and without the deprecated options. - ` + "`RecordBuilder.UnmarshalJSON`" + ` and ` + "`UnmarshalOne`" + ` directly preserving large integers. - Stringified integer round-trip for ` + "`Duration`" + `, ` + "`Timestamp`" + `, ` + "`Time32`" + `, ` + "`Time64`" + `, ` + "`Date32`" + `, ` + "`Date64`" + `. - Regression: invalid duration strings (` + "`\"abc\"`" + `) still error; format-style duration strings (` + "`\"3h2m0.5s\"`" + `) still parse. Verified passing: - ` + "`./arrow/array/...`" + ` (full suite) - ` + "`./arrow/compute/...`" + ` (largest blast-radius consumer of ` + "`WithUseNumber`" + ` — 49 test sites) - ` + "`./arrow/ipc/... ./arrow/scalar/... ./arrow/extensions/... ./arrow/encoded/... ./arrow/util/...`" + ` - ` + "`./parquet/file/...`" + ` (the ` + "`./parquet/pqarrow/...`" + ` package fails identically on ` + "`main`" + ` due to a missing ` + "`PARQUET_TEST_DATA`" + ` env var — pre-existing, unrelated) All ` + "`lsp_diagnostics`" + ` clean across the 23 changed files. ## Behavior Change Notes The only observable behavior change for existing callers is: large integer values that previously decoded to a wrong ` + "`int64`" + ` (via float64 round-trip) now decode to the correct value. Existing tests in ` + "`arrow/compute`" + ` continue to pass, including those that explicitly opt in via ` + "`WithUseNumber`" + ` and the ones that don't. No production callers in this repo break (verified across ` + "`flight`" + `, ` + "`compute`" + `, ` + "`parquet`" + `, ` + "`ipc`" + `). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
