zeroshade opened a new pull request, #816:
URL: https://github.com/apache/arrow-go/pull/816

   ## Summary
   
   Fixes #804.
   
   JSON decoding in ` + "`arrow/array`" + ` silently corrupted integer values 
larger than 2^53 because Go's ` + "`encoding/json`" + ` decodes numeric tokens 
as ` + "`float64`" + ` by default, losing precision for very large integers. 
This PR makes ` + "`UseNumber`" + ` the unconditional default across ` + 
"`FromJSON`" + `, ` + "`RecordFromJSON`" + `, and ` + "`NewJSONReader`" + ` so 
full ` + "`int64`" + ` precision is preserved without requiring callers to opt 
in.
   
   ## Changes
   
   - **` + "`RecordBuilder`" + ` gains ` + "`UnmarshalOne`" + ` and ` + 
"`Unmarshal`" + `** so caller-configured ` + "`json.Decoder`" + ` options 
propagate through nested field decoding. ` + "`RecordFromJSON`" + ` now calls ` 
+ "`bldr.UnmarshalOne(dec)`" + ` instead of ` + "`dec.Decode(bldr)`" + `, 
avoiding the bytes-roundtrip through ` + "`json.Unmarshaler`" + ` that 
previously dropped the outer decoder configuration (root cause noted in 
[comment](https://github.com/apache/arrow-go/issues/804#issuecomment-4454244067)).
   
   - **All builder ` + "`UnmarshalJSON([]byte)`" + ` methods now enable ` + 
"`UseNumber`" + `** on their internal decoder so large integers also survive 
the standard ` + "`json.Unmarshal(data, &builder)`" + ` path.
   
   - **String-encoded integers now work for time-based types.** ` + 
"`Duration`" + `, ` + "`Timestamp`" + `, ` + "`Time32`" + `, ` + "`Time64`" + 
`, ` + "`Date32`" + `, and ` + "`Date64`" + ` builders accept stringified 
integers (e.g. ` + "`\"9223372036854775807\"`" + `) as a fallback before their 
format-specific parsers — addressing the second snippet from the issue. This 
matches how ` + "`Int64Builder`" + ` and the decimal builders already behaved.
   
   - **` + "`WithUseNumber`" + ` and ` + "`WithUseNumberJSONReader`" + ` are 
deprecated** (marked with ` + "`Deprecated:`" + ` per pkg.go.dev convention) 
since ` + "`UseNumber`" + ` is now always enabled. Both options remain 
functional no-ops for backward compatibility.
   
   ## Test Plan
   
   15 new tests added in ` + "`arrow/array/util_test.go`" + ` and ` + 
"`arrow/array/json_reader_test.go`" + ` covering:
   
   - ` + "`RecordFromJSON`" + ` and ` + "`JSONReader`" + ` round-tripping ` + 
"`MaxInt64`" + `/` + "`MinInt64`" + ` with and without the deprecated options.
   - ` + "`RecordBuilder.UnmarshalJSON`" + ` and ` + "`UnmarshalOne`" + ` 
directly preserving large integers.
   - Stringified integer round-trip for ` + "`Duration`" + `, ` + "`Timestamp`" 
+ `, ` + "`Time32`" + `, ` + "`Time64`" + `, ` + "`Date32`" + `, ` + "`Date64`" 
+ `.
   - Regression: invalid duration strings (` + "`\"abc\"`" + `) still error; 
format-style duration strings (` + "`\"3h2m0.5s\"`" + `) still parse.
   
   Verified passing:
   - ` + "`./arrow/array/...`" + ` (full suite)
   - ` + "`./arrow/compute/...`" + ` (largest blast-radius consumer of ` + 
"`WithUseNumber`" + ` — 49 test sites)
   - ` + "`./arrow/ipc/... ./arrow/scalar/... ./arrow/extensions/... 
./arrow/encoded/... ./arrow/util/...`" + `
   - ` + "`./parquet/file/...`" + ` (the ` + "`./parquet/pqarrow/...`" + ` 
package fails identically on ` + "`main`" + ` due to a missing ` + 
"`PARQUET_TEST_DATA`" + ` env var — pre-existing, unrelated)
   
   All ` + "`lsp_diagnostics`" + ` clean across the 23 changed files.
   
   ## Behavior Change Notes
   
   The only observable behavior change for existing callers is: large integer 
values that previously decoded to a wrong ` + "`int64`" + ` (via float64 
round-trip) now decode to the correct value. Existing tests in ` + 
"`arrow/compute`" + ` continue to pass, including those that explicitly opt in 
via ` + "`WithUseNumber`" + ` and the ones that don't. No production callers in 
this repo break (verified across ` + "`flight`" + `, ` + "`compute`" + `, ` + 
"`parquet`" + `, ` + "`ipc`" + `).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to