[PR] feat(json): expose NdJsonReadOptions via registerJson and readJson [datafusion-java]

via GitHub Wed, 13 May 2026 19:52:42 -0700


LantaoJin opened a new pull request, #47:
URL: https://github.com/apache/datafusion-java/pull/47


   ## Which issue does this PR close?
   
   Closes #35
   
   ## Rationale for this change
   
   DataFusion 53.x supports newline-delimited JSON via 
`SessionContext::read_json` / `register_json`, but the Java bindings only 
expose Parquet and CSV readers today. Users with NDJSON input have to fall back 
to `CREATE EXTERNAL TABLE … STORED AS JSON` through `SessionContext.sql`, which 
works but loses the typed-builder ergonomics the parquet/CSV bindings already 
provide. Issue #35 tracks closing that gap; this PR is the implementation.
   
   ## What changes are included in this PR?
   
   - `proto/json_read_options.proto` — new `NdJsonReadOptionsProto` message. 
Reuses `FileCompressionType` from `csv_read_options.proto` (CSV and JSON accept 
the same compression set in DataFusion).
   - `NdJsonReadOptions` Java builder with `fileExtension`, 
`fileCompressionType`, `schemaInferMaxRecords`, and an explicit Arrow 
`schema(Schema)`. Defaults match the Rust struct (`.json`, `UNCOMPRESSED`, 
infer from data).
   - `SessionContext.registerJson(name, path[, options])` and `readJson(path[, 
options])` overloads, structurally identical to the parquet/CSV entry points 
(Java builds the proto, JNI hands a `byte[]` to native).
   - `native/src/json.rs` — JNI module that decodes `NdJsonReadOptionsProto`, 
constructs the upstream `JsonReadOptions`, and forwards to `register_json` / 
`read_json`. Imports `prelude::JsonReadOptions` rather than the deprecated 
`NdJsonReadOptions` alias; the user-facing Java/proto name still matches the 
issue ask.
   
   Out of scope (kept for follow-ups so each PR stays small):
   - `tablePartitionCols`, `fileSortOrder` — neither parquet nor CSV exposes 
these in the Java surface today; adding them only for JSON would diverge.
   - `newline_delimited` — DataFusion 53.x exposes the knob, but the JSON-array 
reader path is not yet stable upstream. Both the issue title and the Rust API 
name (`NdJson`) imply newline-delimited.
   - AVRO source — separate issue.
   
   ## Are these changes tested?
   
   Yes.
   
   - `NdJsonReadOptionsTest` (4 tests):
     - defaults round-trip through proto,
     - fully-configured options round-trip through proto,
     - `schema(Schema)` is held by reference and not embedded in proto bytes,
     - sweep over every `FileCompressionType` variant.
   - `SessionContextJsonTest` (3 tests):
     - `registerJson` + SQL `COUNT(*)` and projection on an inferred-schema
       NDJSON file,
     - `readJson` with an explicit Arrow schema,
     - `registerJson` with a custom `.ndjson` file extension.
   - `make test` is green: 68 tests, 0 failures, 0 errors. The 12 skipped
     cases are pre-existing parquet/TPC-H data-dependent tests unaffected
     by this PR.
   - `cargo clippy --all-targets -- -D warnings`, `cargo fmt -- --check`,
     and `./mvnw spotless:apply` are all clean.
   
   ## Are there any user-facing changes?
   
   Yes — purely additive. New public API:
   
   - `org.apache.datafusion.NdJsonReadOptions`
   - `SessionContext.registerJson(String, String)`
   - `SessionContext.registerJson(String, String, NdJsonReadOptions)`
   - `SessionContext.readJson(String) → DataFrame`
   - `SessionContext.readJson(String, NdJsonReadOptions) → DataFrame`
   
   No existing API changes; no deprecations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat(json): expose NdJsonReadOptions via registerJson and readJson [datafusion-java]

Reply via email to