andygrove opened a new pull request, #29:
URL: https://github.com/apache/datafusion-java/pull/29

   ## Which issue does this PR close?
   
   No tracking issue — follow-up to #28.
   
   ## Rationale for this change
   
   PR #28 established protobuf-over-JNI as the transport for `SessionContext` 
configuration. This PR applies the same pattern to the CSV and Parquet read 
paths.
   
   Before this change, registering or reading a CSV file passed 14–16 raw JNI 
arguments — booleans, byte values, nullable-encoded as `xxx_set` / `xxx_value` 
pairs, `-1L` sentinels for "unset" longs, and `FileCompressionType` shipped as 
its `name()` string. Parquet had the same shape with 7–9 args.
   
   After: each call takes a single serialized `CsvReadOptionsProto` / 
`ParquetReadOptionsProto` byte array plus an optional Arrow-IPC schema byte 
array. Nullability, enums, and field evolution are now native to the wire 
format. The contributor guide documents the proto-over-JNI convention so future 
structured JNI calls follow the same pattern.
   
   ## What changes are included in this PR?
   
   - New `proto/csv_read_options.proto` and `proto/parquet_read_options.proto`, 
mirroring the structure of `session_options.proto`. `FileCompressionType` is a 
proto3 enum with prefixed values and a `_UNSPECIFIED = 0` sentinel.
   - `CsvReadOptions.toBytes()` and `ParquetReadOptions.toBytes()` serialize 
the Java options through the generated builders.
   - `with_csv_options` and `with_parquet_options` on the Rust side decode the 
proto via prost and fold the fields into DataFusion's option structs. The 
`Unspecified` compression arm returns an error rather than silently defaulting.
   - Four JNI methods collapse to 4 or 5 arguments each: `(handle, [name,] 
path, byte[] optionsBytes, byte[] schemaIpcBytesOrNull)`.
   - New `native/src/schema.rs::decode_optional_schema` replaces two copies of 
identical Arrow-IPC schema-decode logic.
   - Renamed Rust module `session_options` → `proto_gen` since the single 
generated file now contains the types for all three protos (they share `package 
datafusion_java;`).
   - New contributor-guide section `Passing structured options across the JNI 
boundary` documents the convention, including proto3 enum-prefix and 
`_UNSPECIFIED = 0` requirements.
   
   The public Java API is unchanged: every public setter on `CsvReadOptions` / 
`ParquetReadOptions` and every `register*` / `read*` method on `SessionContext` 
keeps the same signature.
   
   ## Are these changes tested?
   
   Yes:
   
   - `CsvReadOptionsTest` (4 tests) and `ParquetReadOptionsTest` (3 tests) 
round-trip through `toBytes()` / `Proto.parseFrom(...)`, verifying every field, 
default presence/absence, and all five `FileCompressionType` values.
   - The existing `SessionContextCsvTest` and 
`SessionContextParquetOptionsTest` continue to exercise the public API 
end-to-end through JNI without modification — strong evidence that the new wire 
format reaches the Rust side correctly.
   - Full `./mvnw test` passes (49 run, 0 failed, 12 skipped — skips are 
pre-existing tpch-data integration tests).
   - `cd native && cargo build && cargo clippy --all-targets -- -D warnings && 
cargo fmt --check` clean.
   
   ## Are there any user-facing changes?
   
   No. Public Java API is unchanged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to