LantaoJin opened a new pull request, #21:
URL: https://github.com/apache/datafusion-java/pull/21
## Summary
Mirror the parquet pattern (#18, #19): add `CsvReadOptions` builder, four
`SessionContext` methods (`registerCsv` x2, `readCsv` x2), and JNI plumbing in
native/src/csv.rs. Builder exposes the subset of DataFusion's Rust
`CsvReadOptions` that has a parquet analog already on the Java side -- header,
delimiter, quote, escape, terminator, comment, newlinesInValues,
schemaInferMaxRecords, fileExtension, fileCompressionType, and explicit Arrow
schema.
Tests cover: option-builder fluent API, header-inferred schema with SQL
round-trip, explicit-schema header-less file with custom delimiter, and a
custom file extension with a tab delimiter.
`table_partition_cols`, `file_sort_order`, `null_regex` and `truncated_rows`
are intentionally deferred -- they have no parquet-side counterpart yet on the
Java side.
## Changes
- Add `CsvReadOptions` builder mirroring the subset of Rust
`datafusion::prelude::CsvReadOptions` that has a Parquet analog already
exposed in this repo: header, delimiter, quote, escape, terminator,
comment, newlinesInValues, schemaInferMaxRecords, fileExtension,
fileCompressionType, and explicit Arrow `schema`.
- Add `SessionContext.registerCsv(...)` / `readCsv(...)` overloads,
matching the shape of `registerParquet` / `readParquet`.
- Wire native methods through `native/src/lib.rs`, sharing the
schema-IPC-serialization helper already used by the Parquet path.
`tablePartitionCols`, `fileSortOrder`, `nullRegex`, and `truncatedRows` are
intentionally deferred to a follow-up; they have no Parquet-side counterpart
yet on the Java side and were called out in the issue as out-of-scope.
## Test plan
- [x] `./mvnw test` — new `CsvReadOptionsTest` (option-builder) passes.
- [x] `./mvnw test` — new `SessionContextCsvTest` passes:
- `registerCsv` + SQL round-trip on an inferred-schema file
- `readCsv` with explicit Arrow schema and `hasHeader(false)`
- [x] `make test` — full native + JVM build is green.
- [x] `cargo fmt` and `cargo clippy --all-targets -- -D warnings` clean
under `native/`.
- [x] `./mvnw spotless:apply` clean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]