gengziyand opened a new pull request, #793:
URL: https://github.com/apache/tsfile/pull/793
## Summary
- Add Parquet and Arrow format support to the TsFile import tool
(alongside existing CSV)
- Add auto mode (schema-less) import for all three formats
- Unify schema naming: `id_columns` → `tag_columns`, `csv_columns` →
`source_columns` (backward compatible)
- Add `parquet2tsfile.sh/.bat` and `arrow2tsfile.sh/.bat` scripts
- Add CLI options: `--format`, `--table_name`, `--time_precision`,
`--separator`
## Changes
- New classes: `ParquetSourceReader`, `ArrowSourceReader`,
`AutoSchemaInferer`, `ImportExecutor`, `ImportSchema`, `ImportSchemaParser`,
`SourceReader`,
`SourceBatch`, `TabletBuilder`, `TimeConverter`, `ValueConverter`
- Modified: `TsFileTool.java` (unified CLI entry point),
`CsvSourceReader.java` (auto mode support)
- New dependencies: `parquet-hadoop 1.14.4`, `hadoop-common 3.3.6`,
`arrow-vector 15.0.2`
- 189 automated tests across 11 test classes
- Updated README.md and README-zh.md
## Test plan
- [x] 189 unit and E2E tests passing (JDK 8)
- [x] Smoke test on Linux JDK 17 (Docker) - 9 scenarios
- [x] Smoke test on Windows JDK 8 and JDK 17 - 9 scenarios
- [x] Data correctness verification - 39 checks (table name, TAG/FIELD,
cross-format consistency)
- [x] Boundary tests - 8 scenarios (tab separator, null values,
SKIP/DEFAULT, error handling, fail_dir)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]