GitHub user v1gnesh closed a discussion: Import deeply nested Rust struct/enum
with custom types
In DuckDB, when I have a ndjson file, I can use `CREATE TABLE t AS SELECT *
FROM read_ndjson('file.ndjson');`.
I would avoid the interim ndjson step if I can, as the objective is to go from
deeply nested Rust struct/enums to the Arrow world, as transparently as
possible.
Currently, I'm serializing `BigStruct` to ndjson `String`, one at a time, and
then writing out the ndjson file (8x the size of the source file). Then, using
DuckDB's SQL above, I'm able to automagically get the data types back from
plaintext (JSON).
It is very desirable to go directly from Rust data types to inserting the
native struct data type of DataFusion/DuckDB, etc.
Note that I don't have a `Vec<BigStruct>`; `BigStruct`s are being produced in
an async Stream.
I understand this could be an ArrayOfStruct to StructOfArray 'problem', but I
don't have an ArrayOfStruct to begin with, as they are produced in a streaming
fashion (too many to keep it all in memory).
In addition to [this
example](https://github.com/duckdb/duckdb-rs/blob/main/src/types/serde_json.rs)
of writing JSON into DuckDB not working (it just writes the hex bytes in
decimal), I lose all type information (`read_ndjson` via the CLI recreates all
of it though), native support for Rust data types is a work in progress.
Do you think this is something that is good to have for DataFusion, and if so,
is it something in the works already?
Are there any examples I can look at?
Oh, and inferred schema would be best. The `BigStruct`s are quite big, and
conceal a whole lot of variations. It would be a nightmare to write the schema
for all of them.
Thanks in advance.
GitHub link: https://github.com/apache/datafusion/discussions/7484
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]