alamb commented on code in PR #7141:
URL: https://github.com/apache/arrow-datafusion/pull/7141#discussion_r1281147777
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -804,21 +804,25 @@ impl TableProvider for ListingTable {
.await?;
let file_groups = file_list_stream.try_collect::<Vec<_>>().await?;
-
- if file_groups.len() > 1 {
- return Err(DataFusionError::Plan(
- "Datafusion currently supports tables from single partition
and/or file."
- .to_owned(),
- ));
+ let writer_mode;
+ //if we are writing a single output_partition to a table backed by a
single file
+ //we can append to that file. Otherwise, we can write new files into
the directory
+ //adding new files to the listing table in order to insert to the
table.
+ let input_partitions = input.output_partitioning().partition_count();
+ if file_groups.len() == 1 && input_partitions == 1 {
Review Comment:
> Are you envisioningListingTableWriteOptions as part of the ListingOptions
struct (i.e. a property of the registered table itself)? So the user would do
something like:
I guess I was thinking there could be a way to register the initil table
with WriteOptions for when it was written to via `INSERT ...` type queries.
However, I think the more interesting usecase i my mind is passing the
options at write time, as you show
```rust
df.write_table("table", WriteOptions::new()...)
```
I am not quite sure how the code would look and how to make a nice API that
separates the two concerns (the actual act of writing / passing the write
options) and the registered table.
It seems like there should also be a way to write directly to a target
output without having to register a table provider. I am sure there is a way we
just need to come up with a clever API to dos
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -804,21 +804,25 @@ impl TableProvider for ListingTable {
.await?;
let file_groups = file_list_stream.try_collect::<Vec<_>>().await?;
-
- if file_groups.len() > 1 {
- return Err(DataFusionError::Plan(
- "Datafusion currently supports tables from single partition
and/or file."
- .to_owned(),
- ));
+ let writer_mode;
+ //if we are writing a single output_partition to a table backed by a
single file
+ //we can append to that file. Otherwise, we can write new files into
the directory
+ //adding new files to the listing table in order to insert to the
table.
+ let input_partitions = input.output_partitioning().partition_count();
+ if file_groups.len() == 1 && input_partitions == 1 {
Review Comment:
> Are you envisioningListingTableWriteOptions as part of the ListingOptions
struct (i.e. a property of the registered table itself)? So the user would do
something like:
I guess I was thinking there could be a way to register the initil table
with WriteOptions for when it was written to via `INSERT ...` type queries.
However, I think the more interesting usecase i my mind is passing the
options at write time, as you show
```rust
df.write_table("table", WriteOptions::new()...)
```
I am not quite sure how the code would look and how to make a nice API that
separates the two concerns (the actual act of writing / passing the write
options) and the registered table.
It seems like there should also be a way to write directly to a target
output without having to register a table provider. I am sure there is a way we
just need to come up with a clever API to do so
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]