cboettig commented on issue #38724:
URL: https://github.com/apache/arrow/issues/38724#issuecomment-1817590796
Thanks @amoeba ! Agree that consistency across bindings would be great.
Good question about breaking changes. The two cases I would think of is if the
same column name already exists in the parquet itself, or the user is manually
adding the additional column. At least in the R interface neither of these are
breaking changes, (since arrow is already comfortable opening a partitioned
dataset that also has those columns hardcoded into the parquet/csv, and since
re-constructing the partition columns with `mutate(path=add_filename(), col1 =
str_extract(...)` would still work).
But always a chance of breaking changes not directly related to this (code
that assumes a certain number/order of columns in the current behavior), so I'd
be happy if this was an opt-in argument to `hive_style` ... maybe? (Though the
existing documentation says:
> should partitioning be interpreted as Hive-style? Default is NA, which
means to inspect the file paths for Hive-style partitioning and behave
accordingly.
which is misleading, because in fact it only inspects sub-paths, not the
full path given to sources. Maybe something like `hive_style =
"relative_path"` or `"full_path"` could distinguish this behavior?
It may be worth considering making this the default behavior eventually.
On balance, there are probably more users who would already assume that arrow
would parse hive notation anywhere in the path than users who would explicitly
rely on the current behavior of only looking at the relative path that comes
after their given source path? (i.e. arguably, the current behavior feels more
like a bug relative to the documented behavior, rather than a missing
additional feature?).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]