jorisvandenbossche commented on code in PR #12530:
URL: https://github.com/apache/arrow/pull/12530#discussion_r841886153
##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -569,6 +570,22 @@ def test_partitioning():
with pytest.raises(pa.ArrowInvalid):
partitioning.parse(shouldfail)
+ partitioning = ds.FilenamePartitioning(
+ pa.schema([
+ pa.field('group', pa.int64()),
+ pa.field('key', pa.float64())
+ ])
+ )
+ assert partitioning.dictionaries is None
Review Comment:
Ah, so it's for the case where only a subset of your fields would be
dictionary encoded, I see.
Now, in that case returning a list with shorter length can also a bit
confusing: to get the dictionaries for a certain partition key, you would need
to count how many dictionary encoded columns are present in the schema before
the specific key you are looking for.
Another option could be to always return a list of the same length as the
number of fields in the schema, but then with `None` entries for keys that are
not dictionary encoded? (eg `[None, pa.array(["first", "second", "third"])]`
for the specific case in the tests)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]