jonded94 opened a new pull request, #47601:
URL: https://github.com/apache/arrow/pull/47601
### Rationale for this change
In Python, `pyarrow.Schema` before was not hashable when it has `metadata`
set.
```
>>> import pyarrow
>>> pyarrow.schema([], metadata={b"1": b"1"})
-- schema metadata --
1: '1'
>>> schema = pyarrow.schema([], metadata={b"1": b"1"})
>>> hash(schema)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/types.pxi", line 2921, in pyarrow.lib.Schema.__hash__
TypeError: unhashable type: 'dict'
```
This is because the metadata (which is a dict) was tried to be hashed as-is,
which doesn't work.
### What changes are included in this PR?
Slightly change how hashes are computed for Schema, by converting the
`dict[str, str]` to the frozenset of key- and value tuples.
For reference, this is faster than computing the hash of a sorted tuple of
key- and value tuples (https://stackoverflow.com/a/6014481/10070873).
### Are these changes tested?
Yes.
### Are there any user-facing changes?
Besides that `Schema` now correctly is hashable, no.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]