Fokko opened a new issue, #45557:
URL: https://github.com/apache/arrow/issues/45557
### Describe the enhancement requested
Consider the following code:
```python
Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0
(clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>>
>>> arrow_schema = pa.schema(
... [
... pa.field("city", pa.string(), nullable=False),
... pa.field("population", pa.int32(), nullable=False),
... ]
... )
>>>
>>> # Write some data
>>> df = pa.Table.from_pylist(
... [
... {"city": "Amsterdam", "population": 921402},
... {"city": "San Francisco", "population": 808988},
... ],
... schema=arrow_schema,
... )
>>>
>>> joined = df.join(df, "city", join_type="inner")
>>>
>>> joined
pyarrow.Table
city: string
population: int32
population: int32
----
city: [["Amsterdam","San Francisco"]]
population: [[921402,808988]]
population: [[921402,808988]]
>>> df
pyarrow.Table
city: string not null
population: int32 not null
----
city: [["Amsterdam","San Francisco"]]
population: [[921402,808988]]
```
We do an inner join of two `not null` fields, but the output is nullable.
Since we know that with the inner join no nulls can be added, and if both sides
are not-null, we can set the output as not null too.
I would be happy to see if I can add this with some pointers to the relevant
code.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]