Pyspark DataFrame.drop wrong type hints

2024-06-21 Thread Oliver Beagley
Hi there,



I believe I have found an error with the type hints for `DataFrame.drop` in
pyspark. The first overload at
https://github.com/apache/spark/blob/0bc38acc615ad411a97779c6a1ff43d4391c0c3d/python/pyspark/sql/dataframe.py#L5559-L5568
isn’t as a `*args` argument, and therefore doesn’t allow specifying
multiple `Columns` in `drop`. Additionally, according to the python docs
, type hints
on the non-overloaded definition should be ignored by a type checker and so
mixing specifying the `str` name of a field with `Column` expressions in
the final `drop` definition should not be used by a type checker. However,
the code in both the connect

and
classic

implementation doesn't appear to have any issues with mixing `Columns` with
`str`, nor specifying multiple `Column`s, so I think the overloads here are
unnecessary altogether and the final declaration is sufficient asis.


Is my understanding correct? Or is there something more that I'm missing as
to why that is typed like that?


Thanks,

Olly


Pyspark DataFrame.drop wrong type hints

2024-06-21 Thread Oliver Beagley
Hi there,



I believe I have found an error with the type hints for `DataFrame.drop` in
pyspark. The first overload at
https://github.com/apache/spark/blob/0bc38acc615ad411a97779c6a1ff43d4391c0c3d/python/pyspark/sql/dataframe.py#L5559-L5568
isn’t
as a `*args` argument, and therefore doesn’t allow specifying multiple
`Columns` in `drop`. Additionally, according to the python docs
, type hints
on the non-overloaded definition should be ignored by a type checker and so
mixing specifying the `str` name of a field with `Column` expressions in
the final `drop` definition should not be used by a type checker. However,
the code in both the connect

 and classic

implementation doesn't
appear to have any issues with mixing `Columns` with `str`, nor specifying
multiple `Column`s, so I think the overloads here are unnecessary
altogether and the final declaration is sufficient asis.


Is my understanding correct? Or is there something more that I'm missing as
to why that is typed like that?


Thanks,

Olly