Re: Dataframe's .drop in PySpark doesn't accept Column

Olivier Girardot Fri, 29 May 2015 02:46:41 -0700

Actually, the Scala API too is only based on column name

Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :


> Hi,
> Testing a bit more 1.4, it seems that the .drop() method in PySpark
> doesn't seem to accept a Column as input datatype :
>
>
> *    .join(only_the_best, only_the_best.pol_no == df.pol_no,
> "inner").drop(only_the_best.pol_no)\* File
> "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line
> 1225, in drop
> jdf = self._jdf.drop(colName)
> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
> 523, in __call__
> (new_args, temp_args) = self._get_args(args)
> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
> 510, in _get_args
> temp_arg = converter.convert(arg, self.gateway_client)
> File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py",
> line 490, in convert
> for key in object.keys():
> TypeError: 'Column' object is not callable
>
> It doesn't seem very consistent with rest of the APIs - and is especially
> annoying when executing joins - because drop("my_key") is not a qualified
> reference to the column.
>
> What do you think about changing that ? or what is the best practice as a
> workaround ?
>
> Regards,
>
> Olivier.
>

Re: Dataframe's .drop in PySpark doesn't accept Column

Reply via email to