Hi, Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't seem to accept a Column as input datatype :
* .join(only_the_best, only_the_best.pol_no == df.pol_no, "inner").drop(only_the_best.pol_no)\* File "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 1225, in drop jdf = self._jdf.drop(colName) File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 523, in __call__ (new_args, temp_args) = self._get_args(args) File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 510, in _get_args temp_arg = converter.convert(arg, self.gateway_client) File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py", line 490, in convert for key in object.keys(): TypeError: 'Column' object is not callable It doesn't seem very consistent with rest of the APIs - and is especially annoying when executing joins - because drop("my_key") is not a qualified reference to the column. What do you think about changing that ? or what is the best practice as a workaround ? Regards, Olivier.