Re: Dataframe.fillna from 1.3.0

Olivier Girardot Thu, 23 Apr 2015 22:36:07 -0700

I'll try thanks

Le ven. 24 avr. 2015 à 00:09, Reynold Xin <r...@databricks.com> a écrit :


> You can do it similar to the way countDistinct is done, can't you?
>
>
> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78
>
>
>
> On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> I found another way setting a SPARK_HOME on a released version and
>> launching an ipython to load the contexts.
>> I may need your insight however, I found why it hasn't been done at the
>> same time, this method (like some others) uses a varargs in Scala and for
>> now the way functions are called only one parameter is supported.
>>
>> So at first I tried to just generalise the helper function "_" in the
>> functions.py file to multiple arguments, but py4j's handling of varargs
>> forces me to create an Array[Column] if the target method is expecting
>> varargs.
>>
>> But from Python's perspective, we have no idea of whether the target
>> method will be expecting varargs or just multiple arguments (to un-tuple).
>> I can create a special case for "coalesce" or "for method that takes of
>> list of columns as arguments" considering they will be varargs based (and
>> therefore needs an Array[Column] instead of just a list of arguments)
>>
>> But this seems very specific and very prone to future mistakes.
>> Is there any way in Py4j to know before calling it the signature of a
>> method ?
>>
>>
>> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>>> What is the way of testing/building the pyspark part of Spark ?
>>>
>>> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> a écrit :
>>>
>>>> yep :) I'll open the jira when I've got the time.
>>>> Thanks
>>>>
>>>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin <r...@databricks.com> a
>>>> écrit :
>>>>
>>>>> Ah damn. We need to add it to the Python list. Would you like to give
>>>>> it a shot?
>>>>>
>>>>>
>>>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>>>> o.girar...@lateral-thoughts.com> wrote:
>>>>>
>>>>>> Yep no problem, but I can't seem to find the coalesce fonction in
>>>>>> pyspark.sql.{*, functions, types or whatever :) }
>>>>>>
>>>>>> Olivier.
>>>>>>
>>>>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>>>>>> o.girar...@lateral-thoughts.com> a écrit :
>>>>>>
>>>>>> > a UDF might be a good idea no ?
>>>>>> >
>>>>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>>>>> > o.girar...@lateral-thoughts.com> a écrit :
>>>>>> >
>>>>>> >> Hi everyone,
>>>>>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the
>>>>>> *fillna* API
>>>>>> >> in PySpark, is there any efficient alternative to mapping the
>>>>>> records
>>>>>> >> myself ?
>>>>>> >>
>>>>>> >> Regards,
>>>>>> >>
>>>>>> >> Olivier.
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>

Re: Dataframe.fillna from 1.3.0

Reply via email to