Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Reynold Xin
The changes look good to me. Jenkins is somehow not responding. Will merge
once Jenkins comes back happy.


On Fri, Apr 24, 2015 at 2:38 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> done : https://github.com/apache/spark/pull/5683 and
> https://issues.apache.org/jira/browse/SPARK-7118
> thx
>
> Le ven. 24 avr. 2015 à 07:34, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> I'll try thanks
>>
>> Le ven. 24 avr. 2015 à 00:09, Reynold Xin  a écrit :
>>
>>> You can do it similar to the way countDistinct is done, can't you?
>>>
>>>
>>> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78
>>>
>>>
>>>
>>> On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
 I found another way setting a SPARK_HOME on a released version and
 launching an ipython to load the contexts.
 I may need your insight however, I found why it hasn't been done at the
 same time, this method (like some others) uses a varargs in Scala and for
 now the way functions are called only one parameter is supported.

 So at first I tried to just generalise the helper function "_" in the
 functions.py file to multiple arguments, but py4j's handling of varargs
 forces me to create an Array[Column] if the target method is expecting
 varargs.

 But from Python's perspective, we have no idea of whether the target
 method will be expecting varargs or just multiple arguments (to un-tuple).
 I can create a special case for "coalesce" or "for method that takes of
 list of columns as arguments" considering they will be varargs based (and
 therefore needs an Array[Column] instead of just a list of arguments)

 But this seems very specific and very prone to future mistakes.
 Is there any way in Py4j to know before calling it the signature of a
 method ?


 Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
 o.girar...@lateral-thoughts.com> a écrit :

> What is the way of testing/building the pyspark part of Spark ?
>
> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> yep :) I'll open the jira when I've got the time.
>> Thanks
>>
>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a
>> écrit :
>>
>>> Ah damn. We need to add it to the Python list. Would you like to
>>> give it a shot?
>>>
>>>
>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
 Yep no problem, but I can't seem to find the coalesce fonction in
 pyspark.sql.{*, functions, types or whatever :) }

 Olivier.

 Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
 o.girar...@lateral-thoughts.com> a écrit :

 > a UDF might be a good idea no ?
 >
 > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
 > o.girar...@lateral-thoughts.com> a écrit :
 >
 >> Hi everyone,
 >> let's assume I'm stuck in 1.3.0, how can I benefit from the
 *fillna* API
 >> in PySpark, is there any efficient alternative to mapping the
 records
 >> myself ?
 >>
 >> Regards,
 >>
 >> Olivier.
 >>
 >

>>>
>>>
>>>


Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Olivier Girardot
done : https://github.com/apache/spark/pull/5683 and
https://issues.apache.org/jira/browse/SPARK-7118
thx

Le ven. 24 avr. 2015 à 07:34, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> I'll try thanks
>
> Le ven. 24 avr. 2015 à 00:09, Reynold Xin  a écrit :
>
>> You can do it similar to the way countDistinct is done, can't you?
>>
>>
>> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78
>>
>>
>>
>> On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> wrote:
>>
>>> I found another way setting a SPARK_HOME on a released version and
>>> launching an ipython to load the contexts.
>>> I may need your insight however, I found why it hasn't been done at the
>>> same time, this method (like some others) uses a varargs in Scala and for
>>> now the way functions are called only one parameter is supported.
>>>
>>> So at first I tried to just generalise the helper function "_" in the
>>> functions.py file to multiple arguments, but py4j's handling of varargs
>>> forces me to create an Array[Column] if the target method is expecting
>>> varargs.
>>>
>>> But from Python's perspective, we have no idea of whether the target
>>> method will be expecting varargs or just multiple arguments (to un-tuple).
>>> I can create a special case for "coalesce" or "for method that takes of
>>> list of columns as arguments" considering they will be varargs based (and
>>> therefore needs an Array[Column] instead of just a list of arguments)
>>>
>>> But this seems very specific and very prone to future mistakes.
>>> Is there any way in Py4j to know before calling it the signature of a
>>> method ?
>>>
>>>
>>> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> a écrit :
>>>
 What is the way of testing/building the pyspark part of Spark ?

 Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
 o.girar...@lateral-thoughts.com> a écrit :

> yep :) I'll open the jira when I've got the time.
> Thanks
>
> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a
> écrit :
>
>> Ah damn. We need to add it to the Python list. Would you like to give
>> it a shot?
>>
>>
>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> wrote:
>>
>>> Yep no problem, but I can't seem to find the coalesce fonction in
>>> pyspark.sql.{*, functions, types or whatever :) }
>>>
>>> Olivier.
>>>
>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> a écrit :
>>>
>>> > a UDF might be a good idea no ?
>>> >
>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>> > o.girar...@lateral-thoughts.com> a écrit :
>>> >
>>> >> Hi everyone,
>>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the
>>> *fillna* API
>>> >> in PySpark, is there any efficient alternative to mapping the
>>> records
>>> >> myself ?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Olivier.
>>> >>
>>> >
>>>
>>
>>
>>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
I'll try thanks

Le ven. 24 avr. 2015 à 00:09, Reynold Xin  a écrit :

> You can do it similar to the way countDistinct is done, can't you?
>
>
> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78
>
>
>
> On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> I found another way setting a SPARK_HOME on a released version and
>> launching an ipython to load the contexts.
>> I may need your insight however, I found why it hasn't been done at the
>> same time, this method (like some others) uses a varargs in Scala and for
>> now the way functions are called only one parameter is supported.
>>
>> So at first I tried to just generalise the helper function "_" in the
>> functions.py file to multiple arguments, but py4j's handling of varargs
>> forces me to create an Array[Column] if the target method is expecting
>> varargs.
>>
>> But from Python's perspective, we have no idea of whether the target
>> method will be expecting varargs or just multiple arguments (to un-tuple).
>> I can create a special case for "coalesce" or "for method that takes of
>> list of columns as arguments" considering they will be varargs based (and
>> therefore needs an Array[Column] instead of just a list of arguments)
>>
>> But this seems very specific and very prone to future mistakes.
>> Is there any way in Py4j to know before calling it the signature of a
>> method ?
>>
>>
>> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>>> What is the way of testing/building the pyspark part of Spark ?
>>>
>>> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> a écrit :
>>>
 yep :) I'll open the jira when I've got the time.
 Thanks

 Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a
 écrit :

> Ah damn. We need to add it to the Python list. Would you like to give
> it a shot?
>
>
> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Yep no problem, but I can't seem to find the coalesce fonction in
>> pyspark.sql.{*, functions, types or whatever :) }
>>
>> Olivier.
>>
>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>> > a UDF might be a good idea no ?
>> >
>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>> > o.girar...@lateral-thoughts.com> a écrit :
>> >
>> >> Hi everyone,
>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the
>> *fillna* API
>> >> in PySpark, is there any efficient alternative to mapping the
>> records
>> >> myself ?
>> >>
>> >> Regards,
>> >>
>> >> Olivier.
>> >>
>> >
>>
>
>
>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Reynold Xin
You can do it similar to the way countDistinct is done, can't you?

https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78



On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> I found another way setting a SPARK_HOME on a released version and
> launching an ipython to load the contexts.
> I may need your insight however, I found why it hasn't been done at the
> same time, this method (like some others) uses a varargs in Scala and for
> now the way functions are called only one parameter is supported.
>
> So at first I tried to just generalise the helper function "_" in the
> functions.py file to multiple arguments, but py4j's handling of varargs
> forces me to create an Array[Column] if the target method is expecting
> varargs.
>
> But from Python's perspective, we have no idea of whether the target
> method will be expecting varargs or just multiple arguments (to un-tuple).
> I can create a special case for "coalesce" or "for method that takes of
> list of columns as arguments" considering they will be varargs based (and
> therefore needs an Array[Column] instead of just a list of arguments)
>
> But this seems very specific and very prone to future mistakes.
> Is there any way in Py4j to know before calling it the signature of a
> method ?
>
>
> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> What is the way of testing/building the pyspark part of Spark ?
>>
>> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>>> yep :) I'll open the jira when I've got the time.
>>> Thanks
>>>
>>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a
>>> écrit :
>>>
 Ah damn. We need to add it to the Python list. Would you like to give
 it a shot?


 On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
 o.girar...@lateral-thoughts.com> wrote:

> Yep no problem, but I can't seem to find the coalesce fonction in
> pyspark.sql.{*, functions, types or whatever :) }
>
> Olivier.
>
> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
> > a UDF might be a good idea no ?
> >
> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
> > o.girar...@lateral-thoughts.com> a écrit :
> >
> >> Hi everyone,
> >> let's assume I'm stuck in 1.3.0, how can I benefit from the
> *fillna* API
> >> in PySpark, is there any efficient alternative to mapping the
> records
> >> myself ?
> >>
> >> Regards,
> >>
> >> Olivier.
> >>
> >
>




Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
I found another way setting a SPARK_HOME on a released version and
launching an ipython to load the contexts.
I may need your insight however, I found why it hasn't been done at the
same time, this method (like some others) uses a varargs in Scala and for
now the way functions are called only one parameter is supported.

So at first I tried to just generalise the helper function "_" in the
functions.py file to multiple arguments, but py4j's handling of varargs
forces me to create an Array[Column] if the target method is expecting
varargs.

But from Python's perspective, we have no idea of whether the target method
will be expecting varargs or just multiple arguments (to un-tuple).
I can create a special case for "coalesce" or "for method that takes of
list of columns as arguments" considering they will be varargs based (and
therefore needs an Array[Column] instead of just a list of arguments)

But this seems very specific and very prone to future mistakes.
Is there any way in Py4j to know before calling it the signature of a
method ?


Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> What is the way of testing/building the pyspark part of Spark ?
>
> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> yep :) I'll open the jira when I've got the time.
>> Thanks
>>
>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a écrit :
>>
>>> Ah damn. We need to add it to the Python list. Would you like to give it
>>> a shot?
>>>
>>>
>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
 Yep no problem, but I can't seem to find the coalesce fonction in
 pyspark.sql.{*, functions, types or whatever :) }

 Olivier.

 Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
 o.girar...@lateral-thoughts.com> a écrit :

 > a UDF might be a good idea no ?
 >
 > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
 > o.girar...@lateral-thoughts.com> a écrit :
 >
 >> Hi everyone,
 >> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
 API
 >> in PySpark, is there any efficient alternative to mapping the records
 >> myself ?
 >>
 >> Regards,
 >>
 >> Olivier.
 >>
 >

>>>
>>>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Reynold Xin
You need to first have the Spark assembly jar built with "sbt/sbt
assembly/assembly"

Then usually I go into python/run-tests and comment out the non-SQL tests:

#run_core_tests
run_sql_tests
#run_mllib_tests
#run_ml_tests
#run_streaming_tests

And then you can run "python/run-tests"




On Thu, Apr 23, 2015 at 1:17 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> What is the way of testing/building the pyspark part of Spark ?
>
> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> yep :) I'll open the jira when I've got the time.
>> Thanks
>>
>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a écrit :
>>
>>> Ah damn. We need to add it to the Python list. Would you like to give it
>>> a shot?
>>>
>>>
>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
 Yep no problem, but I can't seem to find the coalesce fonction in
 pyspark.sql.{*, functions, types or whatever :) }

 Olivier.

 Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
 o.girar...@lateral-thoughts.com> a écrit :

 > a UDF might be a good idea no ?
 >
 > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
 > o.girar...@lateral-thoughts.com> a écrit :
 >
 >> Hi everyone,
 >> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
 API
 >> in PySpark, is there any efficient alternative to mapping the records
 >> myself ?
 >>
 >> Regards,
 >>
 >> Olivier.
 >>
 >

>>>
>>>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
What is the way of testing/building the pyspark part of Spark ?

Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> yep :) I'll open the jira when I've got the time.
> Thanks
>
> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a écrit :
>
>> Ah damn. We need to add it to the Python list. Would you like to give it
>> a shot?
>>
>>
>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> wrote:
>>
>>> Yep no problem, but I can't seem to find the coalesce fonction in
>>> pyspark.sql.{*, functions, types or whatever :) }
>>>
>>> Olivier.
>>>
>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> a écrit :
>>>
>>> > a UDF might be a good idea no ?
>>> >
>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>> > o.girar...@lateral-thoughts.com> a écrit :
>>> >
>>> >> Hi everyone,
>>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
>>> API
>>> >> in PySpark, is there any efficient alternative to mapping the records
>>> >> myself ?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Olivier.
>>> >>
>>> >
>>>
>>
>>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
yep :) I'll open the jira when I've got the time.
Thanks

Le jeu. 23 avr. 2015 à 19:31, Reynold Xin  a écrit :

> Ah damn. We need to add it to the Python list. Would you like to give it a
> shot?
>
>
> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Yep no problem, but I can't seem to find the coalesce fonction in
>> pyspark.sql.{*, functions, types or whatever :) }
>>
>> Olivier.
>>
>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>> > a UDF might be a good idea no ?
>> >
>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>> > o.girar...@lateral-thoughts.com> a écrit :
>> >
>> >> Hi everyone,
>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
>> API
>> >> in PySpark, is there any efficient alternative to mapping the records
>> >> myself ?
>> >>
>> >> Regards,
>> >>
>> >> Olivier.
>> >>
>> >
>>
>
>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Reynold Xin
Ah damn. We need to add it to the Python list. Would you like to give it a
shot?


On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> Yep no problem, but I can't seem to find the coalesce fonction in
> pyspark.sql.{*, functions, types or whatever :) }
>
> Olivier.
>
> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
> > a UDF might be a good idea no ?
> >
> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
> > o.girar...@lateral-thoughts.com> a écrit :
> >
> >> Hi everyone,
> >> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API
> >> in PySpark, is there any efficient alternative to mapping the records
> >> myself ?
> >>
> >> Regards,
> >>
> >> Olivier.
> >>
> >
>


Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
Yep no problem, but I can't seem to find the coalesce fonction in
pyspark.sql.{*, functions, types or whatever :) }

Olivier.

Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> a UDF might be a good idea no ?
>
> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> Hi everyone,
>> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API
>> in PySpark, is there any efficient alternative to mapping the records
>> myself ?
>>
>> Regards,
>>
>> Olivier.
>>
>


Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Reynold Xin
It is actually different.

coalesce expression is to pick the first value that is not null:
https://msdn.microsoft.com/en-us/library/ms190349.aspx

Would be great to update the documentation for it (both Scala and Java) to
explain that it is different from coalesce function on a DataFrame/RDD. Do
you want to submit a pull request?



On Wed, Apr 22, 2015 at 3:05 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> I think I found the Coalesce you were talking about, but this is a
> catalyst class that I think is not available from pyspark
>
> Regards,
>
> Olivier.
>
> Le mer. 22 avr. 2015 à 11:56, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> Where should this *coalesce* come from ? Is it related to the partition
>> manipulation coalesce method ?
>> Thanks !
>>
>> Le lun. 20 avr. 2015 à 22:48, Reynold Xin  a écrit :
>>
>>> Ah ic. You can do something like
>>>
>>>
>>> df.select(coalesce(df("a"), lit(0.0)))
>>>
>>> On Mon, Apr 20, 2015 at 1:44 PM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
 From PySpark it seems to me that the fillna is relying on Java/Scala
 code, that's why I was wondering.
 Thank you for answering :)

 Le lun. 20 avr. 2015 à 22:22, Reynold Xin  a
 écrit :

> You can just create fillna function based on the 1.3.1 implementation
> of fillna, no?
>
>
> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> a UDF might be a good idea no ?
>>
>> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>> > Hi everyone,
>> > let's assume I'm stuck in 1.3.0, how can I benefit from the
>> *fillna* API
>> > in PySpark, is there any efficient alternative to mapping the
>> records
>> > myself ?
>> >
>> > Regards,
>> >
>> > Olivier.
>> >
>>
>
>
>>>


Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Olivier Girardot
I think I found the Coalesce you were talking about, but this is a catalyst
class that I think is not available from pyspark

Regards,

Olivier.

Le mer. 22 avr. 2015 à 11:56, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> Where should this *coalesce* come from ? Is it related to the partition
> manipulation coalesce method ?
> Thanks !
>
> Le lun. 20 avr. 2015 à 22:48, Reynold Xin  a écrit :
>
>> Ah ic. You can do something like
>>
>>
>> df.select(coalesce(df("a"), lit(0.0)))
>>
>> On Mon, Apr 20, 2015 at 1:44 PM, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> wrote:
>>
>>> From PySpark it seems to me that the fillna is relying on Java/Scala
>>> code, that's why I was wondering.
>>> Thank you for answering :)
>>>
>>> Le lun. 20 avr. 2015 à 22:22, Reynold Xin  a
>>> écrit :
>>>
 You can just create fillna function based on the 1.3.1 implementation
 of fillna, no?


 On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
 o.girar...@lateral-thoughts.com> wrote:

> a UDF might be a good idea no ?
>
> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
> > Hi everyone,
> > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
> API
> > in PySpark, is there any efficient alternative to mapping the records
> > myself ?
> >
> > Regards,
> >
> > Olivier.
> >
>


>>


Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Olivier Girardot
Where should this *coalesce* come from ? Is it related to the partition
manipulation coalesce method ?
Thanks !

Le lun. 20 avr. 2015 à 22:48, Reynold Xin  a écrit :

> Ah ic. You can do something like
>
>
> df.select(coalesce(df("a"), lit(0.0)))
>
> On Mon, Apr 20, 2015 at 1:44 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> From PySpark it seems to me that the fillna is relying on Java/Scala
>> code, that's why I was wondering.
>> Thank you for answering :)
>>
>> Le lun. 20 avr. 2015 à 22:22, Reynold Xin  a écrit :
>>
>>> You can just create fillna function based on the 1.3.1 implementation of
>>> fillna, no?
>>>
>>>
>>> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
 a UDF might be a good idea no ?

 Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
 o.girar...@lateral-thoughts.com> a écrit :

 > Hi everyone,
 > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
 API
 > in PySpark, is there any efficient alternative to mapping the records
 > myself ?
 >
 > Regards,
 >
 > Olivier.
 >

>>>
>>>
>


Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Reynold Xin
Ah ic. You can do something like


df.select(coalesce(df("a"), lit(0.0)))

On Mon, Apr 20, 2015 at 1:44 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> From PySpark it seems to me that the fillna is relying on Java/Scala code,
> that's why I was wondering.
> Thank you for answering :)
>
> Le lun. 20 avr. 2015 à 22:22, Reynold Xin  a écrit :
>
>> You can just create fillna function based on the 1.3.1 implementation of
>> fillna, no?
>>
>>
>> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> wrote:
>>
>>> a UDF might be a good idea no ?
>>>
>>> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> a écrit :
>>>
>>> > Hi everyone,
>>> > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
>>> API
>>> > in PySpark, is there any efficient alternative to mapping the records
>>> > myself ?
>>> >
>>> > Regards,
>>> >
>>> > Olivier.
>>> >
>>>
>>
>>


Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
>From PySpark it seems to me that the fillna is relying on Java/Scala code,
that's why I was wondering.
Thank you for answering :)

Le lun. 20 avr. 2015 à 22:22, Reynold Xin  a écrit :

> You can just create fillna function based on the 1.3.1 implementation of
> fillna, no?
>
>
> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> a UDF might be a good idea no ?
>>
>> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>> > Hi everyone,
>> > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API
>> > in PySpark, is there any efficient alternative to mapping the records
>> > myself ?
>> >
>> > Regards,
>> >
>> > Olivier.
>> >
>>
>
>


Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Reynold Xin
You can just create fillna function based on the 1.3.1 implementation of
fillna, no?


On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> a UDF might be a good idea no ?
>
> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
> > Hi everyone,
> > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API
> > in PySpark, is there any efficient alternative to mapping the records
> > myself ?
> >
> > Regards,
> >
> > Olivier.
> >
>


Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
a UDF might be a good idea no ?

Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> Hi everyone,
> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API
> in PySpark, is there any efficient alternative to mapping the records
> myself ?
>
> Regards,
>
> Olivier.
>