Re: filter by dict() key in pySpark

2016-03-15 Thread Davies Liu
Another solution could be using left-semi join:

keys = sqlContext.createDataFrame(dict.keys())
DF2 = DF1.join(keys, DF1.a = keys.k, "leftsemi")

On Wed, Feb 24, 2016 at 2:14 AM, Franc Carter  wrote:
>
> A colleague found how to do this, the approach was to use a udf()
>
> cheers
>
> On 21 February 2016 at 22:41, Franc Carter  wrote:
>>
>>
>> I have a DataFrame that has a Python dict() as one of the columns. I'd
>> like to filter he DataFrame for those Rows that where the dict() contains a
>> specific value. e.g something like this:-
>>
>> DF2 = DF1.filter('name' in DF1.params)
>>
>> but that gives me this error
>>
>> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
>> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>>
>> How do I express this correctly ?
>>
>> thanks
>>
>> --
>> Franc
>
>
>
>
> --
> Franc

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: filter by dict() key in pySpark

2016-02-24 Thread Franc Carter
A colleague found how to do this, the approach was to use a udf()

cheers

On 21 February 2016 at 22:41, Franc Carter  wrote:

>
> I have a DataFrame that has a Python dict() as one of the columns. I'd
> like to filter he DataFrame for those Rows that where the dict() contains a
> specific value. e.g something like this:-
>
> DF2 = DF1.filter('name' in DF1.params)
>
> but that gives me this error
>
> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>
> How do I express this correctly ?
>
> thanks
>
> --
> Franc
>



-- 
Franc


filter by dict() key in pySpark

2016-02-21 Thread Franc Carter
I have a DataFrame that has a Python dict() as one of the columns. I'd like
to filter he DataFrame for those Rows that where the dict() contains a
specific value. e.g something like this:-

DF2 = DF1.filter('name' in DF1.params)

but that gives me this error

ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
for 'or', '~' for 'not' when building DataFrame boolean expressions.

How do I express this correctly ?

thanks

-- 
Franc