Re: filter by dict() key in pySpark
Another solution could be using left-semi join: keys = sqlContext.createDataFrame(dict.keys()) DF2 = DF1.join(keys, DF1.a = keys.k, "leftsemi") On Wed, Feb 24, 2016 at 2:14 AM, Franc Carter wrote: > > A colleague found how to do this, the approach was to use a udf() > > cheers > > On 21 February 2016 at 22:41, Franc Carter wrote: >> >> >> I have a DataFrame that has a Python dict() as one of the columns. I'd >> like to filter he DataFrame for those Rows that where the dict() contains a >> specific value. e.g something like this:- >> >> DF2 = DF1.filter('name' in DF1.params) >> >> but that gives me this error >> >> ValueError: Cannot convert column into bool: please use '&' for 'and', '|' >> for 'or', '~' for 'not' when building DataFrame boolean expressions. >> >> How do I express this correctly ? >> >> thanks >> >> -- >> Franc > > > > > -- > Franc - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: filter by dict() key in pySpark
A colleague found how to do this, the approach was to use a udf() cheers On 21 February 2016 at 22:41, Franc Carter wrote: > > I have a DataFrame that has a Python dict() as one of the columns. I'd > like to filter he DataFrame for those Rows that where the dict() contains a > specific value. e.g something like this:- > > DF2 = DF1.filter('name' in DF1.params) > > but that gives me this error > > ValueError: Cannot convert column into bool: please use '&' for 'and', '|' > for 'or', '~' for 'not' when building DataFrame boolean expressions. > > How do I express this correctly ? > > thanks > > -- > Franc > -- Franc
filter by dict() key in pySpark
I have a DataFrame that has a Python dict() as one of the columns. I'd like to filter he DataFrame for those Rows that where the dict() contains a specific value. e.g something like this:- DF2 = DF1.filter('name' in DF1.params) but that gives me this error ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. How do I express this correctly ? thanks -- Franc