Can we do dataframe.query like Pandas dataframe in spark?

2015-09-17 Thread Rex X
With Pandas dataframe , we can do query: >>> from numpy.random import randn>>> from pandas import DataFrame>>> df = >>> DataFrame(randn(10, 2), columns=list('ab'))>>> df.query('a > b') This SQL-select-like query

Re: Can we do dataframe.query like Pandas dataframe in spark?

2015-09-17 Thread Michael Armbrust
from pyspark.sql.functions import * ​ df = sqlContext.range(10).select(rand().alias("a"), rand().alias("b")) df.where("a > b").show() (2) Spark Jobs +--+---+ | a| b| +--+---+ |0.6697439215581628|0.23420961030968923|

Re: Can we do dataframe.query like Pandas dataframe in spark?

2015-09-17 Thread Rex X
very cool! Thank you, Michael. On Thu, Sep 17, 2015 at 11:00 AM, Michael Armbrust wrote: > from pyspark.sql.functions import * > > ​ > > df = sqlContext.range(10).select(rand().alias("a"), rand().alias("b")) > > df.where("a > b").show() > > (2) Spark Jobs >