Re: Can we do dataframe.query like Pandas dataframe in spark?

2015-09-17 Thread Rex X
very cool! Thank you, Michael.


On Thu, Sep 17, 2015 at 11:00 AM, Michael Armbrust 
wrote:

> from pyspark.sql.functions import *
>
> ​
>
> df = sqlContext.range(10).select(rand().alias("a"), rand().alias("b"))
>
> df.where("a > b").show()
>
> (2) Spark Jobs
> +--+---+ | a| b|
> +--+---+
> |0.6697439215581628|0.23420961030968923| |0.9248996796756386|
> 0.4146647917936366| +--+---+
>
> On Thu, Sep 17, 2015 at 9:32 AM, Rex X  wrote:
>
>> With Pandas dataframe
>> ,
>> we can do query:
>>
>> >>> from numpy.random import randn>>> from pandas import DataFrame>>> df = 
>> >>> DataFrame(randn(10, 2), columns=list('ab'))>>> df.query('a > b')
>>
>>
>> This SQL-select-like query is very convenient. Can we do similar thing
>> with the new dataframe of spark?
>>
>>
>> Best,
>> Rex
>>
>
>


Re: Can we do dataframe.query like Pandas dataframe in spark?

2015-09-17 Thread Michael Armbrust
from pyspark.sql.functions import *

​

df = sqlContext.range(10).select(rand().alias("a"), rand().alias("b"))

df.where("a > b").show()

(2) Spark Jobs
+--+---+ | a| b|
+--+---+
|0.6697439215581628|0.23420961030968923| |0.9248996796756386|
0.4146647917936366| +--+---+

On Thu, Sep 17, 2015 at 9:32 AM, Rex X  wrote:

> With Pandas dataframe
> ,
> we can do query:
>
> >>> from numpy.random import randn>>> from pandas import DataFrame>>> df = 
> >>> DataFrame(randn(10, 2), columns=list('ab'))>>> df.query('a > b')
>
>
> This SQL-select-like query is very convenient. Can we do similar thing
> with the new dataframe of spark?
>
>
> Best,
> Rex
>