Re: ordering over structs

2016-04-12 Thread Michael Armbrust
Does the data actually fit in memory? Check the web ui. If it doesn't caching is not going to help you. On Tue, Apr 12, 2016 at 9:00 AM, Imran Akbar wrote: > thanks Michael, > > That worked. > But what's puzzling is if I take the exact same code and run it off a temp >

Re: ordering over structs

2016-04-12 Thread Imran Akbar
thanks Michael, That worked. But what's puzzling is if I take the exact same code and run it off a temp table created from parquet, vs. a cached table - it runs much slower. 5-10 seconds uncached vs. 47-60 seconds cached. Any ideas why? Here's my code snippet: df = data.select("customer_id",

Re: ordering over structs

2016-04-08 Thread Michael Armbrust
You need to use the struct function (which creates an actual struct), you are trying to use the struct datatype (which just represents the schema of a struct). On Thu, Apr 7, 2016 at 3:48 PM, Imran

Re: ordering over structs

2016-04-07 Thread Imran Akbar
thanks Michael, I'm trying to implement the code in pyspark like so (where my dataframe has 3 columns - customer_id, dt, and product): st = StructType().add("dt", DateType(), True).add("product", StringType(), True) top = data.select("customer_id", st.alias('vs')) .groupBy("customer_id")

Re: ordering over structs

2016-04-06 Thread Michael Armbrust
> > Ordering for a struct goes in order of the fields. So the max struct is > the one with the highest TotalValue (and then the highest category > if there are multiple entries with the same hour and total value). > > Is this due to "InterpretedOrdering" in StructType? > That is one

RE: ordering over structs

2016-04-06 Thread Yong Zhang
1) Is a struct in Spark like a struct in C++? Kinda. Its an ordered collection of data with known names/types. 2) What is an alias in this context? it is assigning a name to the column. similar to doing AS in sql. 3) How does this code even work? Ordering

Re: ordering over structs

2016-04-06 Thread Michael Armbrust
> > 1) Is a struct in Spark like a struct in C++? > Kinda. Its an ordered collection of data with known names/types. > 2) What is an alias in this context? > it is assigning a name to the column. similar to doing AS in sql. > 3) How does this code even work? > Ordering for a struct

ordering over structs

2016-04-06 Thread Imran Akbar
I have a use case similar to this: http://stackoverflow.com/questions/33878370/spark-dataframe-select-the-first-row-of-each-group and I'm trying to understand the solution titled "ordering over structs": 1) Is a struct in Spark like a struct in C++? 2) What is an alias in this contex