RE: Nested DataFrames

2015-06-25 Thread Richard Catlin
I am looking to do something similar to this Postgres query in HiveQL.  If
I have a DataFrame student and a DataFrame grade, is this possible?

I read in Learning Spark: Lightning-Fast Big Data Analysis that it should
be possible.  It says in Chapter 9
SchemaRDDs can store several basic types, as well as structures and arrays
of these types.  They use the HiveQL syntax for type definitions.
Table-9-1 shown
and goes on to say
The last type, structures, is simply represented as other Rows in Spark
SQL.  All of these types can also be nested within each other; for example,
you can have arrays of structs, or maps that contain structs

Here is the url of the page that describes the Postgres:
http://stackoverflow.com/questions/10928210/postgresql-aggregate-array

SELECT s.name,  array_agg(g.Mark) as marks

FROM student s

LEFT JOIN Grade g ON g.Student_id = s.Id

GROUP BY s.Id

Thank you.

Richard Catlin


Re: Nested DataFrames

2015-06-25 Thread pawan kumar
May be you could try something like this using sparkSQL 1.4 and dataframes


student.join(Grade, Grade(student_id) === student(student_id), left)

   .groupBy(id)

   .agg(sum(grade(Marks)), avg(grade(Marks)))



You could refer to the following document :
https://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.DataFrame


On Thu, Jun 25, 2015 at 5:16 PM, Richard Catlin richard.m.cat...@gmail.com
wrote:

 I am looking to do something similar to this Postgres query in HiveQL.  If
 I have a DataFrame student and a DataFrame grade, is this possible?

 I read in Learning Spark: Lightning-Fast Big Data Analysis that it should
 be possible.  It says in Chapter 9
 SchemaRDDs can store several basic types, as well as structures and
 arrays of these types.  They use the HiveQL syntax for type definitions.
 Table-9-1 shown
 and goes on to say
 The last type, structures, is simply represented as other Rows in Spark
 SQL.  All of these types can also be nested within each other; for example,
 you can have arrays of structs, or maps that contain structs

 Here is the url of the page that describes the Postgres:
 http://stackoverflow.com/questions/10928210/postgresql-aggregate-array

 SELECT s.name,  array_agg(g.Mark) as marks

 FROM student s

 LEFT JOIN Grade g ON g.Student_id = s.Id

 GROUP BY s.Id

 Thank you.

 Richard Catlin