RE: Joining 2 dataframes, getting result as nested list/structure in dataframe

2017-08-24 Thread JG Perrin
Thanks Michael – this is a great article… very helpful

From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, August 23, 2017 4:33 PM
To: JG Perrin <jper...@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: Joining 2 dataframes, getting result as nested list/structure in 
dataframe

You can create a nested struct that contains multiple columns using struct().

Here's a pretty complete guide on working with nested data: 
https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html

On Wed, Aug 23, 2017 at 2:30 PM, JG Perrin 
<jper...@lumeris.com<mailto:jper...@lumeris.com>> wrote:
Hi folks,

I am trying to join 2 dataframes, but I would like to have the result as a list 
of rows of the right dataframe (dDf in the example) in a column of the left 
dataframe (cDf in the example). I made it work with one column, but having 
issues adding more columns/creating a row(?).
Seq joinColumns = new Set2<>("c1", "c2").toSeq();
Dataset allDf = cDf.join(dDf, joinColumns, "inner");
allDf.printSchema();
allDf.show();

Dataset aggDf = allDf.groupBy(cDf.col("c1"), cDf.col("c2"))
.agg(collect_list(col("c50")));
aggDf.show();

Output:
++---+---+
|c1  |c2 |collect_list(c50)  |
++---+---+
|3744|1160242| [6, 5, 4, 3, 2, 1]|
|3739|1150097|[1]|
|3780|1159902|[5, 4, 3, 2, 1]|
| 132|1200743|   [4, 3, 2, 1]|
|3778|1183204|[1]|
|3766|1132709|[1]|
|3835|1146169|[1]|
++---+---+

Thanks,

jg



This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender. This 
information may contain confidential health information that is legally 
privileged. The information is intended only for the use of the individual or 
entity named above. The authorized recipient of this transmission is prohibited 
from disclosing this information to any other party unless required to do so by 
law or regulation and is required to delete or destroy the information after 
its stated need has been fulfilled. If you are not the intended recipient, you 
are hereby notified that any disclosure, copying, distribution or the taking of 
any action in reliance on or regarding the contents of this electronically 
transmitted information is strictly prohibited. If you have received this 
E-mail in error, please notify the sender and delete this message immediately.



Re: Joining 2 dataframes, getting result as nested list/structure in dataframe

2017-08-23 Thread Michael Armbrust
You can create a nested struct that contains multiple columns using
struct().

Here's a pretty complete guide on working with nested data:
https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html

On Wed, Aug 23, 2017 at 2:30 PM, JG Perrin  wrote:

> Hi folks,
>
>
>
> I am trying to join 2 dataframes, but I would like to have the result as a
> list of rows of the right dataframe (dDf in the example) in a column of the
> left dataframe (cDf in the example). I made it work with *one column*,
> but having issues adding more columns/creating a row(?).
>
> Seq joinColumns = new Set2<>("c1", "c2").toSeq();
>
> Dataset allDf = cDf.join(dDf, joinColumns, "inner");
>
> allDf.printSchema();
>
> allDf.show();
>
>
>
> Dataset aggDf = allDf.groupBy(cDf.col("c1"), cDf.col("c2"))
>
> .agg(collect_list(col("c50")));
>
> aggDf.show();
>
>
>
> Output:
>
> ++---+---+
>
> |c1  |c2 |collect_list(c50)  |
>
> ++---+---+
>
> |3744|1160242| [6, 5, 4, 3, 2, 1]|
>
> |3739|1150097|[1]|
>
> |3780|1159902|[5, 4, 3, 2, 1]|
>
> | 132|1200743|   [4, 3, 2, 1]|
>
> |3778|1183204|[1]|
>
> |3766|1132709|[1]|
>
> |3835|1146169|[1]|
>
> ++---+---+
>
>
>
> Thanks,
>
>
>
> jg
>
>
> --
>
> This electronic transmission and any documents accompanying this
> electronic transmission contain confidential information belonging to the
> sender. This information may contain confidential health information that
> is legally privileged. The information is intended only for the use of the
> individual or entity named above. The authorized recipient of this
> transmission is prohibited from disclosing this information to any other
> party unless required to do so by law or regulation and is required to
> delete or destroy the information after its stated need has been fulfilled.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or the taking of any action in reliance
> on or regarding the contents of this electronically transmitted information
> is strictly prohibited. If you have received this E-mail in error, please
> notify the sender and delete this message immediately.
>


Joining 2 dataframes, getting result as nested list/structure in dataframe

2017-08-23 Thread JG Perrin
Hi folks,

I am trying to join 2 dataframes, but I would like to have the result as a list 
of rows of the right dataframe (dDf in the example) in a column of the left 
dataframe (cDf in the example). I made it work with one column, but having 
issues adding more columns/creating a row(?).
Seq joinColumns = new Set2<>("c1", "c2").toSeq();
Dataset allDf = cDf.join(dDf, joinColumns, "inner");
allDf.printSchema();
allDf.show();

Dataset aggDf = allDf.groupBy(cDf.col("c1"), cDf.col("c2"))
.agg(collect_list(col("c50")));
aggDf.show();

Output:
++---+---+
|c1  |c2 |collect_list(c50)  |
++---+---+
|3744|1160242| [6, 5, 4, 3, 2, 1]|
|3739|1150097|[1]|
|3780|1159902|[5, 4, 3, 2, 1]|
| 132|1200743|   [4, 3, 2, 1]|
|3778|1183204|[1]|
|3766|1132709|[1]|
|3835|1146169|[1]|
++---+---+

Thanks,

jg

__
This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender.  This 
information may contain confidential health information that is legally 
privileged.  The information is intended only for the use of the individual or 
entity named above.  The authorized recipient of this transmission is 
prohibited from disclosing this information to any other party unless required 
to do so by law or regulation and is required to delete or destroy the 
information after its stated need has been fulfilled.  If you are not the 
intended recipient, you are hereby notified that any disclosure, copying, 
distribution or the taking of any action in reliance on or regarding the 
contents of this electronically transmitted information is strictly prohibited. 
 If you have received this E-mail in error, please notify the sender and delete 
this message immediately.