Re: DataFrame joins with Spark-Java

2017-11-29 Thread Rishi Mishra
Hi Sushma,
can you try as below with a left anti join ..In my example name & id
consists of a key.

df1.alias("a").join(df2.alias("b"),
col("a.name").equalTo(col("b.name"))
.and(col("a.id").equalTo(col("b.id"))) ,
"left_anti").selectExpr("name", "id").show(10, false);

Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

On Thu, Nov 30, 2017 at 7:38 AM, sushma spark 
wrote:

> Dear Friends,
>
> I am new to spark DataFrame. My requirement is i have a dataframe1
> contains the today's records and dataframe2 contains yesterday's records. I
> need to compare the today's records with yesterday's records and find out
> new records which are not exists in the yesterday's records based on the
> primary key of the column. Here, the problem is sometimes there are
> multiple columns having primary keys.
>
> I am receiving primary key columns in a List.
>
> example:
>
> List primaryKeyList = listOfPrimarykeys; // single or multiple
> primary key columns
>
> DataFrame currentDataRecords = queryexecutor.getCurrentRecords(); // this
> contains today's records
> DataFrame yesterdayRecords = queryexecutor.getYesterdayRecords();// this
> contains yesterday's records
>
> Can you anyone help me how to join these two dataframes and apply WHERE
> conditions on columns dynamically with SPARK-JAVA code.
>
> Thanks
> Sushma
>
>


DataFrame joins with Spark-Java

2017-11-29 Thread sushma spark
Dear Friends,

I am new to spark DataFrame. My requirement is i have a dataframe1 contains
the today's records and dataframe2 contains yesterday's records. I need to
compare the today's records with yesterday's records and find out new
records which are not exists in the yesterday's records based on the
primary key of the column. Here, the problem is sometimes there are
multiple columns having primary keys.

I am receiving primary key columns in a List.

example:

List primaryKeyList = listOfPrimarykeys; // single or multiple
primary key columns

DataFrame currentDataRecords = queryexecutor.getCurrentRecords(); // this
contains today's records
DataFrame yesterdayRecords = queryexecutor.getYesterdayRecords();// this
contains yesterday's records

Can you anyone help me how to join these two dataframes and apply WHERE
conditions on columns dynamically with SPARK-JAVA code.

Thanks
Sushma