Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread ayan guha
How about running this - select * from (select * , count() over (partition by id order by id) c from filteredDS) f where f.cnt < 7500 On Sun, Mar 5, 2017 at 12:05 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Yes every time I run this code with production scale data it fails.

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread Ankur Srivastava
Yes every time I run this code with production scale data it fails. Test case with small dataset of 50 records on local box runs fine. Thanks Ankur Sent from my iPhone > On Mar 4, 2017, at 12:09 PM, ayan guha wrote: > > Just to be sure, can you reproduce the error using

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread ayan guha
Just to be sure, can you reproduce the error using sql api? On Sat, 4 Mar 2017 at 2:32 pm, Ankur Srivastava wrote: > Adding DEV. > > Or is there any other way to do subtractByKey using Dataset APIs? > > Thanks > Ankur > > On Wed, Mar 1, 2017 at 1:28 PM, Ankur

Re: Spark 2.0 issue with left_outer join

2017-03-03 Thread Ankur Srivastava
Adding DEV. Or is there any other way to do subtractByKey using Dataset APIs? Thanks Ankur On Wed, Mar 1, 2017 at 1:28 PM, Ankur Srivastava wrote: > Hi Users, > > We are facing an issue with left_outer join using Spark Dataset api in 2.0 > Java API. Below is the

Spark 2.0 issue with left_outer join

2017-03-01 Thread Ankur Srivastava
Hi Users, We are facing an issue with left_outer join using Spark Dataset api in 2.0 Java API. Below is the code we have Dataset badIds = filteredDS.groupBy(col("id").alias("bid")).count() .filter((FilterFunction) row -> (Long) row.getAs("count") > 75000); _logger.info("Id count with