Re: groupByKey does not work?

2016-01-05 Thread Sean Owen
I suspect this is another instance of case classes not working as expected between the driver and executor when used with spark-shell. Search JIRA for some back story. On Tue, Jan 5, 2016 at 12:42 AM, Arun Luthra wrote: > Spark 1.5.0 > > data: > >

Re: groupByKey does not work?

2016-01-04 Thread Ted Yu
Can you give a bit more information ? Release of Spark you're using Minimal dataset that shows the problem Cheers On Mon, Jan 4, 2016 at 3:55 PM, Arun Luthra wrote: > I tried groupByKey and noticed that it did not group all values into the > same group. > > In my test

Re: groupByKey does not work?

2016-01-04 Thread Daniel Imberman
Could you please post the associated code and output? On Mon, Jan 4, 2016 at 3:55 PM Arun Luthra wrote: > I tried groupByKey and noticed that it did not group all values into the > same group. > > In my test dataset (a Pair rdd) I have 16 records, where there are only 4 >

groupByKey does not work?

2016-01-04 Thread Arun Luthra
I tried groupByKey and noticed that it did not group all values into the same group. In my test dataset (a Pair rdd) I have 16 records, where there are only 4 distinct keys, so I expected there to be 4 records in the groupByKey object, but instead there were 8. Each of the 4 distinct keys appear

Re: groupByKey does not work?

2016-01-04 Thread Arun Luthra
Spark 1.5.0 data: p1,lo1,8,0,4,0,5,20150901|5,1,1.0 p1,lo2,8,0,4,0,5,20150901|5,1,1.0 p1,lo3,8,0,4,0,5,20150901|5,1,1.0 p1,lo4,8,0,4,0,5,20150901|5,1,1.0 p1,lo1,8,0,4,0,5,20150901|5,1,1.0 p1,lo2,8,0,4,0,5,20150901|5,1,1.0

Re: groupByKey does not work?

2016-01-04 Thread Daniel Imberman
Could you try simplifying the key and seeing if that makes any difference? Make it just a string or an int so we can count out any issues in object equality. On Mon, Jan 4, 2016 at 4:42 PM Arun Luthra wrote: > Spark 1.5.0 > > data: > >

Re: groupByKey does not work?

2016-01-04 Thread Arun Luthra
If I simplify the key to String column with values lo1, lo2, lo3, lo4, it works correctly. On Mon, Jan 4, 2016 at 4:49 PM, Daniel Imberman wrote: > Could you try simplifying the key and seeing if that makes any difference? > Make it just a string or an int so we can

Re: groupByKey does not work?

2016-01-04 Thread Daniel Imberman
That's interesting. I would try case class Mykey(uname:String) case class Mykey(uname:String, c1:Char) case class Mykey(uname:String, lo:String, f1:Char, f2:Char, f3:Char, f4:Char, f5:Char, f6:String) In that order. It seems like there is some issue with equality between keys. On Mon, Jan 4,