Re: Spark join produce duplicate rows in resultset

2023-10-27 Thread Meena Rajani
Thanks all: Patrick selected rev.* and I.* cleared the confusion. The Item actually brought 4 rows hence the final result set had 4 rows. Regards, Meena On Sun, Oct 22, 2023 at 10:13 AM Bjørn Jørgensen wrote: > alos remove the space in rev. scode > > søn. 22. okt. 2023 kl. 19:08 skrev Sadha

Re: Spark join produce duplicate rows in resultset

2023-10-22 Thread Bjørn Jørgensen
alos remove the space in rev. scode søn. 22. okt. 2023 kl. 19:08 skrev Sadha Chilukoori : > Hi Meena, > > I'm asking to clarify, are the *on *& *and* keywords optional in the join > conditions? > > Please try this snippet, and see if it helps > > select rev.* from rev > inner join customer c >

Re: Spark join produce duplicate rows in resultset

2023-10-22 Thread Sadha Chilukoori
Hi Meena, I'm asking to clarify, are the *on *& *and* keywords optional in the join conditions? Please try this snippet, and see if it helps select rev.* from rev inner join customer c on rev.custumer_id =c.id inner join product p on rev.sys = p.sys and rev.prin = p.prin and rev.scode= p.bcode

Re: Spark join produce duplicate rows in resultset

2023-10-22 Thread Patrick Tucci
Hi Meena, It's not impossible, but it's unlikely that there's a bug in Spark SQL randomly duplicating rows. The most likely explanation is there are more records in the item table that match your sys/custumer_id/scode criteria than you expect. In your original query, try changing select rev.* to

Spark join produce duplicate rows in resultset

2023-10-21 Thread Meena Rajani
Hello all: I am using spark sql to join two tables. To my surprise I am getting redundant rows. What could be the cause. select rev.* from rev inner join customer c on rev.custumer_id =c.id inner join product p rev.sys = p.sys rev.prin = p.prin rev.scode= p.bcode left join item I on rev.sys =