Re: Spark join produce duplicate rows in resultset
Thanks all: Patrick selected rev.* and I.* cleared the confusion. The Item actually brought 4 rows hence the final result set had 4 rows. Regards, Meena On Sun, Oct 22, 2023 at 10:13 AM Bjørn Jørgensen wrote: > alos remove the space in rev. scode > > søn. 22. okt. 2023 kl. 19:08 skrev Sadha Chilukoori < > sage.quoti...@gmail.com>: > >> Hi Meena, >> >> I'm asking to clarify, are the *on *& *and* keywords optional in the >> join conditions? >> >> Please try this snippet, and see if it helps >> >> select rev.* from rev >> inner join customer c >> on rev.custumer_id =c.id >> inner join product p >> on rev.sys = p.sys >> and rev.prin = p.prin >> and rev.scode= p.bcode >> >> left join item I >> on rev.sys = I.sys >> and rev.custumer_id = I.custumer_id >> and rev. scode = I.scode; >> >> Thanks, >> Sadha >> >> On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani >> wrote: >> >>> Hello all: >>> >>> I am using spark sql to join two tables. To my surprise I am >>> getting redundant rows. What could be the cause. >>> >>> >>> select rev.* from rev >>> inner join customer c >>> on rev.custumer_id =c.id >>> inner join product p >>> rev.sys = p.sys >>> rev.prin = p.prin >>> rev.scode= p.bcode >>> >>> left join item I >>> on rev.sys = i.sys >>> rev.custumer_id = I.custumer_id >>> rev. scode = I.scode >>> >>> where rev.custumer_id = '123456789' >>> >>> The first part of the code brings one row >>> >>> select rev.* from rev >>> inner join customer c >>> on rev.custumer_id =c.id >>> inner join product p >>> rev.sys = p.sys >>> rev.prin = p.prin >>> rev.scode= p.bcode >>> >>> >>> The item has two rows which have common attributes and the* final >>> join should result in 2 rows. But I am seeing 4 rows instead.* >>> >>> left join item I >>> on rev.sys = i.sys >>> rev.custumer_id = I.custumer_id >>> rev. scode = I.scode >>> >>> >>> >>> Regards, >>> Meena >>> >>> >>> > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 >
Re: Spark join produce duplicate rows in resultset
alos remove the space in rev. scode søn. 22. okt. 2023 kl. 19:08 skrev Sadha Chilukoori : > Hi Meena, > > I'm asking to clarify, are the *on *& *and* keywords optional in the join > conditions? > > Please try this snippet, and see if it helps > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > on rev.sys = p.sys > and rev.prin = p.prin > and rev.scode= p.bcode > > left join item I > on rev.sys = I.sys > and rev.custumer_id = I.custumer_id > and rev. scode = I.scode; > > Thanks, > Sadha > > On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani > wrote: > >> Hello all: >> >> I am using spark sql to join two tables. To my surprise I am >> getting redundant rows. What could be the cause. >> >> >> select rev.* from rev >> inner join customer c >> on rev.custumer_id =c.id >> inner join product p >> rev.sys = p.sys >> rev.prin = p.prin >> rev.scode= p.bcode >> >> left join item I >> on rev.sys = i.sys >> rev.custumer_id = I.custumer_id >> rev. scode = I.scode >> >> where rev.custumer_id = '123456789' >> >> The first part of the code brings one row >> >> select rev.* from rev >> inner join customer c >> on rev.custumer_id =c.id >> inner join product p >> rev.sys = p.sys >> rev.prin = p.prin >> rev.scode= p.bcode >> >> >> The item has two rows which have common attributes and the* final join >> should result in 2 rows. But I am seeing 4 rows instead.* >> >> left join item I >> on rev.sys = i.sys >> rev.custumer_id = I.custumer_id >> rev. scode = I.scode >> >> >> >> Regards, >> Meena >> >> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297
Re: Spark join produce duplicate rows in resultset
Hi Meena, I'm asking to clarify, are the *on *& *and* keywords optional in the join conditions? Please try this snippet, and see if it helps select rev.* from rev inner join customer c on rev.custumer_id =c.id inner join product p on rev.sys = p.sys and rev.prin = p.prin and rev.scode= p.bcode left join item I on rev.sys = I.sys and rev.custumer_id = I.custumer_id and rev. scode = I.scode; Thanks, Sadha On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani wrote: > Hello all: > > I am using spark sql to join two tables. To my surprise I am > getting redundant rows. What could be the cause. > > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > rev.sys = p.sys > rev.prin = p.prin > rev.scode= p.bcode > > left join item I > on rev.sys = i.sys > rev.custumer_id = I.custumer_id > rev. scode = I.scode > > where rev.custumer_id = '123456789' > > The first part of the code brings one row > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > rev.sys = p.sys > rev.prin = p.prin > rev.scode= p.bcode > > > The item has two rows which have common attributes and the* final join > should result in 2 rows. But I am seeing 4 rows instead.* > > left join item I > on rev.sys = i.sys > rev.custumer_id = I.custumer_id > rev. scode = I.scode > > > > Regards, > Meena > > >
Re: Spark join produce duplicate rows in resultset
Hi Meena, It's not impossible, but it's unlikely that there's a bug in Spark SQL randomly duplicating rows. The most likely explanation is there are more records in the item table that match your sys/custumer_id/scode criteria than you expect. In your original query, try changing select rev.* to select I.*. This will show you the records from item that the join produces. If the first part of the code only returns one record, I expect you will see 4 distinct records returned here. Thanks, Patrick On Sun, Oct 22, 2023 at 1:29 AM Meena Rajani wrote: > Hello all: > > I am using spark sql to join two tables. To my surprise I am > getting redundant rows. What could be the cause. > > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > rev.sys = p.sys > rev.prin = p.prin > rev.scode= p.bcode > > left join item I > on rev.sys = i.sys > rev.custumer_id = I.custumer_id > rev. scode = I.scode > > where rev.custumer_id = '123456789' > > The first part of the code brings one row > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > rev.sys = p.sys > rev.prin = p.prin > rev.scode= p.bcode > > > The item has two rows which have common attributes and the* final join > should result in 2 rows. But I am seeing 4 rows instead.* > > left join item I > on rev.sys = i.sys > rev.custumer_id = I.custumer_id > rev. scode = I.scode > > > > Regards, > Meena > > >
Spark join produce duplicate rows in resultset
Hello all: I am using spark sql to join two tables. To my surprise I am getting redundant rows. What could be the cause. select rev.* from rev inner join customer c on rev.custumer_id =c.id inner join product p rev.sys = p.sys rev.prin = p.prin rev.scode= p.bcode left join item I on rev.sys = i.sys rev.custumer_id = I.custumer_id rev. scode = I.scode where rev.custumer_id = '123456789' The first part of the code brings one row select rev.* from rev inner join customer c on rev.custumer_id =c.id inner join product p rev.sys = p.sys rev.prin = p.prin rev.scode= p.bcode The item has two rows which have common attributes and the* final join should result in 2 rows. But I am seeing 4 rows instead.* left join item I on rev.sys = i.sys rev.custumer_id = I.custumer_id rev. scode = I.scode Regards, Meena