I see, Thank you for explanation LInyuxin On Wed, May 30, 2018 at 6:21 AM, Linyuxin <linyu...@huawei.com> wrote:
> Hi, > > Why not group by first then join? > > BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ > > > > Source code of 2.1: > > *def *distinct(): Dataset[T] = dropDuplicates() > > … > > def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { > > … > > Aggregate(groupCols, aggCols, logicalPlan) > } > > > > > > > > > > *发件人**:* Chetan Khatri [mailto:chetan.opensou...@gmail.com] > *发送时间:* 2018年5月30日 2:52 > *收件人:* Irving Duran <irving.du...@gmail.com> > *抄送:* Georg Heiler <georg.kf.hei...@gmail.com>; user < > user@spark.apache.org> > *主题:* Re: GroupBy in Spark / Scala without Agg functions > > > > Georg, Sorry for dumb question. Help me to understand - if i do > DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy > without agg in sql right ? > > > > On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > > I don't want to get any aggregation, just want to know rather saying > distinct to all columns any other better approach ? > > > > On Wed, May 30, 2018 at 12:16 AM, Irving Duran <irving.du...@gmail.com> > wrote: > > Unless you want to get a count, yes. > > > Thank You, > > Irving Duran > > > > > > On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <chetan.opensou...@gmail.com> > wrote: > > Georg, I just want to double check that someone wrote MSSQL Server script > where it's groupby all columns. What is alternate best way to do distinct > all columns ? > > > > > > > > On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <georg.kf.hei...@gmail.com> > wrote: > > Why do you group if you do not want to aggregate? > > Isn't this the same as select distinct? > > > > Chetan Khatri <chetan.opensou...@gmail.com> schrieb am Di., 29. Mai 2018 > um 20:21 Uhr: > > All, > > > > I have scenario like this in MSSQL Server SQL where i need to do groupBy > without Agg function: > > > > Pseudocode: > > > > > > select m.student_id, m.student_name, m.student_std, m.student_group, > m.student_d > > ob from student as m inner join general_register g on m.student_id = > g.student_i > > d group by m.student_id, m.student_name, m.student_std, m.student_group, > m.student_dob > > > > I tried to doing in spark but i am not able to get Dataframe as return > value, how this kind of things could be done in Spark. > > > > Thanks > > > > > > >