Re: 答复: GroupBy in Spark / Scala without Agg functions
I see, Thank you for explanation LInyuxin On Wed, May 30, 2018 at 6:21 AM, Linyuxin wrote: > Hi, > > Why not group by first then join? > > BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ > > > > Source code of 2.1: > > *def *distinct(): Dataset[T] = dropDuplicates() > > … > > def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { > > … > > Aggregate(groupCols, aggCols, logicalPlan) > } > > > > > > > > > > *发件人**:* Chetan Khatri [mailto:chetan.opensou...@gmail.com] > *发送时间:* 2018年5月30日 2:52 > *收件人:* Irving Duran > *抄送:* Georg Heiler ; user < > user@spark.apache.org> > *主题:* Re: GroupBy in Spark / Scala without Agg functions > > > > Georg, Sorry for dumb question. Help me to understand - if i do > DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy > without agg in sql right ? > > > > On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > > I don't want to get any aggregation, just want to know rather saying > distinct to all columns any other better approach ? > > > > On Wed, May 30, 2018 at 12:16 AM, Irving Duran > wrote: > > Unless you want to get a count, yes. > > > Thank You, > > Irving Duran > > > > > > On Tue, May 29, 2018 at 1:44 PM Chetan Khatri > wrote: > > Georg, I just want to double check that someone wrote MSSQL Server script > where it's groupby all columns. What is alternate best way to do distinct > all columns ? > > > > > > > > On Wed, May 30, 2018 at 12:08 AM, Georg Heiler > wrote: > > Why do you group if you do not want to aggregate? > > Isn't this the same as select distinct? > > > > Chetan Khatri schrieb am Di., 29. Mai 2018 > um 20:21 Uhr: > > All, > > > > I have scenario like this in MSSQL Server SQL where i need to do groupBy > without Agg function: > > > > Pseudocode: > > > > > > select m.student_id, m.student_name, m.student_std, m.student_group, > m.student_d > > ob from student as m inner join general_register g on m.student_id = > g.student_i > > d group by m.student_id, m.student_name, m.student_std, m.student_group, > m.student_dob > > > > I tried to doing in spark but i am not able to get Dataframe as return > value, how this kind of things could be done in Spark. > > > > Thanks > > > > > > >
答复: GroupBy in Spark / Scala without Agg functions
Hi, Why not group by first then join? BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ Source code of 2.1: def distinct(): Dataset[T] = dropDuplicates() … def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { … Aggregate(groupCols, aggCols, logicalPlan) } 发件人: Chetan Khatri [mailto:chetan.opensou...@gmail.com] 发送时间: 2018年5月30日 2:52 收件人: Irving Duran 抄送: Georg Heiler ; user 主题: Re: GroupBy in Spark / Scala without Agg functions Georg, Sorry for dumb question. Help me to understand - if i do DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg in sql right ? On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri mailto:chetan.opensou...@gmail.com>> wrote: I don't want to get any aggregation, just want to know rather saying distinct to all columns any other better approach ? On Wed, May 30, 2018 at 12:16 AM, Irving Duran mailto:irving.du...@gmail.com>> wrote: Unless you want to get a count, yes. Thank You, Irving Duran On Tue, May 29, 2018 at 1:44 PM Chetan Khatri mailto:chetan.opensou...@gmail.com>> wrote: Georg, I just want to double check that someone wrote MSSQL Server script where it's groupby all columns. What is alternate best way to do distinct all columns ? On Wed, May 30, 2018 at 12:08 AM, Georg Heiler mailto:georg.kf.hei...@gmail.com>> wrote: Why do you group if you do not want to aggregate? Isn't this the same as select distinct? Chetan Khatri mailto:chetan.opensou...@gmail.com>> schrieb am Di., 29. Mai 2018 um 20:21 Uhr: All, I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function: Pseudocode: select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d ob from student as m inner join general_register g on m.student_id = g.student_i d group by m.student_id, m.student_name, m.student_std, m.student_group, m.student_dob I tried to doing in spark but i am not able to get Dataframe as return value, how this kind of things could be done in Spark. Thanks
Re: GroupBy in Spark / Scala without Agg functions
Georg, Sorry for dumb question. Help me to understand - if i do DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy without agg in sql right ? On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri wrote: > I don't want to get any aggregation, just want to know rather saying > distinct to all columns any other better approach ? > > On Wed, May 30, 2018 at 12:16 AM, Irving Duran > wrote: > >> Unless you want to get a count, yes. >> >> Thank You, >> >> Irving Duran >> >> >> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Georg, I just want to double check that someone wrote MSSQL Server >>> script where it's groupby all columns. What is alternate best way to do >>> distinct all columns ? >>> >>> >>> >>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler < >>> georg.kf.hei...@gmail.com> wrote: >>> Why do you group if you do not want to aggregate? Isn't this the same as select distinct? Chetan Khatri schrieb am Di., 29. Mai 2018 um 20:21 Uhr: > All, > > I have scenario like this in MSSQL Server SQL where i need to do > groupBy without Agg function: > > Pseudocode: > > > select m.student_id, m.student_name, m.student_std, m.student_group, > m.student_d > ob from student as m inner join general_register g on m.student_id = > g.student_i > d group by m.student_id, m.student_name, m.student_std, > m.student_group, m.student_dob > > I tried to doing in spark but i am not able to get Dataframe as return > value, how this kind of things could be done in Spark. > > Thanks > >>> >
Re: GroupBy in Spark / Scala without Agg functions
I don't want to get any aggregation, just want to know rather saying distinct to all columns any other better approach ? On Wed, May 30, 2018 at 12:16 AM, Irving Duran wrote: > Unless you want to get a count, yes. > > Thank You, > > Irving Duran > > > On Tue, May 29, 2018 at 1:44 PM Chetan Khatri > wrote: > >> Georg, I just want to double check that someone wrote MSSQL Server script >> where it's groupby all columns. What is alternate best way to do distinct >> all columns ? >> >> >> >> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler > > wrote: >> >>> Why do you group if you do not want to aggregate? >>> Isn't this the same as select distinct? >>> >>> Chetan Khatri schrieb am Di., 29. Mai >>> 2018 um 20:21 Uhr: >>> All, I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function: Pseudocode: select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d ob from student as m inner join general_register g on m.student_id = g.student_i d group by m.student_id, m.student_name, m.student_std, m.student_group, m.student_dob I tried to doing in spark but i am not able to get Dataframe as return value, how this kind of things could be done in Spark. Thanks >>> >>
Re: GroupBy in Spark / Scala without Agg functions
Unless you want to get a count, yes. Thank You, Irving Duran On Tue, May 29, 2018 at 1:44 PM Chetan Khatri wrote: > Georg, I just want to double check that someone wrote MSSQL Server script > where it's groupby all columns. What is alternate best way to do distinct > all columns ? > > > > On Wed, May 30, 2018 at 12:08 AM, Georg Heiler > wrote: > >> Why do you group if you do not want to aggregate? >> Isn't this the same as select distinct? >> >> Chetan Khatri schrieb am Di., 29. Mai 2018 >> um 20:21 Uhr: >> >>> All, >>> >>> I have scenario like this in MSSQL Server SQL where i need to do groupBy >>> without Agg function: >>> >>> Pseudocode: >>> >>> >>> select m.student_id, m.student_name, m.student_std, m.student_group, >>> m.student_d >>> ob from student as m inner join general_register g on m.student_id = >>> g.student_i >>> d group by m.student_id, m.student_name, m.student_std, m.student_group, >>> m.student_dob >>> >>> I tried to doing in spark but i am not able to get Dataframe as return >>> value, how this kind of things could be done in Spark. >>> >>> Thanks >>> >> >
Re: GroupBy in Spark / Scala without Agg functions
Georg, I just want to double check that someone wrote MSSQL Server script where it's groupby all columns. What is alternate best way to do distinct all columns ? On Wed, May 30, 2018 at 12:08 AM, Georg Heiler wrote: > Why do you group if you do not want to aggregate? > Isn't this the same as select distinct? > > Chetan Khatri schrieb am Di., 29. Mai 2018 > um 20:21 Uhr: > >> All, >> >> I have scenario like this in MSSQL Server SQL where i need to do groupBy >> without Agg function: >> >> Pseudocode: >> >> >> select m.student_id, m.student_name, m.student_std, m.student_group, >> m.student_d >> ob from student as m inner join general_register g on m.student_id = >> g.student_i >> d group by m.student_id, m.student_name, m.student_std, m.student_group, >> m.student_dob >> >> I tried to doing in spark but i am not able to get Dataframe as return >> value, how this kind of things could be done in Spark. >> >> Thanks >> >
Re: GroupBy in Spark / Scala without Agg functions
Why do you group if you do not want to aggregate? Isn't this the same as select distinct? Chetan Khatri schrieb am Di., 29. Mai 2018 um 20:21 Uhr: > All, > > I have scenario like this in MSSQL Server SQL where i need to do groupBy > without Agg function: > > Pseudocode: > > > select m.student_id, m.student_name, m.student_std, m.student_group, > m.student_d > ob from student as m inner join general_register g on m.student_id = > g.student_i > d group by m.student_id, m.student_name, m.student_std, m.student_group, > m.student_dob > > I tried to doing in spark but i am not able to get Dataframe as return > value, how this kind of things could be done in Spark. > > Thanks >
GroupBy in Spark / Scala without Agg functions
All, I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function: Pseudocode: select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d ob from student as m inner join general_register g on m.student_id = g.student_i d group by m.student_id, m.student_name, m.student_std, m.student_group, m.student_dob I tried to doing in spark but i am not able to get Dataframe as return value, how this kind of things could be done in Spark. Thanks