I see, Thank you for explanation LInyuxin

On Wed, May 30, 2018 at 6:21 AM, Linyuxin <linyu...@huawei.com> wrote:

> Hi,
>
> Why not group by first then join?
>
> BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
>
>
>
> Source code of 2.1:
>
> *def *distinct(): Dataset[T] = dropDuplicates()
>
> …
>
> def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
>
> …
>
> Aggregate(groupCols, aggCols, logicalPlan)
> }
>
>
>
>
>
>
>
>
>
> *发件人**:* Chetan Khatri [mailto:chetan.opensou...@gmail.com]
> *发送时间:* 2018年5月30日 2:52
> *收件人:* Irving Duran <irving.du...@gmail.com>
> *抄送:* Georg Heiler <georg.kf.hei...@gmail.com>; user <
> user@spark.apache.org>
> *主题:* Re: GroupBy in Spark / Scala without Agg functions
>
>
>
> Georg, Sorry for dumb question. Help me to understand - if i do
> DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy
> without agg in sql right ?
>
>
>
> On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
>
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran <irving.du...@gmail.com>
> wrote:
>
> Unless you want to get a count, yes.
>
>
> Thank You,
>
> Irving Duran
>
>
>
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <chetan.opensou...@gmail.com>
> wrote:
>
> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
>
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <georg.kf.hei...@gmail.com>
> wrote:
>
> Why do you group if you do not want to aggregate?
>
> Isn't this the same as select distinct?
>
>
>
> Chetan Khatri <chetan.opensou...@gmail.com> schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
> All,
>
>
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
>
>
> Pseudocode:
>
>
>
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
>
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
>
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
>
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
>
>
> Thanks
>
>
>
>
>
>
>

Reply via email to