Re: 答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
I see, Thank you for explanation LInyuxin

On Wed, May 30, 2018 at 6:21 AM, Linyuxin  wrote:

> Hi,
>
> Why not group by first then join?
>
> BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
>
>
>
> Source code of 2.1:
>
> *def *distinct(): Dataset[T] = dropDuplicates()
>
> …
>
> def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
>
> …
>
> Aggregate(groupCols, aggCols, logicalPlan)
> }
>
>
>
>
>
>
>
>
>
> *发件人**:* Chetan Khatri [mailto:chetan.opensou...@gmail.com]
> *发送时间:* 2018年5月30日 2:52
> *收件人:* Irving Duran 
> *抄送:* Georg Heiler ; user <
> user@spark.apache.org>
> *主题:* Re: GroupBy in Spark / Scala without Agg functions
>
>
>
> Georg, Sorry for dumb question. Help me to understand - if i do
> DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy
> without agg in sql right ?
>
>
>
> On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
>
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
> wrote:
>
> Unless you want to get a count, yes.
>
>
> Thank You,
>
> Irving Duran
>
>
>
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
> wrote:
>
> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
>
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
> wrote:
>
> Why do you group if you do not want to aggregate?
>
> Isn't this the same as select distinct?
>
>
>
> Chetan Khatri  schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
> All,
>
>
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
>
>
> Pseudocode:
>
>
>
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
>
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
>
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
>
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
>
>
> Thanks
>
>
>
>
>
>
>


答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Linyuxin
Hi,
Why not group by first then join?
BTW, I don’t think there any difference between ‘distinct’ and ‘group by’

Source code of 2.1:
def distinct(): Dataset[T] = dropDuplicates()
…
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
…
Aggregate(groupCols, aggCols, logicalPlan)
}




发件人: Chetan Khatri [mailto:chetan.opensou...@gmail.com]
发送时间: 2018年5月30日 2:52
收件人: Irving Duran 
抄送: Georg Heiler ; user 
主题: Re: GroupBy in Spark / Scala without Agg functions

Georg, Sorry for dumb question. Help me to understand - if i do 
DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg 
in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri 
mailto:chetan.opensou...@gmail.com>> wrote:
I don't want to get any aggregation, just want to know rather saying distinct 
to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
mailto:irving.du...@gmail.com>> wrote:
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
mailto:chetan.opensou...@gmail.com>> wrote:
Georg, I just want to double check that someone wrote MSSQL Server script where 
it's groupby all columns. What is alternate best way to do distinct all columns 
?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
mailto:georg.kf.hei...@gmail.com>> wrote:
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri mailto:chetan.opensou...@gmail.com>> 
schrieb am Di., 29. Mai 2018 um 20:21 Uhr:
All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy 
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d
ob from student as m inner join general_register g on m.student_id = g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group, 
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return value, 
how this kind of things could be done in Spark.

Thanks





Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
Georg, Sorry for dumb question. Help me to understand - if i do
DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy without
agg in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri  wrote:

> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
> wrote:
>
>> Unless you want to get a count, yes.
>>
>> Thank You,
>>
>> Irving Duran
>>
>>
>> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Georg, I just want to double check that someone wrote MSSQL Server
>>> script where it's groupby all columns. What is alternate best way to do
>>> distinct all columns ?
>>>
>>>
>>>
>>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <
>>> georg.kf.hei...@gmail.com> wrote:
>>>
 Why do you group if you do not want to aggregate?
 Isn't this the same as select distinct?

 Chetan Khatri  schrieb am Di., 29. Mai
 2018 um 20:21 Uhr:

> All,
>
> I have scenario like this in MSSQL Server SQL where i need to do
> groupBy without Agg function:
>
> Pseudocode:
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
> d group by m.student_id, m.student_name, m.student_std,
> m.student_group, m.student_dob
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
> Thanks
>

>>>
>


Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
I don't want to get any aggregation, just want to know rather saying
distinct to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
wrote:

> Unless you want to get a count, yes.
>
> Thank You,
>
> Irving Duran
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
> wrote:
>
>> Georg, I just want to double check that someone wrote MSSQL Server script
>> where it's groupby all columns. What is alternate best way to do distinct
>> all columns ?
>>
>>
>>
>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler > > wrote:
>>
>>> Why do you group if you do not want to aggregate?
>>> Isn't this the same as select distinct?
>>>
>>> Chetan Khatri  schrieb am Di., 29. Mai
>>> 2018 um 20:21 Uhr:
>>>
 All,

 I have scenario like this in MSSQL Server SQL where i need to do
 groupBy without Agg function:

 Pseudocode:


 select m.student_id, m.student_name, m.student_std, m.student_group,
 m.student_d
 ob from student as m inner join general_register g on m.student_id =
 g.student_i
 d group by m.student_id, m.student_name, m.student_std,
 m.student_group, m.student_dob

 I tried to doing in spark but i am not able to get Dataframe as return
 value, how this kind of things could be done in Spark.

 Thanks

>>>
>>


Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Irving Duran
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
wrote:

> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
> wrote:
>
>> Why do you group if you do not want to aggregate?
>> Isn't this the same as select distinct?
>>
>> Chetan Khatri  schrieb am Di., 29. Mai 2018
>> um 20:21 Uhr:
>>
>>> All,
>>>
>>> I have scenario like this in MSSQL Server SQL where i need to do groupBy
>>> without Agg function:
>>>
>>> Pseudocode:
>>>
>>>
>>> select m.student_id, m.student_name, m.student_std, m.student_group,
>>> m.student_d
>>> ob from student as m inner join general_register g on m.student_id =
>>> g.student_i
>>> d group by m.student_id, m.student_name, m.student_std, m.student_group,
>>> m.student_dob
>>>
>>> I tried to doing in spark but i am not able to get Dataframe as return
>>> value, how this kind of things could be done in Spark.
>>>
>>> Thanks
>>>
>>
>


Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
Georg, I just want to double check that someone wrote MSSQL Server script
where it's groupby all columns. What is alternate best way to do distinct
all columns ?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
wrote:

> Why do you group if you do not want to aggregate?
> Isn't this the same as select distinct?
>
> Chetan Khatri  schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
>> All,
>>
>> I have scenario like this in MSSQL Server SQL where i need to do groupBy
>> without Agg function:
>>
>> Pseudocode:
>>
>>
>> select m.student_id, m.student_name, m.student_std, m.student_group,
>> m.student_d
>> ob from student as m inner join general_register g on m.student_id =
>> g.student_i
>> d group by m.student_id, m.student_name, m.student_std, m.student_group,
>> m.student_dob
>>
>> I tried to doing in spark but i am not able to get Dataframe as return
>> value, how this kind of things could be done in Spark.
>>
>> Thanks
>>
>


Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Georg Heiler
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri  schrieb am Di., 29. Mai 2018 um
20:21 Uhr:

> All,
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
> Pseudocode:
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
> Thanks
>


GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group,
m.student_d
ob from student as m inner join general_register g on m.student_id =
g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group,
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return
value, how this kind of things could be done in Spark.

Thanks