答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Linyuxin
Hi,
Why not group by first then join?
BTW, I don’t think there any difference between ‘distinct’ and ‘group by’

Source code of 2.1:
def distinct(): Dataset[T] = dropDuplicates()
…
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
…
Aggregate(groupCols, aggCols, logicalPlan)
}




发件人: Chetan Khatri [mailto:chetan.opensou...@gmail.com]
发送时间: 2018年5月30日 2:52
收件人: Irving Duran 
抄送: Georg Heiler ; user 
主题: Re: GroupBy in Spark / Scala without Agg functions

Georg, Sorry for dumb question. Help me to understand - if i do 
DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg 
in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri 
mailto:chetan.opensou...@gmail.com>> wrote:
I don't want to get any aggregation, just want to know rather saying distinct 
to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
mailto:irving.du...@gmail.com>> wrote:
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
mailto:chetan.opensou...@gmail.com>> wrote:
Georg, I just want to double check that someone wrote MSSQL Server script where 
it's groupby all columns. What is alternate best way to do distinct all columns 
?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
mailto:georg.kf.hei...@gmail.com>> wrote:
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri mailto:chetan.opensou...@gmail.com>> 
schrieb am Di., 29. Mai 2018 um 20:21 Uhr:
All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy 
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d
ob from student as m inner join general_register g on m.student_id = g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group, 
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return value, 
how this kind of things could be done in Spark.

Thanks





答复: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Linyuxin
Hi Gurdit Singh
Thanks. It is very helpful.

发件人: Gurdit Singh [mailto:gurdit.si...@bitwiseglobal.com]
发送时间: 2017年2月22日 13:31
收件人: Linyuxin <linyu...@huawei.com>; Irving Duran <irving.du...@gmail.com>; 
Yong Zhang <java8...@hotmail.com>
抄送: Jacek Laskowski <ja...@japila.pl>; user <user@spark.apache.org>
主题: RE: [SparkSQL] pre-check syntex before running spark job?

Hi, you can use spark sql Antlr grammer for pre check you syntax.

https://github.com/apache/spark/blob/acf71c63cdde8dced8d108260cdd35e1cc992248/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4


From: Linyuxin [mailto:linyu...@huawei.com]
Sent: Wednesday, February 22, 2017 7:34 AM
To: Irving Duran <irving.du...@gmail.com<mailto:irving.du...@gmail.com>>; Yong 
Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>>
Cc: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>; user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: 答复: [SparkSQL] pre-check syntex before running spark job?

Actually,I want a standalone jar as I can check the syntax without spark 
execution environment

发件人: Irving Duran [mailto:irving.du...@gmail.com]
发送时间: 2017年2月21日 23:29
收件人: Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>>
抄送: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>; Linyuxin 
<linyu...@huawei.com<mailto:linyu...@huawei.com>>; user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: [SparkSQL] pre-check syntex before running spark job?

You can also run it on REPL and test to see if you are getting the expected 
result.


Thank You,

Irving Duran

On Tue, Feb 21, 2017 at 8:01 AM, Yong Zhang 
<java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote:

You can always use explain method to validate your DF or SQL, before any action.



Yong


From: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>
Sent: Tuesday, February 21, 2017 4:34 AM
To: Linyuxin
Cc: user
Subject: Re: [SparkSQL] pre-check syntex before running spark job?

Hi,

Never heard about such a tool before. You could use Antlr to parse SQLs (just 
as Spark SQL does while parsing queries). I think it's a one-hour project.

Jacek

On 21 Feb 2017 4:44 a.m., "Linyuxin" 
<linyu...@huawei.com<mailto:linyu...@huawei.com>> wrote:
Hi All,
Is there any tool/api to check the sql syntax without running spark job 
actually?

Like the siddhiQL on storm here:
SiddhiManagerService. validateExecutionPlan
https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java
it can validate the syntax before running the sql on storm

this is very useful for exposing sql string as a DSL of the platform.

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>




答复: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Linyuxin
Actually,I want a standalone jar as I can check the syntax without spark 
execution environment

发件人: Irving Duran [mailto:irving.du...@gmail.com]
发送时间: 2017年2月21日 23:29
收件人: Yong Zhang <java8...@hotmail.com>
抄送: Jacek Laskowski <ja...@japila.pl>; Linyuxin <linyu...@huawei.com>; user 
<user@spark.apache.org>
主题: Re: [SparkSQL] pre-check syntex before running spark job?

You can also run it on REPL and test to see if you are getting the expected 
result.


Thank You,

Irving Duran

On Tue, Feb 21, 2017 at 8:01 AM, Yong Zhang 
<java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote:

You can always use explain method to validate your DF or SQL, before any action.



Yong


From: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>
Sent: Tuesday, February 21, 2017 4:34 AM
To: Linyuxin
Cc: user
Subject: Re: [SparkSQL] pre-check syntex before running spark job?

Hi,

Never heard about such a tool before. You could use Antlr to parse SQLs (just 
as Spark SQL does while parsing queries). I think it's a one-hour project.

Jacek

On 21 Feb 2017 4:44 a.m., "Linyuxin" 
<linyu...@huawei.com<mailto:linyu...@huawei.com>> wrote:
Hi All,
Is there any tool/api to check the sql syntax without running spark job 
actually?

Like the siddhiQL on storm here:
SiddhiManagerService. validateExecutionPlan
https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java
it can validate the syntax before running the sql on storm

this is very useful for exposing sql string as a DSL of the platform.

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>




[SparkSQL] pre-check syntex before running spark job?

2017-02-20 Thread Linyuxin
Hi All,
Is there any tool/api to check the sql syntax without running spark job 
actually?

Like the siddhiQL on storm here:
SiddhiManagerService. validateExecutionPlan
https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java
it can validate the syntax before running the sql on storm 

this is very useful for exposing sql string as a DSL of the platform.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



答复: 答复: submit spark task on yarn asynchronously via java?

2016-12-25 Thread Linyuxin
Thanks.

发件人: Naveen [mailto:hadoopst...@gmail.com]
发送时间: 2016年12月25日 0:33
收件人: Linyuxin <linyu...@huawei.com>
抄送: user <user@spark.apache.org>
主题: Re: 答复: submit spark task on yarn asynchronously via java?

Hi,
Please use SparkLauncher API class and invoke the threads using async calls 
using Futures.
Using SparkLauncher, you can mention class name, application resouce, arguments 
to be passed to the driver, deploy-mode etc.
I would suggest to use scala's Future, is scala code is possible.

https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/launcher/SparkLauncher.html
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html


On Fri, Dec 23, 2016 at 7:10 AM, Linyuxin 
<linyu...@huawei.com<mailto:linyu...@huawei.com>> wrote:
Hi,
Could Anybody help?

发件人: Linyuxin
发送时间: 2016年12月22日 14:18
收件人: user <user@spark.apache.org<mailto:user@spark.apache.org>>
主题: submit spark task on yarn asynchronously via java?

Hi All,

Version:
Spark 1.5.1
Hadoop 2.7.2

Is there any way to submit and monitor spark task on yarn via java 
asynchronously?





答复: submit spark task on yarn asynchronously via java?

2016-12-22 Thread Linyuxin
Hi,
Could Anybody help?

发件人: Linyuxin
发送时间: 2016年12月22日 14:18
收件人: user <user@spark.apache.org>
主题: submit spark task on yarn asynchronously via java?

Hi All,

Version:
Spark 1.5.1
Hadoop 2.7.2

Is there any way to submit and monitor spark task on yarn via java 
asynchronously?




submit spark task on yarn asynchronously via java?

2016-12-21 Thread Linyuxin
Hi All,

Version:
Spark 1.5.1
Hadoop 2.7.2

Is there any way to submit and monitor spark task on yarn via java 
asynchronously?




How to avoid sql injection on SparkSQL?

2016-08-04 Thread Linyuxin
Hi All,
I want to know how to avoid sql injection on SparkSQL
Is there any common pattern about this?
e.g. some useful tool or code segment

or just create a “wheel” on SparkSQL myself.

Thanks.


Any reference of performance tuning on SparkSQL?

2016-07-28 Thread Linyuxin
Hi ALL
 Is there any  reference of performance tuning on SparkSQL?
I can only find about turning on spark core on http://spark.apache.org/


Where is the SparkSQL Specification?

2016-07-21 Thread Linyuxin
Hi All
Newbee here.
My spark version is 1.5.1

And I want to know how can I find the Specification of Spark SQL to find out 
that if it is supported ‘a like %b_xx’ or other sql syntax