答复: GroupBy in Spark / Scala without Agg functions
Hi, Why not group by first then join? BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ Source code of 2.1: def distinct(): Dataset[T] = dropDuplicates() … def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { … Aggregate(groupCols, aggCols, logicalPlan) } 发件人: Chetan Khatri [mailto:chetan.opensou...@gmail.com] 发送时间: 2018年5月30日 2:52 收件人: Irving Duran 抄送: Georg Heiler ; user 主题: Re: GroupBy in Spark / Scala without Agg functions Georg, Sorry for dumb question. Help me to understand - if i do DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg in sql right ? On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri mailto:chetan.opensou...@gmail.com>> wrote: I don't want to get any aggregation, just want to know rather saying distinct to all columns any other better approach ? On Wed, May 30, 2018 at 12:16 AM, Irving Duran mailto:irving.du...@gmail.com>> wrote: Unless you want to get a count, yes. Thank You, Irving Duran On Tue, May 29, 2018 at 1:44 PM Chetan Khatri mailto:chetan.opensou...@gmail.com>> wrote: Georg, I just want to double check that someone wrote MSSQL Server script where it's groupby all columns. What is alternate best way to do distinct all columns ? On Wed, May 30, 2018 at 12:08 AM, Georg Heiler mailto:georg.kf.hei...@gmail.com>> wrote: Why do you group if you do not want to aggregate? Isn't this the same as select distinct? Chetan Khatri mailto:chetan.opensou...@gmail.com>> schrieb am Di., 29. Mai 2018 um 20:21 Uhr: All, I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function: Pseudocode: select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d ob from student as m inner join general_register g on m.student_id = g.student_i d group by m.student_id, m.student_name, m.student_std, m.student_group, m.student_dob I tried to doing in spark but i am not able to get Dataframe as return value, how this kind of things could be done in Spark. Thanks
答复: [SparkSQL] pre-check syntex before running spark job?
Hi Gurdit Singh Thanks. It is very helpful. 发件人: Gurdit Singh [mailto:gurdit.si...@bitwiseglobal.com] 发送时间: 2017年2月22日 13:31 收件人: Linyuxin <linyu...@huawei.com>; Irving Duran <irving.du...@gmail.com>; Yong Zhang <java8...@hotmail.com> 抄送: Jacek Laskowski <ja...@japila.pl>; user <user@spark.apache.org> 主题: RE: [SparkSQL] pre-check syntex before running spark job? Hi, you can use spark sql Antlr grammer for pre check you syntax. https://github.com/apache/spark/blob/acf71c63cdde8dced8d108260cdd35e1cc992248/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 From: Linyuxin [mailto:linyu...@huawei.com] Sent: Wednesday, February 22, 2017 7:34 AM To: Irving Duran <irving.du...@gmail.com<mailto:irving.du...@gmail.com>>; Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> Cc: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>; user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: 答复: [SparkSQL] pre-check syntex before running spark job? Actually,I want a standalone jar as I can check the syntax without spark execution environment 发件人: Irving Duran [mailto:irving.du...@gmail.com] 发送时间: 2017年2月21日 23:29 收件人: Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> 抄送: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>; Linyuxin <linyu...@huawei.com<mailto:linyu...@huawei.com>>; user <user@spark.apache.org<mailto:user@spark.apache.org>> 主题: Re: [SparkSQL] pre-check syntex before running spark job? You can also run it on REPL and test to see if you are getting the expected result. Thank You, Irving Duran On Tue, Feb 21, 2017 at 8:01 AM, Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote: You can always use explain method to validate your DF or SQL, before any action. Yong From: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>> Sent: Tuesday, February 21, 2017 4:34 AM To: Linyuxin Cc: user Subject: Re: [SparkSQL] pre-check syntex before running spark job? Hi, Never heard about such a tool before. You could use Antlr to parse SQLs (just as Spark SQL does while parsing queries). I think it's a one-hour project. Jacek On 21 Feb 2017 4:44 a.m., "Linyuxin" <linyu...@huawei.com<mailto:linyu...@huawei.com>> wrote: Hi All, Is there any tool/api to check the sql syntax without running spark job actually? Like the siddhiQL on storm here: SiddhiManagerService. validateExecutionPlan https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java it can validate the syntax before running the sql on storm this is very useful for exposing sql string as a DSL of the platform. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
答复: [SparkSQL] pre-check syntex before running spark job?
Actually,I want a standalone jar as I can check the syntax without spark execution environment 发件人: Irving Duran [mailto:irving.du...@gmail.com] 发送时间: 2017年2月21日 23:29 收件人: Yong Zhang <java8...@hotmail.com> 抄送: Jacek Laskowski <ja...@japila.pl>; Linyuxin <linyu...@huawei.com>; user <user@spark.apache.org> 主题: Re: [SparkSQL] pre-check syntex before running spark job? You can also run it on REPL and test to see if you are getting the expected result. Thank You, Irving Duran On Tue, Feb 21, 2017 at 8:01 AM, Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote: You can always use explain method to validate your DF or SQL, before any action. Yong From: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>> Sent: Tuesday, February 21, 2017 4:34 AM To: Linyuxin Cc: user Subject: Re: [SparkSQL] pre-check syntex before running spark job? Hi, Never heard about such a tool before. You could use Antlr to parse SQLs (just as Spark SQL does while parsing queries). I think it's a one-hour project. Jacek On 21 Feb 2017 4:44 a.m., "Linyuxin" <linyu...@huawei.com<mailto:linyu...@huawei.com>> wrote: Hi All, Is there any tool/api to check the sql syntax without running spark job actually? Like the siddhiQL on storm here: SiddhiManagerService. validateExecutionPlan https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java it can validate the syntax before running the sql on storm this is very useful for exposing sql string as a DSL of the platform. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
[SparkSQL] pre-check syntex before running spark job?
Hi All, Is there any tool/api to check the sql syntax without running spark job actually? Like the siddhiQL on storm here: SiddhiManagerService. validateExecutionPlan https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java it can validate the syntax before running the sql on storm this is very useful for exposing sql string as a DSL of the platform. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
答复: 答复: submit spark task on yarn asynchronously via java?
Thanks. 发件人: Naveen [mailto:hadoopst...@gmail.com] 发送时间: 2016年12月25日 0:33 收件人: Linyuxin <linyu...@huawei.com> 抄送: user <user@spark.apache.org> 主题: Re: 答复: submit spark task on yarn asynchronously via java? Hi, Please use SparkLauncher API class and invoke the threads using async calls using Futures. Using SparkLauncher, you can mention class name, application resouce, arguments to be passed to the driver, deploy-mode etc. I would suggest to use scala's Future, is scala code is possible. https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/launcher/SparkLauncher.html https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html On Fri, Dec 23, 2016 at 7:10 AM, Linyuxin <linyu...@huawei.com<mailto:linyu...@huawei.com>> wrote: Hi, Could Anybody help? 发件人: Linyuxin 发送时间: 2016年12月22日 14:18 收件人: user <user@spark.apache.org<mailto:user@spark.apache.org>> 主题: submit spark task on yarn asynchronously via java? Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?
答复: submit spark task on yarn asynchronously via java?
Hi, Could Anybody help? 发件人: Linyuxin 发送时间: 2016年12月22日 14:18 收件人: user <user@spark.apache.org> 主题: submit spark task on yarn asynchronously via java? Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?
submit spark task on yarn asynchronously via java?
Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?
How to avoid sql injection on SparkSQL?
Hi All, I want to know how to avoid sql injection on SparkSQL Is there any common pattern about this? e.g. some useful tool or code segment or just create a “wheel” on SparkSQL myself. Thanks.
Any reference of performance tuning on SparkSQL?
Hi ALL Is there any reference of performance tuning on SparkSQL? I can only find about turning on spark core on http://spark.apache.org/
Where is the SparkSQL Specification?
Hi All Newbee here. My spark version is 1.5.1 And I want to know how can I find the Specification of Spark SQL to find out that if it is supported ‘a like %b_xx’ or other sql syntax