答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Linyuxin
Hi, Why not group by first then join? BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ Source code of 2.1: def distinct(): Dataset[T] = dropDuplicates() … def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { … Aggregate(groupCols, aggCols, logicalPlan)

答复: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Linyuxin
Hi Gurdit Singh Thanks. It is very helpful. 发件人: Gurdit Singh [mailto:gurdit.si...@bitwiseglobal.com] 发送时间: 2017年2月22日 13:31 收件人: Linyuxin <linyu...@huawei.com>; Irving Duran <irving.du...@gmail.com>; Yong Zhang <java8...@hotmail.com> 抄送: Jacek Laskowski <ja...@j

答复: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Linyuxin
Actually,I want a standalone jar as I can check the syntax without spark execution environment 发件人: Irving Duran [mailto:irving.du...@gmail.com] 发送时间: 2017年2月21日 23:29 收件人: Yong Zhang <java8...@hotmail.com> 抄送: Jacek Laskowski <ja...@japila.pl>; Linyuxin <linyu...@huawei.co

[SparkSQL] pre-check syntex before running spark job?

2017-02-20 Thread Linyuxin
Hi All, Is there any tool/api to check the sql syntax without running spark job actually? Like the siddhiQL on storm here: SiddhiManagerService. validateExecutionPlan https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java it

答复: 答复: submit spark task on yarn asynchronously via java?

2016-12-25 Thread Linyuxin
Thanks. 发件人: Naveen [mailto:hadoopst...@gmail.com] 发送时间: 2016年12月25日 0:33 收件人: Linyuxin <linyu...@huawei.com> 抄送: user <user@spark.apache.org> 主题: Re: 答复: submit spark task on yarn asynchronously via java? Hi, Please use SparkLauncher API class and invoke the threads using async

答复: submit spark task on yarn asynchronously via java?

2016-12-22 Thread Linyuxin
Hi, Could Anybody help? 发件人: Linyuxin 发送时间: 2016年12月22日 14:18 收件人: user <user@spark.apache.org> 主题: submit spark task on yarn asynchronously via java? Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?

submit spark task on yarn asynchronously via java?

2016-12-21 Thread Linyuxin
Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?

How to avoid sql injection on SparkSQL?

2016-08-04 Thread Linyuxin
Hi All, I want to know how to avoid sql injection on SparkSQL Is there any common pattern about this? e.g. some useful tool or code segment or just create a “wheel” on SparkSQL myself. Thanks.

Any reference of performance tuning on SparkSQL?

2016-07-28 Thread Linyuxin
Hi ALL Is there any reference of performance tuning on SparkSQL? I can only find about turning on spark core on http://spark.apache.org/

Where is the SparkSQL Specification?

2016-07-21 Thread Linyuxin
Hi All Newbee here. My spark version is 1.5.1 And I want to know how can I find the Specification of Spark SQL to find out that if it is supported ‘a like %b_xx’ or other sql syntax