Re: How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread hemant singh
You can use spark dataframe 'when' 'otherwise' clause to replace SQL case statement. This piece will be required to calculate before - 'select student_id from tbl_student where candidate_id = c.candidate_id and approval_id = 2 and academic_start_date is null' Take the count of above DF after

Re: Kafka Connector version support

2018-09-21 Thread Shixiong(Ryan) Zhu
-dev +user We don't backport new features to a maintenance branch. All new updates will be just in 2.4. Best Regards, Ryan On Fri, Sep 21, 2018 at 2:44 PM, Basil Hariri < basil.har...@microsoft.com.invalid> wrote: > Hi all, > > > > Are there any plans to backport the recent (2.4) updates to

How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread Chetan Khatri
Dear Spark Users, I came across little weird MSSQL Query to replace with Spark and I am like no clue how to do it in an efficient way with Scala + SparkSQL. Can someone please throw light. I can create view of DataFrame and do it as *spark.sql *(query) but I would like to do it with Scala + Spark

Lightweight pipeline execution for single eow

2018-09-21 Thread Jatin Puri
Hi. What tactics can I apply for such a scenario. I have a pipeline of 10 stages. Simple text processing. I train the data with the pipeline and for the fitted data, do some modelling and store the results. I also have a web-server, where I receive requests. For each request (dataframe of

unsubscribe

2018-09-21 Thread Mario Amatucci
Mario Amatucci Senior Software Engineer Office: +48 12 881 10 05 x 31463 Email: mario_amatu...@epam.com Gdansk, Poland epam.com ~do more with less~ CONFIDENTIALITY CAUTION AND DISCLAIMER This message is intended only for the use of

Spark Use Case Analysis

2018-09-21 Thread Ambi, Aniket
Hi Team, I am trying one use case using Spark Streaming and I am not sure If I can solve it using spark. My spark stream will listen to multiple Kafka topics where each topic will receives various counters with diff values. I need to process multiple (around 200) KPI expressions using those

Re: Live Streamed Code Review today at 11am Pacific

2018-09-21 Thread Gourav Sengupta
Thanks a ton :) these are absolutely the best sessions that no one should miss. Regards, Gourav Sengupta On Fri, Sep 21, 2018 at 7:40 AM Holden Karau wrote: > I'm going to be doing this again tomorrow, Friday the 21st, at 9am - > https://www.youtube.com/watch?v=xb2FsHaozVQ /

Re: Spark2 DynamicAllocation doesn't release executors that used cache

2018-09-21 Thread Sergejs Andrejevs
Has anybody tried dynamic allocation with executors, which use cache? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Live Streamed Code Review today at 11am Pacific

2018-09-21 Thread Holden Karau
I'm going to be doing this again tomorrow, Friday the 21st, at 9am - https://www.youtube.com/watch?v=xb2FsHaozVQ / http://twitch.tv/holdenkarau :) As always if you have anything you want me to look at in particular send me a message. https://github.com/apache/spark/pull/22275 (Arrow out-of-order