Re: How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread hemant singh
You can use spark dataframe 'when' 'otherwise' clause to replace SQL case
statement.

This piece will be required to calculate before -

'select student_id from tbl_student where candidate_id = c.candidate_id and
approval_id = 2
and academic_start_date is null'

Take the count of above DF after joining tbl_student and tbl_network DF's
based on condition above.

Overall you can join all three tables first and rest of the query on the
same dataframe.


On Sat, Sep 22, 2018 at 1:08 AM Chetan Khatri 
wrote:

> Dear Spark Users,
>
> I came across little weird MSSQL Query to replace with Spark and I am like
> no clue how to do it in an efficient way with Scala + SparkSQL. Can someone
> please throw light. I can create view of DataFrame and do it as *spark.sql
> *(query) but I would like to do it with Scala + Spark way.
>
> Sample:
>
>
>
>
>
>
>
>
>
>
>
> *select a.student_id,a.candidate_id, a.student_name, a.student_standard,
> a.student_city, b.teacher_name, a.student_status ,a.approval_id, case when
> a.approval_id = 2 and (a.academic_start_date is nulland not exists (select
> student_id from tbl_student where candidate_id = c.candidate_id and
> approval_id = 2and academic_start_date is null)) then 'Yes'else 'No'end as
> is_currentfrom tbl_student a inner join tbl_teacher b on a.candidate_id =
> b.candidate_id inner join tbl_network con c.candidate_id = a.candidate_id*
>
> Thank you.
>
>


Re: Kafka Connector version support

2018-09-21 Thread Shixiong(Ryan) Zhu
-dev
+user

We don't backport new features to a maintenance branch. All new updates
will be just in 2.4.

Best Regards,
Ryan

On Fri, Sep 21, 2018 at 2:44 PM, Basil Hariri <
basil.har...@microsoft.com.invalid> wrote:

> Hi all,
>
>
>
> Are there any plans to backport the recent (2.4) updates to the
> Spark-Kafka adapter for use with Spark v2.3, or will the updates just be
> for v2.4+?
>
>
>
> Thanks,
>
> Basil
>


How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread Chetan Khatri
Dear Spark Users,

I came across little weird MSSQL Query to replace with Spark and I am like
no clue how to do it in an efficient way with Scala + SparkSQL. Can someone
please throw light. I can create view of DataFrame and do it as
*spark.sql *(query)
but I would like to do it with Scala + Spark way.

Sample:











*select a.student_id,a.candidate_id, a.student_name, a.student_standard,
a.student_city, b.teacher_name, a.student_status ,a.approval_id, case when
a.approval_id = 2 and (a.academic_start_date is nulland not exists (select
student_id from tbl_student where candidate_id = c.candidate_id and
approval_id = 2and academic_start_date is null)) then 'Yes'else 'No'end as
is_currentfrom tbl_student a inner join tbl_teacher b on a.candidate_id =
b.candidate_id inner join tbl_network con c.candidate_id = a.candidate_id*

Thank you.


Lightweight pipeline execution for single eow

2018-09-21 Thread Jatin Puri
Hi.

What tactics can I apply for such a scenario.

I have a pipeline of 10 stages. Simple text processing. I train the data
with the pipeline and for the fitted data, do some modelling and store the
results.

I also have a web-server, where I receive requests. For each request
(dataframe of single row), I transform against the same pipeline created
above. And do the respective action. The problem is: calling spark for
single row takes less than  1 second, but under  higher  load, spark
becomes  a major bottleneck.

One solution  that I can  think of, is to have scala re-implementation of
the same pipeline, and with  the help of the model generated above, process
the requests. But this results in  duplication of code and hence
maintenance.

Is there any way, that I can call the same pipeline (transform) in a very
light manner, and just for single row. So that it just works concurrently
and spark does not remain a bottlenect?

Thanks
Jatin


unsubscribe

2018-09-21 Thread Mario Amatucci


Mario Amatucci
Senior Software Engineer

Office: +48 12 881 10 05 x 31463   
Email: mario_amatu...@epam.com
Gdansk, Poland   epam.com

~do more with less~

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) 
to which it is addressed and contains information that is legally privileged 
and confidential. If you are not the intended recipient, or the person 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. All unintended recipients are obliged to 
delete this message and destroy any printed copies.


From: Ryan Adams 
Sent: Thursday, September 20, 2018 7:54 PM
To: user@spark.apache.org
Subject: unsubscribe

unsubscribe

Ryan Adams
radams...@gmail.com


Spark Use Case Analysis

2018-09-21 Thread Ambi, Aniket
Hi Team,
I am trying one use case using Spark Streaming and I am not sure If I can solve 
it using spark.

My spark stream will listen to multiple Kafka topics where each topic will 
receives various counters with diff values.
I need to process multiple (around 200) KPI expressions using those counters 
and publish back onto Kafka.
My problem is, to calculate one particular KPI, I am not sure in which batch 
those counters will be present. And I need to calculate such 200 KPIs.
I am thinking about Structured Streaming to keep the state by unable to fit the 
solution into it.
Please help.




Thanks,
Aniket



Re: Live Streamed Code Review today at 11am Pacific

2018-09-21 Thread Gourav Sengupta
Thanks a ton :)

these are absolutely the best sessions that no one should miss.

Regards,
Gourav Sengupta

On Fri, Sep 21, 2018 at 7:40 AM Holden Karau  wrote:

> I'm going to be doing this again tomorrow, Friday the 21st, at 9am -
> https://www.youtube.com/watch?v=xb2FsHaozVQ / http://twitch.tv/holdenkarau
> :) As always if you have anything you want me to look at in particular send
> me a message. https://github.com/apache/spark/pull/22275 (Arrow
> out-of-order batches) is my current plan to start with :)
>
> On Thu, Jul 19, 2018 at 11:38 PM Holden Karau 
> wrote:
>
>> Heads up tomorrows Friday review is going to be at 8:30 am instead of
>> 9:30 am because I had to move some flights around.
>>
>> On Fri, Jul 13, 2018 at 12:03 PM, Holden Karau 
>> wrote:
>>
>>> This afternoon @ 3pm pacific I'll be looking at review tooling for Spark
>>> & Beam https://www.youtube.com/watch?v=ff8_jbzC8JI.
>>>
>>> Next week's regular Friday code (this time July 20th @ 9:30am pacific)
>>> review will once again probably have more of an ML focus for folks
>>> interested in watching Spark ML PRs be reviewed -
>>>  https://www.youtube.com/watch?v=aG5h99yb6XE
>>> 
>>>
>>> Next week I'll have a live coding session with more of a Beam focus if
>>> you want to see something a bit different (but still related since Beam
>>> runs on Spark) with a focus on Python dependency management (which is a
>>> thing we are also exploring in Spark at the same time) -
>>> https://www.youtube.com/watch?v=Sv0XhS2pYqA on July 19th at 2pm pacific.
>>>
>>> P.S.
>>>
>>> You can follow more generally me holdenkarau on YouTube
>>> 
>>> and holdenkarau on Twitch  to be
>>> notified even when I forget to send out the emails (which is pretty often).
>>>
>>> This morning I did another live review session I forgot to ping to the
>>> list about (
>>> https://www.youtube.com/watch?v=M_lRFptcGTI=PLRLebp9QyZtYF46jlSnIu2x1NDBkKa2uw=31
>>>  )
>>> and yesterday I did some live coding using PySpark and working on Sparkling
>>> ML -
>>> https://www.youtube.com/watch?v=kCnBDpNce9A=PLRLebp9QyZtYF46jlSnIu2x1NDBkKa2uw=32
>>>
>>> On Wed, Jun 27, 2018 at 10:44 AM, Holden Karau 
>>> wrote:
>>>
 Today @ 1:30pm pacific I'll be looking at the current Spark 2.1.3 RC
 and see how we validate Spark releases -
 https://www.twitch.tv/events/VAg-5PKURQeH15UAawhBtw /
 https://www.youtube.com/watch?v=1_XLrlKS26o . Tomorrow @ 12:30 live PR
 reviews & Monday live coding - https://youtube.com/user/holdenkarau &
 https://www.twitch.tv/holdenkarau/events . Hopefully this can
 encourage more folks to help with RC validation & PR reviews :)

 On Thu, Jun 14, 2018 at 6:07 AM, Holden Karau 
 wrote:

> Next week is pride in San Francisco but I'm still going to do two
> quick session. One will be live coding with Apache Spark to collect ASF
> diversity information ( https://www.youtube.com/watch?v=OirnFnsU37A /
> https://www.twitch.tv/events/O1edDMkTRBGy0I0RCK-Afg ) on Monday at
> 9am pacific and the other will be the regular Friday code review (
> https://www.youtube.com/watch?v=IAWm4OLRoyY /
> https://www.twitch.tv/events/v0qzXxnNQ_K7a8JYFsIiKQ ) also at 9am.
>
> On Thu, Jun 7, 2018 at 9:10 PM, Holden Karau 
> wrote:
>
>> I'll be doing another one tomorrow morning at 9am pacific focused on
>> Python + K8s support & improved JSON support -
>> https://www.youtube.com/watch?v=Z7ZEkvNwneU &
>> https://www.twitch.tv/events/xU90q9RGRGSOgp2LoNsf6A :)
>>
>> On Fri, Mar 9, 2018 at 3:54 PM, Holden Karau 
>> wrote:
>>
>>> If anyone wants to watch the recording:
>>> https://www.youtube.com/watch?v=lugG_2QU6YU
>>>
>>> I'll do one next week as well - March 16th @ 11am -
>>> https://www.youtube.com/watch?v=pXzVtEUjrLc
>>>
>>> On Fri, Mar 9, 2018 at 9:28 AM, Holden Karau 
>>> wrote:
>>>
 Hi folks,

 If your curious in learning more about how Spark is developed, I’m
 going to expirement doing a live code review where folks can watch and 
 see
 how that part of our process works. I have two volunteers already for
 having their PRs looked at live, and if you have a Spark PR your 
 working on
 you’d like me to livestream a review of please ping me.

 The livestream will be at
 https://www.youtube.com/watch?v=lugG_2QU6YU.

 Cheers,

 Holden :)
 --
 Twitter: https://twitter.com/holdenkarau

>>>
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau

Re: Spark2 DynamicAllocation doesn't release executors that used cache

2018-09-21 Thread Sergejs Andrejevs
Has anybody tried dynamic allocation with executors, which use cache?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Live Streamed Code Review today at 11am Pacific

2018-09-21 Thread Holden Karau
I'm going to be doing this again tomorrow, Friday the 21st, at 9am -
https://www.youtube.com/watch?v=xb2FsHaozVQ / http://twitch.tv/holdenkarau
:) As always if you have anything you want me to look at in particular send
me a message. https://github.com/apache/spark/pull/22275 (Arrow
out-of-order batches) is my current plan to start with :)

On Thu, Jul 19, 2018 at 11:38 PM Holden Karau  wrote:

> Heads up tomorrows Friday review is going to be at 8:30 am instead of 9:30
> am because I had to move some flights around.
>
> On Fri, Jul 13, 2018 at 12:03 PM, Holden Karau 
> wrote:
>
>> This afternoon @ 3pm pacific I'll be looking at review tooling for Spark
>> & Beam https://www.youtube.com/watch?v=ff8_jbzC8JI.
>>
>> Next week's regular Friday code (this time July 20th @ 9:30am pacific)
>> review will once again probably have more of an ML focus for folks
>> interested in watching Spark ML PRs be reviewed -
>>  https://www.youtube.com/watch?v=aG5h99yb6XE
>> 
>>
>> Next week I'll have a live coding session with more of a Beam focus if
>> you want to see something a bit different (but still related since Beam
>> runs on Spark) with a focus on Python dependency management (which is a
>> thing we are also exploring in Spark at the same time) -
>> https://www.youtube.com/watch?v=Sv0XhS2pYqA on July 19th at 2pm pacific.
>>
>> P.S.
>>
>> You can follow more generally me holdenkarau on YouTube
>> 
>> and holdenkarau on Twitch  to be
>> notified even when I forget to send out the emails (which is pretty often).
>>
>> This morning I did another live review session I forgot to ping to the
>> list about (
>> https://www.youtube.com/watch?v=M_lRFptcGTI=PLRLebp9QyZtYF46jlSnIu2x1NDBkKa2uw=31
>>  )
>> and yesterday I did some live coding using PySpark and working on Sparkling
>> ML -
>> https://www.youtube.com/watch?v=kCnBDpNce9A=PLRLebp9QyZtYF46jlSnIu2x1NDBkKa2uw=32
>>
>> On Wed, Jun 27, 2018 at 10:44 AM, Holden Karau 
>> wrote:
>>
>>> Today @ 1:30pm pacific I'll be looking at the current Spark 2.1.3 RC and
>>> see how we validate Spark releases -
>>> https://www.twitch.tv/events/VAg-5PKURQeH15UAawhBtw /
>>> https://www.youtube.com/watch?v=1_XLrlKS26o . Tomorrow @ 12:30 live PR
>>> reviews & Monday live coding - https://youtube.com/user/holdenkarau &
>>> https://www.twitch.tv/holdenkarau/events . Hopefully this can encourage
>>> more folks to help with RC validation & PR reviews :)
>>>
>>> On Thu, Jun 14, 2018 at 6:07 AM, Holden Karau 
>>> wrote:
>>>
 Next week is pride in San Francisco but I'm still going to do two quick
 session. One will be live coding with Apache Spark to collect ASF diversity
 information ( https://www.youtube.com/watch?v=OirnFnsU37A /
 https://www.twitch.tv/events/O1edDMkTRBGy0I0RCK-Afg ) on Monday at 9am
 pacific and the other will be the regular Friday code review (
 https://www.youtube.com/watch?v=IAWm4OLRoyY /
 https://www.twitch.tv/events/v0qzXxnNQ_K7a8JYFsIiKQ ) also at 9am.

 On Thu, Jun 7, 2018 at 9:10 PM, Holden Karau 
 wrote:

> I'll be doing another one tomorrow morning at 9am pacific focused on
> Python + K8s support & improved JSON support -
> https://www.youtube.com/watch?v=Z7ZEkvNwneU &
> https://www.twitch.tv/events/xU90q9RGRGSOgp2LoNsf6A :)
>
> On Fri, Mar 9, 2018 at 3:54 PM, Holden Karau 
> wrote:
>
>> If anyone wants to watch the recording:
>> https://www.youtube.com/watch?v=lugG_2QU6YU
>>
>> I'll do one next week as well - March 16th @ 11am -
>> https://www.youtube.com/watch?v=pXzVtEUjrLc
>>
>> On Fri, Mar 9, 2018 at 9:28 AM, Holden Karau 
>> wrote:
>>
>>> Hi folks,
>>>
>>> If your curious in learning more about how Spark is developed, I’m
>>> going to expirement doing a live code review where folks can watch and 
>>> see
>>> how that part of our process works. I have two volunteers already for
>>> having their PRs looked at live, and if you have a Spark PR your 
>>> working on
>>> you’d like me to livestream a review of please ping me.
>>>
>>> The livestream will be at
>>> https://www.youtube.com/watch?v=lugG_2QU6YU.
>>>
>>> Cheers,
>>>
>>> Holden :)
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>



 --
 Twitter: https://twitter.com/holdenkarau

>>>
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>


-- 
Twitter: