Re: Webinar 3: An introduction to the Spark runner for Apache Beam

2020-05-21 Thread Aizhamal Nurmamat kyzy
Hi all,

You can watch the recording from yesterday's webinar here:
https://www.youtube.com/watch?v=XI9Y85qks1w

Also you can find the links to the older recordings and presentation slides
of the series in this repo: https://github.com/aijamalnk/beam-learning-month

On Tue, May 19, 2020 at 10:44 PM Aizhamal Nurmamat kyzy 
wrote:

> Hi again!
>
> As it is becoming a tradition for this month, I am sending you a reminder
> about tomorrow's webinar on 'Introduction to the Spark Runner for Apache
> Beam' by our dear Ismael Mejia. We start tomorrow at 10:00am PST/ 5:00pm
> GMT/1:00pm EST.
>
> You can join by signing up here:
> https://learn.xnextcon.com/event/eventdetails/W20052010
>
> If you cannot get into the meeting room on Zoom, you can go to this
> Youtube
> 
>  channel
> for the same livestream, but we encourage attendees to join Zoom to be able
> to ask speakers questions.
>
> The webinar will be recorded and posted on Beam's YT channel later on Wed,
> and all the resources used during the presentation will be shared on this
> repo:
> https://github.com/aijamalnk/beam-learning-month/blob/master/README.md
>
> Thanks,
> Aizhama
>


Re: Best approach for Sql queries in Beam

2020-05-21 Thread bharghavi vajrala
Hi Brain,

Yes, it will be easy if I can use single sql query.

I did try but it didn't  work, dont remember  the error  correctly  but its
something  related  to windows.  I am unable to give more than 2 tables in
one pTransform, every time I have to have ptransform only with one join not
more than that and not even group  by.

On Fri, 22 May 2020, 01:24 Brian Hulette,  wrote:

> It might help if you share what exactly you are hoping to improve upon. Do
> you want to avoid the separate sessionWindow transforms? Do it all with one
> SQL query? Or just have it be generally more concise/readable?
>
> I'd think it would be possible to do this with a single SQL query but I'm
> not sure, maybe the windowing would mess things up. Did you try joining all
> 5 streams with a single query and run into a problem?
>
> Brian
>
> On Tue, May 19, 2020 at 9:38 AM bharghavi vajrala 
> wrote:
>
>> Hi All,
>>
>> Need your inputs on below scenario:
>>
>> Source : Kafka (Actual source is oracle db, data is pushed to kafka)
>> SDK :  Java
>> Runner : Flink
>> Problem: Subscribe to 5 topics(tables) join with different keys, Group by
>> based on few columns.
>> Existing solution: Using session window of 20 seconds having different
>> transform for every 2 queries and using the result.
>>
>> Below is the sample code:
>>
>> Sessions sessionWindow =
>> Sessions.withGapDuration(Duration.standardSeconds((long)
>> Integer.parseInt("20")));
>>
>>   PCollection stream1 =
>> PCollectionTuple
>> .of(new TupleTag<>("TABLE1"), rows1)
>> .and(new TupleTag<>("TABLE2"), rows2)
>> .apply("rows1-rows2", SqlTransform.query(
>> "select  t1.col1,t1.col2,t2.col5 from "
>> "TABLE1 t1  join TABLE2 t2 \n" +
>> "on t1.col5 = t2.col7 "
>> )
>> )
>> .apply("window" , Window.into(sessionWindow));
>>
>>PCollection mergedStream =
>> PCollectionTuple
>> .of(new TupleTag<>("MERGE-TABLE"), stream1)
>> .apply("merge" , SqlTransform.query("select
>> col1,col2, \n" +
>> "max(case when col3='D' then col8   end)
>> as D_col3,\n" +
>> "max(case when col3='C' then col8   end)
>> as C_col3,\n" +
>> "max(case when col6='CP' then col10
>>  end) as CP_col6,\n" +
>> "max(case when col6='DP' then col10
>>  end) as DP_col6\n" +
>> "from MERGE-TABLE " +
>> "group by  col1,col2\n "
>> )).apply("merge-window",
>> Window.into(sessionWindow));
>>
>>  PCollection stream2 =
>> PCollectionTuple
>> .of(new TupleTag<>("TABLE3"), mergedStream)
>> .and(new TupleTag<>("TABLE4"), stream22)
>> .apply(
>>
>>  SqlTransform.query("select distinct c1,c2,c4 from   " +
>> "TABLE3
>> d1 join TABLE4  d2\n" +
>> " on  d1.num= d2.tr_num  "))
>> .apply("e-window" , Window.into(sessionWindow));
>>
>>
>> Is there any better approach?
>>
>> Looking forward for suggestions.
>>
>> Thanks!!
>>
>>


Re: Best approach for Sql queries in Beam

2020-05-21 Thread Brian Hulette
It might help if you share what exactly you are hoping to improve upon. Do
you want to avoid the separate sessionWindow transforms? Do it all with one
SQL query? Or just have it be generally more concise/readable?

I'd think it would be possible to do this with a single SQL query but I'm
not sure, maybe the windowing would mess things up. Did you try joining all
5 streams with a single query and run into a problem?

Brian

On Tue, May 19, 2020 at 9:38 AM bharghavi vajrala 
wrote:

> Hi All,
>
> Need your inputs on below scenario:
>
> Source : Kafka (Actual source is oracle db, data is pushed to kafka)
> SDK :  Java
> Runner : Flink
> Problem: Subscribe to 5 topics(tables) join with different keys, Group by
> based on few columns.
> Existing solution: Using session window of 20 seconds having different
> transform for every 2 queries and using the result.
>
> Below is the sample code:
>
> Sessions sessionWindow =
> Sessions.withGapDuration(Duration.standardSeconds((long)
> Integer.parseInt("20")));
>
>   PCollection stream1 =
> PCollectionTuple
> .of(new TupleTag<>("TABLE1"), rows1)
> .and(new TupleTag<>("TABLE2"), rows2)
> .apply("rows1-rows2", SqlTransform.query(
> "select  t1.col1,t1.col2,t2.col5 from "
> "TABLE1 t1  join TABLE2 t2 \n" +
> "on t1.col5 = t2.col7 "
> )
> )
> .apply("window" , Window.into(sessionWindow));
>
>PCollection mergedStream =
> PCollectionTuple
> .of(new TupleTag<>("MERGE-TABLE"), stream1)
> .apply("merge" , SqlTransform.query("select
> col1,col2, \n" +
> "max(case when col3='D' then col8   end)
> as D_col3,\n" +
> "max(case when col3='C' then col8   end)
> as C_col3,\n" +
> "max(case when col6='CP' then col10   end)
> as CP_col6,\n" +
> "max(case when col6='DP' then col10   end)
> as DP_col6\n" +
> "from MERGE-TABLE " +
> "group by  col1,col2\n "
> )).apply("merge-window",
> Window.into(sessionWindow));
>
>  PCollection stream2 =
> PCollectionTuple
> .of(new TupleTag<>("TABLE3"), mergedStream)
> .and(new TupleTag<>("TABLE4"), stream22)
> .apply(
>
>  SqlTransform.query("select distinct c1,c2,c4 from   " +
> "TABLE3 d1
> join TABLE4  d2\n" +
> " on  d1.num= d2.tr_num  "))
> .apply("e-window" , Window.into(sessionWindow));
>
>
> Is there any better approach?
>
> Looking forward for suggestions.
>
> Thanks!!
>
>


Re: New Dates For Beam Summit Digital 2020

2020-05-21 Thread Maximilian Michels
Thanks Matthias!

We realized we want this to be a much more community-driven process.
That's why we are planning to be more transparent to give everyone a
chance to get involved in the summit. Given that we now have more time,
this will be much more feasible.

Cheers,
Max

On 18.05.20 20:00, Matthias Baetens wrote:
> Dear Beam community,
> 
> A few weeks ago, we announced the dates for the Beam Digital Summit and
> we know the community received this news with excitement. This is a
> great opportunity to create and share content about streaming analytics
> and the solutions that teams around the world have created using Apache
> Beam and its ecosystem. 
> 
> We have chosen August 24-28th as the new dates. While this has been a
> difficult decision, we think it’s the right decision to ensure we
> produce the best possible event. We encourage you to send your talk
> proposals, anything from use cases, lightning talks, or workshop ideas.
> 
> Based on this change, the CFP will remain open until June 15th. We would
> love to hear about what you are doing with Beam, how to improve it, and
> how to strengthen our community.
> 
> We thank you for your understanding! See you soon!
> 
> -Griselda Cuevas, Brittany Hermann, Maximilian Michels, Austin Bennett,
> Matthias Baetens, Alex Van Boxel
>