Re: Apache Beam UI job creator

2018-10-09 Thread Karan Kumar
HI Andrew
This is fantastic news. Yes we can definitely collaborate as we also plan
to open source what ever we are doing. We already have 2 people deep diving
into the steamline code base. This would help them take the next leap.
Reaching out to you from my official mail ID.

Thanks
Karan

On Wed, Oct 10, 2018 at 8:20 AM Andrew Psaltis 
wrote:

> Hi Karan,
> I'm familiar with the streamline codebase and have been a huge proponent
> of having an open source UI to allow the construction of real-time
> pipelines. I truly believe that this should be built on top of Beam. I was
> actually planning on forking the streamline codebase and pursuing this type
> of idea, but have been delayed due to other commitments.
>
> Is this something you would want to collaborate on?
>
> Best,
> Andrew
>
> On Tue, Oct 9, 2018 at 3:13 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi,
>>
>> I  don't know any open source UI right now.
>>
>> Regards
>> JB
>>
>> On 09/10/2018 04:09, Karan Kumar wrote:
>> > HI Juan and Jean
>> > Thanks for the reply. We were looking to adopt an open source codebase.
>> > Any pointers in that direction?
>> >
>> >
>> > On Mon, Oct 8, 2018 at 9:05 PM Jean-Baptiste Onofré > > > wrote:
>> >
>> > Hi
>> >
>> > We have such tool at Talend (named datastreams), already available
>> > (beta) as Amazon ami.
>> >
>> > Regards
>> > JB
>> > Le 8 oct. 2018, à 12:24, Karan Kumar > > > a écrit:
>> >
>> > Hello
>> >
>> > We want to expose a GUI for our engineers/business analysts to
>> > create real time pipelines using drag and drop constructs.
>> > Projects such as https://github.com/TouK/nussknacker for flink
>> > and https://github.com/hortonworks/streamline for storm match
>> > our requirements.
>> >
>> > We wanted to understand if a UI job creator is on the road map
>> > for the beam community or
>> > if there are any projects which have taken a stab at solving
>> > this problem.
>> >
>> > --
>> > Thanks
>> > Karan
>> >
>> >
>> >
>> > --
>> > Thanks
>> > Karan
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Thanks
Karan


Re: Apache Beam UI job creator

2018-10-09 Thread Andrew Psaltis
Hi Karan,
I'm familiar with the streamline codebase and have been a huge proponent of
having an open source UI to allow the construction of real-time pipelines.
I truly believe that this should be built on top of Beam. I was actually
planning on forking the streamline codebase and pursuing this type of idea,
but have been delayed due to other commitments.

Is this something you would want to collaborate on?

Best,
Andrew

On Tue, Oct 9, 2018 at 3:13 PM Jean-Baptiste Onofré  wrote:

> Hi,
>
> I  don't know any open source UI right now.
>
> Regards
> JB
>
> On 09/10/2018 04:09, Karan Kumar wrote:
> > HI Juan and Jean
> > Thanks for the reply. We were looking to adopt an open source codebase.
> > Any pointers in that direction?
> >
> >
> > On Mon, Oct 8, 2018 at 9:05 PM Jean-Baptiste Onofré  > > wrote:
> >
> > Hi
> >
> > We have such tool at Talend (named datastreams), already available
> > (beta) as Amazon ami.
> >
> > Regards
> > JB
> > Le 8 oct. 2018, à 12:24, Karan Kumar  > > a écrit:
> >
> > Hello
> >
> > We want to expose a GUI for our engineers/business analysts to
> > create real time pipelines using drag and drop constructs.
> > Projects such as https://github.com/TouK/nussknacker for flink
> > and https://github.com/hortonworks/streamline for storm match
> > our requirements.
> >
> > We wanted to understand if a UI job creator is on the road map
> > for the beam community or
> > if there are any projects which have taken a stab at solving
> > this problem.
> >
> > --
> > Thanks
> > Karan
> >
> >
> >
> > --
> > Thanks
> > Karan
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Vishwas Bm
Hi Kenn,

We are using Beam 2.6 and using Spark_submit to submit jobs to Spark 2.2
cluster on Kubernetes.


On Tue, Oct 9, 2018, 9:29 PM Kenneth Knowles  wrote:

> Thanks for the report! I filed
> https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.
>
> Can you share what version of Beam you are using?
>
> Kenn
>
> On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm  wrote:
>
>> We are trying to setup a pipeline with using BeamSql and the trigger used
>> is default (AfterWatermark crosses the window).
>> Below is the pipeline:
>>
>>KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql
>> ---> KafkaSink (KafkaIO)
>>
>> We are using Spark Runner for this.
>> The BeamSql query is:
>>  select Col3, count(*) as count_col1 from PCOLLECTION GROUP
>> BY Col3
>>
>> We are grouping by Col3 which is a string. It can hold values
>> string[0-9].
>>
>> The records are getting emitted out at 1 min to kafka sink, but the
>> output record in kafka is not as expected.
>> Below is the output observed: (WST and WET are indicators for window
>> start time and window end time)
>>
>> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00   +"}
>> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00   +"}
>> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00   +"}
>> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00   +"}
>>
>>
>>
>>
>>
>>
>>
>>
>> *{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00 
>> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
>> +","WET":"2018-10-09  09-56-00   +"}*
>>
>> We ran the same pipeline using direct and flink runner and we dont see 0
>> entries for count_col1.
>>
>> As per beam matrix page (
>> https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
>> GroupBy is not fully supported,is this one of those cases ?
>>
>> *Thanks & Regards,*
>>
>> *Vishwas *
>>
>>


Re: Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles
Thanks for the report! I filed
https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.

Can you share what version of Beam you are using?

Kenn

On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm  wrote:

> We are trying to setup a pipeline with using BeamSql and the trigger used
> is default (AfterWatermark crosses the window).
> Below is the pipeline:
>
>KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql
> ---> KafkaSink (KafkaIO)
>
> We are using Spark Runner for this.
> The BeamSql query is:
>  select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY
> Col3
>
> We are grouping by Col3 which is a string. It can hold values string[0-9].
>
> The records are getting emitted out at 1 min to kafka sink, but the output
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start
> time and window end time)
>
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00   +"}
>
>
>
>
>
>
>
>
> *{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00 
> +"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
> +","WET":"2018-10-09  09-56-00   +"}*
>
> We ran the same pipeline using direct and flink runner and we dont see 0
> entries for count_col1.
>
> As per beam matrix page (
> https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
> GroupBy is not fully supported,is this one of those cases ?
>
> *Thanks & Regards,*
>
> *Vishwas *
>
>


Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Vishwas Bm
We are trying to setup a pipeline with using BeamSql and the trigger used
is default (AfterWatermark crosses the window).
Below is the pipeline:

   KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql
---> KafkaSink (KafkaIO)

We are using Spark Runner for this.
The BeamSql query is:
 select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY
Col3

We are grouping by Col3 which is a string. It can hold values string[0-9].

The records are getting emitted out at 1 min to kafka sink, but the output
record in kafka is not as expected.
Below is the output observed: (WST and WET are indicators for window start
time and window end time)

{"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00   +"}








*{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00 
+"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 
+","WET":"2018-10-09  09-56-00   +"}*

We ran the same pipeline using direct and flink runner and we dont see 0
entries for count_col1.

As per beam matrix page (
https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
GroupBy is not fully supported,is this one of those cases ?

*Thanks & Regards,*

*Vishwas *


Re: Apache Beam UI job creator

2018-10-09 Thread Jean-Baptiste Onofré
Hi,

I  don't know any open source UI right now.

Regards
JB

On 09/10/2018 04:09, Karan Kumar wrote:
> HI Juan and Jean
> Thanks for the reply. We were looking to adopt an open source codebase.
> Any pointers in that direction?
> 
> 
> On Mon, Oct 8, 2018 at 9:05 PM Jean-Baptiste Onofré  > wrote:
> 
> Hi
> 
> We have such tool at Talend (named datastreams), already available
> (beta) as Amazon ami.
> 
> Regards
> JB
> Le 8 oct. 2018, à 12:24, Karan Kumar  > a écrit:
> 
> Hello
> 
> We want to expose a GUI for our engineers/business analysts to
> create real time pipelines using drag and drop constructs.
> Projects such as https://github.com/TouK/nussknacker for flink
> and https://github.com/hortonworks/streamline for storm match
> our requirements.
> 
> We wanted to understand if a UI job creator is on the road map
> for the beam community or 
> if there are any projects which have taken a stab at solving
> this problem.
> 
> -- 
> Thanks
> Karan 
> 
> 
> 
> -- 
> Thanks
> Karan

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com