Re: Apache Samza Meetup - March 4 @6PM hosted at LinkedIn's campus in Mountain View CA

2015-03-02 Thread Chris Riccomini
Hey all,

Replying to this message in case anyone missed it. Appears that GMail
thinks Ed is spam. :)

Cheers,
Chris

On Mon, Mar 2, 2015 at 9:58 AM, Ed Yakabosky <
eyakabo...@linkedin.com.invalid> wrote:

> Hi all -
>
> I would like to announce the first Bay Area Apache Samza Meetup<
> http://www.meetup.com/Bay-Area-Samza-Meetup/events/220354853/> hosted at
> LinkedIn in Mountain View, CA on March 4, 2015 @6PM.  We plan to host the
> event every 2-months to encourage knowledge sharing & collaboration in
> Samza’s usage and open source<
> http://samza.apache.org/> community.
>
> The agenda for the meetup is::
>
>   *   6:00 – 6:15PM: Doors open, sign NDAs, networking, food & drinks
>   *   6:15 - 6:45PM: Naveen Somasundaram (LinkedIn) – Getting Started with
> Apache Samza
>   *   6:45 - 7:30PM: Shekar Tippur (Intuit) – Powering Contextual Alerts
> for Intuit’s Operations Center with Apache Samza
>   *   7:30 - 8:00PM: Chris Riccomini (LinkedIn) – Where we’re headed next:
> Apache Samza Roadmap
>
> We plan to provide food & drinks so please RSVP here<
> http://www.meetup.com/Bay-Area-Samza-Meetup/events/220354853/> to help us
> with estimation.  Please let me know if you have any questions or ideas for
> future meet ups.
>
> We plan to announce a live stream the day of the event for remote
> attendance.
>
> Excited to see you there!
> Ed Yakabosky
>
> [BCC:
> Kafka Open Source
> Samza Open Source
> LinkedIn’s DDS and DAI teams
> Linkedin’s Samza customers
> Tech-Talk]
>


Review Request 31649: SAMZA-584

2015-03-02 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31649/
---

Review request for samza.


Bugs: SAMZA-584
https://issues.apache.org/jira/browse/SAMZA-584


Repository: samza


Description
---

fixing race condition


Diffs
-

  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala
 38117e2193dd99fe08f0be9d7736cb89575c0723 

Diff: https://reviews.apache.org/r/31649/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Handling defaults and windowed aggregates in stream queries

2015-03-02 Thread Milinda Pathirage
Hi Yi,

As I understand rules and re-writes basically do the same thing
(changing/re-writing the operator tree). But in case of rules this happens
during planning based on the query planner configuration. And re-writing is
done on the planner output, after the query goes through the planner. In
Calcite re-write is happening inside the interpreter and in our case it
will be inside the query plan to operator router conversion phase.

Thanks
Milinda

On Mon, Mar 2, 2015 at 2:31 PM, Yi Pan  wrote:

> Hi, Milinda,
>
> +1 on your default window idea. One question: what's the difference between
> a rule and a re-write?
>
> Thanks!
>
> On Mon, Mar 2, 2015 at 7:14 AM, Milinda Pathirage 
> wrote:
>
> > @Chris
> > Yes, I was referring to that mail. Actually I was wrong about the ‘Now’
> > window, it should be a ‘Unbounded’ window for most the default scenarios
> > (Section 6.4 of https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf).
> > Because
> > applying a ‘Now’ window with size of 1 will double the number of events
> > generated if we consider insert/delete streams. But ‘Unbounded’ will only
> > generate insert events.
> >
> > @Yi
> > 1. You are correct about Calcite.There is no stream-to-relation
> conversion
> > happening. But as I understand we don’t need Calcite to support this. We
> > can add it to our query planner as a rule or re-write. What I am not sure
> > is whether to use a rule or a re-write.
> > 2. There is a rule in Calcite which extract the Window out from the
> > Project. But I am not sure why that didn’t happen in my test. This rule
> is
> > added to the planner by default. I’ll ask about this in Calcite mailing
> > list.
> >
> > I think we can figure out a way to move the window to the input stream if
> > Calcite can move the window out from Project. I’ll see how we can do
> this.
> >
> > Also I’ll go ahead and implement default windows. We can change it later
> if
> > Julian or someone from Calcite comes up with a better suggestion.
> >
> > Thanks
> > Milinda
> >
> > On Sun, Mar 1, 2015 at 8:23 PM, Yi Pan  wrote:
> >
> > > Hi, Milinda,
> > >
> > > Sorry to reply late on this. Here are some of my comments:
> > > 1) In Calcite's model, it seems that there is no stream-to-relation
> > > conversion step. In the first example where the window specification is
> > > missing, I like your solution to add the default LogicalNowWindow
> > operator
> > > s.t. it makes the physical operator matches the query plan. However, if
> > > Calcite community does not agree to add the default LogicalNowWindow,
> it
> > > would be fine for us if we always insert a default "now" window on a
> > stream
> > > when we generate the Samza configuration.
> > > 2) I am more concerned on the other cases, where window operator is
> used
> > in
> > > aggregation and join. In your example of windowed aggregation in
> Calcite,
> > > window spec seems to be a decoration to the LogicalProject operator,
> > > instead of defining a data source to the LogicalProject operator. In
> the
> > > CQL model we followed, the window operator is considered as a query
> > > primitive that generate a data source for other relation operators to
> > > consume. How exactly is window operator used in Calcite planner? Isn't
> it
> > > much clear if the following is used?
> > > LogicalProject(EXPR$0=[CASE(>(COUNT($2), 0), CAST($SUM0($2)):INTEGER,
> > > null)])
> > >LogicalWindow(ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
> > >
> > > On Thu, Feb 26, 2015 at 12:18 PM, Milinda Pathirage <
> > mpath...@umail.iu.edu
> > > >
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I ask about $subject in calcite-dev. You can find the archived
> > discussion
> > > > at [1]. I think your thoughts are also valuable in this discussion in
> > > > calcite list.
> > > >
> > > > I discovered the requirement for a default window operator when I
> tried
> > > to
> > > > integrate streamscan (I was using tablescan prevously) into the
> > physical
> > > > plan generation logic. Because of the way we have written the
> > > > OperatorRouter API, we always need a stream-to-relation operator at
> the
> > > > input. But Calcite generates a query plan like following:
> > > >
> > > > LogicalDelta
> > > >   LogicalProject(id=[$0], product=[$1], quantity=[$2])
> > > > LogicalFilter(condition=[>($2, 5)])
> > > >
> > > >   StreamScan(table=[[KAFKA, ORDERS]], fields=[[0, 1, 2]])
> > > >
> > > > If we consider LogicalFilter as a relation operator, we need
> something
> > to
> > > > convert input stream to a relation before sending the tuples
> > downstream.
> > > > In addition to this, there is a optimization where we consider filter
> > > > operator as a tuple operator and have it between StreamScan and
> > > > stream-to-relation operator as a way of reducing the amount of
> messages
> > > > going downstream.
> > > >
> > > > Other scenario is windowed aggregates. Currently window spec is
> > attached
> > > to
> > > > the LogicalProject in query plan like following:
> > > >
>

Re: Handling defaults and windowed aggregates in stream queries

2015-03-02 Thread Yi Pan
Hi, Milinda,

+1 on your default window idea. One question: what's the difference between
a rule and a re-write?

Thanks!

On Mon, Mar 2, 2015 at 7:14 AM, Milinda Pathirage 
wrote:

> @Chris
> Yes, I was referring to that mail. Actually I was wrong about the ‘Now’
> window, it should be a ‘Unbounded’ window for most the default scenarios
> (Section 6.4 of https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf).
> Because
> applying a ‘Now’ window with size of 1 will double the number of events
> generated if we consider insert/delete streams. But ‘Unbounded’ will only
> generate insert events.
>
> @Yi
> 1. You are correct about Calcite.There is no stream-to-relation conversion
> happening. But as I understand we don’t need Calcite to support this. We
> can add it to our query planner as a rule or re-write. What I am not sure
> is whether to use a rule or a re-write.
> 2. There is a rule in Calcite which extract the Window out from the
> Project. But I am not sure why that didn’t happen in my test. This rule is
> added to the planner by default. I’ll ask about this in Calcite mailing
> list.
>
> I think we can figure out a way to move the window to the input stream if
> Calcite can move the window out from Project. I’ll see how we can do this.
>
> Also I’ll go ahead and implement default windows. We can change it later if
> Julian or someone from Calcite comes up with a better suggestion.
>
> Thanks
> Milinda
>
> On Sun, Mar 1, 2015 at 8:23 PM, Yi Pan  wrote:
>
> > Hi, Milinda,
> >
> > Sorry to reply late on this. Here are some of my comments:
> > 1) In Calcite's model, it seems that there is no stream-to-relation
> > conversion step. In the first example where the window specification is
> > missing, I like your solution to add the default LogicalNowWindow
> operator
> > s.t. it makes the physical operator matches the query plan. However, if
> > Calcite community does not agree to add the default LogicalNowWindow, it
> > would be fine for us if we always insert a default "now" window on a
> stream
> > when we generate the Samza configuration.
> > 2) I am more concerned on the other cases, where window operator is used
> in
> > aggregation and join. In your example of windowed aggregation in Calcite,
> > window spec seems to be a decoration to the LogicalProject operator,
> > instead of defining a data source to the LogicalProject operator. In the
> > CQL model we followed, the window operator is considered as a query
> > primitive that generate a data source for other relation operators to
> > consume. How exactly is window operator used in Calcite planner? Isn't it
> > much clear if the following is used?
> > LogicalProject(EXPR$0=[CASE(>(COUNT($2), 0), CAST($SUM0($2)):INTEGER,
> > null)])
> >LogicalWindow(ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
> >
> > On Thu, Feb 26, 2015 at 12:18 PM, Milinda Pathirage <
> mpath...@umail.iu.edu
> > >
> > wrote:
> >
> > > Hi devs,
> > >
> > > I ask about $subject in calcite-dev. You can find the archived
> discussion
> > > at [1]. I think your thoughts are also valuable in this discussion in
> > > calcite list.
> > >
> > > I discovered the requirement for a default window operator when I tried
> > to
> > > integrate streamscan (I was using tablescan prevously) into the
> physical
> > > plan generation logic. Because of the way we have written the
> > > OperatorRouter API, we always need a stream-to-relation operator at the
> > > input. But Calcite generates a query plan like following:
> > >
> > > LogicalDelta
> > >   LogicalProject(id=[$0], product=[$1], quantity=[$2])
> > > LogicalFilter(condition=[>($2, 5)])
> > >
> > >   StreamScan(table=[[KAFKA, ORDERS]], fields=[[0, 1, 2]])
> > >
> > > If we consider LogicalFilter as a relation operator, we need something
> to
> > > convert input stream to a relation before sending the tuples
> downstream.
> > > In addition to this, there is a optimization where we consider filter
> > > operator as a tuple operator and have it between StreamScan and
> > > stream-to-relation operator as a way of reducing the amount of messages
> > > going downstream.
> > >
> > > Other scenario is windowed aggregates. Currently window spec is
> attached
> > to
> > > the LogicalProject in query plan like following:
> > >
> > > LogicalProject(EXPR$0=[CASE(>(COUNT($2) OVER (ROWS BETWEEN 2 PRECEDING
> > AND
> > > 2 FOLLOWING), 0), CAST($SUM0($2) OVER (ROWS BETWEEN 2 PRECEDING AND 2
> > > FOLLOWING)):INTEGER, null)])
> > >
> > > I wanted to know from them whether it is possible to move window
> > operation
> > > just after the stream scan, so that it is compatible with our operator
> > > layer.
> > > May be there are better or easier ways to do this. So your comments are
> > > always welcome.
> > >
> > > Thanks
> > > Milinda
> > >
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/201502.mbox/browser
> > >
> > > --
> > > Milinda Pathirage
> > >
> > > PhD Student | Research Assistant
> > > School of 

Re: Stream SQL for Samza Query Language Guide and Design

2015-03-02 Thread Chris Riccomini
Hey all,

Just closing the loop on this. I've migrated the wiki to:

  https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza

The website has been updated accordingly.

Cheers,
Chris

On Wed, Feb 18, 2015 at 2:45 PM, Chris Riccomini 
wrote:

> Hey all,
>
> I've setup a request with INFRA to migrate to CWiki:
>
> https://issues.apache.org/jira/browse/INFRA-9176
>
> Cheers,
> Chris
>
> On Tue, Feb 17, 2015 at 9:57 AM, Yi Pan  wrote:
>
>> I shared the same pain with wiki before. Either cwiki or Markdown sounds
>> good to me.
>>
>> -Yi
>>
>> On Tue, Feb 17, 2015 at 9:53 AM, Chris Riccomini 
>> wrote:
>>
>> > Hey Milinda,
>> >
>> > Yea, I agree. Confluence is better than Moin Moin. If others agree, I
>> think
>> > we should just switch to Confluence.
>> >
>> > So, shall we block on Wiki upgrade? Probably will take a couple of days.
>> > What do others thing about migrating to Confluence for wiki?
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Tue, Feb 17, 2015 at 9:48 AM, Milinda Pathirage <
>> mpath...@umail.iu.edu>
>> > wrote:
>> >
>> > > Hi Chris,
>> > >
>> > > I am okay with Markdown.
>> > >
>> > > But we have the option of using cwiki (https://cwiki.apache.org). I
>> have
>> > > used it in the past for Axis project. It's pretty flexible, with
>> respect
>> > to
>> > > user management and I think its better than wiki deployed at
>> > > wiki.apache.org.
>> > > For an example, you can give permission to a user who is not a
>> committer
>> > > (of the project that owns the space) to edit certain pages in a space
>> > (Each
>> > > project can have its own space). Also it has commenting.
>> > >
>> > > On the other hand, JIRA may be  better  because we can track
>> everything
>> > in
>> > > a single place.
>> > >
>> > > Thanks
>> > > Milinda
>> > >
>> > >
>> > >
>> > > On Tue, Feb 17, 2015 at 12:16 PM, Chris Riccomini <
>> criccom...@apache.org
>> > >
>> > > wrote:
>> > >
>> > > > Hey Milinda,
>> > > >
>> > > > We can do wiki, but Apache's wiki is pretty locked down, sadly.
>> You'll
>> > > have
>> > > > to get an account for it.
>> > > >
>> > > > What if we just convert it to Markdown (like SAMZA-516), and attach
>> to
>> > > the
>> > > > patch? This will make it versioned, and editable. Comments can
>> happen
>> > in
>> > > > the JIRA. I'm OK either way, but the wiki has been a little
>> cumbersome
>> > in
>> > > > the past.
>> > > >
>> > > > Cheers,
>> > > > Chris
>> > > >
>> > > > On Mon, Feb 16, 2015 at 1:59 PM, Milinda Pathirage <
>> > > mpath...@umail.iu.edu>
>> > > > wrote:
>> > > >
>> > > > > Hi Chris and Yi,
>> > > > >
>> > > > > Most of the query language definitions  and examples in
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12690583/StreamSQLforSAMZA-v0.1.docx.docx
>> > > > > (document attached to
>> > https://issues.apache.org/jira/browse/SAMZA-390)
>> > > > are
>> > > > > no longer valid because we moved to extended SQL  supported in
>> > Calcite.
>> > > > So
>> > > > > we need to change the document to reflect the changes.
>> > > > >
>> > > > > Also I found that, having a document with several basic stream SQL
>> > > > samples
>> > > > > will be useful when developing the query planning and physical
>> plan
>> > > > > generation logic.
>> > > > >
>> > > > > How about moving above document to a wiki page or somewhere we
>> can do
>> > > > > shared editing?
>> > > > >
>> > > > > Thanks
>> > > > > Milinda
>> > > > >
>> > > > > --
>> > > > > Milinda Pathirage
>> > > > >
>> > > > > PhD Student | Research Assistant
>> > > > > School of Informatics and Computing | Data to Insight Center
>> > > > > Indiana University
>> > > > >
>> > > > > twitter: milindalakmal
>> > > > > skype: milinda.pathirage
>> > > > > blog: http://milinda.pathirage.org
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Milinda Pathirage
>> > >
>> > > PhD Student | Research Assistant
>> > > School of Informatics and Computing | Data to Insight Center
>> > > Indiana University
>> > >
>> > > twitter: milindalakmal
>> > > skype: milinda.pathirage
>> > > blog: http://milinda.pathirage.org
>> > >
>> >
>>
>
>


Apache Samza Meetup - March 4 @6PM hosted at LinkedIn's campus in Mountain View CA

2015-03-02 Thread Ed Yakabosky
Hi all -

I would like to announce the first Bay Area Apache Samza 
Meetup hosted at 
LinkedIn in Mountain View, CA on March 4, 2015 @6PM.  We plan to host the event 
every 2-months to encourage knowledge sharing & collaboration in Samza’s 
usage and open 
source community.

The agenda for the meetup is::

  *   6:00 – 6:15PM: Doors open, sign NDAs, networking, food & drinks
  *   6:15 - 6:45PM: Naveen Somasundaram (LinkedIn) – Getting Started with 
Apache Samza
  *   6:45 - 7:30PM: Shekar Tippur (Intuit) – Powering Contextual Alerts for 
Intuit’s Operations Center with Apache Samza
  *   7:30 - 8:00PM: Chris Riccomini (LinkedIn) – Where we’re headed next: 
Apache Samza Roadmap

We plan to provide food & drinks so please RSVP 
here to help us 
with estimation.  Please let me know if you have any questions or ideas for 
future meet ups.

We plan to announce a live stream the day of the event for remote attendance.

Excited to see you there!
Ed Yakabosky

[BCC:
Kafka Open Source
Samza Open Source
LinkedIn’s DDS and DAI teams
Linkedin’s Samza customers
Tech-Talk]


Re: Handling defaults and windowed aggregates in stream queries

2015-03-02 Thread Milinda Pathirage
@Chris
Yes, I was referring to that mail. Actually I was wrong about the ‘Now’
window, it should be a ‘Unbounded’ window for most the default scenarios
(Section 6.4 of https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf). Because
applying a ‘Now’ window with size of 1 will double the number of events
generated if we consider insert/delete streams. But ‘Unbounded’ will only
generate insert events.

@Yi
1. You are correct about Calcite.There is no stream-to-relation conversion
happening. But as I understand we don’t need Calcite to support this. We
can add it to our query planner as a rule or re-write. What I am not sure
is whether to use a rule or a re-write.
2. There is a rule in Calcite which extract the Window out from the
Project. But I am not sure why that didn’t happen in my test. This rule is
added to the planner by default. I’ll ask about this in Calcite mailing
list.

I think we can figure out a way to move the window to the input stream if
Calcite can move the window out from Project. I’ll see how we can do this.

Also I’ll go ahead and implement default windows. We can change it later if
Julian or someone from Calcite comes up with a better suggestion.

Thanks
Milinda

On Sun, Mar 1, 2015 at 8:23 PM, Yi Pan  wrote:

> Hi, Milinda,
>
> Sorry to reply late on this. Here are some of my comments:
> 1) In Calcite's model, it seems that there is no stream-to-relation
> conversion step. In the first example where the window specification is
> missing, I like your solution to add the default LogicalNowWindow operator
> s.t. it makes the physical operator matches the query plan. However, if
> Calcite community does not agree to add the default LogicalNowWindow, it
> would be fine for us if we always insert a default "now" window on a stream
> when we generate the Samza configuration.
> 2) I am more concerned on the other cases, where window operator is used in
> aggregation and join. In your example of windowed aggregation in Calcite,
> window spec seems to be a decoration to the LogicalProject operator,
> instead of defining a data source to the LogicalProject operator. In the
> CQL model we followed, the window operator is considered as a query
> primitive that generate a data source for other relation operators to
> consume. How exactly is window operator used in Calcite planner? Isn't it
> much clear if the following is used?
> LogicalProject(EXPR$0=[CASE(>(COUNT($2), 0), CAST($SUM0($2)):INTEGER,
> null)])
>LogicalWindow(ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
>
> On Thu, Feb 26, 2015 at 12:18 PM, Milinda Pathirage  >
> wrote:
>
> > Hi devs,
> >
> > I ask about $subject in calcite-dev. You can find the archived discussion
> > at [1]. I think your thoughts are also valuable in this discussion in
> > calcite list.
> >
> > I discovered the requirement for a default window operator when I tried
> to
> > integrate streamscan (I was using tablescan prevously) into the physical
> > plan generation logic. Because of the way we have written the
> > OperatorRouter API, we always need a stream-to-relation operator at the
> > input. But Calcite generates a query plan like following:
> >
> > LogicalDelta
> >   LogicalProject(id=[$0], product=[$1], quantity=[$2])
> > LogicalFilter(condition=[>($2, 5)])
> >
> >   StreamScan(table=[[KAFKA, ORDERS]], fields=[[0, 1, 2]])
> >
> > If we consider LogicalFilter as a relation operator, we need something to
> > convert input stream to a relation before sending the tuples downstream.
> > In addition to this, there is a optimization where we consider filter
> > operator as a tuple operator and have it between StreamScan and
> > stream-to-relation operator as a way of reducing the amount of messages
> > going downstream.
> >
> > Other scenario is windowed aggregates. Currently window spec is attached
> to
> > the LogicalProject in query plan like following:
> >
> > LogicalProject(EXPR$0=[CASE(>(COUNT($2) OVER (ROWS BETWEEN 2 PRECEDING
> AND
> > 2 FOLLOWING), 0), CAST($SUM0($2) OVER (ROWS BETWEEN 2 PRECEDING AND 2
> > FOLLOWING)):INTEGER, null)])
> >
> > I wanted to know from them whether it is possible to move window
> operation
> > just after the stream scan, so that it is compatible with our operator
> > layer.
> > May be there are better or easier ways to do this. So your comments are
> > always welcome.
> >
> > Thanks
> > Milinda
> >
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/201502.mbox/browser
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
> >
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org