Re: [DISCUSS] Accept JW Player SQE Code Donation

P. Taylor Goetz Thu, 08 Sep 2016 09:34:45 -0700

One possibility I’m eager to explore is taking the SQL parsing capabilities in 
storm-sql (i.e. Calcite) and applying it to SQE. Since SQE syntax is already 
“SQL-like” it should(?) be relatively straightforward to do so. That might be 
the fastest path to a production ready SQL API.


Anyway the discussion has died down so I’ll initiate a VOTE.

-Taylor

> On Sep 6, 2016, at 12:33 AM, Satish Duggana <[email protected]> wrote:
> 
> Agree with Jungtaek on the below.
> 
>   - Better to support SQL instead of SQL like (SQL like creates
>   confusions). We are using Apache Calcite, we should continue with that.
>   - Currently trident is used but we should move to windowing abstractions
>   later for specifying boundedness to run the queries.
> 
> We should not have two APIs for SQL(storm-sql) and SQL like(SQE APIs).
> I am +1 for having this code integrated with storm-sql project and expose
> the respective APIs in Storm-SQL.
> 
> Thanks,
> Satish.
> 
> 
> On Tue, Sep 6, 2016 at 6:42 AM, Jungtaek Lim <[email protected]> wrote:
> 
>> Thanks JW Player folks to come in and express your support. I can see the
>> sponsors of SQE which makes me feel that SQE is nice enough. And also I
>> agree "production-ready" is a great point to value.
>> 
>> I have been positive to merge this in, just wondering how we merge Storm
>> SQL and SQE for exposing better interface to users.
>> Supporting SQL-like (DSL) is not same as supporting SQL, and some of
>> competitor projects are already supporting SQL. I think this is not a thing
>> to compromise. Since both Storm SQL and SQE are based on Trident there
>> should be no notable hurdle to add SQE features to Storm SQL. If making SQE
>> to support SQL is easier, I'm also positive to go ahead that direction.
>> (Btw, my vision for Storm SQL is moving to core - tuple based - with
>> windowing, not relying on Trident. It might be after that Storm introduces
>> higher-level API for Beam or another purposes.)
>> 
>> - Jungtaek Lim (HeartSaVioR)
>> 
>> 2016년 9월 3일 (토) 오전 1:42, Douglas Shore <[email protected]>님이 작성:
>> 
>>> We have benefited greatly from being downstream from SQE in powering our
>>> data driven solutions.
>>> 
>>> I am excited to see this repo grow in breadth and depth.
>>> 
>>> 
>>> On Fri, Sep 2, 2016 at 11:16 AM, Kamil Sindi <[email protected]> wrote:
>>> 
>>>> Our data science efforts rely on SQE to power our recommendations
>>> engine. I
>>>> am also excited to contribute to it especially as we continue to
>>> implement
>>>> predictive models at larger scales.
>>>> 
>>>> On Fri, Sep 2, 2016 at 10:57 AM, Sahil Shah <[email protected]>
>> wrote:
>>>> 
>>>>> I would like to throw my support behind SQE. Having working with it
>> in
>>> a
>>>>> production environment, I have seen the many benefits in testing new
>>>>> topologies and quickly understanding what a topology is doing. As our
>>>> data
>>>>> needs have grown, we have only increased our reliance on SQE and it
>>>> stands
>>>>> the test repeatedly. I am excited at the opportunity to contribute to
>>>> this
>>>>> wonderful open source community.
>>>>> 
>>>>> On Fri, Sep 2, 2016 at 10:31 AM, Alex Halter <[email protected]>
>>> wrote:
>>>>> 
>>>>>> I too want to voice my support for SQE and our commitment to the
>>>>> initiative
>>>>>> going forward. We've been working on adapting Storm to our needs
>> for
>>>> most
>>>>>> of two years. It was thoughtfully designed and supports our
>>> production
>>>>>> needs. We have a long list of features we want to build out and
>> we'd
>>>> love
>>>>>> to work with the community.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Sep 2, 2016 at 10:19 AM, Rohit Garg <
>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> I am one of the developers who has been working on SQE for past
>> 1.5
>>>>>> years.
>>>>>>> Over time, we have made it more stable and production ready.
>>>>>>> 
>>>>>>> As of now, one can easily scale SQE for more production data with
>>>> easy
>>>>>>> config changes and re-deploy, aggregate across different
>> dimensions
>>>> by
>>>>>>> writing json like sql, write to different state stores and most
>>>>>>> importantly, address new feature requirements really quick.(Since
>>>> it's
>>>>>> just
>>>>>>> writing a sql like json file and sqe handles everything for you
>> ! )
>>>>>>> 
>>>>>>> I think SQE can really help companies who want to setup a
>>> production
>>>>>> ready
>>>>>>> and well tested framework within weeks (instead of months) for
>>> large
>>>>>> scale
>>>>>>> event stream processing and with minimum risks and limited
>>> resources.
>>>>> We
>>>>>>> are actively working on SQE to make it more awesome and are
>>> committed
>>>>> to
>>>>>>> make the experience of developing a highly scalable and fault
>>>> tolerant
>>>>>>> stream processing framework more seamless and less stressful !!!!
>>>>>>> 
>>>>>>> On Fri, Sep 2, 2016 at 9:49 AM, Lee Morris <[email protected]>
>>> wrote:
>>>>>>> 
>>>>>>>> Hi, Storm Dev!
>>>>>>>> 
>>>>>>>> I wanted to chime in to show support for SQE and show how
>>> committed
>>>>> we
>>>>>>> are
>>>>>>>> to SQE. *StormSQL looks awesome and has some real potential! *
>>>>>>>> 
>>>>>>>> We use SQE in production. It has been tested, code reviewed,
>> load
>>>>>> tested,
>>>>>>>> maintained, and processing an average of 8 million tuples per
>>>> minute
>>>>> or
>>>>>>>> more for over a year now. The investment into this code base
>> has
>>>> been
>>>>>>>> significant.
>>>>>>>> 
>>>>>>>> Please take a look at the code itself. The production quality
>>> code
>>>> is
>>>>>>> ready
>>>>>>>> to go. Developers with no experience with Storm or even
>> streaming
>>>>>>>> successfully launch robust topologies using SQE.  Our
>>> productivity
>>>> in
>>>>>>> this
>>>>>>>> area went up by orders of magnitude.
>>>>>>>> 
>>>>>>>> Based on this experience we realized the value of querying
>> storm,
>>>> and
>>>>>> we
>>>>>>>> decided to give that value back to the storm community.
>>>>>>>> 
>>>>>>>> Our data pipelines and real-time processing are very important
>> to
>>>> the
>>>>>>>> success of JW Player. SQE has been a foundation for that. We
>> will
>>>>>>> continue
>>>>>>>> to invest into this technology for years to come. Unfortunately
>>> we
>>>>>>> wouldn't
>>>>>>>> be able to adopt StormSQL as is until it has been put through
>> the
>>>>>>> crucible
>>>>>>>> of production level usage and has had the same rigor applied.
>> It
>>>>> seems
>>>>>>> much
>>>>>>>> of the development has been over the last couple of weeks.
>>>>>>>> 
>>>>>>>> *Quick Gap Analysis (Not Exhaustive)*
>>>>>>>> *States*
>>>>>>>>  - SQE supports Redis and MongoDB as states in addition to
>>> Kafka.
>>>>>> (Soon
>>>>>>>> adding a Test/Monitor State)
>>>>>>>>  - SQE supports non-static field names for Redis state
>>>>>>>>  - Storm SQL supports Kafka
>>>>>>>>  - SQE supports replay filtering for Kafka
>>>>>>>> 
>>>>>>>> *Aggregations*
>>>>>>>>  - SQE supports stateful, exactly-once aggregations for states
>>>> that
>>>>>>>> support it
>>>>>>>>  - Storm SQL supports aggregations within each micro batch
>>>>>>>> 
>>>>>>>> *SQL*
>>>>>>>>  - StormSQL supports SQL
>>>>>>>> - SQE supports SQL "like" JSON
>>>>>>>> 
>>>>>>>> *Scaling*
>>>>>>>>  - SQE has a mechanism for controlling parallelism or scaling
>>>>>>>>  - Could not find parallelism or scaling controls within
>>> StormSQL
>>>>> (May
>>>>>>>> need to look harder)
>>>>>>>> 
>>>>>>>> *Support for SQE*
>>>>>>>> So far the SQE / JW Player developers have been watching this
>>>> thread
>>>>>>>> without knowing if we should chime in. I call upon the devs at
>> JW
>>>> to
>>>>>>> chime
>>>>>>>> in because we are dedicated to the success of this SQL in
>> Storm.
>>>>>>>> 
>>>>>>>> (Noticed I said "chime" three times in this email... well now
>>> four
>>>>>> times)
>>>>>>>> 
>>>>>>>> Thanks for reading,
>>>>>>>> 
>>>>>>>> Lee Morris, Sr Principal Engineer, Data  |  JWPLAYER
>>>>>>>> 
>>>>>>>> O: 212.244.0140 <212.244.0140%20x999>  |  M: 215.920.1331
>>>>>>>> 
>>>>>>>> 2 Park Avenue, 10th Floor North, New York NY 10016
>>>>>>>> 
>>>>>>>> jwplayer.com  |  @jwplayer <http://twitter.com/jwplayer>
>>>>>>>> 
>>>>>>>> On Tue, Aug 30, 2016 at 5:46 PM, Jungtaek Lim <
>> [email protected]
>>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Morrigan,
>>>>>>>>> 
>>>>>>>>> Thanks for joining discussion. I thought we need to hear your
>>>> goal
>>>>> to
>>>>>>>>> donate SQE code, and opinion for how to apply SQE to Storm
>> SQL
>>>> and
>>>>>>>> working
>>>>>>>>> on further improvements.
>>>>>>>>> 
>>>>>>>>> Not sure when you took a look at the feature set of Storm
>> SQL,
>>>> but
>>>>> if
>>>>>>> you
>>>>>>>>> haven't recently, you may want to do that.
>>>>>>>>> I started working on improving Storm SQL several weeks ago,
>> and
>>>>> many
>>>>>>>> things
>>>>>>>>> are addressed in recent weeks.
>>>>>>>>> 
>>>>>>>>> * STORM-1435 <https://issues.apache.org/
>> jira/browse/STORM-1435
>>>> :
>>>>> You
>>>>>>> can
>>>>>>>>> easily launch Storm SQL runner without concerning
>> dependencies
>>>> for
>>>>>>> Storm
>>>>>>>>> SQL core and runtime. It wasn't easy to run before STORM-2016
>>>>>>>>> <http://issues.apache.org/jira/browse/STORM-2016> is
>>> introduced.
>>>>>>>>> * Refactored Storm SQL code for Trident to fit to Trident
>>>>> operations.
>>>>>>>> Storm
>>>>>>>>> SQL parsed SQL and generated topology code but it was not
>> easy
>>> to
>>>>>> know
>>>>>>>> how
>>>>>>>>> topology code is generated, and also hard to determine how
>>>> Trident
>>>>>>>>> optimizations are applied.
>>>>>>>>> * STORM-1434 <https://issues.apache.org/
>> jira/browse/STORM-1434
>>>> ,
>>>>>>>>> STORM-2050
>>>>>>>>> <https://issues.apache.org/jira/browse/STORM-2050>:
>> Addressed
>>>>> GROUP
>>>>>> BY
>>>>>>>>> with
>>>>>>>>> UDAF (User Defined Aggregate Function) on Trident mode. Storm
>>> SQL
>>>>>>> already
>>>>>>>>> supported UDF on Trident mode.
>>>>>>>>> * STORM-2057 <https://issues.apache.org/
>> jira/browse/STORM-2057
>>>> :
>>>>>> JOIN
>>>>>>>>> (inner, left outer, right outer, full outer) feature is now
>> on
>>>>>>> reviewing.
>>>>>>>>> Note that only equi-join is supported.
>>>>>>>>> 
>>>>>>>>> The changes are not included to official release yet, but I
>>>> expect
>>>>>>> Storm
>>>>>>>>> 1.1.0 will include them which are worth to try out for early
>>>>>> adopters.
>>>>>>>>> 
>>>>>>>>> You can also refer STORM-1433
>>>>>>>>> <https://issues.apache.org/jira/browse/STORM-1433> for
>> current
>>>>> phase
>>>>>>> of
>>>>>>>>> Storm SQL. Might need to have another phases (epics) for
>>>> resolving
>>>>>>> other
>>>>>>>>> issues as well.
>>>>>>>>> 
>>>>>>>>> I only had a look at SQE wiki so don't know the detailed
>>> features
>>>>> of
>>>>>>> SQE,
>>>>>>>>> but my feeling is that recent changes fills the gap between
>> SQE
>>>> and
>>>>>>> Storm
>>>>>>>>> SQL, and even addressing some TODOs of SQE. We might need to
>>>> cross
>>>>>>> check
>>>>>>>>> feature set of each project to make clear on pros and cons
>> for
>>>> each
>>>>>>>>> project.
>>>>>>>>> 
>>>>>>>>> Btw, while Storm SQL has been implemented its missing
>> features,
>>>> the
>>>>>>>>> difficult part for Storm SQL is SQL optimizations. There
>> seems
>>>> lots
>>>>>> of
>>>>>>>> SQL
>>>>>>>>> optimizations (like filter pushdown) but I'm not expert on
>> that
>>>> and
>>>>>> it
>>>>>>>>> apparently needs more deep understanding of Calcite. Other
>>> parts
>>>>> also
>>>>>>>> need
>>>>>>>>> contributors but we strongly need contributors in this area.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>>> 
>>>>>>>>> 2016년 8월 31일 (수) 오전 12:47, Morrigan Jones <
>>> [email protected]
>>>>>> 님이
>>>>>>> 작성:
>>>>>>>>> 
>>>>>>>>>> Hi, I'm the original creator and primary developer of SQE.
>>>> Sorry
>>>>>> for
>>>>>>>>>> the radio silence on my part, I was out on vacation the
>> past
>>>> two
>>>>>>>>>> weeks.
>>>>>>>>>> 
>>>>>>>>>> I'm glad to see the Storm SQL project chugging along. I
>>> started
>>>>> SQE
>>>>>>>>>> because I wanted better tools on top of Storm, particularly
>>> the
>>>>>>>>>> ability to query streams and build topologies using SQL.
>> Our
>>>>>>>>>> philosophy is to quickly iterate on our production systems
>>> and
>>>>>>> provide
>>>>>>>>>> immediate value. We've been able to do this with SQE, which
>>>>> powers
>>>>>>> our
>>>>>>>>>> streaming systems. Work on SQE and adding functions is
>> driven
>>>> by
>>>>>> our
>>>>>>>>>> current use cases. The big near term item on our road map
>> is
>>> to
>>>>> add
>>>>>>>>>> SQL parsing. Calcite is very promising there and brings
>> lots
>>> of
>>>>>>>>>> additional features, as I'm sure you know. Additionally,
>>> we're
>>>>>> going
>>>>>>>>>> to improve our function, stream, and state support.
>>>>>>>>>> 
>>>>>>>>>> The difficulty I can see for us with Storm SQL is the
>> amount
>>> of
>>>>>> work
>>>>>>>>>> necessary to get from where we are now with SQE to
>>> integrating
>>>>> any
>>>>>>>>>> functionality and making sure Storm SQL can provide the
>>>>>> functionality
>>>>>>>>>> we have now, assuming that is the path we would all go.
>> We're
>>>>> super
>>>>>>>>>> excited to see support for Storm grow and mature, and we'd
>>> like
>>>>> to
>>>>>> be
>>>>>>>>>> a part of that. But we also have to maintain our ability to
>>>>> iterate
>>>>>>>>>> quickly and provide immediate value.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Morrigan Jones
>>>>>>>>>> Principal Engineer
>>>>>>>>>> JWPLAYER  |  Your Way to Play
>>>>>>>>>> [email protected]  |  jwplayer.com
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> *Sahil Shah,* Data Engineer
>>>>> *JW*PLAYER  |  Your Way to Play
>>>>> P: 240.595.1169  |  jwplayer.com
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> *Doug Shore*
>>> Senior Data Engineer
>>> JW PLAYER | Your Way to Play
>>> 
>>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [DISCUSS] Accept JW Player SQE Code Donation

Reply via email to