Created proxy JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2089
On Wed, May 11, 2016 at 1:31 PM, Thomas Weise <[email protected]> wrote: > SQL -> Beam is a longer term prospect that Julian Hyde is looking at. At > this time, I see separate translations for SQL and Beam to the Apex DAG > representation. > > Thanks, > Thomas > > -- > sent from mobile > On May 11, 2016 1:26 PM, "Bhat, Vijay (CONT)" <[email protected]> > wrote: > > I think it's a great idea as well, and could play well with the Calcite / > Streaming SQL discussion that’s also been going on. Brennon and I talked > about this and we could envision something like Streaming SQL -> Beam > representation -> Apex DAG, which will also buy us the trigger / watermark > capabilities of the Beam model. > > On 5/11/16, 9:59 AM, "York, Brennon" <[email protected]> wrote: > > >+1 to add beam integration. This would be huge for the Apex community and > >makes it that much easier for developers to come in and begin leveraging > >the power of Apex. > > > >On 5/9/16, 11:44 PM, "Thomas Weise" <[email protected]> wrote: > > > >>I spoke to Davor from the Beam team about this today at the Apache Big > >>Data. > >> > >>In the bigger picture multiple DSLs and language specific SDKs are > >>translated into a language independent representation, which then is > >>translated by the runner to the execution engine. It seems possible to > >>pass > >>hints or annotations that can be accessed at the runner level and used > >>for > >>optimizations. There is also the notion of hierarchical constructs > >>similar > >>to our modules. > >> > >>I have also contacted the Beam folks for a follow-up on how we can > >>collaborate on this. > >> > >>Thanks, > >>Thomas > >> > >> > >>On Mon, May 9, 2016 at 1:08 PM, Siyuan Hua <[email protected]> > >>wrote: > >> > >>> Hey Ilya, > >>> > >>> Since I'm working on java High-level API, I also looked at Apache Beam. > >>> Some questions are asked like is high-level API replaceable by Apache > >>>Beam > >>> or can we just follow the Apache Beam API that based on Google Dataflow > >>> Model. Well here is something I found: > >>> > >>> 1. Beam provides whole bunch of classes to define DAG and options of > >>>how to > >>> run it. There is no easy way to extend their DAG API or implement them > >>>on > >>> your own. > >>> > >>> 2. The way to use Beam API is use whatever they have to construct a > >>>dag, > >>> get the graph data structure and convert it to Apex DAG and run it with > >>>our > >>> engine. Beam follows visitor design pattern which is similar to ASM. > >>>Here > >>> are 2 core parts to run Beam application in Apex. One is pipeline which > >>>is > >>> Dag structure in Beam > >>> > >>> > >>> > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/ > >>>m > >>>ain/java/org/apache/beam/sdk/Pipeline.java > >>> And the other is the Visitor interface which defines callback functions > >>> when you visit each node in the dag. Here is an example of Flink > >>> translator(visitor) > >>> > >>> > >>> > https://github.com/apache/incubator-beam/blob/master/runners/flink/runne > >>>r > >>>/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingT > >>>r > >>>ansformTranslators.java > >>> > >>> 3. Although I think dataflow model is a very good and complete model > >>>for > >>> stream process to follow, I don't the Beam API is very declarative and > >>> expressive. I still suggest we build a whole bunch of API that could > >>> deliver same features in dataflow model but more Stream(java stream) > >>>like > >>> and SQL like. > >>> > >>> In summary, I think the integration is just rum Beam dag with different > >>> engine(storem, flink, spark or apex). But if you want to mingle Beam > >>>API > >>> with other ones, it is not very easy. > >>> > >>> And also I think we need to work on is not only translation but also > >>> implement some operators that provide the missing features in dataflow > >>> model. And those operators can also be used in high-level API. > >>> > >>> Regards, > >>> Siyuan > >>> > >>> On Mon, May 9, 2016 at 11:51 AM, Thomas Weise <[email protected]> > >>> wrote: > >>> > >>> > Hi Ilya, > >>> > > >>> > Absolutely, this has been discussed in is "on the roadmap". A quick > >>> search > >>> > reveals that a JIRA was already created for it: > >>> > https://issues.apache.org/ > >>> > jira/browse/BEAM-261 > >>> > > >>> > We are currently discussing the windowing semantics in the context of > >>> high > >>> > level stream API, perhaps Siyuan can post his notes here? > >>> > > >>> > Thanks, > >>> > Thomas > >>> > > >>> > > >>> > On Mon, May 9, 2016 at 11:25 AM, Ganelin, Ilya < > >>> > [email protected]> > >>> > wrote: > >>> > > >>> > > Hello, all Google has just published a new blog announcing the > >>>first > >>> > > complete integration of an open source project (Apache Flink) with > >>> Apache > >>> > > Beam: > >>> > > > >>> > > > >>> > > > >>> > > >>> > >>> > https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google- > >>>p > >>>erspective > >>> > > > >>> > > http://data-artisans.com/why-apache-beam/ > >>> > > > >>> > > Apache Beam is a unifying framework that allows users to leverage > >>> > > disparate streaming computational frameworks such as Storm, > >>>DataFlow, > >>> > > Flink, or Spark using a single API. This integration demands that > >>>the > >>> > > framework conform to the Beam programming model: > >>> > > http://vldb.org/pvldb/vol8/p1792-Akidau.pdf, and provides the > >>> > appropriate > >>> > > APIs. > >>> > > > >>> > > While cumbersome, the benefit of integrating with Beam is > >>>tremendous. > >>> > > Since there is no single framework that solves all streaming > >>>problems > >>> for > >>> > > all use cases, the ability to combine frameworks at-will makes > >>> developing > >>> > > end-state applications much more straightforward. I believe that > >>>many > >>> > > projects will choose to leverage Apache Beam to take advantage of > >>>this > >>> > and > >>> > > if Apex does not provide support Beam, it will fall behind, > >>>replaced by > >>> > > those frameworks that fit the easy-to-use model of Beam. > >>> > > > >>> > > If we become early adopters, we have a unique opportunity to become > >>> part > >>> > > of what will quite possible become a very large community of users > >>>and > >>> to > >>> > > capitalize on the inherent name recognition of Google to elevate > >>>the > >>> Apex > >>> > > project and expose it to many who would otherwise not be aware of > >>>it. > >>> > > > >>> > > I think integration with Beam can pair with the recent work on > >>> developing > >>> > > a high-level API for Apex and is a natural evolution towards making > >>> Apex > >>> > > more accessible and more usable by a broader technical community. > >>> > > > >>> > > If there is compelling interest around making this effort a > >>>reality, I > >>> > > would love to get this conversation started and work on translating > >>> this > >>> > > into a concrete plan of action. > >>> > > > >>> > > > >>> > > ________________________________________________________ > >>> > > > >>> > > The information contained in this e-mail is confidential and/or > >>> > > proprietary to Capital One and/or its affiliates and may only be > >>>used > >>> > > solely in performance of work or services for Capital One. The > >>> > information > >>> > > transmitted herewith is intended only for use by the individual or > >>> entity > >>> > > to which it is addressed. If the reader of this message is not the > >>> > intended > >>> > > recipient, you are hereby notified that any review, retransmission, > >>> > > dissemination, distribution, copying or other use of, or taking of > >>>any > >>> > > action in reliance upon this information is strictly prohibited. If > >>>you > >>> > > have received this communication in error, please contact the > >>>sender > >>> and > >>> > > delete the material from your computer. > >>> > > > >>> > > >>> > > > >________________________________________________________ > > > >The information contained in this e-mail is confidential and/or > >proprietary to Capital One and/or its affiliates and may only be used > >solely in performance of work or services for Capital One. The > >information transmitted herewith is intended only for use by the > >individual or entity to which it is addressed. If the reader of this > >message is not the intended recipient, you are hereby notified that any > >review, retransmission, dissemination, distribution, copying or other use > >of, or taking of any action in reliance upon this information is strictly > >prohibited. If you have received this communication in error, please > >contact the sender and delete the material from your computer. > > > > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > >
