I think this approach makes sense in general, Euphoria can be the implementation detail of SQL, similar to Join Library or core SDK Schemas.
I wonder though whether it would be better to bring Euphoria closer to core SDK first, maybe even merge them together. If you look at Reuven's recent work around schemas it seems like there are already similarities between that and Euphoria's approach, unless I'm missing the point (e.g. Filter transforms, FullJoin vs CoGroup... see [2]). And we're already switching parts of SQL to those transforms (e.g. SQL Aggregation is now implemented by core SDK's Group[3]). Adding explicit Schema support to Euphoria will bring it both closer to core SDK and make it natural to use for SQL. Can this be a first step towards this integration? One question I have is, does Euphoria bring dependencies that are not needed by SQL, or does more or less only rely on the core SDK? [1] https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java#L73 [2] https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms [3] https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L179 On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <[email protected]> wrote: > Hi community, > > I'm part of Euphoria DSL team, and on behalf of this team, I'd like to > discuss possible development of Java based DSLs currently present in > Beam. In my knowledge, there are currently two DSLs based on Java SDK - > Euphoria and SQL. These DSLs currently share only the SDK itself, > although there might be room to share some more effort. We already know > that both Euphoria and SQL have need for retractions, but there are > probably many more features that these two could share. > > So, I'd like to open a discussion on what it would cost and what it > would possibly bring, if instead of the current structure > > Java SDK > > | ---- SQL > > | ---- Euphoria > > these DSLs would be structured as > > Java SDK ---> Euphoria ---> SQL > > I'm absolutely sure that this would be a great investment and a huge > change, but I'd like to gather some opinions and general feelings of the > community about this. Some points to start the discussion from my side > would be, that structuring DSLs like this has internal logical > consistency, because each API layer further narrows completeness, but > brings simpler API for simpler tasks, while adding additional high-level > view of the data processing pipeline and thus enabling more > optimizations. On Euphoria side, these are various implementations joins > (most effective implementation depends on data), pipeline sampling and > more. Some (or maybe most) of these optimizations would have to be > implemented in both DSLs, so implementing them once is beneficial. > Another benefit is that this would bring Euphoria "closer" to Beam core > development (which would be good, it is part of the project anyway, > right? :)) and help better drive features, that although currently > needed mostly by SQL, might be needed by other Java users anyway. > > Thanks for discussion and looking forward to any opinions. > > Jan > >
