Scala DSL

2016-06-23 Thread Neville Li
Hi all,

I'm the co-author of Scio  and am in the
progress of moving code to Beam (BEAM-302
). Just wondering if
sdks/scala is the right place for this code or if something like dsls/scio
is a better choice? What do you think?

A little background: Scio was built as a high-level Scala API for Google
Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
and Scalding. It wraps around the Dataflow/Beam Java SDK while also
providing features comparable to other Scala data frameworks. We use Scio
on Dataflow for production extensively inside Spotify.

Cheers,
Neville


Re: Scala DSL

2016-06-23 Thread Neville Li
+folks in my team

On Thu, Jun 23, 2016 at 5:57 PM Neville Li  wrote:

> Hi all,
>
> I'm the co-author of Scio  and am in the
> progress of moving code to Beam (BEAM-302
> ). Just wondering if
> sdks/scala is the right place for this code or if something like dsls/scio
> is a better choice? What do you think?
>
> A little background: Scio was built as a high-level Scala API for Google
> Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
> providing features comparable to other Scala data frameworks. We use Scio
> on Dataflow for production extensively inside Spotify.
>
> Cheers,
> Neville
>


Re: Scala DSL

2016-06-23 Thread Jean-Baptiste Onofré

Hi Neville,

thanks for the update !

As it's another language support, and to clearly identify the purpose, I 
would say sdks/scala.


Regards
JB

On 06/23/2016 11:56 PM, Neville Li wrote:

+folks in my team

On Thu, Jun 23, 2016 at 5:57 PM Neville Li  wrote:


Hi all,

I'm the co-author of Scio  and am in the
progress of moving code to Beam (BEAM-302
). Just wondering if
sdks/scala is the right place for this code or if something like dsls/scio
is a better choice? What do you think?

A little background: Scio was built as a high-level Scala API for Google
Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
and Scalding. It wraps around the Dataflow/Beam Java SDK while also
providing features comparable to other Scala data frameworks. We use Scio
on Dataflow for production extensively inside Spotify.

Cheers,
Neville





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Scala DSL

2016-06-23 Thread Frances Perry
+Rafal & Andrew again

I am leaning DSL for two reasons: (1) scio uses the existing java execution
environment (and won't have a language-specific fn harness of its own), and
(2) it changes the abstractions that users interact with.

I recently saw a scio repl demo from Reuven -- there's some really cool
stuff in there. I'd love to dive into it a bit more and see what can be
generalized beyond scio. The repl-like interactive graph construction is
very similar to what we've seen with ipython, in that it doesn't always
play nicely with the graph construction / graph execution distinction. I
wonder what changes to Beam might more generally support this. The
materialize stuff looks similar to some functionality in FlumeJava we used
to support multi-segment pipelines with some shared intermediate
PCollections.

On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré 
wrote:

> Hi Neville,
>
> thanks for the update !
>
> As it's another language support, and to clearly identify the purpose, I
> would say sdks/scala.
>
> Regards
> JB
>
>
> On 06/23/2016 11:56 PM, Neville Li wrote:
>
>> +folks in my team
>>
>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li  wrote:
>>
>> Hi all,
>>>
>>> I'm the co-author of Scio  and am in
>>> the
>>> progress of moving code to Beam (BEAM-302
>>> ). Just wondering if
>>> sdks/scala is the right place for this code or if something like
>>> dsls/scio
>>> is a better choice? What do you think?
>>>
>>> A little background: Scio was built as a high-level Scala API for Google
>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>> providing features comparable to other Scala data frameworks. We use Scio
>>> on Dataflow for production extensively inside Spotify.
>>>
>>> Cheers,
>>> Neville
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Scala DSL

2016-06-23 Thread Jean-Baptiste Onofré

Good point about new Fn and the fact it's based on the Java SDK.

It's just that in term of "marketing", it's a good message to provide a 
Scala SDK even if technically it's more a DSL.


For instance, a valid "marketing" DSL would be a Java fluent DSL on top 
of the Java SDK, or a declarative XML DSL.


However, from a technical perspective, it can go into dsl module.

My $0.02 ;)

Regards
JB

On 06/24/2016 06:51 AM, Frances Perry wrote:

+Rafal & Andrew again

I am leaning DSL for two reasons: (1) scio uses the existing java execution
environment (and won't have a language-specific fn harness of its own), and
(2) it changes the abstractions that users interact with.

I recently saw a scio repl demo from Reuven -- there's some really cool
stuff in there. I'd love to dive into it a bit more and see what can be
generalized beyond scio. The repl-like interactive graph construction is
very similar to what we've seen with ipython, in that it doesn't always
play nicely with the graph construction / graph execution distinction. I
wonder what changes to Beam might more generally support this. The
materialize stuff looks similar to some functionality in FlumeJava we used
to support multi-segment pipelines with some shared intermediate
PCollections.

On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré 
wrote:


Hi Neville,

thanks for the update !

As it's another language support, and to clearly identify the purpose, I
would say sdks/scala.

Regards
JB


On 06/23/2016 11:56 PM, Neville Li wrote:


+folks in my team

On Thu, Jun 23, 2016 at 5:57 PM Neville Li  wrote:

Hi all,


I'm the co-author of Scio  and am in
the
progress of moving code to Beam (BEAM-302
). Just wondering if
sdks/scala is the right place for this code or if something like
dsls/scio
is a better choice? What do you think?

A little background: Scio was built as a high-level Scala API for Google
Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
and Scalding. It wraps around the Dataflow/Beam Java SDK while also
providing features comparable to other Scala data frameworks. We use Scio
on Dataflow for production extensively inside Spotify.

Cheers,
Neville





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Scala DSL

2016-06-24 Thread Dan Halperin
I don't think that sdks/scala is the right place -- scio is not a Beam
Scala SDK; it wraps the existing Java SDK.

Some options:
* sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
since Scio isn't an extension for the Java SDK, but rather a wrapper

* dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
* dsls/scio  (Scio is a Beam DSL that could eventually use multiple SDKs)
* extensions/java/scio  (Scio is an extension of Beam that uses the Java
SDK)
* extensions/scio  (Scio is an extension of Beam that is not limited to one
SDK)

I lean towards either dsls/java/scio or extensions/java/scio, since I don't
think there are plans for Scio to handle multiple different SDKs (in
different languages). The question between these two is whether we think
DSLs are "big enough" to be a top level concept.

On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré 
wrote:

> Good point about new Fn and the fact it's based on the Java SDK.
>
> It's just that in term of "marketing", it's a good message to provide a
> Scala SDK even if technically it's more a DSL.
>
> For instance, a valid "marketing" DSL would be a Java fluent DSL on top of
> the Java SDK, or a declarative XML DSL.
>
> However, from a technical perspective, it can go into dsl module.
>
> My $0.02 ;)
>
> Regards
> JB
>
>
> On 06/24/2016 06:51 AM, Frances Perry wrote:
>
>> +Rafal & Andrew again
>>
>> I am leaning DSL for two reasons: (1) scio uses the existing java
>> execution
>> environment (and won't have a language-specific fn harness of its own),
>> and
>> (2) it changes the abstractions that users interact with.
>>
>> I recently saw a scio repl demo from Reuven -- there's some really cool
>> stuff in there. I'd love to dive into it a bit more and see what can be
>> generalized beyond scio. The repl-like interactive graph construction is
>> very similar to what we've seen with ipython, in that it doesn't always
>> play nicely with the graph construction / graph execution distinction. I
>> wonder what changes to Beam might more generally support this. The
>> materialize stuff looks similar to some functionality in FlumeJava we used
>> to support multi-segment pipelines with some shared intermediate
>> PCollections.
>>
>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Hi Neville,
>>>
>>> thanks for the update !
>>>
>>> As it's another language support, and to clearly identify the purpose, I
>>> would say sdks/scala.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>>
>>> +folks in my team

 On Thu, Jun 23, 2016 at 5:57 PM Neville Li 
 wrote:

 Hi all,

>
> I'm the co-author of Scio  and am in
> the
> progress of moving code to Beam (BEAM-302
> ). Just wondering if
> sdks/scala is the right place for this code or if something like
> dsls/scio
> is a better choice? What do you think?
>
> A little background: Scio was built as a high-level Scala API for
> Google
> Cloud Dataflow (now also Apache Beam) and is heavily influenced by
> Spark
> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
> providing features comparable to other Scala data frameworks. We use
> Scio
> on Dataflow for production extensively inside Spotify.
>
> Cheers,
> Neville
>
>
>
 --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Scala DSL

2016-06-24 Thread Jean-Baptiste Onofré

Hi Dan,

fair enough.

As I'm also working on new DSLs (XML, JSON), I already created the dsls 
module.


So, I would say dsls/scala.

WDYT ?

Regards
JB

On 06/24/2016 05:07 PM, Dan Halperin wrote:

I don't think that sdks/scala is the right place -- scio is not a Beam
Scala SDK; it wraps the existing Java SDK.

Some options:
* sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
since Scio isn't an extension for the Java SDK, but rather a wrapper

* dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
* dsls/scio  (Scio is a Beam DSL that could eventually use multiple SDKs)
* extensions/java/scio  (Scio is an extension of Beam that uses the Java
SDK)
* extensions/scio  (Scio is an extension of Beam that is not limited to one
SDK)

I lean towards either dsls/java/scio or extensions/java/scio, since I don't
think there are plans for Scio to handle multiple different SDKs (in
different languages). The question between these two is whether we think
DSLs are "big enough" to be a top level concept.

On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré 
wrote:


Good point about new Fn and the fact it's based on the Java SDK.

It's just that in term of "marketing", it's a good message to provide a
Scala SDK even if technically it's more a DSL.

For instance, a valid "marketing" DSL would be a Java fluent DSL on top of
the Java SDK, or a declarative XML DSL.

However, from a technical perspective, it can go into dsl module.

My $0.02 ;)

Regards
JB


On 06/24/2016 06:51 AM, Frances Perry wrote:


+Rafal & Andrew again

I am leaning DSL for two reasons: (1) scio uses the existing java
execution
environment (and won't have a language-specific fn harness of its own),
and
(2) it changes the abstractions that users interact with.

I recently saw a scio repl demo from Reuven -- there's some really cool
stuff in there. I'd love to dive into it a bit more and see what can be
generalized beyond scio. The repl-like interactive graph construction is
very similar to what we've seen with ipython, in that it doesn't always
play nicely with the graph construction / graph execution distinction. I
wonder what changes to Beam might more generally support this. The
materialize stuff looks similar to some functionality in FlumeJava we used
to support multi-segment pipelines with some shared intermediate
PCollections.

On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré 
wrote:

Hi Neville,


thanks for the update !

As it's another language support, and to clearly identify the purpose, I
would say sdks/scala.

Regards
JB


On 06/23/2016 11:56 PM, Neville Li wrote:

+folks in my team


On Thu, Jun 23, 2016 at 5:57 PM Neville Li 
wrote:

Hi all,



I'm the co-author of Scio  and am in
the
progress of moving code to Beam (BEAM-302
). Just wondering if
sdks/scala is the right place for this code or if something like
dsls/scio
is a better choice? What do you think?

A little background: Scio was built as a high-level Scala API for
Google
Cloud Dataflow (now also Apache Beam) and is heavily influenced by
Spark
and Scalding. It wraps around the Dataflow/Beam Java SDK while also
providing features comparable to other Scala data frameworks. We use
Scio
on Dataflow for production extensively inside Spotify.

Cheers,
Neville




--

Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Scala DSL

2016-06-24 Thread Ismaël Mejía
​Hello everyone,

Neville, thanks a lot for your contribution. Your work is amazing and I am
really happy that this scala integration is finally happening.
Congratulations to you and your team.

I *strongly* disagree about the DSL classification for scio for one reason,
if you go to the root of the term, Domain Specific Languages are about a
domain, and the domain in this case is writing Beam pipelines, which is a
really broad domain.

I agree with Frances’ argument that scio is not an SDK e.g. it reuses the
existing Beam java SDK. My proposition is that scio will be called the
Scala API because in the end this is what it is. I think the confusion
comes from the common definition of SDK which is normally an API + a
Runtime. In this case scio will share the runtime with what we call the
Beam Java SDK.

One additional point of using the term API is that it sends the clear
message that Beam has a Scala API too (which is good for visibility as JB
mentioned).

Regards,
Ismaël​


On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré 
wrote:

> Hi Dan,
>
> fair enough.
>
> As I'm also working on new DSLs (XML, JSON), I already created the dsls
> module.
>
> So, I would say dsls/scala.
>
> WDYT ?
>
> Regards
> JB
>
>
> On 06/24/2016 05:07 PM, Dan Halperin wrote:
>
>> I don't think that sdks/scala is the right place -- scio is not a Beam
>> Scala SDK; it wraps the existing Java SDK.
>>
>> Some options:
>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
>> since Scio isn't an extension for the Java SDK, but rather a wrapper
>>
>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
>> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple SDKs)
>> * extensions/java/scio  (Scio is an extension of Beam that uses the Java
>> SDK)
>> * extensions/scio  (Scio is an extension of Beam that is not limited to
>> one
>> SDK)
>>
>> I lean towards either dsls/java/scio or extensions/java/scio, since I
>> don't
>> think there are plans for Scio to handle multiple different SDKs (in
>> different languages). The question between these two is whether we think
>> DSLs are "big enough" to be a top level concept.
>>
>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Good point about new Fn and the fact it's based on the Java SDK.
>>>
>>> It's just that in term of "marketing", it's a good message to provide a
>>> Scala SDK even if technically it's more a DSL.
>>>
>>> For instance, a valid "marketing" DSL would be a Java fluent DSL on top
>>> of
>>> the Java SDK, or a declarative XML DSL.
>>>
>>> However, from a technical perspective, it can go into dsl module.
>>>
>>> My $0.02 ;)
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
>>>
>>> +Rafal & Andrew again

 I am leaning DSL for two reasons: (1) scio uses the existing java
 execution
 environment (and won't have a language-specific fn harness of its own),
 and
 (2) it changes the abstractions that users interact with.

 I recently saw a scio repl demo from Reuven -- there's some really cool
 stuff in there. I'd love to dive into it a bit more and see what can be
 generalized beyond scio. The repl-like interactive graph construction is
 very similar to what we've seen with ipython, in that it doesn't always
 play nicely with the graph construction / graph execution distinction. I
 wonder what changes to Beam might more generally support this. The
 materialize stuff looks similar to some functionality in FlumeJava we
 used
 to support multi-segment pipelines with some shared intermediate
 PCollections.

 On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré 
 wrote:

 Hi Neville,

>
> thanks for the update !
>
> As it's another language support, and to clearly identify the purpose,
> I
> would say sdks/scala.
>
> Regards
> JB
>
>
> On 06/23/2016 11:56 PM, Neville Li wrote:
>
> +folks in my team
>
>>
>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li 
>> wrote:
>>
>> Hi all,
>>
>>
>>> I'm the co-author of Scio  and am
>>> in
>>> the
>>> progress of moving code to Beam (BEAM-302
>>> ). Just wondering if
>>> sdks/scala is the right place for this code or if something like
>>> dsls/scio
>>> is a better choice? What do you think?
>>>
>>> A little background: Scio was built as a high-level Scala API for
>>> Google
>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by
>>> Spark
>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>> providing features comparable to other Scala data frameworks. We use
>>> Scio
>>> on Dataflow for production extensively inside Spotify.
>>>
>>> Cheers,
>>> Neville
>>>
>>>
>>>
>>> --

Re: Scala DSL

2016-06-24 Thread Kenneth Knowles
My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
there might be other Scala-based DSLs.

On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía  wrote:

> ​Hello everyone,
>
> Neville, thanks a lot for your contribution. Your work is amazing and I am
> really happy that this scala integration is finally happening.
> Congratulations to you and your team.
>
> I *strongly* disagree about the DSL classification for scio for one reason,
> if you go to the root of the term, Domain Specific Languages are about a
> domain, and the domain in this case is writing Beam pipelines, which is a
> really broad domain.
>
> I agree with Frances’ argument that scio is not an SDK e.g. it reuses the
> existing Beam java SDK. My proposition is that scio will be called the
> Scala API because in the end this is what it is. I think the confusion
> comes from the common definition of SDK which is normally an API + a
> Runtime. In this case scio will share the runtime with what we call the
> Beam Java SDK.
>
> One additional point of using the term API is that it sends the clear
> message that Beam has a Scala API too (which is good for visibility as JB
> mentioned).
>
> Regards,
> Ismaël​
>
>
> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi Dan,
> >
> > fair enough.
> >
> > As I'm also working on new DSLs (XML, JSON), I already created the dsls
> > module.
> >
> > So, I would say dsls/scala.
> >
> > WDYT ?
> >
> > Regards
> > JB
> >
> >
> > On 06/24/2016 05:07 PM, Dan Halperin wrote:
> >
> >> I don't think that sdks/scala is the right place -- scio is not a Beam
> >> Scala SDK; it wraps the existing Java SDK.
> >>
> >> Some options:
> >> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
> >> since Scio isn't an extension for the Java SDK, but rather a wrapper
> >>
> >> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> >> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> SDKs)
> >> * extensions/java/scio  (Scio is an extension of Beam that uses the Java
> >> SDK)
> >> * extensions/scio  (Scio is an extension of Beam that is not limited to
> >> one
> >> SDK)
> >>
> >> I lean towards either dsls/java/scio or extensions/java/scio, since I
> >> don't
> >> think there are plans for Scio to handle multiple different SDKs (in
> >> different languages). The question between these two is whether we think
> >> DSLs are "big enough" to be a top level concept.
> >>
> >> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré  >
> >> wrote:
> >>
> >> Good point about new Fn and the fact it's based on the Java SDK.
> >>>
> >>> It's just that in term of "marketing", it's a good message to provide a
> >>> Scala SDK even if technically it's more a DSL.
> >>>
> >>> For instance, a valid "marketing" DSL would be a Java fluent DSL on top
> >>> of
> >>> the Java SDK, or a declarative XML DSL.
> >>>
> >>> However, from a technical perspective, it can go into dsl module.
> >>>
> >>> My $0.02 ;)
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> >>>
> >>> +Rafal & Andrew again
> 
>  I am leaning DSL for two reasons: (1) scio uses the existing java
>  execution
>  environment (and won't have a language-specific fn harness of its
> own),
>  and
>  (2) it changes the abstractions that users interact with.
> 
>  I recently saw a scio repl demo from Reuven -- there's some really
> cool
>  stuff in there. I'd love to dive into it a bit more and see what can
> be
>  generalized beyond scio. The repl-like interactive graph construction
> is
>  very similar to what we've seen with ipython, in that it doesn't
> always
>  play nicely with the graph construction / graph execution
> distinction. I
>  wonder what changes to Beam might more generally support this. The
>  materialize stuff looks similar to some functionality in FlumeJava we
>  used
>  to support multi-segment pipelines with some shared intermediate
>  PCollections.
> 
>  On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
>  wrote:
> 
>  Hi Neville,
> 
> >
> > thanks for the update !
> >
> > As it's another language support, and to clearly identify the
> purpose,
> > I
> > would say sdks/scala.
> >
> > Regards
> > JB
> >
> >
> > On 06/23/2016 11:56 PM, Neville Li wrote:
> >
> > +folks in my team
> >
> >>
> >> On Thu, Jun 23, 2016 at 5:57 PM Neville Li 
> >> wrote:
> >>
> >> Hi all,
> >>
> >>
> >>> I'm the co-author of Scio  and am
> >>> in
> >>> the
> >>> progress of moving code to Beam (BEAM-302
> >>> ). Just wondering
> if
> >>> sdks/scala is the right place for this code or if something like
> >>> dsls/scio
> >>> is a better choice? What do you think?

Re: Scala DSL

2016-06-24 Thread Rafal Wojdyla
Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio is a
scala DSL but lives under java directory (?) - that makes sense only once
you get that scio is using java SDK under the hood. Thus, +1 to dsls/scio.
- Rafal

On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles 
wrote:

> My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
> there might be other Scala-based DSLs.
>
> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía  wrote:
>
> > ​Hello everyone,
> >
> > Neville, thanks a lot for your contribution. Your work is amazing and I
> am
> > really happy that this scala integration is finally happening.
> > Congratulations to you and your team.
> >
> > I *strongly* disagree about the DSL classification for scio for one
> reason,
> > if you go to the root of the term, Domain Specific Languages are about a
> > domain, and the domain in this case is writing Beam pipelines, which is a
> > really broad domain.
> >
> > I agree with Frances’ argument that scio is not an SDK e.g. it reuses the
> > existing Beam java SDK. My proposition is that scio will be called the
> > Scala API because in the end this is what it is. I think the confusion
> > comes from the common definition of SDK which is normally an API + a
> > Runtime. In this case scio will share the runtime with what we call the
> > Beam Java SDK.
> >
> > One additional point of using the term API is that it sends the clear
> > message that Beam has a Scala API too (which is good for visibility as JB
> > mentioned).
> >
> > Regards,
> > Ismaël​
> >
> >
> > On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Hi Dan,
> > >
> > > fair enough.
> > >
> > > As I'm also working on new DSLs (XML, JSON), I already created the dsls
> > > module.
> > >
> > > So, I would say dsls/scala.
> > >
> > > WDYT ?
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 06/24/2016 05:07 PM, Dan Halperin wrote:
> > >
> > >> I don't think that sdks/scala is the right place -- scio is not a Beam
> > >> Scala SDK; it wraps the existing Java SDK.
> > >>
> > >> Some options:
> > >> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
> vetoed
> > >> since Scio isn't an extension for the Java SDK, but rather a wrapper
> > >>
> > >> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> > >> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> > SDKs)
> > >> * extensions/java/scio  (Scio is an extension of Beam that uses the
> Java
> > >> SDK)
> > >> * extensions/scio  (Scio is an extension of Beam that is not limited
> to
> > >> one
> > >> SDK)
> > >>
> > >> I lean towards either dsls/java/scio or extensions/java/scio, since I
> > >> don't
> > >> think there are plans for Scio to handle multiple different SDKs (in
> > >> different languages). The question between these two is whether we
> think
> > >> DSLs are "big enough" to be a top level concept.
> > >>
> > >> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> > >
> > >> wrote:
> > >>
> > >> Good point about new Fn and the fact it's based on the Java SDK.
> > >>>
> > >>> It's just that in term of "marketing", it's a good message to
> provide a
> > >>> Scala SDK even if technically it's more a DSL.
> > >>>
> > >>> For instance, a valid "marketing" DSL would be a Java fluent DSL on
> top
> > >>> of
> > >>> the Java SDK, or a declarative XML DSL.
> > >>>
> > >>> However, from a technical perspective, it can go into dsl module.
> > >>>
> > >>> My $0.02 ;)
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>>
> > >>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> > >>>
> > >>> +Rafal & Andrew again
> > >>>>
> > >>>> I am leaning DSL for two reasons: (1) scio uses the existing java
> > >>>> execution
> > >>>> environment (and won't have a langua

Re: Scala DSL

2016-06-24 Thread Lukasz Cwik
+1 for dsls/scio for the already listed reasons

On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla 
wrote:

> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio is a
> scala DSL but lives under java directory (?) - that makes sense only once
> you get that scio is using java SDK under the hood. Thus, +1 to dsls/scio.
> - Rafal
>
> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles 
> wrote:
>
> > My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
> > there might be other Scala-based DSLs.
> >
> > On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía  wrote:
> >
> > > ​Hello everyone,
> > >
> > > Neville, thanks a lot for your contribution. Your work is amazing and I
> > am
> > > really happy that this scala integration is finally happening.
> > > Congratulations to you and your team.
> > >
> > > I *strongly* disagree about the DSL classification for scio for one
> > reason,
> > > if you go to the root of the term, Domain Specific Languages are about
> a
> > > domain, and the domain in this case is writing Beam pipelines, which
> is a
> > > really broad domain.
> > >
> > > I agree with Frances’ argument that scio is not an SDK e.g. it reuses
> the
> > > existing Beam java SDK. My proposition is that scio will be called the
> > > Scala API because in the end this is what it is. I think the confusion
> > > comes from the common definition of SDK which is normally an API + a
> > > Runtime. In this case scio will share the runtime with what we call the
> > > Beam Java SDK.
> > >
> > > One additional point of using the term API is that it sends the clear
> > > message that Beam has a Scala API too (which is good for visibility as
> JB
> > > mentioned).
> > >
> > > Regards,
> > > Ismaël​
> > >
> > >
> > > On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré  >
> > > wrote:
> > >
> > > > Hi Dan,
> > > >
> > > > fair enough.
> > > >
> > > > As I'm also working on new DSLs (XML, JSON), I already created the
> dsls
> > > > module.
> > > >
> > > > So, I would say dsls/scala.
> > > >
> > > > WDYT ?
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 06/24/2016 05:07 PM, Dan Halperin wrote:
> > > >
> > > >> I don't think that sdks/scala is the right place -- scio is not a
> Beam
> > > >> Scala SDK; it wraps the existing Java SDK.
> > > >>
> > > >> Some options:
> > > >> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
> > vetoed
> > > >> since Scio isn't an extension for the Java SDK, but rather a wrapper
> > > >>
> > > >> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> > > >> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> > > SDKs)
> > > >> * extensions/java/scio  (Scio is an extension of Beam that uses the
> > Java
> > > >> SDK)
> > > >> * extensions/scio  (Scio is an extension of Beam that is not limited
> > to
> > > >> one
> > > >> SDK)
> > > >>
> > > >> I lean towards either dsls/java/scio or extensions/java/scio, since
> I
> > > >> don't
> > > >> think there are plans for Scio to handle multiple different SDKs (in
> > > >> different languages). The question between these two is whether we
> > think
> > > >> DSLs are "big enough" to be a top level concept.
> > > >>
> > > >> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> > j...@nanthrax.net
> > > >
> > > >> wrote:
> > > >>
> > > >> Good point about new Fn and the fact it's based on the Java SDK.
> > > >>>
> > > >>> It's just that in term of "marketing", it's a good message to
> > provide a
> > > >>> Scala SDK even if technically it's more a DSL.
> > > >>>
> > > >>> For instance, a valid "marketing" DSL would be a Java fluent DSL on
> > top
> > > >>> of
> > > >>> the Java SDK, or a declarative XML DSL.

Re: Scala DSL

2016-06-24 Thread Jean-Baptiste Onofré

Agree for dsls/scio

Regards
JB

On 06/24/2016 10:22 PM, Lukasz Cwik wrote:

+1 for dsls/scio for the already listed reasons

On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla 
wrote:


Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio is a
scala DSL but lives under java directory (?) - that makes sense only once
you get that scio is using java SDK under the hood. Thus, +1 to dsls/scio.
- Rafal

On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles 
wrote:


My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
there might be other Scala-based DSLs.

On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía  wrote:


​Hello everyone,

Neville, thanks a lot for your contribution. Your work is amazing and I

am

really happy that this scala integration is finally happening.
Congratulations to you and your team.

I *strongly* disagree about the DSL classification for scio for one

reason,

if you go to the root of the term, Domain Specific Languages are about

a

domain, and the domain in this case is writing Beam pipelines, which

is a

really broad domain.

I agree with Frances’ argument that scio is not an SDK e.g. it reuses

the

existing Beam java SDK. My proposition is that scio will be called the
Scala API because in the end this is what it is. I think the confusion
comes from the common definition of SDK which is normally an API + a
Runtime. In this case scio will share the runtime with what we call the
Beam Java SDK.

One additional point of using the term API is that it sends the clear
message that Beam has a Scala API too (which is good for visibility as

JB

mentioned).

Regards,
Ismaël​


On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré 


wrote:


Hi Dan,

fair enough.

As I'm also working on new DSLs (XML, JSON), I already created the

dsls

module.

So, I would say dsls/scala.

WDYT ?

Regards
JB


On 06/24/2016 05:07 PM, Dan Halperin wrote:


I don't think that sdks/scala is the right place -- scio is not a

Beam

Scala SDK; it wraps the existing Java SDK.

Some options:
* sdks/java/extensions  (Scio builds on the Java SDK) -- mentally

vetoed

since Scio isn't an extension for the Java SDK, but rather a wrapper

* dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
* dsls/scio  (Scio is a Beam DSL that could eventually use multiple

SDKs)

* extensions/java/scio  (Scio is an extension of Beam that uses the

Java

SDK)
* extensions/scio  (Scio is an extension of Beam that is not limited

to

one
SDK)

I lean towards either dsls/java/scio or extensions/java/scio, since

I

don't
think there are plans for Scio to handle multiple different SDKs (in
different languages). The question between these two is whether we

think

DSLs are "big enough" to be a top level concept.

On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <

j...@nanthrax.net



wrote:

Good point about new Fn and the fact it's based on the Java SDK.


It's just that in term of "marketing", it's a good message to

provide a

Scala SDK even if technically it's more a DSL.

For instance, a valid "marketing" DSL would be a Java fluent DSL on

top

of
the Java SDK, or a declarative XML DSL.

However, from a technical perspective, it can go into dsl module.

My $0.02 ;)

Regards
JB


On 06/24/2016 06:51 AM, Frances Perry wrote:

+Rafal & Andrew again


I am leaning DSL for two reasons: (1) scio uses the existing java
execution
environment (and won't have a language-specific fn harness of its

own),

and
(2) it changes the abstractions that users interact with.

I recently saw a scio repl demo from Reuven -- there's some really

cool

stuff in there. I'd love to dive into it a bit more and see what

can

be

generalized beyond scio. The repl-like interactive graph

construction

is

very similar to what we've seen with ipython, in that it doesn't

always

play nicely with the graph construction / graph execution

distinction. I

wonder what changes to Beam might more generally support this. The
materialize stuff looks similar to some functionality in FlumeJava

we

used
to support multi-segment pipelines with some shared intermediate
PCollections.

On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <

j...@nanthrax.net>

wrote:

Hi Neville,



thanks for the update !

As it's another language support, and to clearly identify the

purpose,

I
would say sdks/scala.

Regards
JB


On 06/23/2016 11:56 PM, Neville Li wrote:

+folks in my team



On Thu, Jun 23, 2016 at 5:57 PM Neville Li <

neville@gmail.com



wrote:

Hi all,



I'm the co-author of Scio <https://github.com/spotify/scio>

and

am

in
the
progress of moving code to Beam (BEAM-302
<https://issues.apache.org/jira/browse/BEAM-302>). Just

wondering

if

sdks/scala is the right place for this code or if something

like

dsls/s

Re: Scala DSL

2016-06-24 Thread Raghu Angadi
DSL is a pretty generic term..

The fact that scio uses Java SDK is an implementation detail. I love the
name scio. But I think sdks/scala might be most appropriate and would make
it a first class citizen for Beam.

Where would a future python sdk reside?

On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré 
wrote:

> Agree for dsls/scio
>
> Regards
> JB
>
>
> On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
>
>> +1 for dsls/scio for the already listed reasons
>>
>> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla 
>> wrote:
>>
>> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
>>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio
>>> is a
>>> scala DSL but lives under java directory (?) - that makes sense only once
>>> you get that scio is using java SDK under the hood. Thus, +1 to
>>> dsls/scio.
>>> - Rafal
>>>
>>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles >> >
>>> wrote:
>>>
>>> My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
>>>> there might be other Scala-based DSLs.
>>>>
>>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía 
>>>> wrote:
>>>>
>>>> ​Hello everyone,
>>>>>
>>>>> Neville, thanks a lot for your contribution. Your work is amazing and I
>>>>>
>>>> am
>>>>
>>>>> really happy that this scala integration is finally happening.
>>>>> Congratulations to you and your team.
>>>>>
>>>>> I *strongly* disagree about the DSL classification for scio for one
>>>>>
>>>> reason,
>>>>
>>>>> if you go to the root of the term, Domain Specific Languages are about
>>>>>
>>>> a
>>>
>>>> domain, and the domain in this case is writing Beam pipelines, which
>>>>>
>>>> is a
>>>
>>>> really broad domain.
>>>>>
>>>>> I agree with Frances’ argument that scio is not an SDK e.g. it reuses
>>>>>
>>>> the
>>>
>>>> existing Beam java SDK. My proposition is that scio will be called the
>>>>> Scala API because in the end this is what it is. I think the confusion
>>>>> comes from the common definition of SDK which is normally an API + a
>>>>> Runtime. In this case scio will share the runtime with what we call the
>>>>> Beam Java SDK.
>>>>>
>>>>> One additional point of using the term API is that it sends the clear
>>>>> message that Beam has a Scala API too (which is good for visibility as
>>>>>
>>>> JB
>>>
>>>> mentioned).
>>>>>
>>>>> Regards,
>>>>> Ismaël​
>>>>>
>>>>>
>>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré >>>>
>>>>
>>>> wrote:
>>>>>
>>>>> Hi Dan,
>>>>>>
>>>>>> fair enough.
>>>>>>
>>>>>> As I'm also working on new DSLs (XML, JSON), I already created the
>>>>>>
>>>>> dsls
>>>
>>>> module.
>>>>>>
>>>>>> So, I would say dsls/scala.
>>>>>>
>>>>>> WDYT ?
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>
>>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
>>>>>>
>>>>>> I don't think that sdks/scala is the right place -- scio is not a
>>>>>>>
>>>>>> Beam
>>>
>>>> Scala SDK; it wraps the existing Java SDK.
>>>>>>>
>>>>>>> Some options:
>>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
>>>>>>>
>>>>>> vetoed
>>>>
>>>>> since Scio isn't an extension for the Java SDK, but rather a wrapper
>>>>>>>
>>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
>>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
>>>>>>>
>>>>>> SDKs)
>>>>>
>>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses the
>>>>>>>

Re: Scala DSL

2016-06-24 Thread Dan Halperin
On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi 
wrote:

> DSL is a pretty generic term..
>

I agree and am not married to it. Neville?


> The fact that scio uses Java SDK is an implementation detail.


Reasonable, which is why I am also not pushing hard for '/java/scio' to be
in the path.


> I love the
> name scio. But I think sdks/scala might be most appropriate and would make
> it a first class citizen for Beam.
>

I am strongly against it being in the 'sdks/' top-level module -- it's not
a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.


> Where would a future python sdk reside?
>

The Python SDK is in the python-sdk branch on Apache already, and it lives
in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)

Thanks,
Dan

On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Agree for dsls/scio
> >
> > Regards
> > JB
> >
> >
> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> >
> >> +1 for dsls/scio for the already listed reasons
> >>
> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla  >
> >> wrote:
> >>
> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio
> >>> is a
> >>> scala DSL but lives under java directory (?) - that makes sense only
> once
> >>> you get that scio is using java SDK under the hood. Thus, +1 to
> >>> dsls/scio.
> >>> - Rafal
> >>>
> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
>  >>> >
> >>> wrote:
> >>>
> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use it.
> And
> >>>> there might be other Scala-based DSLs.
> >>>>
> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía 
> >>>> wrote:
> >>>>
> >>>> ​Hello everyone,
> >>>>>
> >>>>> Neville, thanks a lot for your contribution. Your work is amazing
> and I
> >>>>>
> >>>> am
> >>>>
> >>>>> really happy that this scala integration is finally happening.
> >>>>> Congratulations to you and your team.
> >>>>>
> >>>>> I *strongly* disagree about the DSL classification for scio for one
> >>>>>
> >>>> reason,
> >>>>
> >>>>> if you go to the root of the term, Domain Specific Languages are
> about
> >>>>>
> >>>> a
> >>>
> >>>> domain, and the domain in this case is writing Beam pipelines, which
> >>>>>
> >>>> is a
> >>>
> >>>> really broad domain.
> >>>>>
> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it reuses
> >>>>>
> >>>> the
> >>>
> >>>> existing Beam java SDK. My proposition is that scio will be called the
> >>>>> Scala API because in the end this is what it is. I think the
> confusion
> >>>>> comes from the common definition of SDK which is normally an API + a
> >>>>> Runtime. In this case scio will share the runtime with what we call
> the
> >>>>> Beam Java SDK.
> >>>>>
> >>>>> One additional point of using the term API is that it sends the clear
> >>>>> message that Beam has a Scala API too (which is good for visibility
> as
> >>>>>
> >>>> JB
> >>>
> >>>> mentioned).
> >>>>>
> >>>>> Regards,
> >>>>> Ismaël​
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> >>>>>
> >>>>
> >>>> wrote:
> >>>>>
> >>>>> Hi Dan,
> >>>>>>
> >>>>>> fair enough.
> >>>>>>
> >>>>>> As I'm also working on new DSLs (XML, JSON), I already created the
> >>>>>>
> >>>>> dsls
> >>>
> >>>> module.
> >>>>>>
> >>>>>> So, I would say dsls/scala.
> >>>>>>
> >>>>>> WDYT ?
> >>>>>>
> >>>>>> Regards
> >>>>>> JB
> >

Re: Scala DSL

2016-06-24 Thread Dan Halperin
On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin  wrote:

> On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi 
> wrote:
>
>> DSL is a pretty generic term..
>>
>
> I agree and am not married to it. Neville?
>
>
>> The fact that scio uses Java SDK is an implementation detail.
>
>
> Reasonable, which is why I am also not pushing hard for '/java/scio' to be
> in the path.
>
>
>> I love the
>> name scio. But I think sdks/scala might be most appropriate and would make
>> it a first class citizen for Beam.
>>
>
> I am strongly against it being in the 'sdks/' top-level module -- it's not
> a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
>
>
>> Where would a future python sdk reside?
>>
>
> The Python SDK is in the python-sdk branch on Apache already, and it lives
> in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>

Now with a link:
https://github.com/apache/incubator-beam/tree/python-sdk/sdks

>
> Thanks,
> Dan
>
> On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> > Agree for dsls/scio
>> >
>> > Regards
>> > JB
>> >
>> >
>> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
>> >
>> >> +1 for dsls/scio for the already listed reasons
>> >>
>> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla
>> 
>> >> wrote:
>> >>
>> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
>> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio
>> >>> is a
>> >>> scala DSL but lives under java directory (?) - that makes sense only
>> once
>> >>> you get that scio is using java SDK under the hood. Thus, +1 to
>> >>> dsls/scio.
>> >>> - Rafal
>> >>>
>> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
>> > >>> >
>> >>> wrote:
>> >>>
>> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use it.
>> And
>> >>>> there might be other Scala-based DSLs.
>> >>>>
>> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía 
>> >>>> wrote:
>> >>>>
>> >>>> ​Hello everyone,
>> >>>>>
>> >>>>> Neville, thanks a lot for your contribution. Your work is amazing
>> and I
>> >>>>>
>> >>>> am
>> >>>>
>> >>>>> really happy that this scala integration is finally happening.
>> >>>>> Congratulations to you and your team.
>> >>>>>
>> >>>>> I *strongly* disagree about the DSL classification for scio for one
>> >>>>>
>> >>>> reason,
>> >>>>
>> >>>>> if you go to the root of the term, Domain Specific Languages are
>> about
>> >>>>>
>> >>>> a
>> >>>
>> >>>> domain, and the domain in this case is writing Beam pipelines, which
>> >>>>>
>> >>>> is a
>> >>>
>> >>>> really broad domain.
>> >>>>>
>> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it
>> reuses
>> >>>>>
>> >>>> the
>> >>>
>> >>>> existing Beam java SDK. My proposition is that scio will be called
>> the
>> >>>>> Scala API because in the end this is what it is. I think the
>> confusion
>> >>>>> comes from the common definition of SDK which is normally an API + a
>> >>>>> Runtime. In this case scio will share the runtime with what we call
>> the
>> >>>>> Beam Java SDK.
>> >>>>>
>> >>>>> One additional point of using the term API is that it sends the
>> clear
>> >>>>> message that Beam has a Scala API too (which is good for visibility
>> as
>> >>>>>
>> >>>> JB
>> >>>
>> >>>> mentioned).
>> >>>>>
>> >>>>> Regards,
>> >>>>> Ismaël​
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> >>>>&

Re: Scala DSL

2016-06-25 Thread Amit Sela
Just looked at some Scio examples - and saw Spark Scala code ;-)

For me, this made some sense - Spark is written in Scala (let's call it
Scala SDK ?) but it also provides Java API. New version has a unified API
(Java-Scala interop.) So I see Scio in a similar way, It's Scala API
because it's built on top of the Java SDK.
Having said that, Scio could offer more than just Scala API over the Java
SDK (i.e., repl) so in the lack of a native fit, I'd go with DSL.  And to
relate to the very valid notes people had about saying "Hi, we support
Scala!", we can call it Scala API, even if it's under dsls/scio.

So +1 for dsls/scio

Thanks,
Amit

On Sat, Jun 25, 2016 at 5:06 AM Dan Halperin 
wrote:

> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin  wrote:
>
> > On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi  >
> > wrote:
> >
> >> DSL is a pretty generic term..
> >>
> >
> > I agree and am not married to it. Neville?
> >
> >
> >> The fact that scio uses Java SDK is an implementation detail.
> >
> >
> > Reasonable, which is why I am also not pushing hard for '/java/scio' to
> be
> > in the path.
> >
> >
> >> I love the
> >> name scio. But I think sdks/scala might be most appropriate and would
> make
> >> it a first class citizen for Beam.
> >>
> >
> > I am strongly against it being in the 'sdks/' top-level module -- it's
> not
> > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> >
> >
> >> Where would a future python sdk reside?
> >>
> >
> > The Python SDK is in the python-sdk branch on Apache already, and it
> lives
> > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> >
>
> Now with a link:
> https://github.com/apache/incubator-beam/tree/python-sdk/sdks
>
> >
> > Thanks,
> > Dan
> >
> > On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> > Agree for dsls/scio
> >> >
> >> > Regards
> >> > JB
> >> >
> >> >
> >> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> >> >
> >> >> +1 for dsls/scio for the already listed reasons
> >> >>
> >> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla
> >> 
> >> >> wrote:
> >> >>
> >> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances.
> About
> >> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion,
> scio
> >> >>> is a
> >> >>> scala DSL but lives under java directory (?) - that makes sense only
> >> once
> >> >>> you get that scio is using java SDK under the hood. Thus, +1 to
> >> >>> dsls/scio.
> >> >>> - Rafal
> >> >>>
> >> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
> >>  >> >>> >
> >> >>> wrote:
> >> >>>
> >> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use
> it.
> >> And
> >> >>>> there might be other Scala-based DSLs.
> >> >>>>
> >> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía 
> >> >>>> wrote:
> >> >>>>
> >> >>>> ​Hello everyone,
> >> >>>>>
> >> >>>>> Neville, thanks a lot for your contribution. Your work is amazing
> >> and I
> >> >>>>>
> >> >>>> am
> >> >>>>
> >> >>>>> really happy that this scala integration is finally happening.
> >> >>>>> Congratulations to you and your team.
> >> >>>>>
> >> >>>>> I *strongly* disagree about the DSL classification for scio for
> one
> >> >>>>>
> >> >>>> reason,
> >> >>>>
> >> >>>>> if you go to the root of the term, Domain Specific Languages are
> >> about
> >> >>>>>
> >> >>>> a
> >> >>>
> >> >>>> domain, and the domain in this case is writing Beam pipelines,
> which
> >> >>>>>
> >> >>>> is a
> >> >>>
> >> >>>> really broad domain.
> >> >>>>>
> >> >>>>> I agree with Frances’ 

Re: Scala DSL

2016-06-26 Thread Aljoscha Krettek
I'm also in favor of branding it a DSL rather than an SDK. Mostly because
it uses the Java SDK and because it does not (necessarily) follow/implement
the Beam model. As the Java SDK does and what the Python SDK is apparently
going for.

On Sat, 25 Jun 2016 at 10:04 Amit Sela  wrote:

> Just looked at some Scio examples - and saw Spark Scala code ;-)
>
> For me, this made some sense - Spark is written in Scala (let's call it
> Scala SDK ?) but it also provides Java API. New version has a unified API
> (Java-Scala interop.) So I see Scio in a similar way, It's Scala API
> because it's built on top of the Java SDK.
> Having said that, Scio could offer more than just Scala API over the Java
> SDK (i.e., repl) so in the lack of a native fit, I'd go with DSL.  And to
> relate to the very valid notes people had about saying "Hi, we support
> Scala!", we can call it Scala API, even if it's under dsls/scio.
>
> So +1 for dsls/scio
>
> Thanks,
> Amit
>
> On Sat, Jun 25, 2016 at 5:06 AM Dan Halperin 
> wrote:
>
> > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin 
> wrote:
> >
> > > On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi
>  > >
> > > wrote:
> > >
> > >> DSL is a pretty generic term..
> > >>
> > >
> > > I agree and am not married to it. Neville?
> > >
> > >
> > >> The fact that scio uses Java SDK is an implementation detail.
> > >
> > >
> > > Reasonable, which is why I am also not pushing hard for '/java/scio' to
> > be
> > > in the path.
> > >
> > >
> > >> I love the
> > >> name scio. But I think sdks/scala might be most appropriate and would
> > make
> > >> it a first class citizen for Beam.
> > >>
> > >
> > > I am strongly against it being in the 'sdks/' top-level module -- it's
> > not
> > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> > >
> > >
> > >> Where would a future python sdk reside?
> > >>
> > >
> > > The Python SDK is in the python-sdk branch on Apache already, and it
> > lives
> > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> > >
> >
> > Now with a link:
> > https://github.com/apache/incubator-beam/tree/python-sdk/sdks
> >
> > >
> > > Thanks,
> > > Dan
> > >
> > > On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré  >
> > >> wrote:
> > >>
> > >> > Agree for dsls/scio
> > >> >
> > >> > Regards
> > >> > JB
> > >> >
> > >> >
> > >> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> > >> >
> > >> >> +1 for dsls/scio for the already listed reasons
> > >> >>
> > >> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla
> > >> 
> > >> >> wrote:
> > >> >>
> > >> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances.
> > About
> > >> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion,
> > scio
> > >> >>> is a
> > >> >>> scala DSL but lives under java directory (?) - that makes sense
> only
> > >> once
> > >> >>> you get that scio is using java SDK under the hood. Thus, +1 to
> > >> >>> dsls/scio.
> > >> >>> - Rafal
> > >> >>>
> > >> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
> > >>  > >> >>> >
> > >> >>> wrote:
> > >> >>>
> > >> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use
> > it.
> > >> And
> > >> >>>> there might be other Scala-based DSLs.
> > >> >>>>
> > >> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía  >
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>> ​Hello everyone,
> > >> >>>>>
> > >> >>>>> Neville, thanks a lot for your contribution. Your work is
> amazing
> > >> and I
> > >> >>>>>
> > >> >>>> am
> > >> >>>>
> > >> >>>>> really happy that this scala integration is finally happe

Re: Scala DSL

2016-06-26 Thread Raghu Angadi
On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin 
wrote:

> > I love the
> > name scio. But I think sdks/scala might be most appropriate and would
> make
> > it a first class citizen for Beam.
> >
>
> I am strongly against it being in the 'sdks/' top-level module -- it's not
> a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
>

+1. I agree, it is not Beam SDK in that sense.

Raghu.


>
> > Where would a future python sdk reside?
> >
>
> The Python SDK is in the python-sdk branch on Apache already, and it lives
> in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)


Re: Scala DSL

2016-06-27 Thread Ismaël Mejía
Just to summarize, at this point:

- Everybody agrees about the fact that scio is not an SDK.
- Almost everybody agrees that given the current choice they would prefer
‘dsls/scio’
- Some of us are not particularly married with the DSL classification.

I have a proposition to make, we can define two concepts with their given
structure in the Beam repository:

1. Beam API: A set of abstractions to program the complete Beam Model in a
given programming language.

These are idiomatic versions of the Beam Model, and ideally should cover
the complete Beam Model e.g. scio is one example. The directory structure
for Beam APIs could be:

apis/scala
apis/clojure
apis/groovy
...

2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
graphs, machine learning, etc

These represent domain specific idioms, e.g. a graph DSL would represent
graph concepts. e.g. edges, vertex, etc as first citizens. The directory
structure for Beam DSLs could be:

dsls/graph
dsls/ml
dsls/cep
...

Given these definitions for the concrete scio case I think the most
accurate directory would be:

apis/scala
or
apis/scala/scio

I personally prefer the first one (apis/scala) because we don’t have any
other scala API for the moment and because I think that we shouldn’t have
more than one API per language to avoid confusion e.g. imagine that someone
creates apis/java/bcollections to represent Beam Pipelines as distributed
collections, that would be confusing. However I understand the arguments
for the second directory e.g. to support different APIs per language, and
to preserve their original names (scio). Anyway I would be ok with any of
the two.

I excuse myself for this long message, and for not choosing any of the two
structures proposed in this thread, but I think it is important to be clear
about the differences in scope of both Beam APIs and DSLs in particular if
we think about new users.

What do you think, do you think my proposition makes sense, any suggestions
?

Regards,
Ismaël

ps. One last thing, I found this text that in part corroborates my feeling
about scio been an API and not a DSL:

“… a Scala Dataflow API (a nascent open-source version of which already
exists, and which seems likely to flower into maturity in due time given
Dataflow's move to join the ASF).”
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison


On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi 
wrote:

> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin  >
> wrote:
>
> > > I love the
> > > name scio. But I think sdks/scala might be most appropriate and would
> > make
> > > it a first class citizen for Beam.
> > >
> >
> > I am strongly against it being in the 'sdks/' top-level module -- it's
> not
> > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> >
>
> +1. I agree, it is not Beam SDK in that sense.
>
> Raghu.
>
>
> >
> > > Where would a future python sdk reside?
> > >
> >
> > The Python SDK is in the python-sdk branch on Apache already, and it
> lives
> > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>


Re: Scala DSL

2016-07-01 Thread Neville Li
Looks like dsls/scio is the winner :)

I like it too plus we get to keep the Scio name. This also leaves room for
other Scala wrappers of different flavor.
Scio is a DSL in the domain of functional style data pipelines.

On Mon, Jun 27, 2016 at 3:55 AM Ismaël Mejía  wrote:

> Just to summarize, at this point:
>
> - Everybody agrees about the fact that scio is not an SDK.
> - Almost everybody agrees that given the current choice they would prefer
> ‘dsls/scio’
> - Some of us are not particularly married with the DSL classification.
>
> I have a proposition to make, we can define two concepts with their given
> structure in the Beam repository:
>
> 1. Beam API: A set of abstractions to program the complete Beam Model in a
> given programming language.
>
> These are idiomatic versions of the Beam Model, and ideally should cover
> the complete Beam Model e.g. scio is one example. The directory structure
> for Beam APIs could be:
>
> apis/scala
> apis/clojure
> apis/groovy
> ...
>
> 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
> graphs, machine learning, etc
>
> These represent domain specific idioms, e.g. a graph DSL would represent
> graph concepts. e.g. edges, vertex, etc as first citizens. The directory
> structure for Beam DSLs could be:
>
> dsls/graph
> dsls/ml
> dsls/cep
> ...
>
> Given these definitions for the concrete scio case I think the most
> accurate directory would be:
>
> apis/scala
> or
> apis/scala/scio
>
> I personally prefer the first one (apis/scala) because we don’t have any
> other scala API for the moment and because I think that we shouldn’t have
> more than one API per language to avoid confusion e.g. imagine that someone
> creates apis/java/bcollections to represent Beam Pipelines as distributed
> collections, that would be confusing. However I understand the arguments
> for the second directory e.g. to support different APIs per language, and
> to preserve their original names (scio). Anyway I would be ok with any of
> the two.
>
> I excuse myself for this long message, and for not choosing any of the two
> structures proposed in this thread, but I think it is important to be clear
> about the differences in scope of both Beam APIs and DSLs in particular if
> we think about new users.
>
> What do you think, do you think my proposition makes sense, any suggestions
> ?
>
> Regards,
> Ismaël
>
> ps. One last thing, I found this text that in part corroborates my feeling
> about scio been an API and not a DSL:
>
> “… a Scala Dataflow API (a nascent open-source version of which already
> exists, and which seems likely to flower into maturity in due time given
> Dataflow's move to join the ASF).”
> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>
>
> On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi 
> wrote:
>
> > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin
>  > >
> > wrote:
> >
> > > > I love the
> > > > name scio. But I think sdks/scala might be most appropriate and would
> > > make
> > > > it a first class citizen for Beam.
> > > >
> > >
> > > I am strongly against it being in the 'sdks/' top-level module -- it's
> > not
> > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> > >
> >
> > +1. I agree, it is not Beam SDK in that sense.
> >
> > Raghu.
> >
> >
> > >
> > > > Where would a future python sdk reside?
> > > >
> > >
> > > The Python SDK is in the python-sdk branch on Apache already, and it
> > lives
> > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> >
>


Re: Scala DSL

2016-07-02 Thread Jean-Baptiste Onofré

+1 for dsls/scio.

Let me know how I can help there !

Thanks
Regards
JB

On 07/01/2016 08:43 PM, Neville Li wrote:

Looks like dsls/scio is the winner :)

I like it too plus we get to keep the Scio name. This also leaves room for
other Scala wrappers of different flavor.
Scio is a DSL in the domain of functional style data pipelines.

On Mon, Jun 27, 2016 at 3:55 AM Ismaël Mejía  wrote:


Just to summarize, at this point:

- Everybody agrees about the fact that scio is not an SDK.
- Almost everybody agrees that given the current choice they would prefer
‘dsls/scio’
- Some of us are not particularly married with the DSL classification.

I have a proposition to make, we can define two concepts with their given
structure in the Beam repository:

1. Beam API: A set of abstractions to program the complete Beam Model in a
given programming language.

These are idiomatic versions of the Beam Model, and ideally should cover
the complete Beam Model e.g. scio is one example. The directory structure
for Beam APIs could be:

apis/scala
apis/clojure
apis/groovy
...

2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
graphs, machine learning, etc

These represent domain specific idioms, e.g. a graph DSL would represent
graph concepts. e.g. edges, vertex, etc as first citizens. The directory
structure for Beam DSLs could be:

dsls/graph
dsls/ml
dsls/cep
...

Given these definitions for the concrete scio case I think the most
accurate directory would be:

apis/scala
or
apis/scala/scio

I personally prefer the first one (apis/scala) because we don’t have any
other scala API for the moment and because I think that we shouldn’t have
more than one API per language to avoid confusion e.g. imagine that someone
creates apis/java/bcollections to represent Beam Pipelines as distributed
collections, that would be confusing. However I understand the arguments
for the second directory e.g. to support different APIs per language, and
to preserve their original names (scio). Anyway I would be ok with any of
the two.

I excuse myself for this long message, and for not choosing any of the two
structures proposed in this thread, but I think it is important to be clear
about the differences in scope of both Beam APIs and DSLs in particular if
we think about new users.

What do you think, do you think my proposition makes sense, any suggestions
?

Regards,
Ismaël

ps. One last thing, I found this text that in part corroborates my feeling
about scio been an API and not a DSL:

“… a Scala Dataflow API (a nascent open-source version of which already
exists, and which seems likely to flower into maturity in due time given
Dataflow's move to join the ASF).”
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison


On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi 
wrote:


On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin




wrote:


I love the
name scio. But I think sdks/scala might be most appropriate and would

make

it a first class citizen for Beam.



I am strongly against it being in the 'sdks/' top-level module -- it's

not

a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.



+1. I agree, it is not Beam SDK in that sense.

Raghu.





Where would a future python sdk reside?



The Python SDK is in the python-sdk branch on Apache already, and it

lives

in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)








--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com