Re: Running direct runner test on windows

2020-11-30 Thread Haizhou Zhao
Hi Tobiasz,

It seems the cross platform pipeline option string handling is only applied
to gradle tasks with "enableJavaPerformanceTesting" applied, and it's
currently not applied to direct runner tests.

I created this PR, do you mind reviewing it?
https://github.com/apache/beam/pull/13449

Thank you,
Haizhou

On Mon, Nov 30, 2020 at 2:21 AM Tobiasz Kędzierski <
tobiasz.kedzier...@polidea.com> wrote:

> Hi Haizhou,
>
> I suppose these quotations mark are handled here:
>
> https://github.com/apache/beam/blob/8cdb8075b7d28e84591b13fb01d0144d941a5ef2/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1518
>
> I was struggling with them while doing GA workflows which runs on
> linux/mac/windows:
> https://github.com/apache/beam/blob/master/.github/workflows/java_tests.yml
>
> BR
> Tobiasz
>
> On Sun, Nov 29, 2020 at 8:45 AM Haizhou Zhao 
> wrote:
>
>> Hello Folks,
>>
>> I'm Haizhou, and I'm new to Beam code base. When I was running
>> 'needsRunnerTests' with  'DirectRunner' on Windows 10, I found that the
>> pipeline option[1] could not be parsed on Windows 10 but was running
>> perfectly on my ubuntu desktop.
>>
>> It seems after groovy setting system property and java getting system
>> property at the time of pipeline construction, Windows will drop the quotes
>> so that the string
>>
>> ["--runner=DirectRunner", "--runnerDeterminedSharding=false"]
>>
>> becomes
>>
>> [--runner=DirectRunner, --runnerDeterminedSharding=false]
>>
>> which will fail the object mapper parsing. What solved the issue for me
>> on Windows 10 was adding single quotes around, like
>>
>> ['"--runner=DirectRunner"', '"--runnerDeterminedSharding=false"']
>>
>> But, the above modification does not work on ubuntu/linux. Not an expert
>> on OS encoding, I was wondering if anyone has run into the same issue
>> before, and is there a good way to support this test on both Operating
>> Systems.
>>
>> Thank you,
>> Haizhou Zhao
>>
>> [1]
>> https://github.com/apache/beam/blob/master/runners/direct-java/build.gradle#L104
>>
>>
>>


Re: Query regarding Array_Agg impl

2020-11-30 Thread Kyle Weaver
I'm not sure there's a reason to use generics here, since this class will
likely only ever be instantiated once. Have you tried using Object instead
of T?


On Wed, Nov 25, 2020 at 10:08 AM Sonam Ramchand <
sonam.ramch...@venturedive.com> wrote:

> Hi Devs,
> I am trying to implement Array_Agg(
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg
> ) for Beam SQL ZetaSQL dialect, as CombineFn.
>
> Rough Pseudocode:
> * public static class ArrayAgg extends CombineFn { //todo }*
>
> But then I came to know we cannot use generics for UDAFs as per
> https://issues.apache.org/jira/browse/BEAM-11059, is there any
> alternative?
>
> Thanks!
> --
>
> Regards,
> *Sonam*
> Software Engineer
> Mobile: +92 3088337296 <+92%20308%208337296>
>
> 
>


Re: PTransform Annotations Proposal

2020-11-30 Thread Kenneth Knowles
On Wed, Nov 25, 2020 at 10:25 AM Robert Bradshaw 
wrote:

> On Wed, Nov 25, 2020 at 10:15 AM Robert Burke  wrote:
> >
> > It sounds like we've come to the position that non-correctness affecting
> Ptransform Annotations are valuable at both leaf and composite levels, and
> don't remove the potential need for a similar feature on Environments, to
> handle physical concerns equirements for worker processes to have (such as
> Ram, CPU, or GPU requirements.)
> >
> > Kenn, it's not clear what part of the solution (an annotation field on
> the Ptransform proto message) would need to change to satisfy your scope
> concern, beyond documenting unambiguously that these may not be used for
> physical concerns or things that affect correctness.
>
> I'll let Kenn answer as well, but from my point of view, explicitly
> having somewhere better to put these things would help.
>

I mean that the primary type of use case for PTransform-level annotations
(properties of a PTransform) was explicitly excluded from scope in the
initial email, while the primary type of use case for Environment (resource
hints / how to execute a DoFn) was used in the motivating examples. In
subsequent discussion, privacy properties have been used as a motivating
example. I honestly don't know which of these is hypothetical and which is
actually what Mirac is trying to get done.

I'm +1 generally on adding something to the model to indicate logical
properties of a PTransform/subgraph. I would be disappointed to see this
used for resource hints.

I do think that "annotation" being so generic makes it a tempting way to do
things that you should do a different way, or shouldn't do at all, while
also limiting the ability to analyze their structure. For example,
properties of a PTransform can be covariant ("output is stable
PCollection") or contravariant ("requires stable input") or
preservation/parametric (various forms of "privacy preserving"). This
structure is not reflected in the feature, precluding designs that take
advantage of it (IIRC Luke had a clever proposal for safely having
runner-opaque contravariant properties). That's OK. Sometimes it is a good
way forward to just add a generic escape hatch. It is analogous to the
difference between a CLI offering --foobizzle=3 versus a CLI offering
--options='{"foobizzle":"3"}'. The former is almost always preferable, once
you have sorted out some clear and stable idea, but starting with the
latter is also a fine way to discover the structure you couldn't identify
(or didn't seem that important) up front.


>
> > I'm also unclear your scope concern not matching, given the above. Your
> first paragraph reads very supportive of logical annotations on
> Ptransforms, and that matches 1-1 with the current proposed solution. Can
> you clarify your concern?
> >
> > I don't wish to scope creep on the physical requirements issue at this
> time. It seems we are agreed they should end up on environments, but I'm
> not seeing proposals on the right way to execute them at this time.They
> seem to be a fruitful topic of discussion, in particular
> unifying/consolidating them for efficient use of resources. I don't think
> we want to end up in a state where every distinct physical concern means a
> distinct environment.
>
> Why not? Assuming, of course, that runners are free to merge
> environments (merging those resource hints they understand and are
> otherwise compatible, and discarding those they don't) for efficient
> execution.
>

+1 the Environment proto is intended as a description of requirements. A
single VM+container configuration can meet many sets of requirements. Some
will need to be unioned (specialized hardware / available libraries) while
others may be additive (RAM, etc) and yet others may be simulated (using
SDK harness vs directly calling a function in process if you can). That's
one reason why we have a structured representation.

Kenn

> I for one am ready to see a PR.
>
> +1
>
> > On Mon, Nov 23, 2020, 6:02 PM Kenneth Knowles  wrote:
> >>
> >>
> >>
> >> On Mon, Nov 23, 2020 at 3:04 PM Robert Bradshaw 
> wrote:
> >>>
> >>> On Fri, Nov 20, 2020 at 11:08 AM Mirac Vuslat Basaran <
> mir...@google.com> wrote:
> >>> >
> >>> > Thanks everyone so much for their input and for the insightful
> discussion.
> >>> >
> >>> > Not being knowledgeable about Beam's internals, I have to say I am a
> bit lost on the PTransform vs. environment discussion.
> >>> >
> >>> > I do agree with Burke's notion that merge rules are very annotation
> dependent, I don't think we can find a one-size-fits-all solution for that.
> So this might be actually be an argument in favour of having annotations on
> PTransforms, since it avoids the conflation with environments.
> >>> >
> >>> > Also in general, I feel that having annotations per single transform
> (rather than composite) and on PTransforms could lead to a simpler design.
> >>>
> >>> If we want to use these for privacy, I don't see how attaching them to
> >>> leaf 

Re: PTransform Annotations Proposal

2020-11-30 Thread Mirac Vuslat Basaran
Created a PR without unit tests at https://github.com/apache/beam/pull/13434. 
Please have a look.

Thanks,
Mirac


On 2020/11/25 18:50:19, Robert Burke  wrote: 
> Hmmm. Fair. I'm mostly concerned about the pathological case where we end
> up with a distinct Environment per transform, but there are likely
> practical cases where that's reasonable (High mem to GPU to TPU, to ARM)
> 
> On Wed, Nov 25, 2020, 10:42 AM Robert Bradshaw  wrote:
> 
> > I'd like to continue the discussion *and* see an implementation for
> > the part we've settled on. I was asking why not have "every distinct
> > physical concern means a distinct environment?"
> >
> > On Wed, Nov 25, 2020 at 10:38 AM Robert Burke  wrote:
> > >
> > > Mostly because perfect is the enemy of good enough. We have a proposal,
> > we have clear boundaries for it. It's fine if the discussion continues, but
> > I see no evidence of concerns that should prevent starting an
> > implementation, because it seems we'll need both anyway.
> > >
> > > On Wed, Nov 25, 2020, 10:25 AM Robert Bradshaw 
> > wrote:
> > >>
> > >> On Wed, Nov 25, 2020 at 10:15 AM Robert Burke 
> > wrote:
> > >> >
> > >> > It sounds like we've come to the position that non-correctness
> > affecting Ptransform Annotations are valuable at both leaf and composite
> > levels, and don't remove the potential need for a similar feature on
> > Environments, to handle physical concerns equirements for worker processes
> > to have (such as Ram, CPU, or GPU requirements.)
> > >> >
> > >> > Kenn, it's not clear what part of the solution (an annotation field
> > on the Ptransform proto message) would need to change to satisfy your scope
> > concern, beyond documenting unambiguously that these may not be used for
> > physical concerns or things that affect correctness.
> > >>
> > >> I'll let Kenn answer as well, but from my point of view, explicitly
> > >> having somewhere better to put these things would help.
> > >>
> > >> > I'm also unclear your scope concern not matching, given the above.
> > Your first paragraph reads very supportive of logical annotations on
> > Ptransforms, and that matches 1-1 with the current proposed solution. Can
> > you clarify your concern?
> > >> >
> > >> > I don't wish to scope creep on the physical requirements issue at
> > this time. It seems we are agreed they should end up on environments, but
> > I'm not seeing proposals on the right way to execute them at this time.They
> > seem to be a fruitful topic of discussion, in particular
> > unifying/consolidating them for efficient use of resources. I don't think
> > we want to end up in a state where every distinct physical concern means a
> > distinct environment.
> > >>
> > >> Why not? Assuming, of course, that runners are free to merge
> > >> environments (merging those resource hints they understand and are
> > >> otherwise compatible, and discarding those they don't) for efficient
> > >> execution.
> > >>
> > >> > I for one am ready to see a PR.
> > >>
> > >> +1
> > >>
> > >> > On Mon, Nov 23, 2020, 6:02 PM Kenneth Knowles 
> > wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Mon, Nov 23, 2020 at 3:04 PM Robert Bradshaw 
> > wrote:
> > >> >>>
> > >> >>> On Fri, Nov 20, 2020 at 11:08 AM Mirac Vuslat Basaran <
> > mir...@google.com> wrote:
> > >> >>> >
> > >> >>> > Thanks everyone so much for their input and for the insightful
> > discussion.
> > >> >>> >
> > >> >>> > Not being knowledgeable about Beam's internals, I have to say I
> > am a bit lost on the PTransform vs. environment discussion.
> > >> >>> >
> > >> >>> > I do agree with Burke's notion that merge rules are very
> > annotation dependent, I don't think we can find a one-size-fits-all
> > solution for that. So this might be actually be an argument in favour of
> > having annotations on PTransforms, since it avoids the conflation with
> > environments.
> > >> >>> >
> > >> >>> > Also in general, I feel that having annotations per single
> > transform (rather than composite) and on PTransforms could lead to a
> > simpler design.
> > >> >>>
> > >> >>> If we want to use these for privacy, I don't see how attaching them
> > to
> > >> >>> leaf transforms alone could work. (Even CombinePerKey is a
> > composite.)
> > >> >>>
> > >> >>> > Seeing as there are valuable arguments in favour of both
> > (PTransform and environments) with no clear(?) "best solution", I would
> > propose moving forward with the initial (PTransform) design to ship the
> > feature and unblock teams asking for it. If it turns out that there was
> > indeed a need to have annotations in environments, we could always refactor
> > it.
> > >> >>>
> > >> >>> I have yet to see any arguments that resource-level hints, such as
> > >> >>> memory or GPU, don't better belong on the environment. But moving
> > >> >>> forward on PTransform-level ones for logical statements like privacy
> > >> >>> declarations makes sense.
> > >> >>
> > >> >>
> > >> >> Exactly this. Properties of transforms make sense. 

Re: Docker Development Environment

2020-11-30 Thread Alex Amato
If any of these are suitable for at least some development. I propose we
merge them, and update them with fixes later. Rather than trying to get
things 100% working in the first PR.

Looks like this one was opened in early Sept, and never got merged. Which
is a pretty long time. Perhaps abandoned for the later?
https://github.com/apache/beam/pull/12837

This one looks like its just failing on just a few tests (Which may be
addressed soon, but the PR was opened 19 days ago).
https://github.com/apache/beam/pull/13308
(Can we set a deadline for this? And just say merge it by the end of the
week, regardless if the last two tests can be fixed or not?)

(Would like to nudge this along, as it's been a pain point for many for a
while now).

Thanks for the work here Niels, Omar and Sam.
Looking forward to giving this a try :)


On Mon, Nov 30, 2020 at 11:32 AM Brian Hulette  wrote:

> I agree this is a good idea. I remember my first experience with Beam
> development - I ran through the steps at [1] and had `./gradlew check`
> fail. I don't think I ever got it working before moving on and just running
> more specific tasks.
> It would be great if we had a reliable way for new contributors to
> establish an environment that can successfully run `./gradlew check`.
>
> Niels Basjes' PR (https://github.com/apache/beam/pull/13308) seems to be
> close to that, so I think we should focus on getting that working and
> iterate from there. Omar concurred with that in
> https://github.com/apache/beam/pull/12837.
>
> [1] https://beam.apache.org/contribute/#development-setup
>
>
> On Wed, Nov 25, 2020 at 3:39 PM Ahmet Altay  wrote:
>
>> Thank you for doing this.
>>
>> I have seen a few related PRs. Connecting them here in case these efforts
>> could be combined:
>> - https://github.com/apache/beam/pull/12837 (/cc +Omar Ismail
>>  )
>> - https://github.com/apache/beam/pull/13308
>>
>> Ahmet
>>
>> On Wed, Nov 25, 2020 at 2:53 PM Sam Rohde  wrote:
>>
>>> Hi All,
>>>
>>> I got tired of my local dev environment being ruined by updates so I
>>> made a container for Apache Beam development work. What this does is create
>>> a Docker container from the Ubuntu Groovy image and load it up with all the
>>> necessary libraries/utilities for Apache Beam development. Then I run an
>>> interactive shell in the Docker container where I do my work.
>>>
>>> This is a nice way for new contributors to easily get started. However
>>> with the container in its current form, I don't know if this will help
>>> other people because it is tied closely with my workflow (using VIM,
>>> YouCompleteMe, for Python). But I think it can be a nice starting point for
>>> improvements. For example:
>>>
>>>- Sharing the host display with Docker to start GUI applications
>>>(like IntelliJ) in the container
>>>- Adding Golang development support
>>>
>>> Here's a draft PR , let me
>>> know what you think, how it can be improved, and whether it's a good idea
>>> for us to have a dev container like this.
>>>
>>> Regards,
>>> Sam
>>>
>>>


Re: Docker Development Environment

2020-11-30 Thread Brian Hulette
I agree this is a good idea. I remember my first experience with Beam
development - I ran through the steps at [1] and had `./gradlew check`
fail. I don't think I ever got it working before moving on and just running
more specific tasks.
It would be great if we had a reliable way for new contributors to
establish an environment that can successfully run `./gradlew check`.

Niels Basjes' PR (https://github.com/apache/beam/pull/13308) seems to be
close to that, so I think we should focus on getting that working and
iterate from there. Omar concurred with that in
https://github.com/apache/beam/pull/12837.

[1] https://beam.apache.org/contribute/#development-setup


On Wed, Nov 25, 2020 at 3:39 PM Ahmet Altay  wrote:

> Thank you for doing this.
>
> I have seen a few related PRs. Connecting them here in case these efforts
> could be combined:
> - https://github.com/apache/beam/pull/12837 (/cc +Omar Ismail
>  )
> - https://github.com/apache/beam/pull/13308
>
> Ahmet
>
> On Wed, Nov 25, 2020 at 2:53 PM Sam Rohde  wrote:
>
>> Hi All,
>>
>> I got tired of my local dev environment being ruined by updates so I made
>> a container for Apache Beam development work. What this does is create a
>> Docker container from the Ubuntu Groovy image and load it up with all the
>> necessary libraries/utilities for Apache Beam development. Then I run an
>> interactive shell in the Docker container where I do my work.
>>
>> This is a nice way for new contributors to easily get started. However
>> with the container in its current form, I don't know if this will help
>> other people because it is tied closely with my workflow (using VIM,
>> YouCompleteMe, for Python). But I think it can be a nice starting point for
>> improvements. For example:
>>
>>- Sharing the host display with Docker to start GUI applications
>>(like IntelliJ) in the container
>>- Adding Golang development support
>>
>> Here's a draft PR , let me
>> know what you think, how it can be improved, and whether it's a good idea
>> for us to have a dev container like this.
>>
>> Regards,
>> Sam
>>
>>


Re: Create External Transform with WindowFn

2020-11-30 Thread Chamikara Jayalath
Please follow https://issues.apache.org/jira/browse/BEAM-11360 instead.

Thanks,
Cham

On Mon, Nov 30, 2020 at 10:26 AM Steve Niemitz  wrote:

> alright, thank you.  Is BEAM-10507 the jira to watch for any progress on
> that?
>
> On Mon, Nov 30, 2020 at 12:55 PM Boyuan Zhang  wrote:
>
>> Hi Steve,
>>
>> Unfortunately I don't think there is a workaround before we have the
>> change that Cham mentions.
>>
>> On Mon, Nov 30, 2020 at 8:16 AM Steve Niemitz 
>> wrote:
>>
>>> I'm trying to write an xlang transform that uses Reshuffle internally,
>>> and ran into this as well.  Is there any workaround to this for now (other
>>> than removing the reshuffle), or do I just need to wait for what Chamikara
>>> mentioned?  I noticed the same issue was mentioned in the SnowflakeIO.Read
>>> PR as well [1].
>>>
>>> https://github.com/apache/beam/pull/12149#discussion_r463710165
>>>
>>> On Wed, Aug 26, 2020 at 10:55 PM Boyuan Zhang 
>>> wrote:
>>>
 That explains a lot. Thanks, Cham!

 On Wed, Aug 26, 2020 at 7:44 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Due to the proto -> object -> proto conversion we do today, Python
> needs to parse the full sub-graph from Java. We have hooks for PTransforms
> and Coders but not for windowing operations. This limitation will go away
> after we have direct Beam proto to Dataflow proto conversion in place.
>
> On Wed, Aug 26, 2020 at 7:03 PM Robert Burke 
> wrote:
>
>> Coders should only be checked over the language boundaries.
>>
>> On Wed, Aug 26, 2020, 6:24 PM Boyuan Zhang 
>> wrote:
>>
>>> Thanks Cham!
>>>
>>>  I just realized that the *beam:window_fn:serialized_**java:v1 *is
>>> introduced by Java *Reshuffle.viaRandomKey()*. But
>>> *Reshuffle.viaRandomKey()* does rewindowed into original window
>>> strategy(which is *GlobalWindows *in my case). Is it expected that
>>> we also check intermediate PCollection rather than only the PCollection
>>> that across the language boundary?
>>>
>>> More about my Ptransform:
>>> MyExternalPTransform  -- expand to --  ParDo() ->
>>> Reshuffle.viaRandomKey() -> ParDo() -> WindowInto(FixWindow) -> ParDo() 
>>> ->
>>> output void
>>>
>>>  |
>>>
>>>   ->
>>> ParDo() -> output PCollection to Python SDK
>>>
>>> On Tue, Aug 25, 2020 at 6:29 PM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Also it's strange that Java used (beam:window_fn:serialized_java:v1)
 for the URN here instead of "beam:window_fn:fixed_windows:v1" [1]
 which is what is being registered by Python [2]. This seems to be the
 immediate issue. Tracking bug for supporting custom windows is
 https://issues.apache.org/jira/browse/BEAM-10507.

 [1]
 https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/standard_window_fns.proto#L55
 [2]
 https://github.com/apache/beam/blob/bd4df94ae10a7e7b0763c1917746d2faf5aeed6c/sdks/python/apache_beam/transforms/window.py#L449

 On Tue, Aug 25, 2020 at 6:07 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Pipelines that use external WindowingStrategies might be failing
> during proto -> object -> proto conversion we do today. This 
> limitation
> will go away once Dataflow directly starts reading Beam protos. We are
> working on this now.
>
> Thanks,
> Cham
>
> On Tue, Aug 25, 2020 at 5:38 PM Boyuan Zhang 
> wrote:
>
>> Thanks, Robert! I want to add more details on my External
>> PTransform:
>>
>> MyExternalPTransform  -- expand to --  ParDo() ->
>> WindowInto(FixWindow) -> ParDo() -> output void
>>
>>   |
>>
>>   -> ParDo() -> output PCollection to Python SDK
>> The full stacktrace:
>>
>> INFO:root:Using Java SDK harness container image 
>> dataflow-dev.gcr.io/boyuanz/java:latest
>> Starting expansion service at localhost:53569
>> Aug 13, 2020 7:42:11 PM 
>> org.apache.beam.sdk.expansion.service.ExpansionService 
>> loadRegisteredTransforms
>> INFO: Registering external transforms: 
>> [beam:external:java:kafka:read:v1, 
>> beam:external:java:kafka:write:v1, 
>> beam:external:java:jdbc:read_rows:v1, 
>> beam:external:java:jdbc:write:v1, 
>> beam:external:java:generate_sequence:v1]
>>  beam:external:java:kafka:read:v1: 
>> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@4ac68d3e
>>  beam:external:java:kafka:write:v1: 
>> 

Re: Create External Transform with WindowFn

2020-11-30 Thread Steve Niemitz
alright, thank you.  Is BEAM-10507 the jira to watch for any progress on
that?

On Mon, Nov 30, 2020 at 12:55 PM Boyuan Zhang  wrote:

> Hi Steve,
>
> Unfortunately I don't think there is a workaround before we have the
> change that Cham mentions.
>
> On Mon, Nov 30, 2020 at 8:16 AM Steve Niemitz  wrote:
>
>> I'm trying to write an xlang transform that uses Reshuffle internally,
>> and ran into this as well.  Is there any workaround to this for now (other
>> than removing the reshuffle), or do I just need to wait for what Chamikara
>> mentioned?  I noticed the same issue was mentioned in the SnowflakeIO.Read
>> PR as well [1].
>>
>> https://github.com/apache/beam/pull/12149#discussion_r463710165
>>
>> On Wed, Aug 26, 2020 at 10:55 PM Boyuan Zhang  wrote:
>>
>>> That explains a lot. Thanks, Cham!
>>>
>>> On Wed, Aug 26, 2020 at 7:44 PM Chamikara Jayalath 
>>> wrote:
>>>
 Due to the proto -> object -> proto conversion we do today, Python
 needs to parse the full sub-graph from Java. We have hooks for PTransforms
 and Coders but not for windowing operations. This limitation will go away
 after we have direct Beam proto to Dataflow proto conversion in place.

 On Wed, Aug 26, 2020 at 7:03 PM Robert Burke 
 wrote:

> Coders should only be checked over the language boundaries.
>
> On Wed, Aug 26, 2020, 6:24 PM Boyuan Zhang  wrote:
>
>> Thanks Cham!
>>
>>  I just realized that the *beam:window_fn:serialized_**java:v1 *is
>> introduced by Java *Reshuffle.viaRandomKey()*. But
>> *Reshuffle.viaRandomKey()* does rewindowed into original window
>> strategy(which is *GlobalWindows *in my case). Is it expected that
>> we also check intermediate PCollection rather than only the PCollection
>> that across the language boundary?
>>
>> More about my Ptransform:
>> MyExternalPTransform  -- expand to --  ParDo() ->
>> Reshuffle.viaRandomKey() -> ParDo() -> WindowInto(FixWindow) -> ParDo() 
>> ->
>> output void
>>
>>|
>>
>> -> 
>> ParDo()
>> -> output PCollection to Python SDK
>>
>> On Tue, Aug 25, 2020 at 6:29 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Also it's strange that Java used (beam:window_fn:serialized_java:v1)
>>> for the URN here instead of "beam:window_fn:fixed_windows:v1" [1]
>>> which is what is being registered by Python [2]. This seems to be the
>>> immediate issue. Tracking bug for supporting custom windows is
>>> https://issues.apache.org/jira/browse/BEAM-10507.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/standard_window_fns.proto#L55
>>> [2]
>>> https://github.com/apache/beam/blob/bd4df94ae10a7e7b0763c1917746d2faf5aeed6c/sdks/python/apache_beam/transforms/window.py#L449
>>>
>>> On Tue, Aug 25, 2020 at 6:07 PM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Pipelines that use external WindowingStrategies might be failing
 during proto -> object -> proto conversion we do today. This limitation
 will go away once Dataflow directly starts reading Beam protos. We are
 working on this now.

 Thanks,
 Cham

 On Tue, Aug 25, 2020 at 5:38 PM Boyuan Zhang 
 wrote:

> Thanks, Robert! I want to add more details on my External
> PTransform:
>
> MyExternalPTransform  -- expand to --  ParDo() ->
> WindowInto(FixWindow) -> ParDo() -> output void
>
>   |
>
>   -> ParDo() -> output PCollection to Python SDK
> The full stacktrace:
>
> INFO:root:Using Java SDK harness container image 
> dataflow-dev.gcr.io/boyuanz/java:latest
> Starting expansion service at localhost:53569
> Aug 13, 2020 7:42:11 PM 
> org.apache.beam.sdk.expansion.service.ExpansionService 
> loadRegisteredTransforms
> INFO: Registering external transforms: 
> [beam:external:java:kafka:read:v1, beam:external:java:kafka:write:v1, 
> beam:external:java:jdbc:read_rows:v1, 
> beam:external:java:jdbc:write:v1, 
> beam:external:java:generate_sequence:v1]
>   beam:external:java:kafka:read:v1: 
> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@4ac68d3e
>   beam:external:java:kafka:write:v1: 
> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@277c0f21
>   beam:external:java:jdbc:read_rows:v1: 
> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@6073f712

Re: Proposal: Redis Stream Connector

2020-11-30 Thread Boyuan Zhang
In Splittable DoFn, the trySplit[1] API in RestrictionTracker is for
performing checkpointing, keeping primary as current restriction and
returning residuals. In the DoFn, you can do Splittable DoFn initiated
checkpoint by returning ProcessContinuation.resume()[2]. Beam programming
guide[3] also talks about Splittable DoFn initiated checkpoint and runner
initiated checkpoint.

[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java#L72-L108
[2]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L1297-L1333
[3]
https://beam.apache.org/documentation/programming-guide/#splittable-dofns

On Sun, Nov 29, 2020 at 10:28 PM Vincent Marquez 
wrote:

> Regarding checkpointing:
>
> I'm confused how the Splittable DoFn can make use of checkpoints to resume
> and not have data loss.  Unlike the old API that had a very easy to
> understand method called 'getCheckpointMark' that allows me to return the
> completed work, I don't see where that is done with the current API.
>
> I tried looking at the OffsetRangeTracker and how it is used by Kafka but
> I'm failing to understand it.  The process method takes the
> RestrictionTracker, but there isn't a way I see for the OffsetRangeTracker
> to represent half completed work (in the event of an exception/crash during
> a previous 'process' method run.   Is there some documentation that could
> help me understand this part?  Thanks in advance.
>
> *~Vincent*
>
>
> On Thu, Nov 26, 2020 at 2:01 PM Ismaël Mejía  wrote:
>
>> Just want to mention that we have been working with Vincent in the
>> ReadAll implementation for Cassandra based on normal DoFn, and we
>> expect to get it merged for the next release of Beam. Vincent is
>> familiarized now with DoFn based IO composition, a first step towards
>> SDF understanding. Vincent you can think of our Cassandra RingRange as
>> a Restriction in the context of SDF. Just for reference it would be
>> good to read in advance these two:
>>
>> https://beam.apache.org/blog/splittable-do-fn/
>> https://beam.apache.org/documentation/programming-guide/#sdf-basics
>>
>> Thanks Boyuan for offering your help I think it is really needed
>> considering that we don't have many Unbounded SDF connectors to use as
>> reference.
>>
>> On Thu, Nov 19, 2020 at 11:16 PM Boyuan Zhang  wrote:
>> >
>> >
>> >
>> > On Thu, Nov 19, 2020 at 1:29 PM Vincent Marquez <
>> vincent.marq...@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Nov 19, 2020 at 10:38 AM Boyuan Zhang 
>> wrote:
>> >>>
>> >>> Hi Vincent,
>> >>>
>> >>> Thanks for your contribution! I'm happy to work with you on this when
>> you contribute the code into Beam.
>> >>
>> >>
>> >> Should I write up a JIRA to start?  I have access, I've already been
>> in the process of contributing some big changes to the CassandraIO
>> connector.
>> >
>> >
>> > Yes, please create a JIRA and assign it to yourself.
>> >
>> >>
>> >>
>> >>>
>> >>>
>> >>> Another thing is that it would be preferable to use Splittable DoFn
>> instead of using UnboundedSource to write a new IO.
>> >>
>> >>
>> >> I would prefer to use the UnboundedSource connector, I've already
>> written most of it, but also, I see some challenges using Splittable DoFn
>> for Redis streams.
>> >>
>> >> Unlike Kafka and Kinesis, Redis Streams offsets are not simply
>> monotonically increasing counters, so there is not a way  to just claim a
>> chunk of work and know that the chunk has any actual data in it.
>> >>
>> >> Since UnboundedSource is not yet deprecated, could I contribute that
>> after finishing up some test aspects, and then perhaps we can implement a
>> Splittable DoFn version?
>> >
>> >
>> > It would be nice not to build new IOs on top of UnboundedSource.
>> Currently we already have the wrapper class which translates the existing
>> UnboundedSource into Unbounded Splittable DoFn and executes the
>> UnboundedSource as the Splittable DoFn. How about you open a WIP PR and we
>> go through the UnboundedSource implementation together to figure out a
>> design for using Splittable DoFn?
>> >
>> >
>> >>
>> >>
>> >>
>> >>>
>> >>>
>> >>> On Thu, Nov 19, 2020 at 10:18 AM Vincent Marquez <
>> vincent.marq...@gmail.com> wrote:
>> 
>>  Currently, Redis offers a streaming queue functionality similar to
>> Kafka/Kinesis/Pubsub, but Beam does not currently have a connector for it.
>> 
>>  I've written an UnboundedSource connector that makes use of Redis
>> Streams as a POC and it seems to work well.
>> 
>>  If someone is willing to work with me, I could write up a JIRA
>> and/or open up a WIP pull request if there is interest in getting this as
>> an official connector.  I would mostly need guidance on naming/testing
>> aspects.
>> 
>>  https://redis.io/topics/streams-intro
>> 
>>  ~Vincent
>> >>
>> >>
>> >> ~Vincent
>>
>


Re: Create External Transform with WindowFn

2020-11-30 Thread Boyuan Zhang
Hi Steve,

Unfortunately I don't think there is a workaround before we have the change
that Cham mentions.

On Mon, Nov 30, 2020 at 8:16 AM Steve Niemitz  wrote:

> I'm trying to write an xlang transform that uses Reshuffle internally, and
> ran into this as well.  Is there any workaround to this for now (other than
> removing the reshuffle), or do I just need to wait for what Chamikara
> mentioned?  I noticed the same issue was mentioned in the SnowflakeIO.Read
> PR as well [1].
>
> https://github.com/apache/beam/pull/12149#discussion_r463710165
>
> On Wed, Aug 26, 2020 at 10:55 PM Boyuan Zhang  wrote:
>
>> That explains a lot. Thanks, Cham!
>>
>> On Wed, Aug 26, 2020 at 7:44 PM Chamikara Jayalath 
>> wrote:
>>
>>> Due to the proto -> object -> proto conversion we do today, Python needs
>>> to parse the full sub-graph from Java. We have hooks for PTransforms and
>>> Coders but not for windowing operations. This limitation will go away after
>>> we have direct Beam proto to Dataflow proto conversion in place.
>>>
>>> On Wed, Aug 26, 2020 at 7:03 PM Robert Burke  wrote:
>>>
 Coders should only be checked over the language boundaries.

 On Wed, Aug 26, 2020, 6:24 PM Boyuan Zhang  wrote:

> Thanks Cham!
>
>  I just realized that the *beam:window_fn:serialized_**java:v1 *is
> introduced by Java *Reshuffle.viaRandomKey()*. But
> *Reshuffle.viaRandomKey()* does rewindowed into original window
> strategy(which is *GlobalWindows *in my case). Is it expected that we
> also check intermediate PCollection rather than only the PCollection that
> across the language boundary?
>
> More about my Ptransform:
> MyExternalPTransform  -- expand to --  ParDo() ->
> Reshuffle.viaRandomKey() -> ParDo() -> WindowInto(FixWindow) -> ParDo() ->
> output void
>
>|
>
> -> ParDo()
> -> output PCollection to Python SDK
>
> On Tue, Aug 25, 2020 at 6:29 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Also it's strange that Java used (beam:window_fn:serialized_java:v1)
>> for the URN here instead of "beam:window_fn:fixed_windows:v1" [1]
>> which is what is being registered by Python [2]. This seems to be the
>> immediate issue. Tracking bug for supporting custom windows is
>> https://issues.apache.org/jira/browse/BEAM-10507.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/standard_window_fns.proto#L55
>> [2]
>> https://github.com/apache/beam/blob/bd4df94ae10a7e7b0763c1917746d2faf5aeed6c/sdks/python/apache_beam/transforms/window.py#L449
>>
>> On Tue, Aug 25, 2020 at 6:07 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Pipelines that use external WindowingStrategies might be failing
>>> during proto -> object -> proto conversion we do today. This limitation
>>> will go away once Dataflow directly starts reading Beam protos. We are
>>> working on this now.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Tue, Aug 25, 2020 at 5:38 PM Boyuan Zhang 
>>> wrote:
>>>
 Thanks, Robert! I want to add more details on my External
 PTransform:

 MyExternalPTransform  -- expand to --  ParDo() ->
 WindowInto(FixWindow) -> ParDo() -> output void

 |

 -> ParDo() -> output PCollection to Python SDK
 The full stacktrace:

 INFO:root:Using Java SDK harness container image 
 dataflow-dev.gcr.io/boyuanz/java:latest
 Starting expansion service at localhost:53569
 Aug 13, 2020 7:42:11 PM 
 org.apache.beam.sdk.expansion.service.ExpansionService 
 loadRegisteredTransforms
 INFO: Registering external transforms: 
 [beam:external:java:kafka:read:v1, beam:external:java:kafka:write:v1, 
 beam:external:java:jdbc:read_rows:v1, 
 beam:external:java:jdbc:write:v1, 
 beam:external:java:generate_sequence:v1]
beam:external:java:kafka:read:v1: 
 org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@4ac68d3e
beam:external:java:kafka:write:v1: 
 org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@277c0f21
beam:external:java:jdbc:read_rows:v1: 
 org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@6073f712
beam:external:java:jdbc:write:v1: 
 org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@43556938
beam:external:java:generate_sequence:v1: 
 

Re: Documentation for Cross-Language Transforms

2020-11-30 Thread Chamikara Jayalath
On Wed, Nov 25, 2020 at 11:09 AM Alexey Romanenko 
wrote:

> Great job, it should be very helpful for users!
>
> Just a minor note - it would be great to add an example of how to finally
> run a cross-language pipeline with Portable Runner since, iirc, it was
> supposed to pass some additional arguments, like “
> *--experiments=beam_fn_api*”.
>

+1. I haven't had time to fully test current examples (SQL
,
Kafka
)
on portable runners but feel free to update if you have the
relevant commands at hand.

Thanks,
Cham


>
> On 21 Nov 2020, at 03:36, Chamikara Jayalath  wrote:
>
> PR went in and documentation is live now:
> https://beam.apache.org/documentation/programming-guide/#mulit-language-pipelines
>
> Thanks,
> Cham
>
> On Wed, Nov 18, 2020 at 10:05 AM Chamikara Jayalath 
> wrote:
>
>> This was mentioned in a separate thread but thought it would be good to
>> highlight here in case more folks wish to take a look before the PR is
>> merged.
>>
>> PR is https://github.com/apache/beam/pull/13317
>>
>> Thanks,
>> Cham
>>
>> On Thu, Nov 12, 2020 at 1:17 PM Chamikara Jayalath 
>> wrote:
>>
>>> Seems like a good place to promote this PR that adds documentation for
>>> cross-language transforms :)
>>> https://github.com/apache/beam/pull/13317
>>>
>>> This covers the following for both Java and Python SDKs.
>>> * Creating new cross-language transforms - primary audience will be
>>> transform authors who wish to make existing Java/Python transforms
>>> available to other SDKs.
>>> * Using cross-language transforms - primary audience will be pipeline
>>> authors that wish to use existing cross-language transforms with or without
>>> language specific wrappers.
>>>
>>> Also this introduces the term "Multi-Language Pipelines" to denote
>>> pipelines that use cross-language transforms (and hence utilize more than
>>> one SDK language).
>>>
>>> Thanks +Dave Wrede  for working on this.
>>>
>>> - Cham
>>>
>>> On Thu, Nov 12, 2020 at 4:56 AM Ismaël Mejía  wrote:
>>>
 I was not aware of these examples Brian, thanks for sharing. Maybe we
 should
 make these examples more discoverable on the website or as part of
 Beam's
 programming guide.

 It would be nice to have an example of the opposite too, calling a
 Python
 transform from Java.

 Additionally Java users who want to integrate python might be lost
 because
 External is NOT part of Beam's Java SDK (the transform is hidden inside
 of a
 different module core-construction-java), so it does not even appear in
 the
 website SDK javadoc.
 https://issues.apache.org/jira/browse/BEAM-8546


 On Wed, Nov 11, 2020 at 8:41 PM Brian Hulette 
 wrote:
 >
 > Hi Ke,
 >
 > A cross-language pipeline looks a lot like a pipeline written
 natively in one of the Beam SDKs, the difference is that some of the
 transforms in the pipeline may be "external transforms" that actually have
 implementations in a different language. There are a few examples in the
 beam repo that use Java transforms from Python pipelines:
 > - kafkataxi [1]: Uses Java's KafkaIO from Python
 > - wordcount_xlang_sql [2] and sql_taxi [3]: Use Java's SqlTransform
 from Python
 >
 > To create your own cross-language pipeline, you'll need to decide
 which SDK you want to use primarily, and then create an expansion service
 to expose the transforms you want to use from the other SDK (if one doesn't
 exist already).
 >
 > [1]
 https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/kafkataxi
 > [2]
 https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang_sql.py
 > [3]
 https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py
 >
 > On Wed, Nov 11, 2020 at 11:07 AM Ke Wu  wrote:
 >>
 >> Hello,
 >>
 >> Is there an example demonstrating how a cross language pipeline look
 like? e.g. a pipeline where it is composes of Java and Python
 code/transforms.
 >>
 >> Best,
 >> Ke

>>>
>


Re: Implementing an IO Connector for Debezium

2020-11-30 Thread Bashir Sadjad
Thanks Boyuan for the pointers.

If you or anyone else here have any recommendations about the two
approaches, i.e., implementing a connector for Beam using the embedded
version of Debezium or relying on Kafka (even for the single node case),
that would be great too.

Regards

-B

On Wed, Nov 25, 2020 at 1:37 PM Boyuan Zhang  wrote:

> +dev 
>
> Hi Bashir,
>
> Most recently we are recommending to use Splittable DoFn[1] to build new
> IO connectors. We have several examples for that in our codebase:
> Java examples:
>
>-
>
>Kafka
>
> 
>- An I/O connector for Apache Kafka  (an
>open-source distributed event streaming platform).
>-
>
>Watch
>
> 
>- Uses a polling function producing a growing set of outputs for each input
>until a per-input termination condition is met.
>-
>
>Parquet
>
> 
>- An I/O connector for Apache Parquet 
>(an open-source columnar storage format).
>-
>
>HL7v2
>
> 
>- An I/O connector for HL7v2 messages (a clinical messaging format that
>provides data about events that occur inside an organization) part of 
> Google’s
>Cloud Healthcare API .
>-
>
>BoundedSource wrapper
>
> 
>- A wrapper which converts an existing BoundedSource implementation to a
>splittable DoFn.
>-
>
>UnboundedSource wrapper
>
> 
>- A wrapper which converts an existing UnboundedSource implementation to a
>splittable DoFn.
>
>
> Python examples:
>
>- BoundedSourceWrapper
>
> 
>- A wrapper which converts an existing BoundedSource implementation to a
>splittable DoFn.
>
>
> [1]
> https://beam.apache.org/documentation/programming-guide/#splittable-dofns
>
> On Wed, Nov 25, 2020 at 8:19 AM Bashir Sadjad  wrote:
>
>> Hi,
>>
>> I have a scenario in which a streaming pipeline should read update
>> messages from MySQL binlog (through Debezium). To implement this pipeline
>> using Beam, I understand there is a KafkaIO which I can use. But I also
>> want to support a local mode in which there is no Kafka and the messages
>> are directly consumed using embedded Debezium because this is a much
>> simpler architecture (no Kafka, ZooKeeper, and Kafka Connect).
>>
>> I did a little bit of search and it seems there is no IO connector for
>> Debezim, hence I have to implement one following this guide
>> . I wonder:
>>
>> 1) Does this approach make sense or is it better to rely on Kafka even
>> for the local single machine use case?
>>
>> 2) Beside the above guide, is there any simple example IO that I can
>> follow to implement the UnboundedSource/Reader? I have looked at some
>> examples here  but
>> was wondering if there is a recommended/simple one as a tutorial.
>>
>> Thanks
>>
>> -B
>> P.S. If this is better suited for dev@, please feel free to move it to
>> that list.
>>
>


Re: Create External Transform with WindowFn

2020-11-30 Thread Steve Niemitz
I'm trying to write an xlang transform that uses Reshuffle internally, and
ran into this as well.  Is there any workaround to this for now (other than
removing the reshuffle), or do I just need to wait for what Chamikara
mentioned?  I noticed the same issue was mentioned in the SnowflakeIO.Read
PR as well [1].

https://github.com/apache/beam/pull/12149#discussion_r463710165

On Wed, Aug 26, 2020 at 10:55 PM Boyuan Zhang  wrote:

> That explains a lot. Thanks, Cham!
>
> On Wed, Aug 26, 2020 at 7:44 PM Chamikara Jayalath 
> wrote:
>
>> Due to the proto -> object -> proto conversion we do today, Python needs
>> to parse the full sub-graph from Java. We have hooks for PTransforms and
>> Coders but not for windowing operations. This limitation will go away after
>> we have direct Beam proto to Dataflow proto conversion in place.
>>
>> On Wed, Aug 26, 2020 at 7:03 PM Robert Burke  wrote:
>>
>>> Coders should only be checked over the language boundaries.
>>>
>>> On Wed, Aug 26, 2020, 6:24 PM Boyuan Zhang  wrote:
>>>
 Thanks Cham!

  I just realized that the *beam:window_fn:serialized_**java:v1 *is
 introduced by Java *Reshuffle.viaRandomKey()*. But
 *Reshuffle.viaRandomKey()* does rewindowed into original window
 strategy(which is *GlobalWindows *in my case). Is it expected that we
 also check intermediate PCollection rather than only the PCollection that
 across the language boundary?

 More about my Ptransform:
 MyExternalPTransform  -- expand to --  ParDo() ->
 Reshuffle.viaRandomKey() -> ParDo() -> WindowInto(FixWindow) -> ParDo() ->
 output void

  |

   -> ParDo() ->
 output PCollection to Python SDK

 On Tue, Aug 25, 2020 at 6:29 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Also it's strange that Java used (beam:window_fn:serialized_java:v1)
> for the URN here instead of "beam:window_fn:fixed_windows:v1" [1]
> which is what is being registered by Python [2]. This seems to be the
> immediate issue. Tracking bug for supporting custom windows is
> https://issues.apache.org/jira/browse/BEAM-10507.
>
> [1]
> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/standard_window_fns.proto#L55
> [2]
> https://github.com/apache/beam/blob/bd4df94ae10a7e7b0763c1917746d2faf5aeed6c/sdks/python/apache_beam/transforms/window.py#L449
>
> On Tue, Aug 25, 2020 at 6:07 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Pipelines that use external WindowingStrategies might be failing
>> during proto -> object -> proto conversion we do today. This limitation
>> will go away once Dataflow directly starts reading Beam protos. We are
>> working on this now.
>>
>> Thanks,
>> Cham
>>
>> On Tue, Aug 25, 2020 at 5:38 PM Boyuan Zhang 
>> wrote:
>>
>>> Thanks, Robert! I want to add more details on my External PTransform:
>>>
>>> MyExternalPTransform  -- expand to --  ParDo() ->
>>> WindowInto(FixWindow) -> ParDo() -> output void
>>> |
>>>
>>> -> ParDo() -> output PCollection to Python SDK
>>> The full stacktrace:
>>>
>>> INFO:root:Using Java SDK harness container image 
>>> dataflow-dev.gcr.io/boyuanz/java:latest
>>> Starting expansion service at localhost:53569
>>> Aug 13, 2020 7:42:11 PM 
>>> org.apache.beam.sdk.expansion.service.ExpansionService 
>>> loadRegisteredTransforms
>>> INFO: Registering external transforms: 
>>> [beam:external:java:kafka:read:v1, beam:external:java:kafka:write:v1, 
>>> beam:external:java:jdbc:read_rows:v1, beam:external:java:jdbc:write:v1, 
>>> beam:external:java:generate_sequence:v1]
>>> beam:external:java:kafka:read:v1: 
>>> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@4ac68d3e
>>> beam:external:java:kafka:write:v1: 
>>> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@277c0f21
>>> beam:external:java:jdbc:read_rows:v1: 
>>> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@6073f712
>>> beam:external:java:jdbc:write:v1: 
>>> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@43556938
>>> beam:external:java:generate_sequence:v1: 
>>> org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$$Lambda$8/0x000800b2a440@3d04a311
>>> WARNING:apache_beam.options.pipeline_options_validator:Option --zone is 
>>> deprecated. Please use 

Beam Dependency Check Report (2020-11-30)

2020-11-30 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
dill
0.3.1.1
0.3.3
2019-10-07
2020-11-02BEAM-11167
google-cloud-bigquery
1.28.0
2.4.0
2020-10-05
2020-11-23BEAM-5537
google-cloud-build
2.0.0
3.0.0
2020-11-09
2020-11-09BEAM-11204
google-cloud-datastore
1.15.3
2.0.1
2020-11-16
2020-11-23BEAM-8443
google-cloud-dlp
1.0.0
2.0.0
2020-06-29
2020-10-05BEAM-10344
google-cloud-language
1.3.0
2.0.0
2020-10-26
2020-10-26BEAM-8
google-cloud-pubsub
1.7.0
2.1.0
2020-07-20
2020-10-05BEAM-5539
google-cloud-spanner
1.19.1
2.0.0
2020-11-16
2020-11-16BEAM-10345
google-cloud-videointelligence
1.16.1
2.0.0
2020-11-23
2020-11-23BEAM-11319
google-cloud-vision
1.0.0
2.0.0
2020-03-24
2020-10-05BEAM-9581
grpcio-tools
1.30.0
1.33.2
2020-06-29
2020-11-02BEAM-9582
mock
2.0.0
4.0.2
2019-05-20
2020-10-05BEAM-7369
mypy-protobuf
1.18
1.23
2020-03-24
2020-06-29BEAM-10346
nbconvert
5.6.1
6.0.7
2020-10-05
2020-10-05BEAM-11007
Pillow
7.2.0
8.0.1
2020-10-19
2020-10-26BEAM-11071
PyHamcrest
1.10.1
2.0.2
2020-01-20
2020-07-08BEAM-9155
pytest
4.6.11
6.1.2
2020-07-08
2020-11-02BEAM-8606
pytest-xdist
1.34.0
2.1.0
2020-08-17
2020-08-28BEAM-10713
tenacity
5.1.5
6.2.0
2019-11-11
2020-06-29BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.datastax.cassandra:cassandra-driver-core
3.10.2
4.0.0
2020-08-26
2019-03-18BEAM-8674
com.esotericsoftware:kryo
4.0.2
5.0.1
2018-03-20
2020-11-22BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.33.0
0.36.0
2020-09-14
2020-11-09BEAM-6645
com.github.jk1.dependency-license-report:com.github.jk1.dependency-license-report.gradle.plugin
1.13
1.16
2020-06-29
2020-10-26BEAM-11120
com.google.api.grpc:grpc-google-common-protos
1.18.1
2.0.1
2020-08-11
2020-11-02BEAM-8633
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1
2.0.2
3.0.4
2020-10-02
2020-11-17BEAM-8682
com.google.api.grpc:proto-google-common-protos
1.18.1
2.0.1
2020-08-11
2020-11-02BEAM-6899
com.google.apis:google-api-services-bigquery
v2-rev20200916-1.30.10
v2-rev20201030-1.30.10
2020-09-30
2020-11-06BEAM-8684
com.google.apis:google-api-services-clouddebugger
v2-rev20200501-1.30.10
v2-rev20200807-1.30.10
2020-07-14
2020-08-17BEAM-8750
com.google.apis:google-api-services-cloudresourcemanager
v1-rev20200720-1.30.10
v2-rev2020-1.30.10
2020-07-25
2020-11-12BEAM-8751
com.google.apis:google-api-services-dataflow
v1b3-rev20200713-1.30.10
v1beta3-rev12-1.20.0
2020-07-25
2015-04-29BEAM-8752
com.google.apis:google-api-services-healthcare
v1beta1-rev20200713-1.30.10
v1-rev20201110-1.30.10
2020-07-24
2020-11-18BEAM-10349
com.google.apis:google-api-services-pubsub
v1-rev20200713-1.30.10
v1-rev20201110-1.30.10
2020-07-25
2020-11-19BEAM-8753
com.google.apis:google-api-services-storage
v1-rev20200927-1.30.10
v1-rev20201112-1.30.10
2020-10-03
2020-11-19BEAM-8754
com.google.auto.service:auto-service
1.0-rc6
1.0-rc7
2019-07-16
2020-05-13BEAM-5541
com.google.auto.service:auto-service-annotations
1.0-rc6
1.0-rc7
2019-07-16
2020-05-13BEAM-10350
com.google.cloud:google-cloud-bigquery
1.122.2
1.125.0
2020-10-09
2020-11-19BEAM-8687

Re: Running direct runner test on windows

2020-11-30 Thread Tobiasz Kędzierski
Hi Haizhou,

I suppose these quotations mark are handled here:
https://github.com/apache/beam/blob/8cdb8075b7d28e84591b13fb01d0144d941a5ef2/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1518

I was struggling with them while doing GA workflows which runs on
linux/mac/windows:
https://github.com/apache/beam/blob/master/.github/workflows/java_tests.yml

BR
Tobiasz

On Sun, Nov 29, 2020 at 8:45 AM Haizhou Zhao 
wrote:

> Hello Folks,
>
> I'm Haizhou, and I'm new to Beam code base. When I was running
> 'needsRunnerTests' with  'DirectRunner' on Windows 10, I found that the
> pipeline option[1] could not be parsed on Windows 10 but was running
> perfectly on my ubuntu desktop.
>
> It seems after groovy setting system property and java getting system
> property at the time of pipeline construction, Windows will drop the quotes
> so that the string
>
> ["--runner=DirectRunner", "--runnerDeterminedSharding=false"]
>
> becomes
>
> [--runner=DirectRunner, --runnerDeterminedSharding=false]
>
> which will fail the object mapper parsing. What solved the issue for me on
> Windows 10 was adding single quotes around, like
>
> ['"--runner=DirectRunner"', '"--runnerDeterminedSharding=false"']
>
> But, the above modification does not work on ubuntu/linux. Not an expert
> on OS encoding, I was wondering if anyone has run into the same issue
> before, and is there a good way to support this test on both Operating
> Systems.
>
> Thank you,
> Haizhou Zhao
>
> [1]
> https://github.com/apache/beam/blob/master/runners/direct-java/build.gradle#L104
>
>
>