Unset / delete Timers

2020-01-07 Thread Reza Rokni
Hi,

Was exploring the ability to add unset / reset option for Timer, would this
be an expensive operation for runners to support?

More complex State and Timer use cases can require this operation and while
its possible to do today using a separate State Object, its heavy on boiler
plate and cumbersome.

Cheers
Reza







-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: RabbitMQ and CheckpointMark feasibility

2020-01-07 Thread Kenneth Knowles
I just took a look at the PR - it is, indeed, huge. But it is probably not
too hard to review as it is mostly fresh code. It is true that there hasn't
been a ton of work on RabbitMQ so maybe the reviewer isn't obvious. There's
3 committers on this thread who seem to have the expertise and interest...

Kenn

On Mon, Jan 6, 2020 at 2:25 PM Daniel Robert  wrote:

> Alright, a bit late but this took me a while.
>
> Thanks for all the input so far. I have rewritten much of the RabbitMq IO
> connector and have it ready to go in a draft pr:
> https://github.com/apache/beam/pull/10509
>
> This should incorporate a lot of what's been discussed here, in terms of
> watermarking, serialization, error handling, etc. It also clarifies/cleans
> up a lot of very confusing documentation/api settings pertaining to using
> 'queues vs exchanges' and adds clarifying documentation on various valid
> AMQP paradigms.
>
> Watermarking/timestamp management is mostly stolen from KafkaIO and
> modified as appropriate.
>
> This also does a lot to improve resource management in terms of Connection
> and Channel usage, largely modeled after JdbcIO's ConnectionHandlerProvider
> concept.
>
> I'm not entirely sure how best to proceed from here, hence the email. It's
> a huge PR, but it has no specific backing ticket (it should), and
> historically there haven't been many eyes on RabbitMq PRs.
>
> Thanks,
> -Danny
> On 11/14/19 4:13 PM, Jan Lukavský wrote:
>
> On 11/14/19 9:50 PM, Daniel Robert wrote:
>
> Alright, thanks everybody. I'm really appreciative of the conversation
> here. I think I see where my disconnect is and how this might all work
> together for me. There are some bugs in the current rabbit implementation
> that I think have confused my understanding of the intended semantics. I'm
> coming around to seeing how such a system with rabbit's restrictions can
> work properly in Beam (I'd totally forgotten about 'dedupe' support in
> Beam) but I want to clarify some implementation questions after pulling
> everyone's notes together.
>
> RabbitMQ reader should not bother accepting an existing CheckpointMark in
> its constructor (in 'ack-based' systems this is unnecessary per Eugene's
> original reply). It should construct its own CheckpointMark at construction
> time and use it throughout its lifecycle.
>
> At some point later, the CheckpointMark will be 'finalized'. If this
> CheckpointMark has been Serialized (via Coder or otherwise) or its
> underlying connection has been severed, this step will fail. This would
> mean at some point the messages are redelivered to Beam on some other
> Reader, so no data loss. If it has not been serialized, the acks will take
> place just fine, even if much later.
>
> If the system is using processing-time as event-time, however, the
> redelivery of these messages would effectively change the ordering and
> potentially the window they arrived in. I *believe* that Beam deduping
> seems to be managed per-window so if 'finalizeCheckpoint' is attempted (and
> fails) would these messages appear in a new window?
>
> This is very much likely to happen with any source, if it would assign
> something like *now* to event time. That is ill defined and if the source
> cannot provide some retry-persistent estimate of real event-time, than I'd
> suggest to force user to specify an UDF to extract event time from the
> payload. Everything else would probably break (at least if any
> timestamp-related windowing would be used in the pipeline).
>
> Perhaps my question are now:
> - how should a CheckpointMark should communicate failure to the Beam
>
> An exception thrown should fail the checkpoint and therefore retry
> everything from the last checkpoint.
>
> - how does Beam handle a CheckpointMark.finalizeCheckpoint failure, if the
> API dictates such a thing?
>
> See above.
>
> - is there a provision that would need to be made for processing-time
> sources that can fail a checkpointmark.finalizeCheckpoint call? (I'm
> nervous redelivered messages would appear in another window)
>
> You are nervous for a reason. :) I strongly believe processing time source
> should be considered anti-pattern, at least in situations where there is
> any time manipulation downstream (time-windows, stateful processing, ...).
>
> - What is the relationship lifecycle-wise between a CheckpointMark and a
> Reader? My understanding is a CheckpointMark may outlive a Reader, is that
> correct?
>
> Definitely. But the same instance bound to the lifecycle of the reader
> would be used to finalizeCheckpoint (if that ever happens).
>
> Thanks for bearing with me everyone. It feels a bit unfortunate my first
> foray into beam is reliant on this rabbit connector but I'm learning a lot
> and I'm very grateful for the help. PRs pending once I get this all
> straightened out in my head.
>
> -Danny
> On 11/14/19 2:35 PM, Eugene Kirpichov wrote:
>
> Hi Daniel,
>
>
> On Wed, Nov 13, 2019 at 8:26 PM Daniel Robert 
> wrote:
>
>> I believe I've nailed 

[DISCUSS] Python static type checkers

2020-01-07 Thread Udi Meiri
Hi,
We recently added mypy to the Jenkins Lint job for PRs (currently ignores
errors). Mypy is a static type checker.

There's a JIRA for adding another static type checker named pytype
https://issues.apache.org/jira/browse/BEAM-9064

I wanted to ask the community their thoughts on this. (see JIRA issue
comments as well)

- Should PRs have to pass more than 1 static type checker? (in pre-commit
tests)
- If not, should the remaining type checkers be run as a post-commit tests?
- How much effort should be put into supporting more than 1 type checker?
(i.e. making sure that they all pass)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Mark Liu
Thank you Kenn. I'm also asking white-list for Beam jobs in INFRA-19670
 if possible.

Here are my experiences of job trigger in
https://github.com/apache/beam/pull/10051 (created by non-committer):
- committer's trigger ("Run XVR_Flink PostCommit") works, and job was
waiting in the queue immediately (job link

).
- there is no triggered job showing in the bottom of the PR page. Guess
something could go wrong in the Github plugin.
- sometimes manual trigger may miss jobs which could be hard to discover.

Mark

On Tue, Jan 7, 2020 at 3:08 PM Robert Bradshaw  wrote:

> I agree. If this can't be done, perhaps we could have a basic suite of
> smoke tests (at least) run on TravisCI.
>
> On Tue, Jan 7, 2020 at 2:53 PM Kenneth Knowles  wrote:
>
>> This new policy seems pretty unwelcoming. I would like to work with INFRA
>> to see if we can set up a sufficient sandbox that the security concern goes
>> away. Clearly this has been solved many times.
>>
>> Kenn
>>
>> On Tue, Jan 7, 2020 at 2:45 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> jiangkai@ - done.
>>>
>>> I've been reviewing a few PRs, e.g. [1], from  contributors who are not
>>> committers. My experience is as follows:
>>> - tests are not triggered by default, but  trigger as soon as a
>>> committer leaves any comment on the PR . This happens only once, unless new
>>> commits are added to the PR.
>>> - sometimes committer's comment trigger only a subset of test suites,
>>> which creates an illusion that all test suites are passing while some were
>>> not triggered.
>>> - Some tests suites never trigger within a reasonable timeframe after
>>> "run suite X" command. For example, Run PythonLint precommit didn't trigger
>>> the suite after two requests, but did trigger it an hour later after yet
>>> another "Run PythonLint precommit".
>>>
>>> cc: @Mark Liu  @Alan Myrvold 
>>>
>>> [1] https://github.com/apache/beam/pull/10504
>>>
>>>
>>> On Tue, Jan 7, 2020 at 1:18 PM Kai Jiang  wrote:
>>>
 Hi Beam Committer,

 I appreciate if you could trigger precommit checks for
 https://github.com/apache/beam/pull/9903.

 Run Flink ValidatesRunner
 Run Flink Runner Nexmark Tests
 Run SQL Postcommit

 Best,
 Kai

 On Tue, Jan 7, 2020 at 8:17 AM Ismaël Mejía  wrote:

> Done
>
> On Tue, Jan 7, 2020 at 5:09 PM Tomo Suzuki  wrote:
>
>> Hi Ismaël and Beam committer,
>>
>> I appreciate the help! Would you trigger precommit checks for
>> https://github.com/apache/beam/pull/10508. I also want the following
>> checks.
>>
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>> Regards,
>> Tomo
>>
>> On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía 
>> wrote:
>>
>>> Until we address this we can maybe use this thread/list to send the
>>> link for the PR(s) you want to be triggered. and the command if a 
>>> special
>>> one is needed, so committers can help to manually do it.
>>>
>>> On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía 
>>> wrote:
>>>
 Thanks for bringing this info Michał. I think the security goal of
 INFRA makes sense however it adds for committers the additional burden 
 of
 having to manually trigger the CI. I hoped that the PR will run the 
 basic
 precommit tests but it does not.
 We have to (1) discuss a possible workaround or (2) find a way to
 be notified of PRs that have not run its tests.
 Any ideas? This looks like a quite critical issue to address.


 On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
 michal.wale...@polidea.com> wrote:

> According to Daniel Gruno's comment in
> https://issues.apache.org/jira/browse/INFRA-19670 , there was a
> change in Jenkins job execution policy - non-committers can't run 
> Jenkins
> workflows now, as it would be a security flaw in terms of arbitrary 
> code
> execution.
> Does anyone know about this? When exactly was this changed for
> Beam? What are our options for testing our pull requests?
>
>
> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang 
> wrote:
>
>> According to this comment
>> ,
>> it might be a Jenkins bug.
>> Meanwhile, I opened an infra ticket at
>> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>>
>> On Mon, Jan 6, 2020 

Re: Custom window invariants and

2020-01-07 Thread Aaron Dixon
What I'm attempting is a variation on Session windows in which there may
exist a "terminal" element in the stream that immediately stops the session
(or perhaps after some configured delay.)

My implementation behaves just like Sessions until any such "terminal"
element is encountered in which case I mark the window as "terminal" and
all windows "merge down" such that any terminal windows get to dictate the
Interval.end()/Window.maxTimestamp().

So, trivial example, if I have windows W1 [0, 100) and W2 [50, 75, terminal
= true] then the merged result will be W3 [0, 75).

I've been successful doing this so far but I've been inferring some
invariants about windows that I'm not sure are official or documented
anywhere.

The invariants that I've inferred go like this:

(I) Definition. An element is "in" window W if it originated in W or in a
window that was merged into W (, recursively.)

(II) Invariant. Any element, e, in window W MUST have e.timestamp <=
W.maxTimestamp().

So far, I think this is obvious and true stuff (I hope). (It would actually
be better or great if there was a way for II to not have to hold, but that
is a whole other separate discussion I think.)

The main invariant I'm trying to formalize is one that allows me to "merge
down" -- i.e., to merge in such a way that the merged window's
(mergedResult's) maxTimestamp *is less than* one of the source's
(toBeMerged's) windows' maxTimestamp.

The (undocumented?) invariant I've been working from goes something like
this:

(III) Corollary. Windows W1 and W2 can merge such that either
maxTimestamp() is regressed (moved backward in time aka "merge down") in
the merged window -- however they cannot merge such that (II) is ever
violated.

Is this correct?

(If you can this can be confirmed, I'll go back and ensure I'm not
violating the merge() precondition and these invariants and post some code
if needed..) Thank you for assistance heere!


On Tue, Jan 7, 2020 at 4:21 PM Reuven Lax  wrote:

> Have you used Dataflow's update feature on this pipeline? Also, do
> you have the code for your WindowFn?
>
> On Tue, Jan 7, 2020 at 12:05 PM Aaron Dixon  wrote:
>
>> Dataflow. (See stacktrace)
>>
>> On Tue, Jan 7, 2020 at 1:50 PM Reuven Lax  wrote:
>>
>>> Which runner are you using?
>>>
>>> On Tue, Jan 7, 2020, 11:17 AM Aaron Dixon  wrote:
>>>
 I get an IllegalStateException " is in more than one state
 address window set" (stacktrace below).

 What does this mean? What invariant of custom window implementation
 & merging am I violating?

 Thank you for any advise.

 ```
 java.lang.IllegalStateException:
 {[2019-12-05T01:36:48.870Z..2019-12-05T01:36:48.871Z),terminal} is in more
 than one state address window set
 at
 org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState
 (Preconditions.java:588)
 at
 org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.checkInvariants
 (MergingActiveWindowSet.java:334)
 at
 org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.persist
 (MergingActiveWindowSet.java:88)
 at
 org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.persist
 (ReduceFnRunner.java:380)
 at
 org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement
 (StreamingGroupAlsoByWindowViaWindowSetFn.java:96)
 ...
 ```

>>>


Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Robert Bradshaw
I agree. If this can't be done, perhaps we could have a basic suite of
smoke tests (at least) run on TravisCI.

On Tue, Jan 7, 2020 at 2:53 PM Kenneth Knowles  wrote:

> This new policy seems pretty unwelcoming. I would like to work with INFRA
> to see if we can set up a sufficient sandbox that the security concern goes
> away. Clearly this has been solved many times.
>
> Kenn
>
> On Tue, Jan 7, 2020 at 2:45 PM Valentyn Tymofieiev 
> wrote:
>
>> jiangkai@ - done.
>>
>> I've been reviewing a few PRs, e.g. [1], from  contributors who are not
>> committers. My experience is as follows:
>> - tests are not triggered by default, but  trigger as soon as a committer
>> leaves any comment on the PR . This happens only once, unless new commits
>> are added to the PR.
>> - sometimes committer's comment trigger only a subset of test suites,
>> which creates an illusion that all test suites are passing while some were
>> not triggered.
>> - Some tests suites never trigger within a reasonable timeframe after
>> "run suite X" command. For example, Run PythonLint precommit didn't trigger
>> the suite after two requests, but did trigger it an hour later after yet
>> another "Run PythonLint precommit".
>>
>> cc: @Mark Liu  @Alan Myrvold 
>>
>> [1] https://github.com/apache/beam/pull/10504
>>
>>
>> On Tue, Jan 7, 2020 at 1:18 PM Kai Jiang  wrote:
>>
>>> Hi Beam Committer,
>>>
>>> I appreciate if you could trigger precommit checks for
>>> https://github.com/apache/beam/pull/9903.
>>>
>>> Run Flink ValidatesRunner
>>> Run Flink Runner Nexmark Tests
>>> Run SQL Postcommit
>>>
>>> Best,
>>> Kai
>>>
>>> On Tue, Jan 7, 2020 at 8:17 AM Ismaël Mejía  wrote:
>>>
 Done

 On Tue, Jan 7, 2020 at 5:09 PM Tomo Suzuki  wrote:

> Hi Ismaël and Beam committer,
>
> I appreciate the help! Would you trigger precommit checks for
> https://github.com/apache/beam/pull/10508. I also want the following
> checks.
>
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>
> On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía  wrote:
>
>> Until we address this we can maybe use this thread/list to send the
>> link for the PR(s) you want to be triggered. and the command if a special
>> one is needed, so committers can help to manually do it.
>>
>> On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía 
>> wrote:
>>
>>> Thanks for bringing this info Michał. I think the security goal of
>>> INFRA makes sense however it adds for committers the additional burden 
>>> of
>>> having to manually trigger the CI. I hoped that the PR will run the 
>>> basic
>>> precommit tests but it does not.
>>> We have to (1) discuss a possible workaround or (2) find a way to be
>>> notified of PRs that have not run its tests.
>>> Any ideas? This looks like a quite critical issue to address.
>>>
>>>
>>> On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 According to Daniel Gruno's comment in
 https://issues.apache.org/jira/browse/INFRA-19670 , there was a
 change in Jenkins job execution policy - non-committers can't run 
 Jenkins
 workflows now, as it would be a security flaw in terms of arbitrary 
 code
 execution.
 Does anyone know about this? When exactly was this changed for
 Beam? What are our options for testing our pull requests?


 On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang 
 wrote:

> According to this comment
> ,
> it might be a Jenkins bug.
> Meanwhile, I opened an infra ticket at
> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>
> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud <
> apill...@google.com> wrote:
>
>> "Run precommits" seems to work sometimes:
>> https://github.com/apache/beam/pull/10455
>>
>> Has anyone opened a ticket with apache infra?
>>
>> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
>> rehman.murad...@venturedive.com> wrote:
>>
>>> +1:  https://github.com/apache/beam/pull/10506
>>>
>>> any solution yet?
>>>
>>>
>>>
>>> *Thanks & Regards*
>>>
>>>
>>> 
>>>
>>> *Rehman Murad Ali*
>>> Software Engineer
>>> Mobile: +92 3452076766 <+92%20345%202076766>
>>> Skype: rehman.muradali
>>>
>>>
>>> On Sat, Jan 4, 2020 at 6:10 AM 

Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Kenneth Knowles
This new policy seems pretty unwelcoming. I would like to work with INFRA
to see if we can set up a sufficient sandbox that the security concern goes
away. Clearly this has been solved many times.

Kenn

On Tue, Jan 7, 2020 at 2:45 PM Valentyn Tymofieiev 
wrote:

> jiangkai@ - done.
>
> I've been reviewing a few PRs, e.g. [1], from  contributors who are not
> committers. My experience is as follows:
> - tests are not triggered by default, but  trigger as soon as a committer
> leaves any comment on the PR . This happens only once, unless new commits
> are added to the PR.
> - sometimes committer's comment trigger only a subset of test suites,
> which creates an illusion that all test suites are passing while some were
> not triggered.
> - Some tests suites never trigger within a reasonable timeframe after "run
> suite X" command. For example, Run PythonLint precommit didn't trigger the
> suite after two requests, but did trigger it an hour later after yet
> another "Run PythonLint precommit".
>
> cc: @Mark Liu  @Alan Myrvold 
>
> [1] https://github.com/apache/beam/pull/10504
>
>
> On Tue, Jan 7, 2020 at 1:18 PM Kai Jiang  wrote:
>
>> Hi Beam Committer,
>>
>> I appreciate if you could trigger precommit checks for
>> https://github.com/apache/beam/pull/9903.
>>
>> Run Flink ValidatesRunner
>> Run Flink Runner Nexmark Tests
>> Run SQL Postcommit
>>
>> Best,
>> Kai
>>
>> On Tue, Jan 7, 2020 at 8:17 AM Ismaël Mejía  wrote:
>>
>>> Done
>>>
>>> On Tue, Jan 7, 2020 at 5:09 PM Tomo Suzuki  wrote:
>>>
 Hi Ismaël and Beam committer,

 I appreciate the help! Would you trigger precommit checks for
 https://github.com/apache/beam/pull/10508. I also want the following
 checks.

 Run Java PostCommit
 Run Java HadoopFormatIO Performance Test
 Run BigQueryIO Streaming Performance Test Java
 Run Dataflow ValidatesRunner
 Run Spark ValidatesRunner
 Run SQL Postcommit

 Regards,
 Tomo

 On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía  wrote:

> Until we address this we can maybe use this thread/list to send the
> link for the PR(s) you want to be triggered. and the command if a special
> one is needed, so committers can help to manually do it.
>
> On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía  wrote:
>
>> Thanks for bringing this info Michał. I think the security goal of
>> INFRA makes sense however it adds for committers the additional burden of
>> having to manually trigger the CI. I hoped that the PR will run the basic
>> precommit tests but it does not.
>> We have to (1) discuss a possible workaround or (2) find a way to be
>> notified of PRs that have not run its tests.
>> Any ideas? This looks like a quite critical issue to address.
>>
>>
>> On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> According to Daniel Gruno's comment in
>>> https://issues.apache.org/jira/browse/INFRA-19670 , there was a
>>> change in Jenkins job execution policy - non-committers can't run 
>>> Jenkins
>>> workflows now, as it would be a security flaw in terms of arbitrary code
>>> execution.
>>> Does anyone know about this? When exactly was this changed for Beam?
>>> What are our options for testing our pull requests?
>>>
>>>
>>> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:
>>>
 According to this comment
 ,
 it might be a Jenkins bug.
 Meanwhile, I opened an infra ticket at
 https://issues.apache.org/jira/browse/INFRA-19670 for Beam.

 On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
 wrote:

> "Run precommits" seems to work sometimes:
> https://github.com/apache/beam/pull/10455
>
> Has anyone opened a ticket with apache infra?
>
> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
> rehman.murad...@venturedive.com> wrote:
>
>> +1:  https://github.com/apache/beam/pull/10506
>>
>> any solution yet?
>>
>>
>>
>> *Thanks & Regards*
>>
>>
>> 
>>
>> *Rehman Murad Ali*
>> Software Engineer
>> Mobile: +92 3452076766 <+92%20345%202076766>
>> Skype: rehman.muradali
>>
>>
>> On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee 
>> wrote:
>>
>>> +1: https://github.com/apache/beam/pull/10051
>>>
>>> force-pushing again. retest this please. nothing works :(
>>>
>>> On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 Hi,
 I'm also affected by this - I 

Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Valentyn Tymofieiev
jiangkai@ - done.

I've been reviewing a few PRs, e.g. [1], from  contributors who are not
committers. My experience is as follows:
- tests are not triggered by default, but  trigger as soon as a committer
leaves any comment on the PR . This happens only once, unless new commits
are added to the PR.
- sometimes committer's comment trigger only a subset of test suites, which
creates an illusion that all test suites are passing while some were not
triggered.
- Some tests suites never trigger within a reasonable timeframe after "run
suite X" command. For example, Run PythonLint precommit didn't trigger the
suite after two requests, but did trigger it an hour later after yet
another "Run PythonLint precommit".

cc: @Mark Liu  @Alan Myrvold 

[1] https://github.com/apache/beam/pull/10504


On Tue, Jan 7, 2020 at 1:18 PM Kai Jiang  wrote:

> Hi Beam Committer,
>
> I appreciate if you could trigger precommit checks for
> https://github.com/apache/beam/pull/9903.
>
> Run Flink ValidatesRunner
> Run Flink Runner Nexmark Tests
> Run SQL Postcommit
>
> Best,
> Kai
>
> On Tue, Jan 7, 2020 at 8:17 AM Ismaël Mejía  wrote:
>
>> Done
>>
>> On Tue, Jan 7, 2020 at 5:09 PM Tomo Suzuki  wrote:
>>
>>> Hi Ismaël and Beam committer,
>>>
>>> I appreciate the help! Would you trigger precommit checks for
>>> https://github.com/apache/beam/pull/10508. I also want the following
>>> checks.
>>>
>>> Run Java PostCommit
>>> Run Java HadoopFormatIO Performance Test
>>> Run BigQueryIO Streaming Performance Test Java
>>> Run Dataflow ValidatesRunner
>>> Run Spark ValidatesRunner
>>> Run SQL Postcommit
>>>
>>> Regards,
>>> Tomo
>>>
>>> On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía  wrote:
>>>
 Until we address this we can maybe use this thread/list to send the
 link for the PR(s) you want to be triggered. and the command if a special
 one is needed, so committers can help to manually do it.

 On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía  wrote:

> Thanks for bringing this info Michał. I think the security goal of
> INFRA makes sense however it adds for committers the additional burden of
> having to manually trigger the CI. I hoped that the PR will run the basic
> precommit tests but it does not.
> We have to (1) discuss a possible workaround or (2) find a way to be
> notified of PRs that have not run its tests.
> Any ideas? This looks like a quite critical issue to address.
>
>
> On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
> michal.wale...@polidea.com> wrote:
>
>> According to Daniel Gruno's comment in
>> https://issues.apache.org/jira/browse/INFRA-19670 , there was a
>> change in Jenkins job execution policy - non-committers can't run Jenkins
>> workflows now, as it would be a security flaw in terms of arbitrary code
>> execution.
>> Does anyone know about this? When exactly was this changed for Beam?
>> What are our options for testing our pull requests?
>>
>>
>> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:
>>
>>> According to this comment
>>> ,
>>> it might be a Jenkins bug.
>>> Meanwhile, I opened an infra ticket at
>>> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>>>
>>> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
>>> wrote:
>>>
 "Run precommits" seems to work sometimes:
 https://github.com/apache/beam/pull/10455

 Has anyone opened a ticket with apache infra?

 On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
 rehman.murad...@venturedive.com> wrote:

> +1:  https://github.com/apache/beam/pull/10506
>
> any solution yet?
>
>
>
> *Thanks & Regards*
>
>
> 
>
> *Rehman Murad Ali*
> Software Engineer
> Mobile: +92 3452076766 <+92%20345%202076766>
> Skype: rehman.muradali
>
>
> On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee 
> wrote:
>
>> +1: https://github.com/apache/beam/pull/10051
>>
>> force-pushing again. retest this please. nothing works :(
>>
>> On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> Hi,
>>> I'm also affected by this - I touched my PRs opened before the
>>> holiday break and no jobs were triggered. Do we know what breaks
>>> Jenkins/fixes it when stuff like this happens?
>>> Happy new year,
>>> Michal
>>>
>>> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang 
>>> wrote:
>>>
 Thanks Alan for checking this out! I closed PR 9903 and reopen
 it in pull/10493 

Re: Custom window invariants and

2020-01-07 Thread Reuven Lax
Have you used Dataflow's update feature on this pipeline? Also, do you have
the code for your WindowFn?

On Tue, Jan 7, 2020 at 12:05 PM Aaron Dixon  wrote:

> Dataflow. (See stacktrace)
>
> On Tue, Jan 7, 2020 at 1:50 PM Reuven Lax  wrote:
>
>> Which runner are you using?
>>
>> On Tue, Jan 7, 2020, 11:17 AM Aaron Dixon  wrote:
>>
>>> I get an IllegalStateException " is in more than one state
>>> address window set" (stacktrace below).
>>>
>>> What does this mean? What invariant of custom window implementation
>>> & merging am I violating?
>>>
>>> Thank you for any advise.
>>>
>>> ```
>>> java.lang.IllegalStateException:
>>> {[2019-12-05T01:36:48.870Z..2019-12-05T01:36:48.871Z),terminal} is in more
>>> than one state address window set
>>> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState
>>> (Preconditions.java:588)
>>> at
>>> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.checkInvariants
>>> (MergingActiveWindowSet.java:334)
>>> at
>>> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.persist
>>> (MergingActiveWindowSet.java:88)
>>> at
>>> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.persist
>>> (ReduceFnRunner.java:380)
>>> at
>>> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement
>>> (StreamingGroupAlsoByWindowViaWindowSetFn.java:96)
>>> ...
>>> ```
>>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Kai Jiang
Hi Beam Committer,

I appreciate if you could trigger precommit checks for
https://github.com/apache/beam/pull/9903.

Run Flink ValidatesRunner
Run Flink Runner Nexmark Tests
Run SQL Postcommit

Best,
Kai

On Tue, Jan 7, 2020 at 8:17 AM Ismaël Mejía  wrote:

> Done
>
> On Tue, Jan 7, 2020 at 5:09 PM Tomo Suzuki  wrote:
>
>> Hi Ismaël and Beam committer,
>>
>> I appreciate the help! Would you trigger precommit checks for
>> https://github.com/apache/beam/pull/10508. I also want the following
>> checks.
>>
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>> Regards,
>> Tomo
>>
>> On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía  wrote:
>>
>>> Until we address this we can maybe use this thread/list to send the link
>>> for the PR(s) you want to be triggered. and the command if a special one is
>>> needed, so committers can help to manually do it.
>>>
>>> On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía  wrote:
>>>
 Thanks for bringing this info Michał. I think the security goal of
 INFRA makes sense however it adds for committers the additional burden of
 having to manually trigger the CI. I hoped that the PR will run the basic
 precommit tests but it does not.
 We have to (1) discuss a possible workaround or (2) find a way to be
 notified of PRs that have not run its tests.
 Any ideas? This looks like a quite critical issue to address.


 On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
 michal.wale...@polidea.com> wrote:

> According to Daniel Gruno's comment in
> https://issues.apache.org/jira/browse/INFRA-19670 , there was a
> change in Jenkins job execution policy - non-committers can't run Jenkins
> workflows now, as it would be a security flaw in terms of arbitrary code
> execution.
> Does anyone know about this? When exactly was this changed for Beam?
> What are our options for testing our pull requests?
>
>
> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:
>
>> According to this comment
>> ,
>> it might be a Jenkins bug.
>> Meanwhile, I opened an infra ticket at
>> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>>
>> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
>> wrote:
>>
>>> "Run precommits" seems to work sometimes:
>>> https://github.com/apache/beam/pull/10455
>>>
>>> Has anyone opened a ticket with apache infra?
>>>
>>> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
>>> rehman.murad...@venturedive.com> wrote:
>>>
 +1:  https://github.com/apache/beam/pull/10506

 any solution yet?



 *Thanks & Regards*


 

 *Rehman Murad Ali*
 Software Engineer
 Mobile: +92 3452076766 <+92%20345%202076766>
 Skype: rehman.muradali


 On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee 
 wrote:

> +1: https://github.com/apache/beam/pull/10051
>
> force-pushing again. retest this please. nothing works :(
>
> On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
> michal.wale...@polidea.com> wrote:
>
>> Hi,
>> I'm also affected by this - I touched my PRs opened before the
>> holiday break and no jobs were triggered. Do we know what breaks
>> Jenkins/fixes it when stuff like this happens?
>> Happy new year,
>> Michal
>>
>> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang 
>> wrote:
>>
>>> Thanks Alan for checking this out! I closed PR 9903 and reopen
>>> it in pull/10493 .
>>> It seems new PR still did not trigger jenkins jobs.
>>>
>>> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold 
>>> wrote:
>>>
 Oh, the PR 9903 run is quite old; I don't see a recent one yet.

 On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold <
 amyrv...@google.com> wrote:

> For PR 10427, I see
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
> For PR 9903, I see
> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>
> Maybe the PR status is not being updated when the jobs run?
>
>
> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang 
> wrote:
>
>> same for https://github.com/apache/beam/pull/9903 as well
>>
>> On Thu, Jan 2, 2020 at 

Re: Dropping late data in DirectRunner

2020-01-07 Thread Luke Cwik
That is a really good way to describe my mental model as well.

On Tue, Jan 7, 2020 at 12:20 PM Kenneth Knowles  wrote:

>
>
> On Tue, Jan 7, 2020 at 1:39 AM Jan Lukavský  wrote:
>
>> Hi Kenn,
>>
>> I see that my terminology seems not to be 100% aligned with Beam's. I'll
>> work on that. :-)
>>
>> I agree with what you say, and by "late" I mostly meant "droppable"
>> (arriving too late after watermark).
>>
>> I'm definitely not proposing to get back to something like "out of order"
>> == "late" or anything like that. I'm also aware that stateful operation is
>> windowed operation, but the semantics of the windowing is different than of
>> a GBK. The difference is how time moves in GBK and how moves in stateful
>> DoFn. Throwing away some details (early triggers, late data triggers), the
>> main difference is that in GBK case, time hops just between window
>> boundaries, while in stateful DoFn time moves "smoothly" (with each
>> watermark update). Now, this difference brings the question about why the
>> definition of "droppable" data is the same for both types of operations,
>> when there is a difference in how users "perceive" time. As the more
>> generic operation, stateful DoFn might deserve a more general definition of
>> droppable data, which should degrade naturally to the one of GBK in
>> presence of "discrete time hops".
>>
>
> I understand what you mean. On the other hand, I encourage thinking of
> event time spatially, not as time passing. That is a big part of unifying
> batch/streaming real-time/archival processing. The event time window is a
> secondary key to partition the data (merging windows are slightly more
> complex). All event time windows exist simultaneously. So for both stateful
> ParDo and GBK, I find it helpful to consider this perspective where all
> windows are processed simultaneously / in an arbitrary order not assuming
> windows are ordered at all. Then you see that GBK and stateful ParDo do not
> really treat windows / watermark differently: both of them process a stream
> of data for each (key, window) pair until the watermark informs them that
> the stream is expired, then they GC the state associated with that (key,
> window) pair.
>
> Kenn
>
>> This might have some consequences on how the droppable data should be
>> handled in presence of (early) triggers, because triggerring is actually
>> what makes time to "hop", so we might arrive to a conclusion that we might
>> actually drop any data that has timestamp less than "last trigger time +
>> allowed lateness". This looks appealing to me, because IMO it has strong
>> internal logical consistency. Although it is possible that it would drop
>> more data, which is generally undesirable, I admit that.
>>
>> I'm looking for explanation why the current approach was chosen instead
>> of the other.
>>
>> Jan
>> On 1/7/20 12:52 AM, Kenneth Knowles wrote:
>>
>> This thread has a lot in it, so I am just top-posting.
>>
>>  - Stateful DoFn is a windowed operation; state is per-window. When the
>> window expires, any further inputs are dropped.
>>  - "Late" is not synonymous with out-of-order. It doesn't really have an
>> independent meaning.
>> - For a GBK/Combine "late" means "not included prior to the on-time
>> output", and "droppable" means "arriving after window expiry".
>> - For Stateful DoFn there is no real meaning to "late" except if one
>> is talking about "droppable", which still means "arriving after window
>> expiry". A user may have a special timer where they flip a flag and treat
>> elements after the timer differently.
>>
>> I think the definition of when data is droppable is very simple. We
>> explicitly moved to this definition, away from the "out of order == late",
>> because it is more robust and simpler to think about. Users saw lots of
>> confusing behavior when we had "out of order by allowed lateness ==
>> droppable" logic.
>>
>> Kenn
>>
>> On Mon, Jan 6, 2020 at 1:42 AM Jan Lukavský  wrote:
>>
>>> > Generally the watermark update can overtake elements, because runners
>>> can explicitly ignore late data in the watermark calculation (for good
>>> reason - those elements are already late, so no need to hold up the
>>> watermark advancing any more).
>>> This seems not to affect the decision of _not late_ vs. _late_, is it?
>>> If element is late and gets ignored from watermark calculation (whatever
>>> that includes in this context), than the watermark cannot move past
>>> elements that were not marked as _not late_ and thus nothing can make them
>>> _late_.
>>>
>>> > For GBK on-time data simply means the first pane marked as on time.
>>> For state+timers I don't think it makes sense for Beam to define on-time
>>> v.s. late, rather I think the user can come up with their own definition
>>> depending on their use case. For example, if you are buffering data into
>>> BagState and setting a timer to process it, it would be logical to say that
>>> any element that was buffered before the timer expired is on 

Re: Dropping late data in DirectRunner

2020-01-07 Thread Kenneth Knowles
On Tue, Jan 7, 2020 at 1:39 AM Jan Lukavský  wrote:

> Hi Kenn,
>
> I see that my terminology seems not to be 100% aligned with Beam's. I'll
> work on that. :-)
>
> I agree with what you say, and by "late" I mostly meant "droppable"
> (arriving too late after watermark).
>
> I'm definitely not proposing to get back to something like "out of order"
> == "late" or anything like that. I'm also aware that stateful operation is
> windowed operation, but the semantics of the windowing is different than of
> a GBK. The difference is how time moves in GBK and how moves in stateful
> DoFn. Throwing away some details (early triggers, late data triggers), the
> main difference is that in GBK case, time hops just between window
> boundaries, while in stateful DoFn time moves "smoothly" (with each
> watermark update). Now, this difference brings the question about why the
> definition of "droppable" data is the same for both types of operations,
> when there is a difference in how users "perceive" time. As the more
> generic operation, stateful DoFn might deserve a more general definition of
> droppable data, which should degrade naturally to the one of GBK in
> presence of "discrete time hops".
>

I understand what you mean. On the other hand, I encourage thinking of
event time spatially, not as time passing. That is a big part of unifying
batch/streaming real-time/archival processing. The event time window is a
secondary key to partition the data (merging windows are slightly more
complex). All event time windows exist simultaneously. So for both stateful
ParDo and GBK, I find it helpful to consider this perspective where all
windows are processed simultaneously / in an arbitrary order not assuming
windows are ordered at all. Then you see that GBK and stateful ParDo do not
really treat windows / watermark differently: both of them process a stream
of data for each (key, window) pair until the watermark informs them that
the stream is expired, then they GC the state associated with that (key,
window) pair.

Kenn

> This might have some consequences on how the droppable data should be
> handled in presence of (early) triggers, because triggerring is actually
> what makes time to "hop", so we might arrive to a conclusion that we might
> actually drop any data that has timestamp less than "last trigger time +
> allowed lateness". This looks appealing to me, because IMO it has strong
> internal logical consistency. Although it is possible that it would drop
> more data, which is generally undesirable, I admit that.
>
> I'm looking for explanation why the current approach was chosen instead of
> the other.
>
> Jan
> On 1/7/20 12:52 AM, Kenneth Knowles wrote:
>
> This thread has a lot in it, so I am just top-posting.
>
>  - Stateful DoFn is a windowed operation; state is per-window. When the
> window expires, any further inputs are dropped.
>  - "Late" is not synonymous with out-of-order. It doesn't really have an
> independent meaning.
> - For a GBK/Combine "late" means "not included prior to the on-time
> output", and "droppable" means "arriving after window expiry".
> - For Stateful DoFn there is no real meaning to "late" except if one
> is talking about "droppable", which still means "arriving after window
> expiry". A user may have a special timer where they flip a flag and treat
> elements after the timer differently.
>
> I think the definition of when data is droppable is very simple. We
> explicitly moved to this definition, away from the "out of order == late",
> because it is more robust and simpler to think about. Users saw lots of
> confusing behavior when we had "out of order by allowed lateness ==
> droppable" logic.
>
> Kenn
>
> On Mon, Jan 6, 2020 at 1:42 AM Jan Lukavský  wrote:
>
>> > Generally the watermark update can overtake elements, because runners
>> can explicitly ignore late data in the watermark calculation (for good
>> reason - those elements are already late, so no need to hold up the
>> watermark advancing any more).
>> This seems not to affect the decision of _not late_ vs. _late_, is it? If
>> element is late and gets ignored from watermark calculation (whatever that
>> includes in this context), than the watermark cannot move past elements
>> that were not marked as _not late_ and thus nothing can make them _late_.
>>
>> > For GBK on-time data simply means the first pane marked as on time. For
>> state+timers I don't think it makes sense for Beam to define on-time v.s.
>> late, rather I think the user can come up with their own definition
>> depending on their use case. For example, if you are buffering data into
>> BagState and setting a timer to process it, it would be logical to say that
>> any element that was buffered before the timer expired is on time, and any
>> data that showed up after the timer fired is late. This would roughly
>> correspond to what GBK does, and the answer would be very similar to simply
>> comparing against the watermark (as the timers fire when the 

Re: Custom window invariants and

2020-01-07 Thread Aaron Dixon
Dataflow. (See stacktrace)

On Tue, Jan 7, 2020 at 1:50 PM Reuven Lax  wrote:

> Which runner are you using?
>
> On Tue, Jan 7, 2020, 11:17 AM Aaron Dixon  wrote:
>
>> I get an IllegalStateException " is in more than one state
>> address window set" (stacktrace below).
>>
>> What does this mean? What invariant of custom window implementation
>> & merging am I violating?
>>
>> Thank you for any advise.
>>
>> ```
>> java.lang.IllegalStateException:
>> {[2019-12-05T01:36:48.870Z..2019-12-05T01:36:48.871Z),terminal} is in more
>> than one state address window set
>> at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState
>> (Preconditions.java:588)
>> at
>> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.checkInvariants
>> (MergingActiveWindowSet.java:334)
>> at
>> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.persist
>> (MergingActiveWindowSet.java:88)
>> at
>> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.persist
>> (ReduceFnRunner.java:380)
>> at
>> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement
>> (StreamingGroupAlsoByWindowViaWindowSetFn.java:96)
>> ...
>> ```
>>
>


Re: Custom window invariants and

2020-01-07 Thread Reuven Lax
Which runner are you using?

On Tue, Jan 7, 2020, 11:17 AM Aaron Dixon  wrote:

> I get an IllegalStateException " is in more than one state address
> window set" (stacktrace below).
>
> What does this mean? What invariant of custom window implementation
> & merging am I violating?
>
> Thank you for any advise.
>
> ```
> java.lang.IllegalStateException:
> {[2019-12-05T01:36:48.870Z..2019-12-05T01:36:48.871Z),terminal} is in more
> than one state address window set
> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState
> (Preconditions.java:588)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.checkInvariants
> (MergingActiveWindowSet.java:334)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.persist
> (MergingActiveWindowSet.java:88)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.persist
> (ReduceFnRunner.java:380)
> at
> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement
> (StreamingGroupAlsoByWindowViaWindowSetFn.java:96)
> ...
> ```
>


Custom window invariants and

2020-01-07 Thread Aaron Dixon
I get an IllegalStateException " is in more than one state address
window set" (stacktrace below).

What does this mean? What invariant of custom window implementation
& merging am I violating?

Thank you for any advise.

```
java.lang.IllegalStateException:
{[2019-12-05T01:36:48.870Z..2019-12-05T01:36:48.871Z),terminal} is in more
than one state address window set
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState
(Preconditions.java:588)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.checkInvariants
(MergingActiveWindowSet.java:334)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.persist
(MergingActiveWindowSet.java:88)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.persist
(ReduceFnRunner.java:380)
at
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement
(StreamingGroupAlsoByWindowViaWindowSetFn.java:96)
...
```


Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Ismaël Mejía
Done

On Tue, Jan 7, 2020 at 5:09 PM Tomo Suzuki  wrote:

> Hi Ismaël and Beam committer,
>
> I appreciate the help! Would you trigger precommit checks for
> https://github.com/apache/beam/pull/10508. I also want the following
> checks.
>
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>
> On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía  wrote:
>
>> Until we address this we can maybe use this thread/list to send the link
>> for the PR(s) you want to be triggered. and the command if a special one is
>> needed, so committers can help to manually do it.
>>
>> On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía  wrote:
>>
>>> Thanks for bringing this info Michał. I think the security goal of INFRA
>>> makes sense however it adds for committers the additional burden of having
>>> to manually trigger the CI. I hoped that the PR will run the basic
>>> precommit tests but it does not.
>>> We have to (1) discuss a possible workaround or (2) find a way to be
>>> notified of PRs that have not run its tests.
>>> Any ideas? This looks like a quite critical issue to address.
>>>
>>>
>>> On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 According to Daniel Gruno's comment in
 https://issues.apache.org/jira/browse/INFRA-19670 , there was a change
 in Jenkins job execution policy - non-committers can't run Jenkins
 workflows now, as it would be a security flaw in terms of arbitrary code
 execution.
 Does anyone know about this? When exactly was this changed for Beam?
 What are our options for testing our pull requests?


 On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:

> According to this comment
> ,
> it might be a Jenkins bug.
> Meanwhile, I opened an infra ticket at
> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>
> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
> wrote:
>
>> "Run precommits" seems to work sometimes:
>> https://github.com/apache/beam/pull/10455
>>
>> Has anyone opened a ticket with apache infra?
>>
>> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
>> rehman.murad...@venturedive.com> wrote:
>>
>>> +1:  https://github.com/apache/beam/pull/10506
>>>
>>> any solution yet?
>>>
>>>
>>>
>>> *Thanks & Regards*
>>>
>>>
>>> 
>>>
>>> *Rehman Murad Ali*
>>> Software Engineer
>>> Mobile: +92 3452076766 <+92%20345%202076766>
>>> Skype: rehman.muradali
>>>
>>>
>>> On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee 
>>> wrote:
>>>
 +1: https://github.com/apache/beam/pull/10051

 force-pushing again. retest this please. nothing works :(

 On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
 michal.wale...@polidea.com> wrote:

> Hi,
> I'm also affected by this - I touched my PRs opened before the
> holiday break and no jobs were triggered. Do we know what breaks
> Jenkins/fixes it when stuff like this happens?
> Happy new year,
> Michal
>
> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang 
> wrote:
>
>> Thanks Alan for checking this out! I closed PR 9903 and reopen it
>> in pull/10493 . It
>> seems new PR still did not trigger jenkins jobs.
>>
>> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold 
>> wrote:
>>
>>> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>>>
>>> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold 
>>> wrote:
>>>
 For PR 10427, I see
 https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
 For PR 9903, I see
 https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/

 Maybe the PR status is not being updated when the jobs run?


 On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang 
 wrote:

> same for https://github.com/apache/beam/pull/9903 as well
>
> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Seems like Jenkins tests are not being triggered for this PR
>> as well: https://github.com/apache/beam/pull/10427
>>
>> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki <
>> suzt...@google.com> wrote:
>>
>>> Jenkins started working. Thank you for 

Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Tomo Suzuki
Hi Ismaël and Beam committer,

I appreciate the help! Would you trigger precommit checks for
https://github.com/apache/beam/pull/10508. I also want the following checks.

Run Java PostCommit
Run Java HadoopFormatIO Performance Test
Run BigQueryIO Streaming Performance Test Java
Run Dataflow ValidatesRunner
Run Spark ValidatesRunner
Run SQL Postcommit

Regards,
Tomo

On Tue, Jan 7, 2020 at 9:27 AM Ismaël Mejía  wrote:

> Until we address this we can maybe use this thread/list to send the link
> for the PR(s) you want to be triggered. and the command if a special one is
> needed, so committers can help to manually do it.
>
> On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía  wrote:
>
>> Thanks for bringing this info Michał. I think the security goal of INFRA
>> makes sense however it adds for committers the additional burden of having
>> to manually trigger the CI. I hoped that the PR will run the basic
>> precommit tests but it does not.
>> We have to (1) discuss a possible workaround or (2) find a way to be
>> notified of PRs that have not run its tests.
>> Any ideas? This looks like a quite critical issue to address.
>>
>>
>> On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> According to Daniel Gruno's comment in
>>> https://issues.apache.org/jira/browse/INFRA-19670 , there was a change
>>> in Jenkins job execution policy - non-committers can't run Jenkins
>>> workflows now, as it would be a security flaw in terms of arbitrary code
>>> execution.
>>> Does anyone know about this? When exactly was this changed for Beam?
>>> What are our options for testing our pull requests?
>>>
>>>
>>> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:
>>>
 According to this comment
 ,
 it might be a Jenkins bug.
 Meanwhile, I opened an infra ticket at
 https://issues.apache.org/jira/browse/INFRA-19670 for Beam.

 On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
 wrote:

> "Run precommits" seems to work sometimes:
> https://github.com/apache/beam/pull/10455
>
> Has anyone opened a ticket with apache infra?
>
> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
> rehman.murad...@venturedive.com> wrote:
>
>> +1:  https://github.com/apache/beam/pull/10506
>>
>> any solution yet?
>>
>>
>>
>> *Thanks & Regards*
>>
>>
>> 
>>
>> *Rehman Murad Ali*
>> Software Engineer
>> Mobile: +92 3452076766 <+92%20345%202076766>
>> Skype: rehman.muradali
>>
>>
>> On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee 
>> wrote:
>>
>>> +1: https://github.com/apache/beam/pull/10051
>>>
>>> force-pushing again. retest this please. nothing works :(
>>>
>>> On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 Hi,
 I'm also affected by this - I touched my PRs opened before the
 holiday break and no jobs were triggered. Do we know what breaks
 Jenkins/fixes it when stuff like this happens?
 Happy new year,
 Michal

 On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang 
 wrote:

> Thanks Alan for checking this out! I closed PR 9903 and reopen it
> in pull/10493 . It
> seems new PR still did not trigger jenkins jobs.
>
> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold 
> wrote:
>
>> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>>
>> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold 
>> wrote:
>>
>>> For PR 10427, I see
>>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
>>> For PR 9903, I see
>>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>>>
>>> Maybe the PR status is not being updated when the jobs run?
>>>
>>>
>>> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang 
>>> wrote:
>>>
 same for https://github.com/apache/beam/pull/9903 as well

 On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Seems like Jenkins tests are not being triggered for this PR
> as well: https://github.com/apache/beam/pull/10427
>
> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki <
> suzt...@google.com> wrote:
>
>> Jenkins started working. Thank you for whoever fixed it.
>>
>> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang <
>> boyu...@google.com> wrote:
>> >
>> > Same here. Even the phrase trigger 

Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Ismaël Mejía
Until we address this we can maybe use this thread/list to send the link
for the PR(s) you want to be triggered. and the command if a special one is
needed, so committers can help to manually do it.

On Tue, Jan 7, 2020 at 3:00 PM Ismaël Mejía  wrote:

> Thanks for bringing this info Michał. I think the security goal of INFRA
> makes sense however it adds for committers the additional burden of having
> to manually trigger the CI. I hoped that the PR will run the basic
> precommit tests but it does not.
> We have to (1) discuss a possible workaround or (2) find a way to be
> notified of PRs that have not run its tests.
> Any ideas? This looks like a quite critical issue to address.
>
>
> On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia 
> wrote:
>
>> According to Daniel Gruno's comment in
>> https://issues.apache.org/jira/browse/INFRA-19670 , there was a change
>> in Jenkins job execution policy - non-committers can't run Jenkins
>> workflows now, as it would be a security flaw in terms of arbitrary code
>> execution.
>> Does anyone know about this? When exactly was this changed for Beam? What
>> are our options for testing our pull requests?
>>
>>
>> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:
>>
>>> According to this comment
>>> ,
>>> it might be a Jenkins bug.
>>> Meanwhile, I opened an infra ticket at
>>> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>>>
>>> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
>>> wrote:
>>>
 "Run precommits" seems to work sometimes:
 https://github.com/apache/beam/pull/10455

 Has anyone opened a ticket with apache infra?

 On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
 rehman.murad...@venturedive.com> wrote:

> +1:  https://github.com/apache/beam/pull/10506
>
> any solution yet?
>
>
>
> *Thanks & Regards*
>
>
> 
>
> *Rehman Murad Ali*
> Software Engineer
> Mobile: +92 3452076766 <+92%20345%202076766>
> Skype: rehman.muradali
>
>
> On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee  wrote:
>
>> +1: https://github.com/apache/beam/pull/10051
>>
>> force-pushing again. retest this please. nothing works :(
>>
>> On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> Hi,
>>> I'm also affected by this - I touched my PRs opened before the
>>> holiday break and no jobs were triggered. Do we know what breaks
>>> Jenkins/fixes it when stuff like this happens?
>>> Happy new year,
>>> Michal
>>>
>>> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang  wrote:
>>>
 Thanks Alan for checking this out! I closed PR 9903 and reopen it
 in pull/10493 . It
 seems new PR still did not trigger jenkins jobs.

 On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold 
 wrote:

> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>
> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold 
> wrote:
>
>> For PR 10427, I see
>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
>> For PR 9903, I see
>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>>
>> Maybe the PR status is not being updated when the jobs run?
>>
>>
>> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang 
>> wrote:
>>
>>> same for https://github.com/apache/beam/pull/9903 as well
>>>
>>> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Seems like Jenkins tests are not being triggered for this PR as
 well: https://github.com/apache/beam/pull/10427

 On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki 
 wrote:

> Jenkins started working. Thank you for whoever fixed it.
>
> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang <
> boyu...@google.com> wrote:
> >
> > Same here. Even the phrase trigger doesn't work.
> >
> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik 
> wrote:
> >>
> >> I'm also affected by this.
> >>
> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki <
> suzt...@google.com> wrote:
> >>>
> >>> Hi Beam developers,
> >>>
> >>> Does anybody know why my PR does not trigger Jenkins jobs
> today?
> >>> https://github.com/apache/beam/pull/10438
> >>>
> >>> --
> >>> Regards,
> >>> Tomo

Re: Intellij Issue in Imports

2020-01-07 Thread Ismaël Mejía
Maybe worth to contribute this info to the Beam IntelliJ docs if you have
time to Zohaib.


On Tue, Dec 31, 2019 at 7:14 AM Zohaib Baig 
wrote:

> Thank you Kirill and Maximilan for your reply. Solved the issue using
> "./gradlew clean" and then refresh the Gradle.
>
> Zohaib
>
> On Sun, Dec 29, 2019 at 6:07 PM Maximilian Michels  wrote:
>
>> I have this issue from time to time when pulling in the latest master.
>> I've found that the only way to resolve this is to run "./gradlew clean",
>> then refresh the Gradle project inside IntelliJ and run "compileTestJava"
>> for the project which had the issues.
>>
>> Cheers,
>> Max
>>
>> On December 27, 2019 8:20:05 AM GMT+01:00, Zohaib Baig <
>> zohaib.b...@venturedive.com> wrote:
>>>
>>> Hi,
>>>
>>> According to the documentation, I have setup Beam project from scratch
>>> in  IntelliJ. Seems like some files have issues in imports and were not
>>> able to build, eventually, I wasn't able to test it through IDE (Working On
>>> Windows).
>>>
>>> Is there any other configuration that I am missing?
>>>
>>> Thank you.
>>>
>>> [image: image.png]
>>>
>>>
>>>
>
> --
>
> *Muhammad Zohaib Baig*
> Senior Software Engineer
> Mobile: +92 3443060266
> Skype: mzobii.baig
>
> 
>


Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Ismaël Mejía
Thanks for bringing this info Michał. I think the security goal of INFRA
makes sense however it adds for committers the additional burden of having
to manually trigger the CI. I hoped that the PR will run the basic
precommit tests but it does not.
We have to (1) discuss a possible workaround or (2) find a way to be
notified of PRs that have not run its tests.
Any ideas? This looks like a quite critical issue to address.


On Tue, Jan 7, 2020 at 10:16 AM Michał Walenia 
wrote:

> According to Daniel Gruno's comment in
> https://issues.apache.org/jira/browse/INFRA-19670 , there was a change in
> Jenkins job execution policy - non-committers can't run Jenkins workflows
> now, as it would be a security flaw in terms of arbitrary code execution.
> Does anyone know about this? When exactly was this changed for Beam? What
> are our options for testing our pull requests?
>
>
> On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:
>
>> According to this comment
>> ,
>> it might be a Jenkins bug.
>> Meanwhile, I opened an infra ticket at
>> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>>
>> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
>> wrote:
>>
>>> "Run precommits" seems to work sometimes:
>>> https://github.com/apache/beam/pull/10455
>>>
>>> Has anyone opened a ticket with apache infra?
>>>
>>> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
>>> rehman.murad...@venturedive.com> wrote:
>>>
 +1:  https://github.com/apache/beam/pull/10506

 any solution yet?



 *Thanks & Regards*


 

 *Rehman Murad Ali*
 Software Engineer
 Mobile: +92 3452076766 <+92%20345%202076766>
 Skype: rehman.muradali


 On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee  wrote:

> +1: https://github.com/apache/beam/pull/10051
>
> force-pushing again. retest this please. nothing works :(
>
> On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
> michal.wale...@polidea.com> wrote:
>
>> Hi,
>> I'm also affected by this - I touched my PRs opened before the
>> holiday break and no jobs were triggered. Do we know what breaks
>> Jenkins/fixes it when stuff like this happens?
>> Happy new year,
>> Michal
>>
>> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang  wrote:
>>
>>> Thanks Alan for checking this out! I closed PR 9903 and reopen it in
>>> pull/10493 . It seems
>>> new PR still did not trigger jenkins jobs.
>>>
>>> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold 
>>> wrote:
>>>
 Oh, the PR 9903 run is quite old; I don't see a recent one yet.

 On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold 
 wrote:

> For PR 10427, I see
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
> For PR 9903, I see
> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>
> Maybe the PR status is not being updated when the jobs run?
>
>
> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang 
> wrote:
>
>> same for https://github.com/apache/beam/pull/9903 as well
>>
>> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Seems like Jenkins tests are not being triggered for this PR as
>>> well: https://github.com/apache/beam/pull/10427
>>>
>>> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki 
>>> wrote:
>>>
 Jenkins started working. Thank you for whoever fixed it.

 On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang <
 boyu...@google.com> wrote:
 >
 > Same here. Even the phrase trigger doesn't work.
 >
 > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik 
 wrote:
 >>
 >> I'm also affected by this.
 >>
 >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki <
 suzt...@google.com> wrote:
 >>>
 >>> Hi Beam developers,
 >>>
 >>> Does anybody know why my PR does not trigger Jenkins jobs
 today?
 >>> https://github.com/apache/beam/pull/10438
 >>>
 >>> --
 >>> Regards,
 >>> Tomo



 --
 Regards,
 Tomo

>>>
>>
>> --
>>
>> Michał Walenia
>> Polidea  | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: michal.wale...@polidea.com
>>
>> Unique Tech
>> Check out our projects! 

Re: Python IO Connector

2020-01-07 Thread Lucas Magalhães
Hi Peter.

Why don't you use this external library?
https://pypi.org/project/beam-nuggets/   They already use SQLAlchemy and is
pretty easy to use.


On Mon, Jan 6, 2020 at 10:17 PM Luke Cwik  wrote:

> Eugene, the JdbcIO output should be updated to support Beam's schema
> format which would allow for "rows" to cross the language boundaries.
>
> If the connector is easy to write and maintain then it makes sense for
> native. Maybe the Python version will have an easier time to support
> splitting and hence could overtake the Java implementation in useful
> features.
>
> On Mon, Jan 6, 2020 at 3:55 PM  wrote:
>
>> Apache Airflow went for the DB API approach as well and it seems like to
>> have worked well for them. We will likely need to add extra_requires for
>> each database engine Python package though, which adds some complexity but
>> not a lot
>>
>> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov  wrote:
>>
>> Agreed with above, it seems prudent to develop a pure-Python connector
>> for something as common as interacting with a database. It's likely easier
>> to achieve an idiomatic API, familiar to non-Beam Python SQL users, within
>> pure Python.
>>
>> Developing a cross-language connector here might be plain impossible,
>> because rows read from a database are (at least in JDBC) not encodable -
>> they require a user's callback to translate to an encodable user type, and
>> the callback can't be in Python because then you have to encode its input
>> before giving it to Python. Same holds for the write transform.
>>
>> Not sure about sqlalchemy though, maybe use plain DB-API
>> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python
>> one is more friendly than JDBC in the sense that it actually returns rows
>> as tuples of simple data types.
>>
>> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw 
>> wrote:
>>
>>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath 
>>> wrote:
>>>
 Regarding cross-language transforms, we need to add better
 documentation, but for now you'll have to go with existing examples and
 tests. For example,


 https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py

 https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py

 Note that cross-language transforms feature is currently only available
 for Flink Runner. Dataflow support is in development.

>>>
>>> I think it works with all non-Dataflow runners, with the exception of
>>> the Java and Go Direct runners. (It does work with the Python direct
>>> runner.)
>>>
>>>
 I'm fine with developing this natively for Python as well. AFAIK Java
 JDBC IO connector is not a super-complicated connector and it should be
 fine to make relatively easy to maintain and widely usable connectors
 available in multiple SDKs.

>>>
>>> Yes, a case can certainly be made for having native connectors for
>>> particular common/simple sources. (We certainly don't call cross-language
>>> to read text files for example.)
>>>
>>>

 Thanks,
 Cham


 On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik  wrote:

> +Chamikara Jayalath  +Heejong Lee
> 
>
> On Mon, Jan 6, 2020 at 10:20 AM  wrote:
>
>> How do I go about doing that? From the docs, it appears cross
>> language transforms are
>> currently undocumented.
>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>> On Jan 6, 2020, at 12:55 PM, Luke Cwik  wrote:
>>
>> What about using a cross language transform between Python and the
>> already existing Java JdbcIO transform?
>>
>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann 
>> wrote:
>>
>>> I’d like to develop the Python SDK’s SQL IO connector. I was
>>> thinking it would be easiest to use sqlalchemy to achieve maximum 
>>> database
>>> engine support, but I suppose I could also create an ABC for databases 
>>> that
>>> follow the DB API and create subclasses for each database engine that
>>> override a connect method. What are your thoughts on the best way to do
>>> this?
>>>
>>

-- 
Lucas Magalhães,
CTO

Paralelo CS - Consultoria e Serviços
Tel: +55 (11) 3090-5557
Cel: +55 (11) 99420-4667
lucas.magalh...@paralelocs.com.br

www.paralelocs.com.br


Re: Dropping late data in DirectRunner

2020-01-07 Thread Jan Lukavský

Hi Kenn,

I see that my terminology seems not to be 100% aligned with Beam's. I'll 
work on that. :-)


I agree with what you say, and by "late" I mostly meant "droppable" 
(arriving too late after watermark).


I'm definitely not proposing to get back to something like "out of 
order" == "late" or anything like that. I'm also aware that stateful 
operation is windowed operation, but the semantics of the windowing is 
different than of a GBK. The difference is how time moves in GBK and how 
moves in stateful DoFn. Throwing away some details (early triggers, late 
data triggers), the main difference is that in GBK case, time hops just 
between window boundaries, while in stateful DoFn time moves "smoothly" 
(with each watermark update). Now, this difference brings the question 
about why the definition of "droppable" data is the same for both types 
of operations, when there is a difference in how users "perceive" time. 
As the more generic operation, stateful DoFn might deserve a more 
general definition of droppable data, which should degrade naturally to 
the one of GBK in presence of "discrete time hops".


This might have some consequences on how the droppable data should be 
handled in presence of (early) triggers, because triggerring is actually 
what makes time to "hop", so we might arrive to a conclusion that we 
might actually drop any data that has timestamp less than "last trigger 
time + allowed lateness". This looks appealing to me, because IMO it has 
strong internal logical consistency. Although it is possible that it 
would drop more data, which is generally undesirable, I admit that.


I'm looking for explanation why the current approach was chosen instead 
of the other.


Jan

On 1/7/20 12:52 AM, Kenneth Knowles wrote:

This thread has a lot in it, so I am just top-posting.

 - Stateful DoFn is a windowed operation; state is per-window. When 
the window expires, any further inputs are dropped.
 - "Late" is not synonymous with out-of-order. It doesn't really have 
an independent meaning.
    - For a GBK/Combine "late" means "not included prior to the 
on-time output", and "droppable" means "arriving after window expiry".
    - For Stateful DoFn there is no real meaning to "late" except if 
one is talking about "droppable", which still means "arriving after 
window expiry". A user may have a special timer where they flip a flag 
and treat elements after the timer differently.


I think the definition of when data is droppable is very simple. We 
explicitly moved to this definition, away from the "out of order == 
late", because it is more robust and simpler to think about. Users saw 
lots of confusing behavior when we had "out of order by allowed 
lateness == droppable" logic.


Kenn

On Mon, Jan 6, 2020 at 1:42 AM Jan Lukavský > wrote:


> Generally the watermark update can overtake elements, because
runners  can explicitly ignore late data in the watermark
calculation (for good reason - those elements are already late, so
no need to hold up the watermark advancing any more).

This seems not to affect the decision of _not late_ vs. _late_, is
it? If element is late and gets ignored from watermark calculation
(whatever that includes in this context), than the watermark
cannot move past elements that were not marked as _not late_ and
thus nothing can make them _late_.

> For GBK on-time data simply means the first pane marked as on
time. For state+timers I don't think it makes sense for Beam to
define on-time v.s. late, rather I think the user can come up with
their own definition depending on their use case. For example, if
you are buffering data into BagState and setting a timer to
process it, it would be logical to say that any element that was
buffered before the timer expired is on time, and any data that
showed up after the timer fired is late. This would roughly
correspond to what GBK does, and the answer would be very similar
to simply comparing against the watermark (as the timers fire when
the watermark advances).

Yes, I'd say that stateful DoFns don't have (well defined) concept
of pane, because that is related to concept of trigger and this is
a concept of GBK (or windowed operations in general). The only
semantic meaning of window in stateful DoFn is that it "scopes" state.

This discussion might have got a little off the original question,
so I'll try to rephrase it:

Should stateful DoFn drop *all* late data, not just data that
arrive after window boundary + allowed lateness? Some arguments
why I think it should:
 * in windowed operations (GBK), it is correct to drop data on
window boundaries only, because time (as seen by user) effectively
hops only on these discrete time points
 * in stateful dofn on the other hand time move "smoothly" (yes,
with some granularity, millisecond, nanosecond, whatever and with

Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Michał Walenia
According to Daniel Gruno's comment in
https://issues.apache.org/jira/browse/INFRA-19670 , there was a change in
Jenkins job execution policy - non-committers can't run Jenkins workflows
now, as it would be a security flaw in terms of arbitrary code execution.
Does anyone know about this? When exactly was this changed for Beam? What
are our options for testing our pull requests?


On Tue, Jan 7, 2020 at 3:26 AM Kai Jiang  wrote:

> According to this comment
> ,
> it might be a Jenkins bug.
> Meanwhile, I opened an infra ticket at
> https://issues.apache.org/jira/browse/INFRA-19670 for Beam.
>
> On Mon, Jan 6, 2020 at 12:01 PM Andrew Pilloud 
> wrote:
>
>> "Run precommits" seems to work sometimes:
>> https://github.com/apache/beam/pull/10455
>>
>> Has anyone opened a ticket with apache infra?
>>
>> On Mon, Jan 6, 2020 at 4:39 AM Rehman Murad Ali <
>> rehman.murad...@venturedive.com> wrote:
>>
>>> +1:  https://github.com/apache/beam/pull/10506
>>>
>>> any solution yet?
>>>
>>>
>>>
>>> *Thanks & Regards*
>>>
>>>
>>> 
>>>
>>> *Rehman Murad Ali*
>>> Software Engineer
>>> Mobile: +92 3452076766 <+92%20345%202076766>
>>> Skype: rehman.muradali
>>>
>>>
>>> On Sat, Jan 4, 2020 at 6:10 AM Heejong Lee  wrote:
>>>
 +1: https://github.com/apache/beam/pull/10051

 force-pushing again. retest this please. nothing works :(

 On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia <
 michal.wale...@polidea.com> wrote:

> Hi,
> I'm also affected by this - I touched my PRs opened before the holiday
> break and no jobs were triggered. Do we know what breaks Jenkins/fixes it
> when stuff like this happens?
> Happy new year,
> Michal
>
> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang  wrote:
>
>> Thanks Alan for checking this out! I closed PR 9903 and reopen it in
>> pull/10493 . It seems new
>> PR still did not trigger jenkins jobs.
>>
>> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold 
>> wrote:
>>
>>> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>>>
>>> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold 
>>> wrote:
>>>
 For PR 10427, I see
 https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
 For PR 9903, I see
 https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/

 Maybe the PR status is not being updated when the jobs run?


 On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang 
 wrote:

> same for https://github.com/apache/beam/pull/9903 as well
>
> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Seems like Jenkins tests are not being triggered for this PR as
>> well: https://github.com/apache/beam/pull/10427
>>
>> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki 
>> wrote:
>>
>>> Jenkins started working. Thank you for whoever fixed it.
>>>
>>> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang 
>>> wrote:
>>> >
>>> > Same here. Even the phrase trigger doesn't work.
>>> >
>>> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik 
>>> wrote:
>>> >>
>>> >> I'm also affected by this.
>>> >>
>>> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki <
>>> suzt...@google.com> wrote:
>>> >>>
>>> >>> Hi Beam developers,
>>> >>>
>>> >>> Does anybody know why my PR does not trigger Jenkins jobs
>>> today?
>>> >>> https://github.com/apache/beam/pull/10438
>>> >>>
>>> >>> --
>>> >>> Regards,
>>> >>> Tomo
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects!