Issue with KafkaIO for list of topics

2020-02-27 Thread Maulik Soneji
*Observations:*
If we read using KafkaIO for a list of topics where one of the topics has
zero throughputs,
and KafkaIO is followed by GroupByKey stage, then:
a. No data is output from GroupByKey stage for all the topics and not just
the zero throughput topic.

If all topics have some throughput coming in, then it works fine and we get
some output from GroupByKey stage.

Is this an issue?

*Points:*
a. The output from GroupByKey is only when all topics have some throughput
b. This is a problem with KafkaIO + GroupByKey, for case where I have
FileIO + GroupByKey, this issue doesn't arise. GroupByKey outputs some data
even if there is no data for one of the files.
c. Not a runner issue, since I ran it with FlinkRunner and DataflowRunner
d. Even if lag is different for each topic on the list, we still get some
output from GroupByKey.


*Debugging:*While Debugging this issue I found that in split function of
KafkaUnboundedSource we create KafkaUnboundedSource where partition list is
one partition for each topic.

I am not sure if this is some issue with watermark, since watermark for the
topic with no throughput will not advance. But this looks like the most
likely cause to me.

*Please help me in figuring out whether this is an issue or if there is
something wrong with my pipeline.*

Attaching detailed pipeline information for more details:

*Context:*
I am currently using KafkaIO to read data from kafka for a list of topics
with a custom timestamp policy.

Below is how I am constructing KafkaIO reader:

return KafkaIO.read()
.withBootstrapServers(brokers)
.withTopics(topics)
.withKeyDeserializer(ByteArrayDeserializer.class)
.withValueDeserializer(ByteArrayDeserializer.class)
.withTimestampPolicyFactory((partition, previousWatermark) ->
new EventTimestampPolicy(godataService, previousWatermark))
.commitOffsetsInFinalize();

*Pipeline Information:
*Pipeline Consists of six steps:
a. Read From Kafka with custom timestamp policy
b. Convert KafkaRecord to Message object
c. Window based on FixedWindow of 10 minutes triggering AfterWatermark
d. PCollection to PCollection> where
Topic is Keye. GroupByKey.create() to get PCollection>f. PCollection> to
PCollection for each topicg. Write output to kafka

*Detailed Pipeline Information*
a. Read data from kafka to get KafkaRecord
Here I am using my own timestamp policy which looks like below:

public EventTimestampPolicy(MyService myService, Optional
previousWatermark) {
this.myService = myService;
this.currentWatermark =
previousWatermark.orElse(BoundedWindow.TIMESTAMP_MIN_VALUE);
}

@Override
public Instant getTimestampForRecord(PartitionContext context,
KafkaRecord record) {
Instant eventTimestamp;
try {
eventTimestamp = Deserializer.getEventTimestamp(record, myService);
} catch (InvalidProtocolBufferException e) {
statsClient.increment("io.proto.buffer.exception");
throw new RuntimeException(e);
}
this.currentWatermark = eventTimestamp;
return this.currentWatermark;
}

@Override
public Instant getWatermark(PartitionContext ctx) {
return this.currentWatermark;
}

Event timestamp is one of the fields in the kafka message. It is the time
when the event was pushed to kafka.

b. DoFn to transform KafkaRecord to Message class.The
Message class contains properties like offset, topic, partition,
offset and timestamp

c. Windowing on 10 minute fixed window triggering at
AfterWatermark.pastEndOfWindow()

d. PCollection to PCollection>
Here Key is the kafka topic.

e. GroupByKey to get PCollection>

f. PCollection> to
PCollection for each topic

g. Write output to kafka


Re: Issue in Dataflow runner for apache beam - Python SDK

2020-02-27 Thread Kyle Weaver
Hi Taranbir,

I posted an answer to the Stack Overflow question. The summary is that you
should use a CombineFn instead of a plain DoFn.

Hope that helps.

Kyle

On Thu, Feb 27, 2020 at 2:25 PM Taranbir Wraich 
wrote:

> Hi Team,
>
> I am writing to get a better understanding of an issue I am facing with
> the Apache Beam - python .
>
> I am running a python dataflow pipeline which reads in data from big query
> then, collates the data in a single pandas data frame and then processes. I
> did not have any issues while testing using the Direct runner, but ran into
> some unexpected issues while using the Dataflow Runner. I have posted a
> question regarding the same at Stack overflow. I was advised to contact you
> guys for better insight into this.
>
> I am adding the link for the question. I would really appreciate if
> someone could help me figure out this issue.
>
>
> https://stackoverflow.com/questions/60437931/python-dataflow-dofn-class-function-finish-bundle-running-multiple-times-and-giv
>
>
>
>
>
>
>
>
>
> Thanks and Regards
>
> Taranbir Wraich
>
> Cloud Analyst
>
> P: 613-366-7881 <(613)%20366-7881> ext 13X
>
> P: 888-243-4619 <(888)%20243-4619> ext 13X
>
> E: taranbir.wra...@napkyn.com
>
>
>
> Napkyn Inc. | 888.243.4619 <(888)%20243-4619> | Twitter
>  | LinkedIn
>  | napkyn.com
> 
>
> This email, including any attachments, is for the sole use of the intended
> recipient and may contain confidential information. If you are not the
> intended recipient, please immediately notify us by reply email or by
> telephone, delete this email and destroy any copies. Thank you.
>
>
>
> Napkyn Inc. | 888.243.4619 <(888)%20243-4619> | Twitter
> | LinkedIn
>  | napkyn.com
> 
>
> This email, including any attachments, is for the sole use of the intended
> recipient and may contain confidential information. If you are not the
> intended recipient, please immediately notify us by reply email or by
> telephone, delete this email and destroy any copies. Thank you.
>
>


Re: Beam Emitted Metrics Reference

2020-02-27 Thread Pablo Estrada
Hi Daniel!
I think +Alex Amato  had tried to have an inventory of
metrics at some point.
Other than that, I don't think we have a document outlining them.

Can you talk about what you plan to do with them? Do you plan to export
them somehow? Do you plan to add your own?
Best
-P.

On Thu, Feb 27, 2020 at 11:33 AM Daniel Chen  wrote:

> Hi all,
>
> I some questions about the reference to the framework metrics emitted by
> Beam. I would like to leverage these metrics to allow better monitoring of
> by Beam jobs but cannot find any references to the description or a
> complete set of emitted metrics.
>
> Do we have this information documented anywhere?
>
> Thanks,
> Daniel
>


Re: Java SplittableDoFn Watermark API

2020-02-27 Thread Kenneth Knowles
Great idea.

Are any of the methods optional or useful on their own? It seems like maybe
not? So then a single annotation to return an object that returns all the
methods might be more clear. Per Boyuan's work - WatermarkEstimatorProvider?

Kenn

On Thu, Feb 27, 2020 at 2:43 PM Luke Cwik  wrote:

> See this doc[1] and blog[2] for some context about SplittableDoFns.
>
> To support watermark reporting within the Java SDK for SplittableDoFns, we
> need a way to have SDF authors to report watermark estimates over the
> element and restriction pair that they are processing.
>
> For UnboundedSources, it was found to be a pain point to ask each SDF
> author to write their own watermark estimation which typically prevented
> re-use. Therefore we would like to have a "library" of watermark estimators
> that help SDF authors perform this estimation similar to how there is a
> "library" of restrictions and restriction trackers that SDF authors can
> use. For SDF authors where the existing library doesn't work, they can add
> additional ones that observe timestamps of elements or choose to directly
> report the watermark through a "ManualWatermarkEstimator" parameter that
> can be supplied to @ProcessElement methods.
>
> The public facing portion of the DoFn changes adds three new annotations
> for new DoFn style methods:
> GetInitialWatermarkEstimatorState: Returns the initial watermark state,
> similar to GetInitialRestriction
> GetWatermarkEstimatorStateCoder: Returns a coder compatible with watermark
> state type, similar to GetRestrictionCoder for restrictions returned by
> GetInitialRestriction.
> NewWatermarkEstimator: Returns a watermark estimator that either the
> framework invokes allowing it to observe the timestamps of output records
> or a manual watermark estimator that can be explicitly invoked to update
> the watermark.
>
> See [3] for an initial PR with the public facing additions to the core
> Java API related to SplittableDoFn.
>
> This mirrors a bunch of work that was done by Boyuan within the Pyhon SDK
> [4, 5] but in the style of new DoFn parameter/method invocation we have in
> the Java SDK.
>
> 1: https://s.apache.org/splittable-do-fn
> 2: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
> 3: https://github.com/apache/beam/pull/10992
> 4: https://github.com/apache/beam/pull/9794
> 5: https://github.com/apache/beam/pull/10375
>


Re: [PROPOSAL] Preparing for Beam 2.20.0 release

2020-02-27 Thread Rui Wang
Hi community,

Just fyi:

The 2.20.0 release branch should be cut yesterday (02/26) per schedule.
However as our python precommit was broken so I didn't cut the branch.

I am closely working with PR [1] owner to fix the python precommit. Once
the fix is in, I will cut the release branch immediately.


[1]: https://github.com/apache/beam/pull/10982


-Rui

On Thu, Feb 20, 2020 at 7:06 AM Ismaël Mejía  wrote:

> Not yet, up to last check nobody is tackling it, it is still unassigned.
> Let's
> not forget that the fix of this one requires an extra release of the grpc
> vendored dependency (the source of the issue).
>
> And yes this is a release blocker for the open source runners because
> people
> tend to package their projects with the respective runners in a jar and
> this is
> breaking at the moment.
>
> Kenn changed the priority of BEAM-9252 from Blocker to Critical to follow
> the
> conventions in [1], and from those definitions  'most critical bugs should
> block release'.
>
> [1] https://beam.apache.org/contribute/jira-priorities/
>
> On Thu, Feb 20, 2020 at 3:42 AM Ahmet Altay  wrote:
>
>> Curions, was there a resolution on BEAM-9252? Would it be a release
>> blocker?
>>
>> On Fri, Feb 14, 2020 at 12:42 AM Ismaël Mejía  wrote:
>>
>>> Thanks Rui for volunteering and for keeping the release pace!
>>>
>>> Since we are discussing the next release, I would like to highlight that
>>> nobody
>>> apparently is working on this blocker issue:
>>>
>>> BEAM-9252 Problem shading Beam pipeline with Beam 2.20.0-SNAPSHOT
>>> https://issues.apache.org/jira/browse/BEAM-9252
>>>
>>> This is a regression introduced by the move to vendored gRPC 1.26.0 and
>>> it
>>> probably will require an extra vendored gRPC release so better to give it
>>> some priority.
>>>
>>>
>>> On Wed, Feb 12, 2020 at 6:48 PM Ahmet Altay  wrote:
>>>
 +1. Thank you.

 On Tue, Feb 11, 2020 at 11:01 PM Rui Wang  wrote:

> Hi all,
>
> The next (2.20.0) release branch cut is scheduled for 02/26, according
> to the calendar
> 
> .
> I would like to volunteer myself to do this release.
> The plan is to cut the branch on that date, and cherrypick 
> release-blocking
> fixes afterwards if any.
>
> Any unresolved release blocking JIRA issues for 2.20.0 should have
> their "Fix Version/s" marked as "2.20.0".
>
> Any comments or objections?
>
>
> -Rui
>



Java SplittableDoFn Watermark API

2020-02-27 Thread Luke Cwik
See this doc[1] and blog[2] for some context about SplittableDoFns.

To support watermark reporting within the Java SDK for SplittableDoFns, we
need a way to have SDF authors to report watermark estimates over the
element and restriction pair that they are processing.

For UnboundedSources, it was found to be a pain point to ask each SDF
author to write their own watermark estimation which typically prevented
re-use. Therefore we would like to have a "library" of watermark estimators
that help SDF authors perform this estimation similar to how there is a
"library" of restrictions and restriction trackers that SDF authors can
use. For SDF authors where the existing library doesn't work, they can add
additional ones that observe timestamps of elements or choose to directly
report the watermark through a "ManualWatermarkEstimator" parameter that
can be supplied to @ProcessElement methods.

The public facing portion of the DoFn changes adds three new annotations
for new DoFn style methods:
GetInitialWatermarkEstimatorState: Returns the initial watermark state,
similar to GetInitialRestriction
GetWatermarkEstimatorStateCoder: Returns a coder compatible with watermark
state type, similar to GetRestrictionCoder for restrictions returned by
GetInitialRestriction.
NewWatermarkEstimator: Returns a watermark estimator that either the
framework invokes allowing it to observe the timestamps of output records
or a manual watermark estimator that can be explicitly invoked to update
the watermark.

See [3] for an initial PR with the public facing additions to the core Java
API related to SplittableDoFn.

This mirrors a bunch of work that was done by Boyuan within the Pyhon SDK
[4, 5] but in the style of new DoFn parameter/method invocation we have in
the Java SDK.

1: https://s.apache.org/splittable-do-fn
2: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
3: https://github.com/apache/beam/pull/10992
4: https://github.com/apache/beam/pull/9794
5: https://github.com/apache/beam/pull/10375


Issue in Dataflow runner for apache beam - Python SDK

2020-02-27 Thread Taranbir Wraich
Hi Team, I am writing to get a better understanding of an issue I am facing with the Apache Beam - python .I am running a python dataflow pipeline which reads in data from big query then, collates the data in a single pandas data frame and then processes. I did not have any issues while testing using the Direct runner, but ran into some unexpected issues while using the Dataflow Runner. I have posted a question regarding the same at Stack overflow. I was advised to contact you guys for better insight into this. I am adding the link for the question. I would really appreciate if someone could help me figure out this issue. https://stackoverflow.com/questions/60437931/python-dataflow-dofn-class-function-finish-bundle-running-multiple-times-and-givThanks and RegardsTaranbir WraichCloud AnalystP: 613-366-7881 ext 13XP: 888-243-4619 ext 13XE: taranbir.wra...@napkyn.com Napkyn Inc. | 888.243.4619 | Twitter | LinkedIn | napkyn.comThis email, including any attachments, is for the sole use of the intended recipient and may contain confidential information. If you are not the intended recipient, please immediately notify us by reply email or by telephone, delete this email and destroy any copies. Thank you. 


Napkyn Inc. | 888.243.4619 | Twitter | LinkedIn | napkyn.comThis email, including any attachments, is for the sole use of the intended recipient and may contain confidential information. If you are not the intended recipient, please immediately notify us by reply email or by telephone, delete this email and destroy any copies. Thank you.


gcr.io images clean up

2020-02-27 Thread Ismaël Mejía
We ended up looking at the gcr.io images due to some issue with tests today
(due to the prefix change in the docker images (beam_) with Michał and we
noticed that there are literally hundreds of them, do we have clean up jobs
for those?

https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam_portability?gcrImageListsize=30

Can somebody take a look so we don't end up wasting resources there too.

Regards,
Ismaël

ps. It is probably worth to check not only the images in beam_portability
but others for the same group it looks like there is a lot of unused stuff
there too.


Re: GroupIntoBatches not Working properly for Direct Runner Java

2020-02-27 Thread Kenneth Knowles
Can you share some more details? What is the expected output and what
output are you seeing?

On Thu, Feb 27, 2020 at 9:39 AM Vasu Gupta  wrote:

> Hey folks, I am using Apache beam Framework in Java with Direction Runner
> for local testing purposes. When using GroupIntoBatches with batch size 1
> it works perfectly fine i.e. the output of the transform is consistent and
> as expected. But when using with batch size > 1 the output Pcollection has
> less data than it should be.
>
> Pipeline flow:
> 1. A Transform for reading from pubsub
> 2. Transform for making a KV out of the data
> 3. A Fixed Window transform of 1 second
> 4. Applying GroupIntoBatches transform
> 5. And last, Logging the resulting Iterables.
>
> Weird thing is that it batch_size > 1 works great when running on
> DataflowRunner but not with DirectRunner. I think the issue might be with
> Timer Expiry since GroupIntoBatches uses BagState internally.
>
> Any help will be much appreciated.
>


Beam Emitted Metrics Reference

2020-02-27 Thread Daniel Chen
Hi all,

I some questions about the reference to the framework metrics emitted by
Beam. I would like to leverage these metrics to allow better monitoring of
by Beam jobs but cannot find any references to the description or a
complete set of emitted metrics.

Do we have this information documented anywhere?

Thanks,
Daniel


GroupIntoBatches not Working properly for Direct Runner Java

2020-02-27 Thread Vasu Gupta
Hey folks, I am using Apache beam Framework in Java with Direction Runner for 
local testing purposes. When using GroupIntoBatches with batch size 1 it works 
perfectly fine i.e. the output of the transform is consistent and as expected. 
But when using with batch size > 1 the output Pcollection has less data than it 
should be.

Pipeline flow:
1. A Transform for reading from pubsub
2. Transform for making a KV out of the data
3. A Fixed Window transform of 1 second
4. Applying GroupIntoBatches transform
5. And last, Logging the resulting Iterables.

Weird thing is that it batch_size > 1 works great when running on 
DataflowRunner but not with DirectRunner. I think the issue might be with Timer 
Expiry since GroupIntoBatches uses BagState internally.

Any help will be much appreciated.


[BEAM-8078] Code review request

2020-02-27 Thread Алексей Высотин
Hi Team,

Could someone please take a look at the PR for [BEAM-8078]
streaming_wordcount_debugging.py is missing a test
?

Thank you in advance.

Best Regards,

Alex Vysotin.


Re: Docker images are migrated to Apache org.

2020-02-27 Thread Kyle Weaver
Hi Ismael, thanks for bringing that up. Can you please start a new thread
for the image cleanup issue, including details on which images you mean? I
want to make sure it has sufficient visibility.

On Thu, Feb 27, 2020 at 7:23 AM Ismaël Mejía  wrote:

> Independent to this thread, but slightly related.
>
> I ended up looking at the gcr.io images we use for testing and noticed
> that there are literally hundreds of them, do we have clean up jobs for
> those?
> Can anyone take a look so we don't end up wasting resources there too.
>
>
>
> On Thu, Feb 27, 2020 at 2:30 PM Ismaël Mejía  wrote:
>
>> Great news, Thanks Hannah for all your attention to this issue, let's not
>> forget to thank the INFRA guys for their help.
>>
>> On Tue, Feb 25, 2020 at 2:03 AM Hannah Jiang 
>> wrote:
>>
>>> Thanks everyone for reporting issues and potential problems etc.
>>> Please feel free to let me know if you have any questions or see
>>> additional issues!
>>>
>>>
>>> On Mon, Feb 24, 2020 at 4:17 PM Pablo Estrada 
>>> wrote:
>>>
 Thanks Hannah! This is great : )
 -P.

 On Mon, Feb 24, 2020 at 12:47 PM Robert Burke 
 wrote:

> It was only merged in an hour ago. That explains why I still saw it as
> broken before lunch. Thanks again!
>
> On Mon, Feb 24, 2020, 12:42 PM Robert Burke 
> wrote:
>
>> NVM. I had stale pages for some reason. Hannah fixed them already. :D
>>
>> On Mon, Feb 24, 2020, 12:39 PM Robert Burke 
>> wrote:
>>
>>> Looks like the change broke the go post commits. I've filed
>>> BEAM-9374 for it. I think I know how to fix it though.
>>>
>>> On Mon, Feb 24, 2020, 12:18 PM Kyle Weaver 
>>> wrote:
>>>
 I wonder if searching for "apache beam" (instead of "apache/beam")
 will ever work on Docker hub once the new images start getting more
 downloads. Currently no relevant results are included. Unfortunately 
 this
 might not be something we can control, as it seems Docker hub's search 
 is
 not as sophisticated as I would like.

 Anyway, this is still a great improvement. Thanks Hannah for making
 this happen.

 On Fri, Feb 21, 2020 at 11:58 AM Ahmet Altay 
 wrote:

> Thank you, Hannah! This is great.
>
> On Fri, Feb 21, 2020 at 11:24 AM Hannah Jiang <
> hannahji...@google.com> wrote:
>
>> Hello team
>>
>> Docker SDK images (Python, Java, Go) and Flink job server images
>> are migrated to Apache org[1].
>> I confirmed digests of all the images of the two repos are
>> exactly the same.
>> In addition, I updated readme pages of the new repos. The readma
>> pages match to the ones at github.
>>
>> New images will be deployed to Apache org from v2.20.0. I added
>> notices to the original repos[2] about the changes.
>> Spark job server images will be added from v2.20 as well, thanks
>> for @Kyle Weaver  to make it happen[3].
>>
>> Thanks,
>> Hannah
>>
>> 1. https://hub.docker.com/search?q=apache%2Fbeam=image
>> 2. https://hub.docker.com/search?q=apachebeam/=image
>> 3. https://github.com/apache/beam/pull/10921
>>
>


Re: Jenkins problems: javaPreCommitPortabilityApiJava11 and No Space left

2020-02-27 Thread Alan Myrvold
The disk on jenkins5 was 100% full. I deleted some /tmp files owned by
jenkins > 3d old, bringing it down to 90%. There a are still quite a few
large files in the jenkins workspaces. I can look into ways to keep this
cleaned up.

On Wed, Feb 26, 2020 at 5:38 PM Kyle Weaver  wrote:

> It looks like apache-beam-jenkins-5 is back to normal now:
> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>
> > Java PreCommit also seems to be failing due to a couple of errors in
> 'BigQureyIO" and "SpannerIO".
>
> Judging by the log output, the job failed because of a timeout, possibly
> (probably?) related to Rabbit mq test:
>
> *10:52:08* >* Task :sdks:java:testing:test-utils:buildDependents**12:11:39* 
> Build timed out (after 120 minutes). Marking the build as aborted.*12:11:39* 
> Build was aborted*12:11:39* Recording test results*12:11:41* >* Task 
> :sdks:java:io:rabbitmq:test* FAILED*12:11:41* *12:11:41* FAILURE: Build 
> failed with an exception.
>
>
>
> On Wed, Feb 26, 2020 at 5:22 PM Pulasthi Supun Wickramasinghe <
> pulasthi...@gmail.com> wrote:
>
>> I got the same build issues for my pull request as well. Java PreCommit
>> also seems to be failing due to a couple of errors in 'BigQureyIO" and
>> "SpannerIO".
>>
>> [1]
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/10160/java/fixed/
>>
>> Best Regards,
>> Pulasthi
>>
>> On Wed, Feb 26, 2020 at 1:27 PM Alex Van Boxel  wrote:
>>
>>> I see 2 problems with Jenkins popping up. The root beam project is
>>> missing:
>>>
>>>
>>> JavaPortabilityApiJava11 ("Run JavaPortabilityApiJava11 PreCommit")  is
>>> running, but:
>>>
>>> Task 'javaPreCommitPortabilityApiJava11' not found in root project 'beam'
>>>
>>> And we have no space left on device:
>>>
>>> Example:
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/9287/console
>>>
>>>
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>
>>
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035 <(224)%20386-9035>
>>
>


Re: Docker images are migrated to Apache org.

2020-02-27 Thread Ismaël Mejía
Independent to this thread, but slightly related.

I ended up looking at the gcr.io images we use for testing and noticed that
there are literally hundreds of them, do we have clean up jobs for those?
Can anyone take a look so we don't end up wasting resources there too.



On Thu, Feb 27, 2020 at 2:30 PM Ismaël Mejía  wrote:

> Great news, Thanks Hannah for all your attention to this issue, let's not
> forget to thank the INFRA guys for their help.
>
> On Tue, Feb 25, 2020 at 2:03 AM Hannah Jiang 
> wrote:
>
>> Thanks everyone for reporting issues and potential problems etc.
>> Please feel free to let me know if you have any questions or see
>> additional issues!
>>
>>
>> On Mon, Feb 24, 2020 at 4:17 PM Pablo Estrada  wrote:
>>
>>> Thanks Hannah! This is great : )
>>> -P.
>>>
>>> On Mon, Feb 24, 2020 at 12:47 PM Robert Burke 
>>> wrote:
>>>
 It was only merged in an hour ago. That explains why I still saw it as
 broken before lunch. Thanks again!

 On Mon, Feb 24, 2020, 12:42 PM Robert Burke  wrote:

> NVM. I had stale pages for some reason. Hannah fixed them already. :D
>
> On Mon, Feb 24, 2020, 12:39 PM Robert Burke 
> wrote:
>
>> Looks like the change broke the go post commits. I've filed BEAM-9374
>> for it. I think I know how to fix it though.
>>
>> On Mon, Feb 24, 2020, 12:18 PM Kyle Weaver 
>> wrote:
>>
>>> I wonder if searching for "apache beam" (instead of "apache/beam")
>>> will ever work on Docker hub once the new images start getting more
>>> downloads. Currently no relevant results are included. Unfortunately 
>>> this
>>> might not be something we can control, as it seems Docker hub's search 
>>> is
>>> not as sophisticated as I would like.
>>>
>>> Anyway, this is still a great improvement. Thanks Hannah for making
>>> this happen.
>>>
>>> On Fri, Feb 21, 2020 at 11:58 AM Ahmet Altay 
>>> wrote:
>>>
 Thank you, Hannah! This is great.

 On Fri, Feb 21, 2020 at 11:24 AM Hannah Jiang <
 hannahji...@google.com> wrote:

> Hello team
>
> Docker SDK images (Python, Java, Go) and Flink job server images
> are migrated to Apache org[1].
> I confirmed digests of all the images of the two repos are
> exactly the same.
> In addition, I updated readme pages of the new repos. The readma
> pages match to the ones at github.
>
> New images will be deployed to Apache org from v2.20.0. I added
> notices to the original repos[2] about the changes.
> Spark job server images will be added from v2.20 as well, thanks
> for @Kyle Weaver  to make it happen[3].
>
> Thanks,
> Hannah
>
> 1. https://hub.docker.com/search?q=apache%2Fbeam=image
> 2. https://hub.docker.com/search?q=apachebeam/=image
> 3. https://github.com/apache/beam/pull/10921
>



Re: Docker images are migrated to Apache org.

2020-02-27 Thread Ismaël Mejía
Great news, Thanks Hannah for all your attention to this issue, let's not
forget to thank the INFRA guys for their help.

On Tue, Feb 25, 2020 at 2:03 AM Hannah Jiang  wrote:

> Thanks everyone for reporting issues and potential problems etc.
> Please feel free to let me know if you have any questions or see
> additional issues!
>
>
> On Mon, Feb 24, 2020 at 4:17 PM Pablo Estrada  wrote:
>
>> Thanks Hannah! This is great : )
>> -P.
>>
>> On Mon, Feb 24, 2020 at 12:47 PM Robert Burke  wrote:
>>
>>> It was only merged in an hour ago. That explains why I still saw it as
>>> broken before lunch. Thanks again!
>>>
>>> On Mon, Feb 24, 2020, 12:42 PM Robert Burke  wrote:
>>>
 NVM. I had stale pages for some reason. Hannah fixed them already. :D

 On Mon, Feb 24, 2020, 12:39 PM Robert Burke  wrote:

> Looks like the change broke the go post commits. I've filed BEAM-9374
> for it. I think I know how to fix it though.
>
> On Mon, Feb 24, 2020, 12:18 PM Kyle Weaver 
> wrote:
>
>> I wonder if searching for "apache beam" (instead of "apache/beam")
>> will ever work on Docker hub once the new images start getting more
>> downloads. Currently no relevant results are included. Unfortunately this
>> might not be something we can control, as it seems Docker hub's search is
>> not as sophisticated as I would like.
>>
>> Anyway, this is still a great improvement. Thanks Hannah for making
>> this happen.
>>
>> On Fri, Feb 21, 2020 at 11:58 AM Ahmet Altay 
>> wrote:
>>
>>> Thank you, Hannah! This is great.
>>>
>>> On Fri, Feb 21, 2020 at 11:24 AM Hannah Jiang <
>>> hannahji...@google.com> wrote:
>>>
 Hello team

 Docker SDK images (Python, Java, Go) and Flink job server images
 are migrated to Apache org[1].
 I confirmed digests of all the images of the two repos are
 exactly the same.
 In addition, I updated readme pages of the new repos. The readma
 pages match to the ones at github.

 New images will be deployed to Apache org from v2.20.0. I added
 notices to the original repos[2] about the changes.
 Spark job server images will be added from v2.20 as well, thanks
 for @Kyle Weaver  to make it happen[3].

 Thanks,
 Hannah

 1. https://hub.docker.com/search?q=apache%2Fbeam=image
 2. https://hub.docker.com/search?q=apachebeam/=image
 3. https://github.com/apache/beam/pull/10921

>>>


Re: [ANNOUNCE] New committer: Jincheng Sun

2020-02-27 Thread jincheng sun
Thanks for all of your warm welcomes. It is really a pleasure working with
you and the community!

Valentyn Tymofieiev  于2020年2月26日周三 上午10:57写道:

> Congratulations, Jincheng!
>
> On Tue, Feb 25, 2020 at 5:02 PM Chamikara Jayalath 
> wrote:
>
>> Congrats Jincheng!
>>
>> On Tue, Feb 25, 2020 at 10:14 AM Rui Wang  wrote:
>>
>>> Congrats!
>>>
>>>
>>> -Rui
>>>
>>> On Mon, Feb 24, 2020 at 11:24 PM Austin Bennett <
>>> whatwouldausti...@gmail.com> wrote:
>>>
 Congrats!

 On Mon, Feb 24, 2020, 11:22 PM Alex Van Boxel  wrote:

> Congrats!
>
>  _/
> _/ Alex Van Boxel
>
>
> On Mon, Feb 24, 2020 at 8:13 PM Kyle Weaver 
> wrote:
>
>> Thanks Jincheng for all your work on Beam and Flink integration.
>>
>> On Mon, Feb 24, 2020 at 11:02 AM Yichi Zhang 
>> wrote:
>>
>>> Congrats, Jincheng!
>>>
>>> On Mon, Feb 24, 2020 at 9:45 AM Ahmet Altay 
>>> wrote:
>>>
 Congratulations!

 On Mon, Feb 24, 2020 at 6:48 AM Thomas Weise 
 wrote:

> Congratulations!
>
>
> On Mon, Feb 24, 2020 at 6:45 AM Ismaël Mejía 
> wrote:
>
>> Congrats Jincheng!
>>
>> On Mon, Feb 24, 2020 at 1:39 PM Gleb Kanterov 
>> wrote:
>>
>>> Congratulations!
>>>
>>> On Mon, Feb 24, 2020 at 1:18 PM Hequn Cheng 
>>> wrote:
>>>
 Congratulations Jincheng, well deserved!

 Best,
 Hequn

 On Mon, Feb 24, 2020 at 7:21 PM Reza Rokni 
 wrote:

> Congrats!
>
> On Mon, Feb 24, 2020 at 7:15 PM Jan Lukavský 
> wrote:
>
>> Congrats Jincheng!
>>
>>   Jan
>>
>> On 2/24/20 11:55 AM, Maximilian Michels wrote:
>> > Hi everyone,
>> >
>> > Please join me and the rest of the Beam PMC in welcoming a
>> new
>> > committer: Jincheng Sun 
>> >
>> > Jincheng has worked on generalizing parts of Beam for
>> Flink's Python
>> > API. He has also picked up other issues, like fixing
>> documentation,
>> > implementing missing features, or cleaning up code [1].
>> >
>> > In consideration of his contributions, the Beam PMC trusts
>> him with
>> > the responsibilities of a Beam committer [2].
>> >
>> > Thank you for your contributions Jincheng!
>> >
>> > -Max, on behalf of the Apache Beam PMC
>> >
>> > [1]
>> >
>> https://jira.apache.org/jira/browse/BEAM-9299?jql=project%20%3D%20BEAM%20AND%20assignee%20in%20(sunjincheng121)
>> > [2]
>> >
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>