Re: Question about transformOverride

2021-04-20 Thread Boyuan Zhang
+1 to use pipeline options.

 Alternatively, you can also change your KafkaReadTransform to perform
different expansion(override expand()) based on your pipeline options.

On Tue, Apr 20, 2021 at 9:51 PM Reuven Lax  wrote:

> It would be simpler to create a custom pipeline option, and swap out the
> read transform in your code. For example
>
> PCollection pc;
> if (options.getLocalTest()) {
>   pc = pipeline.apply(new ReadFromLocalFile());
> } else {
>   pc = pipeline.apply(new KafkaReadTrasnform());
> }
>
> pc.apply(/* rest of pipeline */);
>
> On Tue, Apr 20, 2021 at 9:41 PM Yuhong Cheng 
> wrote:
>
>> We want to support transform override when doing tests locally.  For
>> example, in real pipelines, we read from Kafka, but when doing tests
>> locally, we want to read from a local file to help test whether the
>> pipeline works fine. So we want to override the Kafka read transform
>> directly instead of writing the pipeline twice.
>>
>> code example:
>>
>> public Pipeline createPipeline(Pipeline pipeline) {
>>
>>pipeline.apply(new KafkaReadTransform()).apply(// other functions..);
>> }
>> In test, we will use the same createPipeline() function to create a
>> pipeline but meanwhile we want to override KafkaReadTransform with another
>> transform to avoid reading from Kafka.
>>
>> Thanks,
>> Yuhong
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath 
>> wrote:
>>
>>> In general, TransformOverrides are expected to be per-runner
>>> implementation details and are not expected to be directly used by
>>> end-users.
>>> What is the exact use-case you are trying to achieve ? Are you running
>>> into a missing feature of an existing transform ?
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng 
>>> wrote:
>>>
 Hi Beam,
 We have a use case when creating a pipeline, we want to replace the IO
 read/write transform when testing using `pipeline.replaceAll(overrides)`.
 However, we met some problems when doing tests:
 1. Are there any ways we can avoid calling expand() of a transform when
 it is going to be replaced?  The reason we want to override a transform is
 because that the expand() of this transform is somehow not available in
 some situations. It seems not reasonable enough to call the expand() of the
 originalTransform and then call the expand() of the overrideTransform
 again?
 2. When trying to implement `PTransformOverrideFactory`, we realize
 that the inputs are `TaggedPValue`, which can only make {Tuple,
 PCollection} pairs. Then if we want to override a write transform whose
 output type is `PDone`, what's the best way to implement this factory?

 Thanks in advance for answers! This is quite important to our pipelines.

 Thanks,
 Yuhong

>>>


Re: Question about transformOverride

2021-04-20 Thread Reuven Lax
It would be simpler to create a custom pipeline option, and swap out the
read transform in your code. For example

PCollection pc;
if (options.getLocalTest()) {
  pc = pipeline.apply(new ReadFromLocalFile());
} else {
  pc = pipeline.apply(new KafkaReadTrasnform());
}

pc.apply(/* rest of pipeline */);

On Tue, Apr 20, 2021 at 9:41 PM Yuhong Cheng 
wrote:

> We want to support transform override when doing tests locally.  For
> example, in real pipelines, we read from Kafka, but when doing tests
> locally, we want to read from a local file to help test whether the
> pipeline works fine. So we want to override the Kafka read transform
> directly instead of writing the pipeline twice.
>
> code example:
>
> public Pipeline createPipeline(Pipeline pipeline) {
>
>pipeline.apply(new KafkaReadTransform()).apply(// other functions..);
> }
> In test, we will use the same createPipeline() function to create a
> pipeline but meanwhile we want to override KafkaReadTransform with another
> transform to avoid reading from Kafka.
>
> Thanks,
> Yuhong
>
>
>
>
>
>
>
>
>
> On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath 
> wrote:
>
>> In general, TransformOverrides are expected to be per-runner
>> implementation details and are not expected to be directly used by
>> end-users.
>> What is the exact use-case you are trying to achieve ? Are you running
>> into a missing feature of an existing transform ?
>>
>> Thanks,
>> Cham
>>
>> On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng 
>> wrote:
>>
>>> Hi Beam,
>>> We have a use case when creating a pipeline, we want to replace the IO
>>> read/write transform when testing using `pipeline.replaceAll(overrides)`.
>>> However, we met some problems when doing tests:
>>> 1. Are there any ways we can avoid calling expand() of a transform when
>>> it is going to be replaced?  The reason we want to override a transform is
>>> because that the expand() of this transform is somehow not available in
>>> some situations. It seems not reasonable enough to call the expand() of the
>>> originalTransform and then call the expand() of the overrideTransform
>>> again?
>>> 2. When trying to implement `PTransformOverrideFactory`, we realize
>>> that the inputs are `TaggedPValue`, which can only make {Tuple,
>>> PCollection} pairs. Then if we want to override a write transform whose
>>> output type is `PDone`, what's the best way to implement this factory?
>>>
>>> Thanks in advance for answers! This is quite important to our pipelines.
>>>
>>> Thanks,
>>> Yuhong
>>>
>>


Re: Question about transformOverride

2021-04-20 Thread Chamikara Jayalath
I don't know if TransformOverrides is the correct way to do this. Will one
of the following options work ?

(1) Update the pipeline to have a "test" mode where you would read from a
file-based source instead of Kafka.
(2) Run a local Kafka instance for the test pipeline instead of using the
instance with production data.

Thanks,
Cham

On Tue, Apr 20, 2021 at 9:41 PM Yuhong Cheng 
wrote:

> We want to support transform override when doing tests locally.  For
> example, in real pipelines, we read from Kafka, but when doing tests
> locally, we want to read from a local file to help test whether the
> pipeline works fine. So we want to override the Kafka read transform
> directly instead of writing the pipeline twice.
>
> code example:
>
> public Pipeline createPipeline(Pipeline pipeline) {
>
>pipeline.apply(new KafkaReadTransform()).apply(// other functions..);
> }
> In test, we will use the same createPipeline() function to create a
> pipeline but meanwhile we want to override KafkaReadTransform with another
> transform to avoid reading from Kafka.
>
> Thanks,
> Yuhong
>
>
>
>
>
>
>
>
>
> On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath 
> wrote:
>
>> In general, TransformOverrides are expected to be per-runner
>> implementation details and are not expected to be directly used by
>> end-users.
>> What is the exact use-case you are trying to achieve ? Are you running
>> into a missing feature of an existing transform ?
>>
>> Thanks,
>> Cham
>>
>> On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng 
>> wrote:
>>
>>> Hi Beam,
>>> We have a use case when creating a pipeline, we want to replace the IO
>>> read/write transform when testing using `pipeline.replaceAll(overrides)`.
>>> However, we met some problems when doing tests:
>>> 1. Are there any ways we can avoid calling expand() of a transform when
>>> it is going to be replaced?  The reason we want to override a transform is
>>> because that the expand() of this transform is somehow not available in
>>> some situations. It seems not reasonable enough to call the expand() of the
>>> originalTransform and then call the expand() of the overrideTransform
>>> again?
>>> 2. When trying to implement `PTransformOverrideFactory`, we realize
>>> that the inputs are `TaggedPValue`, which can only make {Tuple,
>>> PCollection} pairs. Then if we want to override a write transform whose
>>> output type is `PDone`, what's the best way to implement this factory?
>>>
>>> Thanks in advance for answers! This is quite important to our pipelines.
>>>
>>> Thanks,
>>> Yuhong
>>>
>>


Re: Question about transformOverride

2021-04-20 Thread Yuhong Cheng
We want to support transform override when doing tests locally.  For
example, in real pipelines, we read from Kafka, but when doing tests
locally, we want to read from a local file to help test whether the
pipeline works fine. So we want to override the Kafka read transform
directly instead of writing the pipeline twice.

code example:

public Pipeline createPipeline(Pipeline pipeline) {

   pipeline.apply(new KafkaReadTransform()).apply(// other functions..);
}
In test, we will use the same createPipeline() function to create a
pipeline but meanwhile we want to override KafkaReadTransform with another
transform to avoid reading from Kafka.

Thanks,
Yuhong









On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath 
wrote:

> In general, TransformOverrides are expected to be per-runner
> implementation details and are not expected to be directly used by
> end-users.
> What is the exact use-case you are trying to achieve ? Are you running
> into a missing feature of an existing transform ?
>
> Thanks,
> Cham
>
> On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng 
> wrote:
>
>> Hi Beam,
>> We have a use case when creating a pipeline, we want to replace the IO
>> read/write transform when testing using `pipeline.replaceAll(overrides)`.
>> However, we met some problems when doing tests:
>> 1. Are there any ways we can avoid calling expand() of a transform when
>> it is going to be replaced?  The reason we want to override a transform is
>> because that the expand() of this transform is somehow not available in
>> some situations. It seems not reasonable enough to call the expand() of the
>> originalTransform and then call the expand() of the overrideTransform
>> again?
>> 2. When trying to implement `PTransformOverrideFactory`, we realize that
>> the inputs are `TaggedPValue`, which can only make {Tuple, PCollection}
>> pairs. Then if we want to override a write transform whose output type is
>> `PDone`, what's the best way to implement this factory?
>>
>> Thanks in advance for answers! This is quite important to our pipelines.
>>
>> Thanks,
>> Yuhong
>>
>


Re: Question about transformOverride

2021-04-20 Thread Chamikara Jayalath
In general, TransformOverrides are expected to be per-runner implementation
details and are not expected to be directly used by end-users.
What is the exact use-case you are trying to achieve ? Are you running into
a missing feature of an existing transform ?

Thanks,
Cham

On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng 
wrote:

> Hi Beam,
> We have a use case when creating a pipeline, we want to replace the IO
> read/write transform when testing using `pipeline.replaceAll(overrides)`.
> However, we met some problems when doing tests:
> 1. Are there any ways we can avoid calling expand() of a transform when it
> is going to be replaced?  The reason we want to override a transform is
> because that the expand() of this transform is somehow not available in
> some situations. It seems not reasonable enough to call the expand() of the
> originalTransform and then call the expand() of the overrideTransform
> again?
> 2. When trying to implement `PTransformOverrideFactory`, we realize that
> the inputs are `TaggedPValue`, which can only make {Tuple, PCollection}
> pairs. Then if we want to override a write transform whose output type is
> `PDone`, what's the best way to implement this factory?
>
> Thanks in advance for answers! This is quite important to our pipelines.
>
> Thanks,
> Yuhong
>


Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-20 Thread Ahmet Altay
+1 and thank you!

On Tue, Apr 20, 2021 at 4:55 PM Heejong Lee  wrote:

> Hi All,
>
> Beam 2.30.0 release is scheduled to be cut on April 21 according to the
> release calendar [1]
>
> I'd like to volunteer myself to be the release manager for this release.
> I plan on cutting the release branch on the scheduled date.
>
> Any comments or objections ?
>
> Thanks,
> Heejong
>
> [1]
> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com&ctz=America/Los_Angeles
>


Question about transformOverride

2021-04-20 Thread Yuhong Cheng
Hi Beam,
We have a use case when creating a pipeline, we want to replace the IO
read/write transform when testing using `pipeline.replaceAll(overrides)`.
However, we met some problems when doing tests:
1. Are there any ways we can avoid calling expand() of a transform when it
is going to be replaced?  The reason we want to override a transform is
because that the expand() of this transform is somehow not available in
some situations. It seems not reasonable enough to call the expand() of the
originalTransform and then call the expand() of the overrideTransform again?
2. When trying to implement `PTransformOverrideFactory`, we realize that
the inputs are `TaggedPValue`, which can only make {Tuple, PCollection}
pairs. Then if we want to override a write transform whose output type is
`PDone`, what's the best way to implement this factory?

Thanks in advance for answers! This is quite important to our pipelines.

Thanks,
Yuhong


[PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-20 Thread Heejong Lee
Hi All,

Beam 2.30.0 release is scheduled to be cut on April 21 according to the
release calendar [1]

I'd like to volunteer myself to be the release manager for this release. I
plan on cutting the release branch on the scheduled date.

Any comments or objections ?

Thanks,
Heejong

[1]
https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com&ctz=America/Los_Angeles


Flaky test issue report

2021-04-20 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12163: Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK 
harness startup (https://issues.apache.org/jira/browse/BEAM-12163)
BEAM-12061: beam_PostCommit_SQL failing on 
KafkaTableProviderIT.testFakeNested 
(https://issues.apache.org/jira/browse/BEAM-12061)
BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11733: [beam_PostCommit_Java] [testFhirIO_Import|export] flaky 
(https://issues.apache.org/jira/browse/BEAM-11733)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11662: elasticsearch tests failing 
(https://issues.apache.org/jira/browse/BEAM-11662)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on 
direct runner. (https://issues.apache.org/jira/browse/BEAM-11541)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-11493: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyAndWindows
 (https://issues.apache.org/jira/browse/BEAM-11493)
BEAM-11492: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows
 (https://issues.apache.org/jira/browse/BEAM-11492)
BEAM-11491: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMultipleWindows
 (https://issues.apache.org/jira/browse/BEAM-11491)
BEAM-11490: Spark test failure: 
org.apache.beam.sdk.transforms.ReifyTimestampsTest.inValuesSucceeds 
(https://issues.apache.org/jira/browse/BEAM-11490)
BEAM-11489: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-11489)
BEAM-11488: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics
 (https://issues.apache.org/jira/browse/BEAM-11488)
BEAM-11487: Spark test failure: 
org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsShouldApplyTimestamps
 (https://issues.apache.org/jira/browse/BEAM-11487)
BEAM-11486: Spark test failure: 
org.apache.beam.sdk.testing.PAssertTest.testSerializablePredicate 
(https://issues.apache.org/jira/browse/BEAM-11486)
BEAM-11485: Spark test failure: 
org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineNullValues 
(https://issues.apache.org/jira/browse/BEAM-11485)
BEAM-11484: Spark test failure: 
org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics 
(https://issues.apache.org/jira/browse/BEAM-11484)
BEAM-11483: Spark portable streaming PostCommit Test Improvements 
(https://issues.apache.org/jira/browse/BEAM-11483)
BEAM-10995: Java + Universal Local Runner: 
WindowingTest.testWindowPreservation fails 
(https://issues.apache.org/jira/browse/BEAM-10995)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM 
(https://issues.apache.org/jira/browse/BEAM-10899)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10763: Spotless flake (NullPointerException) 
(https://issues.apache.org/jira/browse/BEAM-10763)
BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types 
(https://issues.apache.org/jira/browse/BEAM-10590)
BEAM-10519: 
MultipleInputsAndOutputTests.testParDoWithSi

P1 issues report

2021-04-20 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12195: Flink Runner 1.11 uses old Scala-Version 
(https://issues.apache.org/jira/browse/BEAM-12195)
BEAM-11959: Python Beam SDK Harness hangs when installing pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11906: No trigger early repeatedly for session windows 
(https://issues.apache.org/jira/browse/BEAM-11906)
BEAM-11875: XmlIO.Read does not handle XML encoding per spec 
(https://issues.apache.org/jira/browse/BEAM-11875)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner 
(https://issues.apache.org/jira/browse/BEAM-11576)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 
(https://issues.apache.org/jira/browse/BEAM-11227)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine 
results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617)
BEAM-10573: CSV files are loaded several times if they are too large 
(https://issues.apache.org/jira/browse/BEAM-10573)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9293: Python direct runner doesn't emit empty pane when it should 
(https://issues.apache.org/jira/browse/BEAM-9293)
BEAM-8986: SortValues may not work correct for numerical types 
(https://issues.apache.org/jira/browse/BEAM-8986)
BEAM-8985: SortValues should fail if SecondaryKey coder is not 
deterministic (https://issues.apache.org/jira/browse/BEAM-8985)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails to get created 
(https://issues.apache.org/jira/browse/BEAM-7195)
BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or 
above (https://issues.apache.org/jira/browse/BEAM-6839)
BEAM-6466: KafkaIO doesn't commit offsets while being used as bounded 
source (https://issues.apache.org/jira/browse/BEAM-6466)


Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-20 Thread Kenneth Knowles
On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw  wrote:

> The artifacts and signatures look good to me. +1 (binding)
>
> (The release branch still has the .dev name, maybe you didn't push?
> https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
> )
>

Good point. I'll highlight that I finally implemented the branching changes
from
https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E

The new guide with diagram is here:
https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc

TL;DR:
 - the release branch continues to be dev/SNAPSHOT for 2.29.0 while the
main branch is now dev/SNAPSHOT for 2.30.0
 - the RC tag v2.29.0-RC1 no longer lies on the release branch. It is a
single tagged commit that removes the dev/SNAPSHOT suffix

Kenn


> On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles  wrote:
>
>> Please take another look.
>>
>>  - I re-ran the RC creation script so the source release and wheels are
>> new and built from the RC tag. I confirmed the source zip and wheels have
>> version 2.29.0 (not .dev or -SNAPSHOT).
>>  - I fixed and rebuilt Dataflow worker container images from exactly the
>> RC commit, added dataclasses, with internal changes to get the version to
>> match.
>>  - I confirmed that the staged jars already have version 2.29.0 (not
>> -SNAPSHOT).
>>  - I confirmed with `diff -r -q` that the source tarball matches the RC
>> tag (minus the .git* files and directories and gradlew)
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles  wrote:
>>
>>> At this point, the release train has just about come around to 2.30.0
>>> which will pick up that change. I don't think it makes sense to cherry-pick
>>> anything more into 2.29.0 unless it is nonfunctional. As it is, I think we
>>> have a good commit and just need to build the expected artifacts. Since it
>>> isn't all the artifacts, I was planning on just overwriting the RC1
>>> artifacts in question and re-verify. I could also roll a new RC2 from the
>>> same commit fairly easily.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:
>>>
 Any chance we could include https://github.com/apache/beam/pull/14548?

 On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles 
 wrote:

> To clarify: I am running and fixing the release scripts on the
> `master` branch. They work from fresh clones of the RC tag so this should
> work in most cases. The exception is the GitHub Actions configuration,
> which I cherrypicked
> to the release branch.
>
> Kenn
>
> On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles 
> wrote:
>
>> OK it sounds like I need to re-roll the artifacts in question. I
>> don't think anything raised here indicates a problem with the tagged
>> commit, but with the state of the release scripts at the time I built the
>> earlier artifacts.
>>
>> On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw 
>> wrote:
>>
>>> It looks like the wheels are also versioned "2.29.0.dev".
>>>
>>> Not sure if it's important, but the source tarball also seems to
>>> contain some release script changes that are not reflected in the github
>>> branch.
>>>
>>> On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
>>> wrote:
>>>
 Thanks for the details, Valentyn & Cham. I will fix the Dataflow
 worker containers then update this thread.

 Kenn

 On Mon, Apr 19, 2021 at 8:36 AM Kenneth Knowles 
 wrote:

>
>
> On Fri, Apr 16, 2021 at 3:42 AM Elliotte Rusty Harold <
> elh...@ibiblio.org> wrote:
>
>> On Fri, Apr 16, 2021 at 4:02 AM Kenneth Knowles 
>> wrote:
>>
>> > The complete staging area is available for your review, which
>> includes:
>> > * JIRA release notes [1],
>> > * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with
>> fingerprint 03DBA3E6ABDD04BFD1558DC16ED551A8AE02461C [3],
>> > * all artifacts to be deployed to the Maven Central Repository
>> [4],
>> > * source code tag "v2.29.0-RC1" [5],
>> > * website pull request listing the release [6], publishing the
>> API reference manual [7], and the blog post [8].
>> > * Java artifacts were built with Maven MAVEN_VERSION and
>> OpenJDK/Oracle JDK JDK_VERSION.
>>
>> Are the MAVEN_VERSION and OpenJDK/Oracle JDK JDK_VERSION supposed
>> to
>> be filled in with numbers?
>>
>
> Yes, I missed that these were variables to be replaced.
>
> JDK_VERSION=8u181 (1.8) and the Gradle version is taken from the
> gradlew config so no need to include in the template, but it is 6.8
>
> Kenn
>>

Re: Naming! Dataflow Worker/SDK "Harness" image flag

2021-04-20 Thread Kenneth Knowles
+1 to dropping "harness". Even if it still occurs in code, we can remove it
from the user interface and that is a Good Thing.

Kenn

On Mon, Apr 19, 2021 at 3:42 PM Robert Burke  wrote:

> +1 to shorter flags without unnecessary words
>
> On Mon, Apr 19, 2021, 3:19 PM Robert Bradshaw  wrote:
>
>> I commented on the doc, but I'm also in favor of dropping "harness."
>>
>> On Mon, Apr 19, 2021 at 3:10 PM Tyson Hamilton 
>> wrote:
>>
>>> I'm in favor of dropping "harness" and going with "sdk_container_image".
>>> I don't feel like the word "harness" adds value or clarity.
>>>
>>> On Mon, Apr 19, 2021 at 11:34 AM Emily Ye  wrote:
>>>
 Hi all,

 *tl;dr*: keep harness in user-facing container image flag names?

 I have a few PRs in-progress for "renaming" the
 workerHarnessContainerImage flag, i.e. adding a new flag and marking the
 old flag as deprecated. This is being done to better reflect the Portable
 framework.. I wanted to create a flag with the same usage (i.e. passing in
 a single image) - see proposal if you are curious [1].

 Python: https://github.com/apache/beam/pull/14575
 Java: https://github.com/apache/beam/pull/14557

 The names we are choosing between right now are:

- --sdk_container_image
   - "Harness" doesn't mean anything to most users and is already a
   confusing term, but mostly got carried over from legacy image names 
 where
   as far as I can tell, we added harness to indicate it started the SDK
   process/was different from (VM) worker images
   - Portable runner uses "docker_container_image" currently for
   the --environment_type=DOCKER --environment_config key
- --sdk_harness_container_image
   - "Harness" is baked into a bunch of different places (other
   flag --sdk_harness_container_image_overrides for providing multiple 
 image
   overrides, e.g. for xlang, Dataflow API objects refer to
   workerHarnessContainerImage/sdkHarnessContainerImages)

 Right now the PRs are using sdk_container_image and we reached a small
 consensus about this in the proposal doc, but I wanted to see how strongly
 (within reasonable time frame) people felt we should keep harness for
 consistency's sake. As mentioned on the Python PR, we can also alias the
 other flag to not have harness in the name, but the Dataflow API still
 refers to harness objects.

 [1] go/beam-sdk-container-image-flag
 


 Thanks!
 -Emily







Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-20 Thread Robert Bradshaw
The artifacts and signatures look good to me. +1 (binding)

(The release branch still has the .dev name, maybe you didn't push?
https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
)

On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles  wrote:

> Please take another look.
>
>  - I re-ran the RC creation script so the source release and wheels are
> new and built from the RC tag. I confirmed the source zip and wheels have
> version 2.29.0 (not .dev or -SNAPSHOT).
>  - I fixed and rebuilt Dataflow worker container images from exactly the
> RC commit, added dataclasses, with internal changes to get the version to
> match.
>  - I confirmed that the staged jars already have version 2.29.0 (not
> -SNAPSHOT).
>  - I confirmed with `diff -r -q` that the source tarball matches the RC
> tag (minus the .git* files and directories and gradlew)
>
> Kenn
>
> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles  wrote:
>
>> At this point, the release train has just about come around to 2.30.0
>> which will pick up that change. I don't think it makes sense to cherry-pick
>> anything more into 2.29.0 unless it is nonfunctional. As it is, I think we
>> have a good commit and just need to build the expected artifacts. Since it
>> isn't all the artifacts, I was planning on just overwriting the RC1
>> artifacts in question and re-verify. I could also roll a new RC2 from the
>> same commit fairly easily.
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:
>>
>>> Any chance we could include https://github.com/apache/beam/pull/14548?
>>>
>>> On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles  wrote:
>>>
 To clarify: I am running and fixing the release scripts on the `master`
 branch. They work from fresh clones of the RC tag so this should work in
 most cases. The exception is the GitHub Actions configuration, which I
 cherrypicked
 to the release branch.

 Kenn

 On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles 
 wrote:

> OK it sounds like I need to re-roll the artifacts in question. I don't
> think anything raised here indicates a problem with the tagged commit, but
> with the state of the release scripts at the time I built the earlier
> artifacts.
>
> On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw 
> wrote:
>
>> It looks like the wheels are also versioned "2.29.0.dev".
>>
>> Not sure if it's important, but the source tarball also seems to
>> contain some release script changes that are not reflected in the github
>> branch.
>>
>> On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
>> wrote:
>>
>>> Thanks for the details, Valentyn & Cham. I will fix the Dataflow
>>> worker containers then update this thread.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 8:36 AM Kenneth Knowles 
>>> wrote:
>>>


 On Fri, Apr 16, 2021 at 3:42 AM Elliotte Rusty Harold <
 elh...@ibiblio.org> wrote:

> On Fri, Apr 16, 2021 at 4:02 AM Kenneth Knowles 
> wrote:
>
> > The complete staging area is available for your review, which
> includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with
> fingerprint 03DBA3E6ABDD04BFD1558DC16ED551A8AE02461C [3],
> > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > * source code tag "v2.29.0-RC1" [5],
> > * website pull request listing the release [6], publishing the
> API reference manual [7], and the blog post [8].
> > * Java artifacts were built with Maven MAVEN_VERSION and
> OpenJDK/Oracle JDK JDK_VERSION.
>
> Are the MAVEN_VERSION and OpenJDK/Oracle JDK JDK_VERSION supposed
> to
> be filled in with numbers?
>

 Yes, I missed that these were variables to be replaced.

 JDK_VERSION=8u181 (1.8) and the Gradle version is taken from the
 gradlew config so no need to include in the template, but it is 6.8

 Kenn


>
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>



Re: Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Robert Burke
It looks like it doesn't consider TYPE to match TYPE NOT NULL. I don't know
how Beam Java handles that but I'd guess you'd need to annotate somehow the
fields to make them match.

On Tue, Apr 20, 2021, 12:24 PM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> sure thing!
> This is the error from method *testSQLInsertRowsToPubsubFlat*
>
> Given message schema: 'Fields:
> Field{name=name, description=, type=STRING, options={{}}}
> Field{name=height, description=, type=INT32, options={{}}}
> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
> Options:{{}}'
> does not match schema inferred from protobuf class.
> Protobuf class:
> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
> Inferred schema: 'Fields:
> Field{name=name, description=, type=STRING NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=1
> Field{name=height, description=, type=INT32 NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=2
> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=3
> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
> value=NameHeightKnowsJSMessage}}}'
>
> And this is the message from
> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*.
>
> Given message schema: 'Fields:
> Field{name=name, description=, type=STRING, options={{}}}
> Field{name=height, description=, type=INT32, options={{}}}
> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
> Options:{{}}'
> does not match schema inferred from protobuf class.
> Protobuf class:
> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
> Inferred schema: 'Fields:
> Field{name=name, description=, type=STRING NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=1
> Field{name=height, description=, type=INT32 NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=2
> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=3
> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
> value=NameHeightKnowsJSMessage}}}'
>
> Maybe the proto variable *knowsJs* should be called *knowsJavascript*?
>
>
> On Tue, Apr 20, 2021 at 12:26 PM Daniel Collins 
> wrote:
>
>> The error includes "NameHeightMessage". Can you provide an example error
>> from one of the other methods?
>>
>> On Tue, Apr 20, 2021 at 1:56 PM Fernando Morales Martinez <
>> fernando.mora...@wizeline.com> wrote:
>>
>>> Thanks for the heads up, Daniel!
>>>
>>> I missed changing that one, but even after making the change, I'm
>>> getting the same error.
>>>
>>> The other two methods, *testSQLInsertRowsToPubsubFlat* and
>>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already
>>> using NameHeightKnowsJSMessage class but are still throwing the same error.
>>>
>>> Any idea what else might be going on?
>>>
>>> On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
>>> wrote:
>>>
 Thanks for working on this! It looks to me like the schemas don't
 match: you appear to be using NameHeightMessage defined as:

 ```
 message NameHeightMessage {
   string name = 1;
   int32 height = 2;
 }
 ```

 And expecting it to work with a table schema that has a "BOOL
 knowsJavascript" field. Did you mean to use the "NameHeightKnowsJSMessage"
 class?

 -Daniel

 On Tue, Apr 20, 2021 at 1:02 PM Fernando Morales Martinez <
 fernando.mora...@wizeline.com> wrote:

> Sorry for the spam, forgot to add the pertinent link to the code
> change.
>
>
> https://github.com/fernando-wizeline/beam/commit/abc17db41b6aabf3f337c7742526e5ae9655f40b
>
> Thanks!
>
> On Tue, Apr 20, 2021 at 10:17 AM Fernando Morales Martinez <
> fernando.mora...@wizeline.com> wrote:
>
>> Hi team,
>>
>> I'm working on adding tests to PubsubTableProviderIT class to test
>> the proto support added to Pubsub.
>>
>> The issue below occurs when running
>> *testSQLReadAndWriteWithSameFlatTableDefinition*,
>> *testSQLInsertRowsToPubsubFlat* and
>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
>>  of PubsubTableProviderIT.
>>
>> Right now I'm facing an issue when executing method
>> inferAndVerifySchema of class ProtoPayloadSerializerProvider. The 
>> expected
>> schema is set as
>>
>> 'Fields:
>> Field{name=name, description=, type=STRING, options={{}}}
>> Field{name=height, description=, type=INT32, options={{}}}
>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>> Options:{{}}'
>>
>> and the schema obtained from the protoclass is:
>>
>> 'Fields:
>> 

Re: Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Fernando Morales Martinez
sure thing!
This is the error from method *testSQLInsertRowsToPubsubFlat*

Given message schema: 'Fields:
Field{name=name, description=, type=STRING, options={{}}}
Field{name=height, description=, type=INT32, options={{}}}
Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
Options:{{}}'
does not match schema inferred from protobuf class.
Protobuf class:
'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
Inferred schema: 'Fields:
Field{name=name, description=, type=STRING NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=1
Field{name=height, description=, type=INT32 NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=2
Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=3
Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
value=NameHeightKnowsJSMessage}}}'

And this is the message from
*testSQLInsertRowsToPubsubWithTimestampAttributeFlat*.

Given message schema: 'Fields:
Field{name=name, description=, type=STRING, options={{}}}
Field{name=height, description=, type=INT32, options={{}}}
Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
Options:{{}}'
does not match schema inferred from protobuf class.
Protobuf class:
'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
Inferred schema: 'Fields:
Field{name=name, description=, type=STRING NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=1
Field{name=height, description=, type=INT32 NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=2
Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=3
Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
value=NameHeightKnowsJSMessage}}}'

Maybe the proto variable *knowsJs* should be called *knowsJavascript*?


On Tue, Apr 20, 2021 at 12:26 PM Daniel Collins 
wrote:

> The error includes "NameHeightMessage". Can you provide an example error
> from one of the other methods?
>
> On Tue, Apr 20, 2021 at 1:56 PM Fernando Morales Martinez <
> fernando.mora...@wizeline.com> wrote:
>
>> Thanks for the heads up, Daniel!
>>
>> I missed changing that one, but even after making the change, I'm getting
>> the same error.
>>
>> The other two methods, *testSQLInsertRowsToPubsubFlat* and
>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already
>> using NameHeightKnowsJSMessage class but are still throwing the same error.
>>
>> Any idea what else might be going on?
>>
>> On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
>> wrote:
>>
>>> Thanks for working on this! It looks to me like the schemas don't match:
>>> you appear to be using NameHeightMessage defined as:
>>>
>>> ```
>>> message NameHeightMessage {
>>>   string name = 1;
>>>   int32 height = 2;
>>> }
>>> ```
>>>
>>> And expecting it to work with a table schema that has a "BOOL
>>> knowsJavascript" field. Did you mean to use the "NameHeightKnowsJSMessage"
>>> class?
>>>
>>> -Daniel
>>>
>>> On Tue, Apr 20, 2021 at 1:02 PM Fernando Morales Martinez <
>>> fernando.mora...@wizeline.com> wrote:
>>>
 Sorry for the spam, forgot to add the pertinent link to the code change.


 https://github.com/fernando-wizeline/beam/commit/abc17db41b6aabf3f337c7742526e5ae9655f40b

 Thanks!

 On Tue, Apr 20, 2021 at 10:17 AM Fernando Morales Martinez <
 fernando.mora...@wizeline.com> wrote:

> Hi team,
>
> I'm working on adding tests to PubsubTableProviderIT class to test the
> proto support added to Pubsub.
>
> The issue below occurs when running
> *testSQLReadAndWriteWithSameFlatTableDefinition*,
> *testSQLInsertRowsToPubsubFlat* and
> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
>  of PubsubTableProviderIT.
>
> Right now I'm facing an issue when executing method
> inferAndVerifySchema of class ProtoPayloadSerializerProvider. The expected
> schema is set as
>
> 'Fields:
> Field{name=name, description=, type=STRING, options={{}}}
> Field{name=height, description=, type=INT32, options={{}}}
> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
> Options:{{}}'
>
> and the schema obtained from the protoclass is:
>
> 'Fields:
> Field{name=name, description=, type=STRING NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=1
> Field{name=height, description=, type=INT32 NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=2
> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT
> NULL, value=NameHeightMessage}}}'
> java.lang.IllegalArgumentException: Given message sc

Re: Codecov Bash Uploader Security Notice

2021-04-20 Thread Ahmet Altay
I received the same email too. I think it is related to Beam given that we
all received it. I cannot find any references to the bash uploader anymore
but I found some references from 2016 [1]. It looks like we used it at some
point in the past and maybe that is why we received the notifications. If I
understand correctly, we are not using the bash uploader any more and we do
not need to take any action.

[1]
https://github.com/apache/beam/commit/aed5e276726440cb3cfa04fe6d16985aa7d2fb4f

On Thu, Apr 15, 2021 at 12:59 PM Brian Hulette  wrote:

> I also got this email, it stated "Unfortunately, we can confirm that you
> were impacted by this security event," but it didn't specify _how_ I
> was impacted. I assumed it was through Beam, but perhaps it was through
> Arrow. It looks like they use the Bash uploader [1].
>
> The codecov notice states:
> > The Bash Uploader is also used in these related uploaders:
> Codecov-actions uploader for Github, the Codecov CircleCl Orb, and the
> Codecov Bitrise Step (together, the “Bash Uploaders”). Therefore, these
> related uploaders were also impacted by this event.
>
> Which would seem to confirm the Python codecov tool is not impacted.
>
> [1]
> https://github.com/apache/arrow/blob/13c334e976f09d4d896c26d4b5f470e36a46572b/.github/workflows/rust.yml#L337
>
>
>
>
>
> On Thu, Apr 15, 2021 at 12:50 PM Pablo Estrada  wrote:
>
>> I believe that the utility that we use is the Python codecov tool[1], not
>> the bash uploader[2].
>> Specifically, the upload seems to happen in Python here[3].
>>
>> Why do I think we use the Python tool? Because it seems to be installed
>> by tox around the link Udi shared[4]
>>
>> So it seems we're okay?
>>
>>
>> [1] https://github.com/codecov/codecov-python
>> [2] https://docs.codecov.io/docs/about-the-codecov-bash-uploader
>> [3]
>> https://github.com/codecov/codecov-python/blob/158a38eed7fd6f0d2f9c9f4c5258ab1f244b6e13/codecov/__init__.py#L1129-L1157
>> [4]
>> https://github.com/apache/beam/blob/39923d8f843ecfd3d89443dccc359c14aea8f26f/sdks/python/tox.ini#L105
>>
>>
>> On Thu, Apr 15, 2021 at 11:38 AM Udi Meiri  wrote:
>>
>>> From the notice: "We strongly recommend affected users immediately
>>> re-roll all of their credentials, tokens, or keys located in the
>>> environment variables in their CI processes that used one of Codecov’s Bash
>>> Uploaders."
>>>
>>>
>>> On Thu, Apr 15, 2021 at 11:35 AM Udi Meiri  wrote:
>>>
 I got this email: https://about.codecov.io/security-update/

 This is where we use codecov:

 https://github.com/apache/beam/blob/39923d8f843ecfd3d89443dccc359c14aea8f26f/sdks/python/tox.ini#L105

 I'm not sure if this runs the "bash uploader", but we do set
 a CODECOV_TOKEN environment variable.

>>>


Re: Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Daniel Collins
The error includes "NameHeightMessage". Can you provide an example error
from one of the other methods?

On Tue, Apr 20, 2021 at 1:56 PM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> Thanks for the heads up, Daniel!
>
> I missed changing that one, but even after making the change, I'm getting
> the same error.
>
> The other two methods, *testSQLInsertRowsToPubsubFlat* and
> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already using
> NameHeightKnowsJSMessage class but are still throwing the same error.
>
> Any idea what else might be going on?
>
> On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
> wrote:
>
>> Thanks for working on this! It looks to me like the schemas don't match:
>> you appear to be using NameHeightMessage defined as:
>>
>> ```
>> message NameHeightMessage {
>>   string name = 1;
>>   int32 height = 2;
>> }
>> ```
>>
>> And expecting it to work with a table schema that has a "BOOL
>> knowsJavascript" field. Did you mean to use the "NameHeightKnowsJSMessage"
>> class?
>>
>> -Daniel
>>
>> On Tue, Apr 20, 2021 at 1:02 PM Fernando Morales Martinez <
>> fernando.mora...@wizeline.com> wrote:
>>
>>> Sorry for the spam, forgot to add the pertinent link to the code change.
>>>
>>>
>>> https://github.com/fernando-wizeline/beam/commit/abc17db41b6aabf3f337c7742526e5ae9655f40b
>>>
>>> Thanks!
>>>
>>> On Tue, Apr 20, 2021 at 10:17 AM Fernando Morales Martinez <
>>> fernando.mora...@wizeline.com> wrote:
>>>
 Hi team,

 I'm working on adding tests to PubsubTableProviderIT class to test the
 proto support added to Pubsub.

 The issue below occurs when running
 *testSQLReadAndWriteWithSameFlatTableDefinition*,
 *testSQLInsertRowsToPubsubFlat* and
 *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
  of PubsubTableProviderIT.

 Right now I'm facing an issue when executing method
 inferAndVerifySchema of class ProtoPayloadSerializerProvider. The expected
 schema is set as

 'Fields:
 Field{name=name, description=, type=STRING, options={{}}}
 Field{name=height, description=, type=INT32, options={{}}}
 Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
 Options:{{}}'

 and the schema obtained from the protoclass is:

 'Fields:
 Field{name=name, description=, type=STRING NOT NULL,
 options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
 value=1
 Field{name=height, description=, type=INT32 NOT NULL,
 options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
 value=2
 Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
 value=NameHeightMessage}}}'
 java.lang.IllegalArgumentException: Given message schema: 'Fields:
 Field{name=name, description=, type=STRING, options={{}}}
 Field{name=height, description=, type=INT32, options={{}}}
 Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
 Options:{{}}'

 I'm guessing, by the name of the tests, the idea is to compare the
 payload (protoclass in this case) against the flat schema, but then the
 validation in inferAndVerifySchema fails.

 Should I be looking for a workaround for that and if so, can you shed
 some light on how to proceed?

 Thanks for the help!
 - Fernando Morales


>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *This email and its contents (including any attachments) are being sent
>>> toyou on the condition of confidentiality and may be protected by
>>> legalprivilege. Access to this email by anyone other than the intended
>>> recipientis unauthorized. If you are not the intended recipient, please
>>> immediatelynotify the sender by replying to this message and delete the
>>> materialimmediately from your system. Any further use, dissemination,
>>> distributionor reproduction of this email is strictly prohibited. Further,
>>> norepresentation is made with respect to any content contained in this
>>> email.*
>>
>>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


P0 (outage) report

2021-04-20 Thread Beam Jira Bot
This is your daily summary of Beam's current outages. See 
https://beam.apache.org/contribute/jira-priorities/#p0-outage for the meaning 
and expectations around P0 issues.

BEAM-12196: Apache Beam Kafka Source Connector Idle Partition Issue with 
“CustomTimeStampPolicyWithLimitedDelay” 
(https://issues.apache.org/jira/browse/BEAM-12196)


Re: Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Fernando Morales Martinez
Thanks for the heads up, Daniel!

I missed changing that one, but even after making the change, I'm getting
the same error.

The other two methods, *testSQLInsertRowsToPubsubFlat* and
*testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already using
NameHeightKnowsJSMessage class but are still throwing the same error.

Any idea what else might be going on?

On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
wrote:

> Thanks for working on this! It looks to me like the schemas don't match:
> you appear to be using NameHeightMessage defined as:
>
> ```
> message NameHeightMessage {
>   string name = 1;
>   int32 height = 2;
> }
> ```
>
> And expecting it to work with a table schema that has a "BOOL
> knowsJavascript" field. Did you mean to use the "NameHeightKnowsJSMessage"
> class?
>
> -Daniel
>
> On Tue, Apr 20, 2021 at 1:02 PM Fernando Morales Martinez <
> fernando.mora...@wizeline.com> wrote:
>
>> Sorry for the spam, forgot to add the pertinent link to the code change.
>>
>>
>> https://github.com/fernando-wizeline/beam/commit/abc17db41b6aabf3f337c7742526e5ae9655f40b
>>
>> Thanks!
>>
>> On Tue, Apr 20, 2021 at 10:17 AM Fernando Morales Martinez <
>> fernando.mora...@wizeline.com> wrote:
>>
>>> Hi team,
>>>
>>> I'm working on adding tests to PubsubTableProviderIT class to test the
>>> proto support added to Pubsub.
>>>
>>> The issue below occurs when running
>>> *testSQLReadAndWriteWithSameFlatTableDefinition*,
>>> *testSQLInsertRowsToPubsubFlat* and
>>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
>>>  of PubsubTableProviderIT.
>>>
>>> Right now I'm facing an issue when executing method inferAndVerifySchema
>>> of class ProtoPayloadSerializerProvider. The expected schema is set as
>>>
>>> 'Fields:
>>> Field{name=name, description=, type=STRING, options={{}}}
>>> Field{name=height, description=, type=INT32, options={{}}}
>>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>>> Options:{{}}'
>>>
>>> and the schema obtained from the protoclass is:
>>>
>>> 'Fields:
>>> Field{name=name, description=, type=STRING NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=1
>>> Field{name=height, description=, type=INT32 NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=2
>>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>>> value=NameHeightMessage}}}'
>>> java.lang.IllegalArgumentException: Given message schema: 'Fields:
>>> Field{name=name, description=, type=STRING, options={{}}}
>>> Field{name=height, description=, type=INT32, options={{}}}
>>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>>> Options:{{}}'
>>>
>>> I'm guessing, by the name of the tests, the idea is to compare the
>>> payload (protoclass in this case) against the flat schema, but then the
>>> validation in inferAndVerifySchema fails.
>>>
>>> Should I be looking for a workaround for that and if so, can you shed
>>> some light on how to proceed?
>>>
>>> Thanks for the help!
>>> - Fernando Morales
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>> *This email and its contents (including any attachments) are being sent
>> toyou on the condition of confidentiality and may be protected by
>> legalprivilege. Access to this email by anyone other than the intended
>> recipientis unauthorized. If you are not the intended recipient, please
>> immediatelynotify the sender by replying to this message and delete the
>> materialimmediately from your system. Any further use, dissemination,
>> distributionor reproduction of this email is strictly prohibited. Further,
>> norepresentation is made with respect to any content contained in this
>> email.*
>
>

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-20 Thread Kenneth Knowles
Please take another look.

 - I re-ran the RC creation script so the source release and wheels are new
and built from the RC tag. I confirmed the source zip and wheels have
version 2.29.0 (not .dev or -SNAPSHOT).
 - I fixed and rebuilt Dataflow worker container images from exactly the RC
commit, added dataclasses, with internal changes to get the version to
match.
 - I confirmed that the staged jars already have version 2.29.0 (not
-SNAPSHOT).
 - I confirmed with `diff -r -q` that the source tarball matches the RC tag
(minus the .git* files and directories and gradlew)

Kenn

On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles  wrote:

> At this point, the release train has just about come around to 2.30.0
> which will pick up that change. I don't think it makes sense to cherry-pick
> anything more into 2.29.0 unless it is nonfunctional. As it is, I think we
> have a good commit and just need to build the expected artifacts. Since it
> isn't all the artifacts, I was planning on just overwriting the RC1
> artifacts in question and re-verify. I could also roll a new RC2 from the
> same commit fairly easily.
>
> Kenn
>
> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:
>
>> Any chance we could include https://github.com/apache/beam/pull/14548?
>>
>> On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles  wrote:
>>
>>> To clarify: I am running and fixing the release scripts on the `master`
>>> branch. They work from fresh clones of the RC tag so this should work in
>>> most cases. The exception is the GitHub Actions configuration, which I
>>> cherrypicked
>>> to the release branch.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles  wrote:
>>>
 OK it sounds like I need to re-roll the artifacts in question. I don't
 think anything raised here indicates a problem with the tagged commit, but
 with the state of the release scripts at the time I built the earlier
 artifacts.

 On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw 
 wrote:

> It looks like the wheels are also versioned "2.29.0.dev".
>
> Not sure if it's important, but the source tarball also seems to
> contain some release script changes that are not reflected in the github
> branch.
>
> On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
> wrote:
>
>> Thanks for the details, Valentyn & Cham. I will fix the Dataflow
>> worker containers then update this thread.
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 8:36 AM Kenneth Knowles 
>> wrote:
>>
>>>
>>>
>>> On Fri, Apr 16, 2021 at 3:42 AM Elliotte Rusty Harold <
>>> elh...@ibiblio.org> wrote:
>>>
 On Fri, Apr 16, 2021 at 4:02 AM Kenneth Knowles 
 wrote:

 > The complete staging area is available for your review, which
 includes:
 > * JIRA release notes [1],
 > * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 03DBA3E6ABDD04BFD1558DC16ED551A8AE02461C [3],
 > * all artifacts to be deployed to the Maven Central Repository
 [4],
 > * source code tag "v2.29.0-RC1" [5],
 > * website pull request listing the release [6], publishing the
 API reference manual [7], and the blog post [8].
 > * Java artifacts were built with Maven MAVEN_VERSION and
 OpenJDK/Oracle JDK JDK_VERSION.

 Are the MAVEN_VERSION and OpenJDK/Oracle JDK JDK_VERSION supposed to
 be filled in with numbers?

>>>
>>> Yes, I missed that these were variables to be replaced.
>>>
>>> JDK_VERSION=8u181 (1.8) and the Gradle version is taken from the
>>> gradlew config so no need to include in the template, but it is 6.8
>>>
>>> Kenn
>>>
>>>


 --
 Elliotte Rusty Harold
 elh...@ibiblio.org

>>>


Re: Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Daniel Collins
Thanks for working on this! It looks to me like the schemas don't match:
you appear to be using NameHeightMessage defined as:

```
message NameHeightMessage {
  string name = 1;
  int32 height = 2;
}
```

And expecting it to work with a table schema that has a "BOOL
knowsJavascript" field. Did you mean to use the "NameHeightKnowsJSMessage"
class?

-Daniel

On Tue, Apr 20, 2021 at 1:02 PM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> Sorry for the spam, forgot to add the pertinent link to the code change.
>
>
> https://github.com/fernando-wizeline/beam/commit/abc17db41b6aabf3f337c7742526e5ae9655f40b
>
> Thanks!
>
> On Tue, Apr 20, 2021 at 10:17 AM Fernando Morales Martinez <
> fernando.mora...@wizeline.com> wrote:
>
>> Hi team,
>>
>> I'm working on adding tests to PubsubTableProviderIT class to test the
>> proto support added to Pubsub.
>>
>> The issue below occurs when running
>> *testSQLReadAndWriteWithSameFlatTableDefinition*,
>> *testSQLInsertRowsToPubsubFlat* and
>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
>>  of PubsubTableProviderIT.
>>
>> Right now I'm facing an issue when executing method inferAndVerifySchema
>> of class ProtoPayloadSerializerProvider. The expected schema is set as
>>
>> 'Fields:
>> Field{name=name, description=, type=STRING, options={{}}}
>> Field{name=height, description=, type=INT32, options={{}}}
>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>> Options:{{}}'
>>
>> and the schema obtained from the protoclass is:
>>
>> 'Fields:
>> Field{name=name, description=, type=STRING NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=1
>> Field{name=height, description=, type=INT32 NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=2
>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>> value=NameHeightMessage}}}'
>> java.lang.IllegalArgumentException: Given message schema: 'Fields:
>> Field{name=name, description=, type=STRING, options={{}}}
>> Field{name=height, description=, type=INT32, options={{}}}
>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>> Options:{{}}'
>>
>> I'm guessing, by the name of the tests, the idea is to compare the
>> payload (protoclass in this case) against the flat schema, but then the
>> validation in inferAndVerifySchema fails.
>>
>> Should I be looking for a workaround for that and if so, can you shed
>> some light on how to proceed?
>>
>> Thanks for the help!
>> - Fernando Morales
>>
>>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


Re: Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Fernando Morales Martinez
Sorry for the spam, forgot to add the pertinent link to the code change.

https://github.com/fernando-wizeline/beam/commit/abc17db41b6aabf3f337c7742526e5ae9655f40b

Thanks!

On Tue, Apr 20, 2021 at 10:17 AM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> Hi team,
>
> I'm working on adding tests to PubsubTableProviderIT class to test the
> proto support added to Pubsub.
>
> The issue below occurs when running
> *testSQLReadAndWriteWithSameFlatTableDefinition*,
> *testSQLInsertRowsToPubsubFlat* and
> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
>  of PubsubTableProviderIT.
>
> Right now I'm facing an issue when executing method inferAndVerifySchema
> of class ProtoPayloadSerializerProvider. The expected schema is set as
>
> 'Fields:
> Field{name=name, description=, type=STRING, options={{}}}
> Field{name=height, description=, type=INT32, options={{}}}
> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
> Options:{{}}'
>
> and the schema obtained from the protoclass is:
>
> 'Fields:
> Field{name=name, description=, type=STRING NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=1
> Field{name=height, description=, type=INT32 NOT NULL,
> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
> value=2
> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
> value=NameHeightMessage}}}'
> java.lang.IllegalArgumentException: Given message schema: 'Fields:
> Field{name=name, description=, type=STRING, options={{}}}
> Field{name=height, description=, type=INT32, options={{}}}
> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
> Options:{{}}'
>
> I'm guessing, by the name of the tests, the idea is to compare the payload
> (protoclass in this case) against the flat schema, but then the validation
> in inferAndVerifySchema fails.
>
> Should I be looking for a workaround for that and if so, can you shed some
> light on how to proceed?
>
> Thanks for the help!
> - Fernando Morales
>
>

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


Error comparing flat schema against schema inferred from protoclass

2021-04-20 Thread Fernando Morales Martinez
Hi team,

I'm working on adding tests to PubsubTableProviderIT class to test the
proto support added to Pubsub.

The issue below occurs when running
*testSQLReadAndWriteWithSameFlatTableDefinition*,
*testSQLInsertRowsToPubsubFlat* and
*testSQLInsertRowsToPubsubWithTimestampAttributeFlat*
 of PubsubTableProviderIT.

Right now I'm facing an issue when executing method inferAndVerifySchema of
class ProtoPayloadSerializerProvider. The expected schema is set as

'Fields:
Field{name=name, description=, type=STRING, options={{}}}
Field{name=height, description=, type=INT32, options={{}}}
Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
Options:{{}}'

and the schema obtained from the protoclass is:

'Fields:
Field{name=name, description=, type=STRING NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=1
Field{name=height, description=, type=INT32 NOT NULL,
options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
value=2
Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
value=NameHeightMessage}}}'
java.lang.IllegalArgumentException: Given message schema: 'Fields:
Field{name=name, description=, type=STRING, options={{}}}
Field{name=height, description=, type=INT32, options={{}}}
Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
Options:{{}}'

I'm guessing, by the name of the tests, the idea is to compare the payload
(protoclass in this case) against the flat schema, but then the validation
in inferAndVerifySchema fails.

Should I be looking for a workaround for that and if so, can you shed some
light on how to proceed?

Thanks for the help!
- Fernando Morales

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*