Flaky test issue report

2021-04-06 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12096: Flake: test_progress_in_HTML_JS_when_in_notebook 
(https://issues.apache.org/jira/browse/BEAM-12096)
BEAM-12061: beam_PostCommit_SQL failing on 
KafkaTableProviderIT.testFakeNested 
(https://issues.apache.org/jira/browse/BEAM-12061)
BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11733: [beam_PostCommit_Java] [testFhirIO_Import|export] flaky 
(https://issues.apache.org/jira/browse/BEAM-11733)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11662: elasticsearch tests failing 
(https://issues.apache.org/jira/browse/BEAM-11662)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on 
direct runner. (https://issues.apache.org/jira/browse/BEAM-11541)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-11493: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyAndWindows
 (https://issues.apache.org/jira/browse/BEAM-11493)
BEAM-11492: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows
 (https://issues.apache.org/jira/browse/BEAM-11492)
BEAM-11491: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMultipleWindows
 (https://issues.apache.org/jira/browse/BEAM-11491)
BEAM-11490: Spark test failure: 
org.apache.beam.sdk.transforms.ReifyTimestampsTest.inValuesSucceeds 
(https://issues.apache.org/jira/browse/BEAM-11490)
BEAM-11489: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-11489)
BEAM-11488: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics
 (https://issues.apache.org/jira/browse/BEAM-11488)
BEAM-11487: Spark test failure: 
org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsShouldApplyTimestamps
 (https://issues.apache.org/jira/browse/BEAM-11487)
BEAM-11486: Spark test failure: 
org.apache.beam.sdk.testing.PAssertTest.testSerializablePredicate 
(https://issues.apache.org/jira/browse/BEAM-11486)
BEAM-11485: Spark test failure: 
org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineNullValues 
(https://issues.apache.org/jira/browse/BEAM-11485)
BEAM-11484: Spark test failure: 
org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics 
(https://issues.apache.org/jira/browse/BEAM-11484)
BEAM-11483: Spark PostCommit Test Improvements 
(https://issues.apache.org/jira/browse/BEAM-11483)
BEAM-10995: Java + Universal Local Runner: 
WindowingTest.testWindowPreservation fails 
(https://issues.apache.org/jira/browse/BEAM-10995)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10901: Flaky test: 
PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
 (https://issues.apache.org/jira/browse/BEAM-10901)
BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM 
(https://issues.apache.org/jira/browse/BEAM-10899)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10763: Spotless flake (NullPointerException) 
(https://issues.apache.org/jira/browse/BEAM-10763)
BEAM-10590: BigQueryQueryToTableIT flaky: 

P1 issues report

2021-04-06 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12104: Java WordCountIT on Dataflow failing (GHA) 
(https://issues.apache.org/jira/browse/BEAM-12104)
BEAM-12095: spark_runner.py broken by Spark 3 upgrade. 
(https://issues.apache.org/jira/browse/BEAM-12095)
BEAM-12060: beam_PostCommit_Go_VR_Flink and beam_PostCommit_Go_VR_Spark 
failing since Mar 23, 2021 6:00:00 AM 
(https://issues.apache.org/jira/browse/BEAM-12060)
BEAM-12050: ParDoTest TimerTests that use TestStream failing for portable 
FlinkRunner (https://issues.apache.org/jira/browse/BEAM-12050)
BEAM-11965: testSplitQueryFnWithLargeDataset timeout failures 
(https://issues.apache.org/jira/browse/BEAM-11965)
BEAM-11961: InfluxDBIOIT failing with unauthorized error 
(https://issues.apache.org/jira/browse/BEAM-11961)
BEAM-11959: Python Beam SDK Harness hangs when installing pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11922: 
org.apache.beam.examples.cookbook.MapClassIntegrationIT.testDataflowMapState 
has been failing in master (https://issues.apache.org/jira/browse/BEAM-11922)
BEAM-11906: No trigger early repeatedly for session windows 
(https://issues.apache.org/jira/browse/BEAM-11906)
BEAM-11875: XmlIO.Read does not handle XML encoding per spec 
(https://issues.apache.org/jira/browse/BEAM-11875)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11772: GCP BigQuery sink (file loads) uses runner determined sharding 
for unbounded data (https://issues.apache.org/jira/browse/BEAM-11772)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner 
(https://issues.apache.org/jira/browse/BEAM-11576)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10883: XmlIO parsing of multibyte characters 
(https://issues.apache.org/jira/browse/BEAM-10883)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10718: Mongo DB IO benchmark is failing 
(https://issues.apache.org/jira/browse/BEAM-10718)
BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine 
results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617)
BEAM-10573: CSV files are loaded several times if they are too large 
(https://issues.apache.org/jira/browse/BEAM-10573)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9917: BigQueryBatchFileLoads dynamic destination 
(https://issues.apache.org/jira/browse/BEAM-9917)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9293: Python direct runner doesn't emit empty pane when it should 
(https://issues.apache.org/jira/browse/BEAM-9293)
BEAM-9154: Move Chicago Taxi Example to Python 3 
(https://issues.apache.org/jira/browse/BEAM-9154)
BEAM-8986: SortValues may not work correct for numerical types 
(https://issues.apache.org/jira/browse/BEAM-8986)
BEAM-8985: SortValues should fail if SecondaryKey coder is not 
deterministic (https://issues.apache.org/jira/browse/BEAM-8985)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails 

Re: Long term support versions of Beam Java

2021-04-06 Thread Robert Bradshaw
We actually do try to maintain state (and pipeline shape)
compatibility between Beam versions (e.g. this is why we have multiple
distinct sketch implementations rather than "fix" the one). It's true that
this is easier said than done, and some people are more vigilant about that
than others). State-upgrading, and a better upgrade story in general, has
been discussed from time to time, and it can get quite tricky but that
doesn't mean we can't do better here. I know that Reuven has been thinking
about these ideas.

As for why people don't upgrade, I think the largest reason is that
which is common to all aversion to software upgrades: trading what works
(and has a long track record) for something that merely should work is
always a risk and even when things go well involves a certain amount of
toil (testing, deployment, ...)


On Tue, Apr 6, 2021 at 2:43 PM Jan Lukavský  wrote:

> Hi,
> do we know what is the reason users stay on an older version of Beam? My
> guess would be that it is not related to API changes, but more likely to
> state incompatibility. Maybe if we could figure out a way which would
> enable a smooth migration of state (and timers) between Beam versions, that
> might help? The migration would probably have to be runner-dependent, but
> Beam might offer some tooling to make this easier. One example would be
> coder evolution, where we currently do not have the option of "reading old
> way, writing new way" with some "coder-version-registry". I suppose there
> might have been a discussion about this in the past, does anyone know of
> any conclusion?
>
>  Jan
> On 4/6/21 10:54 PM, Robert Bradshaw wrote:
>
> I do think there's value in having an LTS release, if there's sufficient
> interest to fund it (specifically, figuring out who would be backporting
> fixes and cutting the new releases).
>
> On Mon, Apr 5, 2021 at 1:14 PM Elliotte Rusty Harold 
> wrote:
>
>> Hi,
>>
>> I'd like to return to the discussion around a long term support
>> release that first came up here in 2018:
>>
>>
>> https://lists.apache.org/thread.html/6ec572d8edfe93225edebec18792cbcf44ef447ffe54ea35549cdafe%40%3Cdev.beam.apache.org%3E
>>
>> This is important to some Google Cloud Dataflow Java customers, and
>> likely others as well.
>>
>> Specifically, I'd like to propose cutting an LTS release off a branch
>> and maintaining it with critical bug fixes and security updates for 18
>> months. Right now we're finding that the current one year support
>> period and six week release cycle is a tad fast for some customers.
>>
>> There's some wiggle room in terms of what's "critical", but in
>> that category I include security fixes and data integrity issues.
>> Essentially, this is any bug so bad that, if found in a new release,
>> we'd recommend customers wait for the fix before upgrading to the
>> latest and greatest. The difference is we'd backport the patch to the
>> not-latest-and-greatest release.
>>
>> To run something up the flagpole, I propose:
>>
>> 1. 2.28.0 becomes the first LTS release.
>> 2. New patch versions are released as 2.28.1, 2.28.2, etc.
>> 3. Patch releases do not change API, at all, except in the unlikely
>> event this is absolutely required for a security fix.
>> 4. Dependencies are not upgraded in patch releases unless required to
>> fix a critical bug or security issue.
>> 5. In a year, cut a new LTS release from whatever is then current so
>> there's some overlap to give customers time to switch over.
>>
>> I picked 2.28.0 since it's the most recent release, and I prefer to
>> stay off the bleeding edge for longterm support. This would also
>> enable customers to develop on top of it sooner. However I understand
>> others may well prefer to pick a different release such as 2.29.0 or
>> 2.30.0. I'm OK with whatever recent version the community picks.
>>
>> Thoughts?
>>
>> --
>> Elliotte Rusty Harold
>> elh...@ibiblio.org
>>
>


Re: Long term support versions of Beam Java

2021-04-06 Thread Jan Lukavský

Hi,
do we know what is the reason users stay on an older version of Beam? My 
guess would be that it is not related to API changes, but more likely to 
state incompatibility. Maybe if we could figure out a way which would 
enable a smooth migration of state (and timers) between Beam versions, 
that might help? The migration would probably have to be 
runner-dependent, but Beam might offer some tooling to make this easier. 
One example would be coder evolution, where we currently do not have the 
option of "reading old way, writing new way" with some 
"coder-version-registry". I suppose there might have been a discussion 
about this in the past, does anyone know of any conclusion?


 Jan

On 4/6/21 10:54 PM, Robert Bradshaw wrote:
I do think there's value in having an LTS release, if there's 
sufficient interest to fund it (specifically, figuring out who would 
be backporting fixes and cutting the new releases).


On Mon, Apr 5, 2021 at 1:14 PM Elliotte Rusty Harold 
mailto:elh...@ibiblio.org>> wrote:


Hi,

I'd like to return to the discussion around a long term support
release that first came up here in 2018:


https://lists.apache.org/thread.html/6ec572d8edfe93225edebec18792cbcf44ef447ffe54ea35549cdafe%40%3Cdev.beam.apache.org%3E



This is important to some Google Cloud Dataflow Java customers, and
likely others as well.

Specifically, I'd like to propose cutting an LTS release off a branch
and maintaining it with critical bug fixes and security updates for 18
months. Right now we're finding that the current one year support
period and six week release cycle is a tad fast for some customers.

There's some wiggle room in terms of what's "critical", but in
that category I include security fixes and data integrity issues.
Essentially, this is any bug so bad that, if found in a new release,
we'd recommend customers wait for the fix before upgrading to the
latest and greatest. The difference is we'd backport the patch to the
not-latest-and-greatest release.

To run something up the flagpole, I propose:

1. 2.28.0 becomes the first LTS release.
2. New patch versions are released as 2.28.1, 2.28.2, etc.
3. Patch releases do not change API, at all, except in the unlikely
event this is absolutely required for a security fix.
4. Dependencies are not upgraded in patch releases unless required to
fix a critical bug or security issue.
5. In a year, cut a new LTS release from whatever is then current so
there's some overlap to give customers time to switch over.

I picked 2.28.0 since it's the most recent release, and I prefer to
stay off the bleeding edge for longterm support. This would also
enable customers to develop on top of it sooner. However I understand
others may well prefer to pick a different release such as 2.29.0 or
2.30.0. I'm OK with whatever recent version the community picks.

Thoughts?

-- 
Elliotte Rusty Harold

elh...@ibiblio.org 



Re: Flink runner configuration for closure cleaner

2021-04-06 Thread Kyle Weaver
I don't think this will require Beam to have its own configuration option.
You should be able to set the property "pipeline.closure-cleaner-level" in
your flink.conf and then pass it to Beam using Beam's "--flink-conf-dir"
pipeline option.

On Tue, Apr 6, 2021 at 2:28 PM Raman Gupta  wrote:

> Hello all: I created https://issues.apache.org/jira/browse/BEAM-12055
> because I'm having an issue with using the Flink runner locally, due to
> https://issues.apache.org/jira/browse/FLINK-15773.
>
> Does anyone see any reason why Beam's Flink runner should not provide a
> configuration option that can disable the Flink closure cleaner? I'm not
> familiar with this myself, and the only reason I would use it is as a
> work-around for FLINK-15773 in my local dev environment.
>
> Regards,
> Raman
>
>


Flink runner configuration for closure cleaner

2021-04-06 Thread Raman Gupta
Hello all: I created https://issues.apache.org/jira/browse/BEAM-12055
because I'm having an issue with using the Flink runner locally, due to
https://issues.apache.org/jira/browse/FLINK-15773.

Does anyone see any reason why Beam's Flink runner should not provide a
configuration option that can disable the Flink closure cleaner? I'm not
familiar with this myself, and the only reason I would use it is as a
work-around for FLINK-15773 in my local dev environment.

Regards,
Raman


Re: Long term support versions of Beam Java

2021-04-06 Thread Robert Bradshaw
I do think there's value in having an LTS release, if there's sufficient
interest to fund it (specifically, figuring out who would be backporting
fixes and cutting the new releases).

On Mon, Apr 5, 2021 at 1:14 PM Elliotte Rusty Harold 
wrote:

> Hi,
>
> I'd like to return to the discussion around a long term support
> release that first came up here in 2018:
>
>
> https://lists.apache.org/thread.html/6ec572d8edfe93225edebec18792cbcf44ef447ffe54ea35549cdafe%40%3Cdev.beam.apache.org%3E
>
> This is important to some Google Cloud Dataflow Java customers, and
> likely others as well.
>
> Specifically, I'd like to propose cutting an LTS release off a branch
> and maintaining it with critical bug fixes and security updates for 18
> months. Right now we're finding that the current one year support
> period and six week release cycle is a tad fast for some customers.
>
> There's some wiggle room in terms of what's "critical", but in
> that category I include security fixes and data integrity issues.
> Essentially, this is any bug so bad that, if found in a new release,
> we'd recommend customers wait for the fix before upgrading to the
> latest and greatest. The difference is we'd backport the patch to the
> not-latest-and-greatest release.
>
> To run something up the flagpole, I propose:
>
> 1. 2.28.0 becomes the first LTS release.
> 2. New patch versions are released as 2.28.1, 2.28.2, etc.
> 3. Patch releases do not change API, at all, except in the unlikely
> event this is absolutely required for a security fix.
> 4. Dependencies are not upgraded in patch releases unless required to
> fix a critical bug or security issue.
> 5. In a year, cut a new LTS release from whatever is then current so
> there's some overlap to give customers time to switch over.
>
> I picked 2.28.0 since it's the most recent release, and I prefer to
> stay off the bleeding edge for longterm support. This would also
> enable customers to develop on top of it sooner. However I understand
> others may well prefer to pick a different release such as 2.29.0 or
> 2.30.0. I'm OK with whatever recent version the community picks.
>
> Thoughts?
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>


Re: [DISCUSS] Include inherited members in Python API Docs?

2021-04-06 Thread Brian Hulette
Sure, I can try cutting out PTransform.

We could also look into reducing noise by:
- removing undoc-members from the config [1] (this would make it so only
objects with a docstring are added to the generated docs)
- adding :meta private:` to docstrings for objects we don't want publicly
visible

[1]
https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/python/scripts/generate_pydoc.sh#L48

On Tue, Apr 6, 2021 at 1:17 PM Robert Bradshaw  wrote:

> Way too many things are inherited from PTransform, can we at least cut
> that out?
>
> On Tue, Apr 6, 2021 at 1:09 PM Brian Hulette  wrote:
>
>> Just wanted to bump this - does anyone have concerns with the way the API
>> docs look when inherited members are included?
>>
>> On Wed, Mar 31, 2021 at 5:23 PM Brian Hulette 
>> wrote:
>>
>>> I staged my current working copy built from head here [1], see
>>> CombinePerKey here [2]. Note it also has a few other changes, most notably
>>> I excluded several internal-only modules that are currently in our API docs
>>> (I will PR this soon regardless).
>>>
>>> > are these inherited members grouped in such a way that it makes it
>>> easy to ignore them once they get to "low" in the stack?
>>> There doesn't seem to be any grouping, but it does look like inherited
>>> members are added at the end.
>>>
>>> > If it can't be per-module, is there a "nice" set of ancestors to avoid
>>> (as it seems this option takes such an argument).
>>> Ah good point, I missed this. I suppose we could avoid basic constructs
>>> like PTransform, DoFn, etc. I'm not sure how realistic that is though. It
>>> would be nice if this argument worked the other way
>>>
>>> [1] https://theneuralbit.github.io/beam-site/pydoc/inherited-members
>>> [2]
>>> https://theneuralbit.github.io/beam-site/pydoc/inherited-members/apache_beam.transforms.core.html#apache_beam.transforms.core.CombinePerKey
>>>
>>> On Wed, Mar 31, 2021 at 4:45 PM Robert Bradshaw 
>>> wrote:
>>>
 +1 to an example. In particular, are these inherited members grouped in
 such a way that it makes it easy to ignore them once they get to "low" in
 the stack? If it can't be per-module, is there a "nice" set of ancestors to
 avoid (as it seems this option takes such an argument).

 On Wed, Mar 31, 2021 at 4:23 PM Pablo Estrada 
 wrote:

> Do you have an example of what it would look like when released?
>
> On Wed, Mar 31, 2021 at 4:16 PM Brian Hulette 
> wrote:
>
>> I'm working on generating useful API docs for the DataFrame API
>> (BEAM-12074). In doing so, one thing I've found would be very helpful is 
>> if
>> we could include docstrings for inherited members in the API docs. That 
>> way
>> docstrings for operations defined in DeferredDataFrameOrSeries [1], will 
>> be
>> propagated to DeferredDataFrame [2] and DeferredSeries, and the former 
>> can
>> be hidden entirely. This would be more consistent with the pandas
>> documentation [3].
>>
>> It looks like we can do this by specifying :inherited-members: [4],
>> but this will apply to _all_ of our API docs, there doesn't seem to be a
>> way to restrict it to a particular module. This seems generally useful to
>> me, but it would be a significant change, so I wanted to see if there are
>> any objections from dev@ before doing this.
>>
>> An example of the kind of change this would produce: any PTransform
>> sub-classes, e.g. CombinePerKey [5], would now include docstrings for 
>> every
>> PTransform member, e.g. with_input_types [6], and display_data [7].
>>
>> Would there be any objections to that?
>>
>> Thanks,
>> Brian
>>
>> [1]
>> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrameOrSeries
>> [2]
>> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrame
>> [3] https://pandas.pydata.org/docs/reference/frame.html
>> [4]
>> https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html
>> [5]
>> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.core.html?highlight=combineperkey#apache_beam.transforms.core.CombinePerKey
>> [6]
>> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.PTransform.with_input_types
>> [7]
>> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.display.html#apache_beam.transforms.display.HasDisplayData.display_data
>>
>


Re: [DISCUSS] Include inherited members in Python API Docs?

2021-04-06 Thread Robert Bradshaw
Way too many things are inherited from PTransform, can we at least cut
that out?

On Tue, Apr 6, 2021 at 1:09 PM Brian Hulette  wrote:

> Just wanted to bump this - does anyone have concerns with the way the API
> docs look when inherited members are included?
>
> On Wed, Mar 31, 2021 at 5:23 PM Brian Hulette  wrote:
>
>> I staged my current working copy built from head here [1], see
>> CombinePerKey here [2]. Note it also has a few other changes, most notably
>> I excluded several internal-only modules that are currently in our API docs
>> (I will PR this soon regardless).
>>
>> > are these inherited members grouped in such a way that it makes it easy
>> to ignore them once they get to "low" in the stack?
>> There doesn't seem to be any grouping, but it does look like inherited
>> members are added at the end.
>>
>> > If it can't be per-module, is there a "nice" set of ancestors to avoid
>> (as it seems this option takes such an argument).
>> Ah good point, I missed this. I suppose we could avoid basic constructs
>> like PTransform, DoFn, etc. I'm not sure how realistic that is though. It
>> would be nice if this argument worked the other way
>>
>> [1] https://theneuralbit.github.io/beam-site/pydoc/inherited-members
>> [2]
>> https://theneuralbit.github.io/beam-site/pydoc/inherited-members/apache_beam.transforms.core.html#apache_beam.transforms.core.CombinePerKey
>>
>> On Wed, Mar 31, 2021 at 4:45 PM Robert Bradshaw 
>> wrote:
>>
>>> +1 to an example. In particular, are these inherited members grouped in
>>> such a way that it makes it easy to ignore them once they get to "low" in
>>> the stack? If it can't be per-module, is there a "nice" set of ancestors to
>>> avoid (as it seems this option takes such an argument).
>>>
>>> On Wed, Mar 31, 2021 at 4:23 PM Pablo Estrada 
>>> wrote:
>>>
 Do you have an example of what it would look like when released?

 On Wed, Mar 31, 2021 at 4:16 PM Brian Hulette 
 wrote:

> I'm working on generating useful API docs for the DataFrame API
> (BEAM-12074). In doing so, one thing I've found would be very helpful is 
> if
> we could include docstrings for inherited members in the API docs. That 
> way
> docstrings for operations defined in DeferredDataFrameOrSeries [1], will 
> be
> propagated to DeferredDataFrame [2] and DeferredSeries, and the former can
> be hidden entirely. This would be more consistent with the pandas
> documentation [3].
>
> It looks like we can do this by specifying :inherited-members: [4],
> but this will apply to _all_ of our API docs, there doesn't seem to be a
> way to restrict it to a particular module. This seems generally useful to
> me, but it would be a significant change, so I wanted to see if there are
> any objections from dev@ before doing this.
>
> An example of the kind of change this would produce: any PTransform
> sub-classes, e.g. CombinePerKey [5], would now include docstrings for 
> every
> PTransform member, e.g. with_input_types [6], and display_data [7].
>
> Would there be any objections to that?
>
> Thanks,
> Brian
>
> [1]
> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrameOrSeries
> [2]
> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrame
> [3] https://pandas.pydata.org/docs/reference/frame.html
> [4] https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html
> [5]
> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.core.html?highlight=combineperkey#apache_beam.transforms.core.CombinePerKey
> [6]
> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.PTransform.with_input_types
> [7]
> https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.display.html#apache_beam.transforms.display.HasDisplayData.display_data
>



Re: [DISCUSS] Include inherited members in Python API Docs?

2021-04-06 Thread Brian Hulette
Just wanted to bump this - does anyone have concerns with the way the API
docs look when inherited members are included?

On Wed, Mar 31, 2021 at 5:23 PM Brian Hulette  wrote:

> I staged my current working copy built from head here [1], see
> CombinePerKey here [2]. Note it also has a few other changes, most notably
> I excluded several internal-only modules that are currently in our API docs
> (I will PR this soon regardless).
>
> > are these inherited members grouped in such a way that it makes it easy
> to ignore them once they get to "low" in the stack?
> There doesn't seem to be any grouping, but it does look like inherited
> members are added at the end.
>
> > If it can't be per-module, is there a "nice" set of ancestors to avoid
> (as it seems this option takes such an argument).
> Ah good point, I missed this. I suppose we could avoid basic constructs
> like PTransform, DoFn, etc. I'm not sure how realistic that is though. It
> would be nice if this argument worked the other way
>
> [1] https://theneuralbit.github.io/beam-site/pydoc/inherited-members
> [2]
> https://theneuralbit.github.io/beam-site/pydoc/inherited-members/apache_beam.transforms.core.html#apache_beam.transforms.core.CombinePerKey
>
> On Wed, Mar 31, 2021 at 4:45 PM Robert Bradshaw 
> wrote:
>
>> +1 to an example. In particular, are these inherited members grouped in
>> such a way that it makes it easy to ignore them once they get to "low" in
>> the stack? If it can't be per-module, is there a "nice" set of ancestors to
>> avoid (as it seems this option takes such an argument).
>>
>> On Wed, Mar 31, 2021 at 4:23 PM Pablo Estrada  wrote:
>>
>>> Do you have an example of what it would look like when released?
>>>
>>> On Wed, Mar 31, 2021 at 4:16 PM Brian Hulette 
>>> wrote:
>>>
 I'm working on generating useful API docs for the DataFrame API
 (BEAM-12074). In doing so, one thing I've found would be very helpful is if
 we could include docstrings for inherited members in the API docs. That way
 docstrings for operations defined in DeferredDataFrameOrSeries [1], will be
 propagated to DeferredDataFrame [2] and DeferredSeries, and the former can
 be hidden entirely. This would be more consistent with the pandas
 documentation [3].

 It looks like we can do this by specifying :inherited-members: [4], but
 this will apply to _all_ of our API docs, there doesn't seem to be a way to
 restrict it to a particular module. This seems generally useful to me, but
 it would be a significant change, so I wanted to see if there are any
 objections from dev@ before doing this.

 An example of the kind of change this would produce: any PTransform
 sub-classes, e.g. CombinePerKey [5], would now include docstrings for every
 PTransform member, e.g. with_input_types [6], and display_data [7].

 Would there be any objections to that?

 Thanks,
 Brian

 [1]
 https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrameOrSeries
 [2]
 https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrame
 [3] https://pandas.pydata.org/docs/reference/frame.html
 [4] https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html
 [5]
 https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.core.html?highlight=combineperkey#apache_beam.transforms.core.CombinePerKey
 [6]
 https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.PTransform.with_input_types
 [7]
 https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.transforms.display.html#apache_beam.transforms.display.HasDisplayData.display_data

>>>


One day left: Last call for Beam College!

2021-04-06 Thread Mara Ruvalcaba

*
*


 **

***

Hi Beam Community,



Don’t miss out on Beam College - book your spot today. It’s packed with 
tons of content, designed to help improve your data processing skills 
from industry’s top experts. And it’s FREE! The first day of Beam 
College starts tomorrow, April 7th. If you want to attend, we recommend 
booking your spot today.



 Here’s a sneak peek of the training topics for this week:


 *

   The distributed data processing landscape, and Apache Beam under the
   hood

 *

   Advanced distributed data processing

 *

   Beam features to scale and productionalize your use case


Meet our amazing lineup of speakers: https://beamcollege.dev/instructors 




The online training program is designed to be flexible for all 
experience levels.



Register now at: https://beamcollege.dev/forms-enrollment-workshop1/ 






*

--
Mara Ruvalcaba
COO, SG Software Guru & Nearshore Link
USA: 512 296 2884
MX: 55 5239 5502



Re: Null checking in Beam

2021-04-06 Thread Jan Lukavský
I agree that there are _some_ added annotations at _some_, that are 
useful - most notably @NonNull on method arguments, possibly return 
values. Adding @NonNull into exception type being thrown seems awkward. 
The @UnknownKeyFor probably should not be there, as it brings no value. 
Did we raise the issue with the checkerframework? It seems to me, that 
the biggest problem lies there. It might have two modes of operation - 
after the check it could have a way of specifying which (and where) 
annotations should be in the compiled byte-code and which should be 
removed. Or can we post-process that with some different tool?


 Jan

On 4/5/21 6:03 PM, Kenneth Knowles wrote:


On Thu, Apr 1, 2021 at 9:57 AM Brian Hulette > wrote:


What value does it add? Is it that it enables them to use
checkerframework with our interfaces?


Actually if they are also using checkerframework the defaults are the 
same so it is not usually needed (though some defaults can be 
changed). Making defaults explicit is most useful for interfacing with 
other tools with different defaults, such as Spotbugs [1], IDEs [2] 
[3], or JVM languages with null safety bult-in, etc [4] [5].


Kenn

[1] 
https://spotbugs.readthedocs.io/en/stable/annotations.html#edu-umd-cs-findbugs-annotations-nullable 

[2] 
https://www.jetbrains.com/help/idea/2021.1/nullable-and-notnull-annotations.html 

[3] https://wiki.eclipse.org/JDT_Core/Null_Analysis 

[4] https://kotlinlang.org/docs/null-safety.html 

[5] 
https://kotlinlang.org/docs/java-interop.html#null-safety-and-platform-types 



On Thu, Apr 1, 2021 at 8:54 AM Kenneth Knowles mailto:k...@apache.org>> wrote:

Thanks for filing that. Once it is fixed in IntelliJ, the
annotations actually add value for downstream users.

Kenn

On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Hi,

I created the issue in JetBrains tracker [1]. I'm still
not 100% convinced that it is correct for the checker to
actually modify the bytecode. An open questions is - in
guava this does not happen. Does guava apply the check on
code being released? From what is in this thread is seems
to me, that the answer is no.

 Jan

[1] https://youtrack.jetbrains.com/issue/IDEA-265658


On 4/1/21 6:15 AM, Kenneth Knowles wrote:

Hi all,

About the IntelliJ automatic method stub issue: I raised
it to the checkerframework list and got a helpful
response:

https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ



It eventually reached back to Jetbrains and they would
appreciate a detailed report with steps to reproduce,
preferably a sample project. Would you - Jan or Ismaël or
Reuven - provide them with this issue report? It sounds
like Jan you have an example ready to go.

Kenn

On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský
mailto:je...@seznam.cz>> wrote:

Yes, annotations that we add to the code base on
purpose (like @Nullable or @SuppressWarnings) are
aboslutely fine. What is worse is that the checked is
not only checked, but a code generator. :)

For example when one wants to implement Coder by
extending CustomCoder and use auto-generating the
overridden methods, they look like

@Override public void encode(Long value, @UnknownKeyFor 
@NonNull @Initialized OutputStream outStream)throws 
@UnknownKeyFor@NonNull@Initialized CoderException, 
@UnknownKeyFor@NonNull@Initialized IOException {

}

Which is really ugly. :-(

 Jan

On 3/15/21 6:37 PM, Ismaël Mejía wrote:

+1

Even if I like the strictness for Null checking, I also think 
that
this is adding too much extra time for builds (that I noticed 
locally
when enabled) and also I agree with Jan that the annotations are
really an undesired side effect. For reference when you try to 
auto
complete some method signatures on IntelliJ on downstream 
projects
with C-A-v it generates some extra