Re: Beam Website Feedback

2022-09-09 Thread Ahmet Altay via dev
Thank you for the feedback Peter.

/cc @Reza Rokni  @David Cavazos  for
visibility.

On Fri, Sep 9, 2022 at 2:51 PM Austin Bennett 
wrote:

> Any chance you've seen a GH Issue for that?  If not, please feel free to
> file one :-)
>
> https://github.com/apache/beam/issues
>
> On Fri, Sep 9, 2022 at 12:09 PM Peter McArthur  wrote:
>
>> I wish there were a python example for "Slowly updating global window
>> side inputs” on this page
>> https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs
>>
>> thank you
>>
>>


Re: Beam Website Feedback

2022-09-09 Thread Austin Bennett
Any chance you've seen a GH Issue for that?  If not, please feel free to
file one :-)

https://github.com/apache/beam/issues

On Fri, Sep 9, 2022 at 12:09 PM Peter McArthur  wrote:

> I wish there were a python example for "Slowly updating global window side
> inputs” on this page
> https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs
>
> thank you
>
>


Beam Website Feedback

2022-09-09 Thread Peter McArthur
I wish there were a python example for "Slowly updating global window side 
inputs” on this page 
https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs

thank you



Re: Incomplete Beam Schema -> Avro Schema conversion

2022-09-09 Thread Balázs Németh
Is it still better to have an asymmetric conversion that supports more data
types than not having these implemented, right? This contribution seems
simple enough, but that's definitely not true for the other direction (...
and I'm also biased, I only need Beam->Avro).

Brian Hulette via dev  ezt írta (időpont: 2022. aug.
23., K, 1:53):

> I don't think there's a reason for this, it's just that these logical
> types were defined after the Avro <-> Beam schema conversion. I think it
> would be worthwhile to add support for them, but we'd also need to look at
> the reverse (avro to beam) direction, which would map back to the catch-all
> DATETIME primitive type [1]. Changing that could break backwards
> compatibility.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776
>
> On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh 
> wrote:
>
>> java.lang.RuntimeException: Unhandled logical type
>> beam:logical_type:date:v1
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java
>>
>> In
>> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944
>> it seems to me there are some missing options.
>>
>> For example
>> - FixedBytes.IDENTIFIER,
>> - EnumerationType.IDENTIFIER,
>> - OneOfType.IDENTIFIER
>> is there, but:
>> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER
>> ("beam:logical_type:date:v1")
>> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER
>> ("beam:logical_type:datetime:v1")
>> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER
>> ("beam:logical_type:time:v1")
>> is missing.
>>
>> This in an example that fails:
>>
>>> import java.time.LocalDate;
>>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
>>> import org.apache.beam.sdk.schemas.Schema;
>>> import org.apache.beam.sdk.schemas.Schema.FieldType;
>>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes;
>>> import org.apache.beam.sdk.schemas.utils.AvroUtils;
>>> import org.apache.beam.sdk.values.Row;
>>
>> // ...
>>
>> final Schema schema =
>>> Schema.builder()
>>> .addField("ymd",
>>> FieldType.logicalType(SqlTypes.DATE))
>>> .build();
>>>
>>> final Row row =
>>> Row.withSchema(schema)
>>> .withFieldValue("ymd", LocalDate.now())
>>> .build();
>>>
>>> System.out.println(BigQueryUtils.toTableSchema(schema)); // works
>>> System.out.println(BigQueryUtils.toTableRow(row)); // works
>>>
>>> System.out.println(AvroUtils.toAvroSchema(schema)); // fails
>>> System.out.println(AvroUtils.toGenericRecord(row)); // fails
>>
>>
>> Am I missing a reason for that or is it just not done properly yet? If
>> this is the case, am I right to assume that they should be represented in
>> the Avro format as the already existing cases?
>> "beam:logical_type:date:v1" vs "DATE"
>> "beam:logical_type:time:v1" vs "TIME"
>>
>>
>>


Beam High Priority Issue Report (69)

2022-09-09 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/23106 [Feature Request][Go SDK]: Support 
Slowly Changing Side Input Pattern
https://github.com/apache/beam/issues/23020 [Bug]: Kubernetes service not 
cleaned up and reaching quota causing multiple performance test failure
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is failing
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClien