Re: Forward StackOverflow questions with the apache-beam tag to a new mailing list

2022-08-22 Thread Chamikara Jayalath via dev
Hi folks,

Thanks for the comments here.

I created https://issues.apache.org/jira/browse/INFRA-23616 to request
Apache INFRA to create a new email list.
I'll update this thread when INFRA comments on the feasibility of this.

Thanks,
Cham

On Wed, Aug 17, 2022 at 6:35 PM Ahmet Altay  wrote:

> +1 to this idea.
>
> I think both options are reasonable. At the same time I also agree with
> checking with infra first on the feasibility of creating another mailing
> list.
>
> On Wed, Aug 17, 2022 at 10:00 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Wed, Aug 17, 2022 at 9:18 AM Robert Bradshaw 
>> wrote:
>>
>>> On Wed, Aug 17, 2022 at 5:15 AM Alexey Romanenko
>>>  wrote:
>>> >
>>> > Good point about unanswered SO questions. +1 that we need to improve a
>>> situation there.
>>> >
>>> > Yes, we may try to stream them to a new dedicated list but it will
>>> require people here to subscribe to and check regularly one more list which
>>> perhaps won’t be so efficient as well.
>>>
>>> I view the value of a new list as making it easy to choose to
>>> subscribe or not, and being able to respond to questions as they come
>>> in. You're right it's less useful for people who manually check lists
>>> rather than get them in their inbox.
>>>
>>> > I believe that a digest of the N latest unanswered questions to dev@
>>> (or to user@? or both?) every 3-4 days should be a better option.
>>>
>>> We could do this too. Ideally a list of all recent questions (answered
>>> or not, answers can be improved and voted on) if that's technically
>>> feasible.
>>>
>>> IIRC, the current rate is around 10-20 questions a week, so not huge,
>>> but still non-trivial.
>>>
>>
>> Yeah, ideally we can do both.
>>
>> (1) Individual questions at a high high frequency to
>> stackoverf...@beam.apache.org so that folks who are interested in
>> answering such questions (and subscribed to the list) get notified early.
>>
>
> As an anecdotal data point: Google as a company has an internal Q&A site
> and creating a similar mailing list had a noticeable impact on the number
> of questions answered. It was a good way to include the broader community
> as well. That said, I do not know if this would similarly apply to stack
> overflow or not.
>
>
>>
>> (2) Aggregated digests at a low frequency to the dev@ and user@ so that
>> we get more eyeballs into questions that go unanswered. Also this will be a
>> good way to get more upvotes for the correct answers.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> > On 17 Aug 2022, at 04:42, Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> > It seems like many of the questions posted to StackOverflow with the
>>> apache-beam tag [1] go unanswered or take more than they should to receive
>>> an acceptable answer.
>>> >
>>> > What do you all think about creating a new mailing list,
>>> stackoverf...@beam.apache.org (assuming Apache Infra is OK with this
>>> and it's technically feasible to do so), where notifications regarding such
>>> questions will be forwarded to ?
>>> >
>>> > This should allow folks who are interested in answering related
>>> questions to get notified early. Hopefully getting more eyeballs on these
>>> questions will increase the response rate (and the quality of the answers).
>>> >
>>> > Another option might be to post such notifications (or aggregations)
>>> to dev@ but this might unnecessarily spam all members of the dev list.
>>> >
>>> > Thanks,
>>> > Cham
>>> >
>>> > [1] https://stackoverflow.com/questions/tagged/apache-beam
>>> >
>>> >
>>>
>>


Re: Incomplete Beam Schema -> Avro Schema conversion

2022-08-22 Thread Brian Hulette via dev
I don't think there's a reason for this, it's just that these logical types
were defined after the Avro <-> Beam schema conversion. I think it would be
worthwhile to add support for them, but we'd also need to look at the
reverse (avro to beam) direction, which would map back to the catch-all
DATETIME primitive type [1]. Changing that could break backwards
compatibility.

[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776

On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh  wrote:

> java.lang.RuntimeException: Unhandled logical type
> beam:logical_type:date:v1
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943)
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306)
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341)
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java
>
> In
> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944
> it seems to me there are some missing options.
>
> For example
> - FixedBytes.IDENTIFIER,
> - EnumerationType.IDENTIFIER,
> - OneOfType.IDENTIFIER
> is there, but:
> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER
> ("beam:logical_type:date:v1")
> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER
> ("beam:logical_type:datetime:v1")
> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER
> ("beam:logical_type:time:v1")
> is missing.
>
> This in an example that fails:
>
>> import java.time.LocalDate;
>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
>> import org.apache.beam.sdk.schemas.Schema;
>> import org.apache.beam.sdk.schemas.Schema.FieldType;
>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes;
>> import org.apache.beam.sdk.schemas.utils.AvroUtils;
>> import org.apache.beam.sdk.values.Row;
>
> // ...
>
> final Schema schema =
>> Schema.builder()
>> .addField("ymd",
>> FieldType.logicalType(SqlTypes.DATE))
>> .build();
>>
>> final Row row =
>> Row.withSchema(schema)
>> .withFieldValue("ymd", LocalDate.now())
>> .build();
>>
>> System.out.println(BigQueryUtils.toTableSchema(schema)); // works
>> System.out.println(BigQueryUtils.toTableRow(row)); // works
>>
>> System.out.println(AvroUtils.toAvroSchema(schema)); // fails
>> System.out.println(AvroUtils.toGenericRecord(row)); // fails
>
>
> Am I missing a reason for that or is it just not done properly yet? If
> this is the case, am I right to assume that they should be represented in
> the Avro format as the already existing cases?
> "beam:logical_type:date:v1" vs "DATE"
> "beam:logical_type:time:v1" vs "TIME"
>
>
>


Re: Beam Website Feedback

2022-08-22 Thread Ahmet Altay via dev
Thank you Tianzi.

Adding relevant folks: @John Casey 

On Mon, Aug 22, 2022 at 3:03 PM Tianzi Cai via dev 
wrote:

> According to this page,
> https://beam.apache.org/documentation/io/connectors/, KinesisIO doesn't
> have Python implementation, but I found
> https://beam.apache.org/releases/pydoc/current/apache_beam.io.kinesis.html#apache_beam.io.kinesis.ReadDataFromKinesis
> .
>
> The page is probably outdated.
>


Beam Website Feedback

2022-08-22 Thread Tianzi Cai via dev
According to this page, https://beam.apache.org/documentation/io/connectors/,
KinesisIO doesn't have Python implementation, but I found
https://beam.apache.org/releases/pydoc/current/apache_beam.io.kinesis.html#apache_beam.io.kinesis.ReadDataFromKinesis
.

The page is probably outdated.


Beam High Priority Issue Report (73)

2022-08-22 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22808 [Bug]: Dataflow job stuckness 
related to big query storage api
https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() 
stops forwarding change records and starts continuously throwing (large number) 
of Operation ongoing errors 
https://github.com/apache/beam/issues/22773 [Bug]: ElasticsearchIO.Write fails 
when calling outputWithTimestamp()
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22642 [Bug]: Dataflow fails to drain a 
job when using BigQuery (java sdk v.2.38)
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, A