Final Reminder: Community Over Code call for presentations closing soon

2023-06-28 Thread Rich Bowen
[Note: You're receiving this email because you are subscribed to one or
more project dev@ mailing lists at the Apache Software Foundation.]

This is your final reminder that the Call for Presentations for
Community Over Code (formerly known as ApacheCon) is closing soon - on
Thursday, 13 July 2023 at 23:59:59 GMT.

https://communityovercode.org/call-for-presentations/

We are looking for talk proposals on all topics related to ASF projects
and open source software.

The event will be held in Halifax, Nova Scotia, Octiber 7th through
10th. More details about the event may be found on the event website at
https://communityovercode.org/

Rich, for the event planners


Re: [Launch Announcement] Beam Quest

2023-06-28 Thread Svetak Sundhar via dev
Hi,

Could you elaborate on your issue here? Are you running into an error?

Could you

1) Click on the free access code here

2) Create an account
3) Navigate to "Getting started with apache beam"
4) launch the lab with credits.


Let me know if you have any questions or run into any issues,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, Jun 28, 2023 at 2:15 AM Komi Marc ATSOU 
wrote:

> Please is the free access code working for someone? It is not working for
> me.
>
> Le ven. 9 juin 2023 à 19:12, Ahmet Altay via user 
> a écrit :
>
>> Thank you Svetak! I would encourage everyone to try out, and get the
>> badges :)
>>
>> On Fri, Jun 9, 2023 at 7:03 AM Svetak Sundhar via user <
>> u...@beam.apache.org> wrote:
>>
>>> Hi Beam Community,
>>>
>>> We're excited to launch the "Getting Started with Apache Beam" Quest
>>> . This quest provides a
>>> completion badge that can be shared on social media (such as Linkedin and
>>> Twitter) upon completion of four qwiklabs.
>>>
>>> These labs venture into various concepts of Beam in the Java and Python
>>> SDK (that many of you have developed), and should take less than 7 hours to
>>> obtain. I've written about it in our Beam Blog
>>> ; we are offering this free of
>>> charge till July 8.
>>>
>>> Please share the information with whomever you think may be interested,
>>> and please share on social media once you obtain your badge. Additionally,
>>> if you have any feedback on the labs, please contact me directly at
>>> svetaksund...@google.com-- we plan to have these labs evolve over time!
>>>
>>> I look forward to discussing this more at Beam Summit next week.
>>>
>>> As this was one of GCP's first OSS quests, there were many people
>>> instrumental in making this possible.
>>>
>>> Thanks to:
>>> -Danielle Syse
>>> -Ajay Hemnani
>>> -Joellen Saunders
>>> -Grzegorz Wierzchows
>>> -Ahmet Altay
>>> -XQ Hu
>>> -Jenny Palomino
>>> -Svetak Sundhar
>>> -Shunping Huang
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Svetak Sundhar
>>>
>>>   Data Engineer
>>> s vetaksund...@google.com
>>>
>>>
>
> --
>
> *ATSOU Komi Marc*
> *Cel : +33 7 58 92 30 35 <+33%207%2058%2092%2030%2035>*
> *Skype : bi-ayefo*
>


ETL: SQL to Avro type handling issues

2023-06-28 Thread Mikhail Khludnev
Hello,
I'm repeating slack messages here.
I'm experimenting with ingesting JDBC into Paquet. ie repeating
spotify/dbeam with

   - JdbcIO.readRows()
   - AvroUtils.getAvroSchema(beamRows.getSchema()).
   -  AvroUtils.schemaCoder(avroSchema)
   - AvroUtils.getRowToGenericRecordFunction(avroSchema)

Here's the observed issues:
- DECIMAL(21,2) can't be handled due to loosing scale param (2).
org.apache.avro.Conversions.DecimalConversion.validate()
AvroTypeException("Cannot
encode decimal with scale 2 as scale 0 without rounding")

   - it can be fixing Beam Row schema by FieldType.logicalType(
   FixedPrecisionNumeric.of(Integer.MAX_VALUE, 2)) and then it should be
   passed to AvroSchema as LogicalTypes.decimal(Integer.MAX_VALUE, ((
   RowWithStorage)
   (field.getType().getLogicalType()).getArgument()).getValue("scale"
   )).addToSchema(Schema.create(Schema.Type.BYTES)) (it might not be the
   best approach, you know) I noticed
   https://github.com/apache/beam/issues/21226
   https://github.com/apache/beam/issues/20978 which might be related.

 - INT16 represented in beam schema as-is, but its 32-bit INT in avro
and java Short in runtime that causes
ClassCastException: class java.lang.Short cannot be cast to class
java.lang.Integer (java.lang.Short and java.lang.Integer are in module
java.base of loader 'bootstrap')
at 
org.apache.beam.sdk.extensions.avro.schemas.utils.AvroUtils.convertAvroFieldStrict(AvroUtils.java:1299)

I suppose this method can accept Number and then call intValue() wdyt?

-- 
Sincerely yours
Mikhail Khludnev


Beam High Priority Issue Report (35)

2023-06-28 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26547 [Failing Test]: 
beam_PostCommit_Java_DataflowV2
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/26902 [Bug]: Images built not saved in 
the local image store
https://github.com/apache/beam/issues/26723 [Failing Test]: Tour of Beam 
Frontend Test suite is perma-red on master
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey