Re: Review request for 2 Python BigQuery IO related PRs

2021-03-03 Thread Pablo Estrada
Thanks! Yes, I'll take a look!

On Wed, Mar 3, 2021 at 1:19 PM Ahmet Altay  wrote:

> +Pablo Estrada 
>
> On Wed, Mar 3, 2021 at 1:08 PM Chuck Yang 
> wrote:
>
>> Hi Beam devs,
>>
>> Could I have reviews on the following two PRs?
>>
>> * [BEAM-11884] Pass destination as str rather than TableReference
>> within BigQueryBatchFileLoads #14112 [1]
>> * [BEAM-11277] Respect schemaUpdateOptions during BigQuery load with
>> temporary tables #14113 [2]
>>
>> The first PR fixes a bug I encountered in master around serialization
>> of BigQuery table references.
>>
>> The second PR addresses a bug in which BigQuery schema update options
>> are not respected during WriteToBigQuery with temporary tables.
>> Ideally BigQuery copy jobs which append data to a table can be
>> configured with schema update options, but since that's not the case,
>> I hope the PR provides an acceptable workaround.
>>
>> I'd like to get these in before the release cut next week if possible.
>>
>> Thanks!
>> Chuck
>>
>> [1] https://github.com/apache/beam/pull/14112
>> [2] https://github.com/apache/beam/pull/14113
>>
>> --
>>
>>
>> *Confidentiality Note:* We care about protecting our proprietary
>> information, confidential material, and trade secrets. This message may
>> contain some or all of those things. Cruise will suffer material harm if
>> anyone other than the intended recipient disseminates or takes any action
>> based on this message. If you have received this message (including any
>> attachments) in error, please delete it immediately and notify the sender
>> promptly.
>>
>


Re: Review request for 2 Python BigQuery IO related PRs

2021-03-03 Thread Ahmet Altay
+Pablo Estrada 

On Wed, Mar 3, 2021 at 1:08 PM Chuck Yang  wrote:

> Hi Beam devs,
>
> Could I have reviews on the following two PRs?
>
> * [BEAM-11884] Pass destination as str rather than TableReference
> within BigQueryBatchFileLoads #14112 [1]
> * [BEAM-11277] Respect schemaUpdateOptions during BigQuery load with
> temporary tables #14113 [2]
>
> The first PR fixes a bug I encountered in master around serialization
> of BigQuery table references.
>
> The second PR addresses a bug in which BigQuery schema update options
> are not respected during WriteToBigQuery with temporary tables.
> Ideally BigQuery copy jobs which append data to a table can be
> configured with schema update options, but since that's not the case,
> I hope the PR provides an acceptable workaround.
>
> I'd like to get these in before the release cut next week if possible.
>
> Thanks!
> Chuck
>
> [1] https://github.com/apache/beam/pull/14112
> [2] https://github.com/apache/beam/pull/14113
>
> --
>
>
> *Confidentiality Note:* We care about protecting our proprietary
> information, confidential material, and trade secrets. This message may
> contain some or all of those things. Cruise will suffer material harm if
> anyone other than the intended recipient disseminates or takes any action
> based on this message. If you have received this message (including any
> attachments) in error, please delete it immediately and notify the sender
> promptly.
>


Review request for 2 Python BigQuery IO related PRs

2021-03-03 Thread Chuck Yang
Hi Beam devs,

Could I have reviews on the following two PRs?

* [BEAM-11884] Pass destination as str rather than TableReference
within BigQueryBatchFileLoads #14112 [1]
* [BEAM-11277] Respect schemaUpdateOptions during BigQuery load with
temporary tables #14113 [2]

The first PR fixes a bug I encountered in master around serialization
of BigQuery table references.

The second PR addresses a bug in which BigQuery schema update options
are not respected during WriteToBigQuery with temporary tables.
Ideally BigQuery copy jobs which append data to a table can be
configured with schema update options, but since that's not the case,
I hope the PR provides an acceptable workaround.

I'd like to get these in before the release cut next week if possible.

Thanks!
Chuck

[1] https://github.com/apache/beam/pull/14112
[2] https://github.com/apache/beam/pull/14113

-- 


*Confidentiality Note:* We care about protecting our proprietary 
information, confidential material, and trade secrets. This message may 
contain some or all of those things. Cruise will suffer material harm if 
anyone other than the intended recipient disseminates or takes any action 
based on this message. If you have received this message (including any 
attachments) in error, please delete it immediately and notify the sender 
promptly.


Re: [PROPOSAL] Preparing for Beam 2.29.0 release

2021-03-03 Thread Ahmet Altay
+1 and thank you!

On Wed, Mar 3, 2021 at 11:51 AM Kenneth Knowles  wrote:

> Hi All,
>
> Beam 2.29.0 release is scheduled to be cut in a week, on March 10
> according to the release calendar [1].
>
> I'd like to volunteer myself to be the release manager for this release.
> I plan on cutting the release branch on the scheduled date.
>
> Any comments or objections?
>
> Kenn
>
> [1]
> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>


[PROPOSAL] Preparing for Beam 2.29.0 release

2021-03-03 Thread Kenneth Knowles
Hi All,

Beam 2.29.0 release is scheduled to be cut in a week, on March 10 according
to the release calendar [1].

I'd like to volunteer myself to be the release manager for this release. I
plan on cutting the release branch on the scheduled date.

Any comments or objections?

Kenn

[1]
https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles


Re: Random outputs for ARRAY_CONCAT_AGG fn zetasql

2021-03-03 Thread Kyle Weaver
You will be able to access the actual value inside the function you pass to
satisfies. I just meant that you will have to go through a function call.

On Wed, Mar 3, 2021 at 1:58 AM Sonam Ramchand <
sonam.ramch...@venturedive.com> wrote:

> Yeah, that sounds right. But the only thing that confuses me is:
>
> PAssert.that(stream).satisfies(row -> assertThat("array_agg_concat_field", 
> actual ,
> containsInAnyOrder(Arrays.asList(1L,2L,3L,4L,5L,6L;
>
> How come I can access *actual* here when output is not materialized.
>
> On Tue, Mar 2, 2021 at 10:49 PM Kyle Weaver  wrote:
>
>> As you can see from existing tests, Beam doesn't materialize the output
>> array directly. Instead you must use the PAssert API. I agree with Tyson's
>> suggestion to use `satisfies`, which lets you do arbitrary assertions on
>> the output data.
>>
>> On Tue, Mar 2, 2021 at 3:57 AM Sonam Ramchand <
>> sonam.ramch...@venturedive.com> wrote:
>>
>>> Is there any way I can access the output array resulting from the sql
>>> query? Then maybe I can sort and compare both *output array* and *expected
>>> output array *for the test to pass.
>>>
>>> On Tue, Mar 2, 2021 at 12:24 AM Kenneth Knowles  wrote:
>>>
 Yea, the reason is that SQL relations are not ordered. So any ordering
 of [1, 2, 3, 4] and [5, 6] and [7, 8, 9] is possible and correct.

 Kenn

 On Mon, Mar 1, 2021 at 11:01 AM Tyson Hamilton 
 wrote:

> I didn't find anything like that after a brief look. What you could do
> instead is something like:
>
> PAssert.thatSingleton(stream).satisfies( row ->
> assertThat("array_field containsInAnyOrder", row.getArray("array_field"),
> containsInAnyOrder(Arrays.asList(...)));
>
> using junit/hamcrest matchers. I didn't verify this works myself but
> it should give you an idea for some next steps.
>
>
> On Mon, Mar 1, 2021 at 12:37 AM Sonam Ramchand <
> sonam.ramch...@venturedive.com> wrote:
>
>> Hi Devs,
>> I have implemented the ARRAY_CONCAT_AGG function for zetasql dialect.
>> I am trying to validate the test as:
>>
>> @Test
>> public void testArrayConcatAggZetasql() {
>>   String sql =
>>   "SELECT ARRAY_CONCAT_AGG(x) AS array_concat_agg FROM (SELECT [1, 
>> 2, 3, 4] AS x UNION ALL SELECT [5, 6] UNION ALL SELECT [7, 8, 9])";
>>
>>   ZetaSQLQueryPlanner zetaSQLQueryPlanner = new 
>> ZetaSQLQueryPlanner(config);
>>   BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
>>   PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
>> beamRelNode);
>>
>>   Schema schema = Schema.builder().addArrayField("array_field", 
>> FieldType.INT64).build();
>>   PAssert.that(stream)
>>   .containsInAnyOrder(
>>   Row.withSchema(schema).addArray(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
>> 8L, 9L).build());
>>   
>> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>> }
>>
>> Expected Output is: 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L.
>> But I am getting randomly different outputs:
>> 1. 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L
>> 2. 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L
>> 3. 7L, 8L, 9L, 5L, 6L, 1L, 2L, 3L, 4L
>>
>> As per my understanding, it is because of containsInAnyOrder
>> function. Is there anything Like:
>>
>>PAssert.that(stream)
>> .containsAnyOfThem(
>> Row.withSchema(schema).addArray(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
>> 8L, 9L).build(),
>> Row.withSchema(schema).addArray(5L, 6L, 7L, 8L, 9L, 1L, 
>> 2L, 3L, 4L).build(),
>> Row.withSchema(schema).addArray(7L, 8L, 9L, 5L, 6L, 1L, 
>> 2L, 3L, 4L).build());
>>
>> I would really appreciate if anyone can help me in knowing how to
>> handle such scenario in Beam.
>>
>> Thanks!
>> --
>> Regards,
>> *Sonam*
>> Software Engineer
>> Mobile: +92 3088337296 <+92%20308%208337296>
>>
>> 
>>
>
>>>
>>> --
>>>
>>> Regards,
>>> *Sonam*
>>> Software Engineer
>>> Mobile: +92 3088337296 <+92%20308%208337296>
>>>
>>> 
>>>
>>
>
> --
>
> Regards,
> *Sonam*
> Software Engineer
> Mobile: +92 3088337296
>
> 
>


Re: Random outputs for ARRAY_CONCAT_AGG fn zetasql

2021-03-03 Thread Sonam Ramchand
Yeah, that sounds right. But the only thing that confuses me is:

PAssert.that(stream).satisfies(row ->
assertThat("array_agg_concat_field", actual ,
containsInAnyOrder(Arrays.asList(1L,2L,3L,4L,5L,6L;

How come I can access *actual* here when output is not materialized.

On Tue, Mar 2, 2021 at 10:49 PM Kyle Weaver  wrote:

> As you can see from existing tests, Beam doesn't materialize the output
> array directly. Instead you must use the PAssert API. I agree with Tyson's
> suggestion to use `satisfies`, which lets you do arbitrary assertions on
> the output data.
>
> On Tue, Mar 2, 2021 at 3:57 AM Sonam Ramchand <
> sonam.ramch...@venturedive.com> wrote:
>
>> Is there any way I can access the output array resulting from the sql
>> query? Then maybe I can sort and compare both *output array* and *expected
>> output array *for the test to pass.
>>
>> On Tue, Mar 2, 2021 at 12:24 AM Kenneth Knowles  wrote:
>>
>>> Yea, the reason is that SQL relations are not ordered. So any ordering
>>> of [1, 2, 3, 4] and [5, 6] and [7, 8, 9] is possible and correct.
>>>
>>> Kenn
>>>
>>> On Mon, Mar 1, 2021 at 11:01 AM Tyson Hamilton 
>>> wrote:
>>>
 I didn't find anything like that after a brief look. What you could do
 instead is something like:

 PAssert.thatSingleton(stream).satisfies( row -> assertThat("array_field
 containsInAnyOrder", row.getArray("array_field"),
 containsInAnyOrder(Arrays.asList(...)));

 using junit/hamcrest matchers. I didn't verify this works myself but it
 should give you an idea for some next steps.


 On Mon, Mar 1, 2021 at 12:37 AM Sonam Ramchand <
 sonam.ramch...@venturedive.com> wrote:

> Hi Devs,
> I have implemented the ARRAY_CONCAT_AGG function for zetasql dialect.
> I am trying to validate the test as:
>
> @Test
> public void testArrayConcatAggZetasql() {
>   String sql =
>   "SELECT ARRAY_CONCAT_AGG(x) AS array_concat_agg FROM (SELECT [1, 2, 
> 3, 4] AS x UNION ALL SELECT [5, 6] UNION ALL SELECT [7, 8, 9])";
>
>   ZetaSQLQueryPlanner zetaSQLQueryPlanner = new 
> ZetaSQLQueryPlanner(config);
>   BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
>   PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
>
>   Schema schema = Schema.builder().addArrayField("array_field", 
> FieldType.INT64).build();
>   PAssert.that(stream)
>   .containsInAnyOrder(
>   Row.withSchema(schema).addArray(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
> 9L).build());
>   
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
> }
>
> Expected Output is: 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L.
> But I am getting randomly different outputs:
> 1. 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L
> 2. 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L
> 3. 7L, 8L, 9L, 5L, 6L, 1L, 2L, 3L, 4L
>
> As per my understanding, it is because of containsInAnyOrder function.
> Is there anything Like:
>
>PAssert.that(stream)
> .containsAnyOfThem(
> Row.withSchema(schema).addArray(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
> 8L, 9L).build(),
> Row.withSchema(schema).addArray(5L, 6L, 7L, 8L, 9L, 1L, 
> 2L, 3L, 4L).build(),
> Row.withSchema(schema).addArray(7L, 8L, 9L, 5L, 6L, 1L, 
> 2L, 3L, 4L).build());
>
> I would really appreciate if anyone can help me in knowing how to
> handle such scenario in Beam.
>
> Thanks!
> --
> Regards,
> *Sonam*
> Software Engineer
> Mobile: +92 3088337296 <+92%20308%208337296>
>
> 
>

>>
>> --
>>
>> Regards,
>> *Sonam*
>> Software Engineer
>> Mobile: +92 3088337296 <+92%20308%208337296>
>>
>> 
>>
>

-- 

Regards,
*Sonam*
Software Engineer
Mobile: +92 3088337296