Re: Question about unbounded in-memory PCollection

2019-05-07 Thread Rui Wang
Does TestStream.java
[1]
satisfy your need?



-Rui

[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java

On Tue, May 7, 2019 at 2:47 PM Chengzhi Zhao 
wrote:

> Hi Beam Team,
>
> I am new to here and recently study the programming guide, I have a
> question about the in-memory data,
> https://beam.apache.org/documentation/programming-guide/#creating-a-pcollection
>
> Is there a way to create unbounded PCollection from the in-memory
> collection? I want to test the unbounded PCollection locally and don't know
> what's the easiest way to get unbounded PCollection. Please let me know if
> I am doing something wrong or I should use a file system to do it.
>
> Thanks in advance!
>
> Best,
> Chengzhi
>
> On Tue, May 7, 2019 at 5:46 PM Chengzhi Zhao 
> wrote:
>
>> Hi Beam Team,
>>
>> I am new to here and recently study the programming guide, I have a
>> question about the in-memory data,
>> https://beam.apache.org/documentation/programming-guide/#creating-a-pcollection
>>
>> Is there a way to create unbounded PCollection from the in-memory
>> collection? I want to test the unbounded PCollection locally and don't know
>> what's the easiest way to get unbounded PCollection. Please let me know if
>> I am doing something wrong or I should use a file system to do it.
>>
>> Thanks in advance!
>>
>> Best,
>> Chengzhi
>>
>


Re: Question about unbounded in-memory PCollection

2019-05-07 Thread Chengzhi Zhao
Hi Beam Team,

I am new to here and recently study the programming guide, I have a
question about the in-memory data,
https://beam.apache.org/documentation/programming-guide/#creating-a-pcollection

Is there a way to create unbounded PCollection from the in-memory
collection? I want to test the unbounded PCollection locally and don't know
what's the easiest way to get unbounded PCollection. Please let me know if
I am doing something wrong or I should use a file system to do it.

Thanks in advance!

Best,
Chengzhi

On Tue, May 7, 2019 at 5:46 PM Chengzhi Zhao 
wrote:

> Hi Beam Team,
>
> I am new to here and recently study the programming guide, I have a
> question about the in-memory data,
> https://beam.apache.org/documentation/programming-guide/#creating-a-pcollection
>
> Is there a way to create unbounded PCollection from the in-memory
> collection? I want to test the unbounded PCollection locally and don't know
> what's the easiest way to get unbounded PCollection. Please let me know if
> I am doing something wrong or I should use a file system to do it.
>
> Thanks in advance!
>
> Best,
> Chengzhi
>


Re: Streaming inserts BQ with Java SDK Beam

2019-05-07 Thread Andres Angel
Pablo thanks so much I will explore this method then for   STREAMING_INSERTS
,
this answer worth much for us :)

thanks.

On Tue, May 7, 2019 at 1:05 PM Pablo Estrada  wrote:

> Hi Andres!
> You can definitely do streaming inserts using the Java SDK. This is
> available with BigQueryIO.write(). Specifically, you can use the
> `withMethod`[1] call to specify whether you want batch loads or streaming
> inserts. If you specify streaming inserts, Beam should insert rows as they
> come in bundles.
> Hope that helps
> -P.
>
> [1]
> https://beam.apache.org/releases/javadoc/2.11.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withMethod-org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method-
>
> On Tue, May 7, 2019 at 9:58 AM Andres Angel <
> ingenieroandresan...@gmail.com> wrote:
>
>> Hello everyone,
>>
>> I need to use BigQuery inserts within my beam pipeline, hence I know well
>> the built-in IO options offer `BigQueryIO`, however this will insert in a
>> batch fashion to BQ creating underneath a BQ load job. I instead need to
>> trigger a streaming insert into BQ, and I was reviewing the Java SDK
>> documentation but seems like this is not possible.
>>
>> In the other hand, I have the python SDK and I found this GitHub
>> documentation
>> 
>> code where they are using a method *InsertAll
>> *
>>  which
>> is apparently what I need. If this is official I would like to know if
>> there is a naive fashion to trigger stream inserts in BQ using the Java SDK.
>>
>> thanks so much for your feedback
>> AU
>>
>


Re: Streaming inserts BQ with Java SDK Beam

2019-05-07 Thread Alex Van Boxel
I think you really need a peculiar reason to force streamingInsert in a
batch job. In batch mode you. Note that you will quickly hit the quota
limit in batch mode: "
Maximum rows per second: 100,000 rows per second, per project", as in batch
load you can process a lot more information in a shorter time.

I know you can force a batch mode in streaming mode, I don't know for the
other way around.

_/
_/ Alex Van Boxel


On Tue, May 7, 2019 at 6:58 PM Andres Angel 
wrote:

> Hello everyone,
>
> I need to use BigQuery inserts within my beam pipeline, hence I know well
> the built-in IO options offer `BigQueryIO`, however this will insert in a
> batch fashion to BQ creating underneath a BQ load job. I instead need to
> trigger a streaming insert into BQ, and I was reviewing the Java SDK
> documentation but seems like this is not possible.
>
> In the other hand, I have the python SDK and I found this GitHub
> documentation
> 
> code where they are using a method *InsertAll
> *
>  which
> is apparently what I need. If this is official I would like to know if
> there is a naive fashion to trigger stream inserts in BQ using the Java SDK.
>
> thanks so much for your feedback
> AU
>


Re: Is it safe to cache the value of a singleton view (with a global window) in a DoFn?

2019-05-07 Thread Lukasz Cwik
Keep your code simple and rely on the runner caching the value locally so
it should be very cheap to access. If you have a performance issue due to a
runner lacking caching, it would be good to hear about it so we could file
a JIRA about it.

On Mon, May 6, 2019 at 4:24 PM Kenneth Knowles  wrote:

> A singleton view in the global window and no triggering does have just a
> single immutable value. (It really ought to have an updated value in the
> presence of triggers, but I believe instead you will receive a crash. I
> haven't tested.)
>
> In general, a side input yields one value per window. Dataflow in batch
> will already do what you describe, but works with all windows. Dataflow in
> streaming has some caching but if you see a problem that is interesting
> information.
>
> Kenn
>
> On Sat, May 4, 2019 at 9:19 AM Steve Niemitz  wrote:
>
>> I have a singleton view in a global window that is read from a DoFn.  I'm
>> curious if its "correct" to cache that value from the view, or if I need to
>> read it every time.
>>
>> As a (simplified) example, if I were to generate the view as such:
>>
>> input.getPipeline
>>   .apply(Create.of(Collections.singleton[Void](null)))
>>   .apply(MapElements.via(new SimpleFunction[Void, JLong]() {
>> override def apply(input: Void): JLong = {
>>   Instant.now().getMillis
>> }
>>   })).apply(View.asSingleton[JLong]())
>>
>> and then read it from a DoFn (using context.sideInput), is it guaranteed
>> that:
>> - every instance of the DoFn will read the same value?
>> - The value will never change?
>>
>> If so it seems like it'd be safe to cache the value inside the DoFn.  It
>> seems like this would be the case, but I've also seen cases in dataflow
>> where the UI indicates that the MapElements step above produced more than
>> one element, so I'm curious what people have to say.
>>
>> Thanks!
>>
>


Re: Streaming inserts BQ with Java SDK Beam

2019-05-07 Thread Pablo Estrada
Hi Andres!
You can definitely do streaming inserts using the Java SDK. This is
available with BigQueryIO.write(). Specifically, you can use the
`withMethod`[1] call to specify whether you want batch loads or streaming
inserts. If you specify streaming inserts, Beam should insert rows as they
come in bundles.
Hope that helps
-P.

[1]
https://beam.apache.org/releases/javadoc/2.11.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withMethod-org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method-

On Tue, May 7, 2019 at 9:58 AM Andres Angel 
wrote:

> Hello everyone,
>
> I need to use BigQuery inserts within my beam pipeline, hence I know well
> the built-in IO options offer `BigQueryIO`, however this will insert in a
> batch fashion to BQ creating underneath a BQ load job. I instead need to
> trigger a streaming insert into BQ, and I was reviewing the Java SDK
> documentation but seems like this is not possible.
>
> In the other hand, I have the python SDK and I found this GitHub
> documentation
> 
> code where they are using a method *InsertAll
> *
>  which
> is apparently what I need. If this is official I would like to know if
> there is a naive fashion to trigger stream inserts in BQ using the Java SDK.
>
> thanks so much for your feedback
> AU
>


Streaming inserts BQ with Java SDK Beam

2019-05-07 Thread Andres Angel
Hello everyone,

I need to use BigQuery inserts within my beam pipeline, hence I know well
the built-in IO options offer `BigQueryIO`, however this will insert in a
batch fashion to BQ creating underneath a BQ load job. I instead need to
trigger a streaming insert into BQ, and I was reviewing the Java SDK
documentation but seems like this is not possible.

In the other hand, I have the python SDK and I found this GitHub
documentation

code where they are using a method *InsertAll
*
which
is apparently what I need. If this is official I would like to know if
there is a naive fashion to trigger stream inserts in BQ using the Java SDK.

thanks so much for your feedback
AU


Re: Apache BEAM on Flink in production

2019-05-07 Thread Austin Bennett
On the Beam YouTube channel:
https://www.youtube.com/channel/UChNnb_YO_7B0HlW6FhAXZZQ you can see two
talks from people at Lyft; they use Beam on Flink.

Other users can also chime in as to how they are running.

Would also suggest coming to BeamSummit.org in Berlin in June and/or
sharing experiences or coming to ApacheCon in September, where we are to
have 2 tracks in each of 2 days focused on Beam
https://www.apachecon.com/acna19/index.html




On Tue, May 7, 2019 at 6:52 AM  wrote:

> Hi all,
>
>
>
> We currently run Apache Flink based data load processes (fairly simple
> streaming ETL jobs) and are looking at converting to Apache BEAM to give
> more flexibility on the runner.
>
>
>
> Is anyone aware of any organisations running Apache BEAM on Flink in
> production. Does anyone have any case studies they would be able to share?
>
>
>
> Many thanks,
>
>
>
> Steve
>
> This communication and any attachments are confidential and intended
> solely for the addressee. If you are not the intended recipient please
> advise us immediately and delete it. Unless specifically stated in the
> message or otherwise indicated, you may not duplicate, redistribute or
> forward this message and any attachments are not intended for distribution
> to, or use by any person or entity in any jurisdiction or country where
> such distribution or use would be contrary to local law or regulation.
> NatWest Markets Plc  or any affiliated entity ("NatWest Markets") accepts
> no responsibility for any changes made to this message after it was sent.
> Unless otherwise specifically indicated, the contents of this
> communication and its attachments are for information purposes only and
> should not be regarded as an offer or solicitation to buy or sell a product
> or service, confirmation of any transaction, a valuation, indicative price
> or an official statement. Trading desks may have a position or interest
> that is inconsistent with any views expressed in this message. In
> evaluating the information contained in this message, you should know that
> it could have been previously provided to other clients and/or internal
> NatWest Markets personnel, who could have already acted on it.
> NatWest Markets cannot provide absolute assurances that all electronic
> communications (sent or received) are secure, error free, not corrupted,
> incomplete or virus free and/or that they will not be lost, mis-delivered,
> destroyed, delayed or intercepted/decrypted by others. Therefore NatWest
> Markets disclaims all liability with regards to electronic communications
> (and the contents therein) if they are corrupted, lost destroyed, delayed,
> incomplete, mis-delivered, intercepted, decrypted or otherwise
> misappropriated by others.
> Any electronic communication that is conducted within or through NatWest
> Markets systems will be subject to being archived, monitored and produced
> to regulators and in litigation in accordance with NatWest Markets’ policy
> and local laws, rules and regulations. Unless expressly prohibited by local
> law, electronic communications may be archived in countries other than the
> country in which you are located, and may be treated in accordance with the
> laws and regulations of the country of each individual included in the
> entire chain.
> Copyright NatWest Markets plc. All rights reserved. See
> http://www.natwestmarkets.com/legal/s-t-discl.html for further risk
> disclosure.
>


Apache BEAM on Flink in production

2019-05-07 Thread Stephen.Hesketh
Hi all,

We currently run Apache Flink based data load processes (fairly simple 
streaming ETL jobs) and are looking at converting to Apache BEAM to give more 
flexibility on the runner.

Is anyone aware of any organisations running Apache BEAM on Flink in 
production. Does anyone have any case studies they would be able to share?

Many thanks,

Steve


This communication and any attachments are confidential and intended solely for 
the addressee. If you are not the intended recipient please advise us 
immediately and delete it. Unless specifically stated in the message or 
otherwise indicated, you may not duplicate, redistribute or forward this 
message and any attachments are not intended for distribution to, or use by any 
person or entity in any jurisdiction or country where such distribution or use 
would be contrary to local law or regulation. NatWest Markets Plc  or any 
affiliated entity ("NatWest Markets") accepts no responsibility for any changes 
made to this message after it was sent.

Unless otherwise specifically indicated, the contents of this communication and 
its attachments are for information purposes only and should not be regarded as 
an offer or solicitation to buy or sell a product or service, confirmation of 
any transaction, a valuation, indicative price or an official statement. 
Trading desks may have a position or interest that is inconsistent with any 
views expressed in this message. In evaluating the information contained in 
this message, you should know that it could have been previously provided to 
other clients and/or internal NatWest Markets personnel, who could have already 
acted on it.

NatWest Markets cannot provide absolute assurances that all electronic 
communications (sent or received) are secure, error free, not corrupted, 
incomplete or virus free and/or that they will not be lost, mis-delivered, 
destroyed, delayed or intercepted/decrypted by others. Therefore NatWest 
Markets disclaims all liability with regards to electronic communications (and 
the contents therein) if they are corrupted, lost destroyed, delayed, 
incomplete, mis-delivered, intercepted, decrypted or otherwise misappropriated 
by others.

Any electronic communication that is conducted within or through NatWest 
Markets systems will be subject to being archived, monitored and produced to 
regulators and in litigation in accordance with NatWest Markets’ policy and 
local laws, rules and regulations. Unless expressly prohibited by local law, 
electronic communications may be archived in countries other than the country 
in which you are located, and may be treated in accordance with the laws and 
regulations of the country of each individual included in the entire chain.

Copyright NatWest Markets plc. All rights reserved. See 
http://www.natwestmarkets.com/legal/s-t-discl.html for further risk disclosure.