Re: Apache Dataflow Template (Python)

2020-04-04 Thread Marco Mistroni
Hey
 sure... it's  a crap script :).. just an ordinary dataflow script

https://github.com/mmistroni/GCP_Experiments/tree/master/dataflow/edgar_flow


What i meant to say , for your template question, is for you to write a
basic script which run on bean... something as simple as this

https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/beam_test.py

and then you can create a template out of it by just running this

python -m edgar_main  --runner=dataflow --project=datascience-projets
--template_location=gs://mm_dataflow_bucket/templates/edgar_dataflow_template
--temp_location=gs://mm_dataflow_bucket/temp
--staging_location=gs://mm_dataflow_bucket/staging

That will create a template 'edgar_dataflow_template' which you can use in
GCP dataflow console to create your job.

hth, i m sort of a noob to Beam, having started writing code just over a
month ago. Feel free to ping me if u get stuck

kind regards
 Marco












On Sat, Apr 4, 2020 at 6:01 PM Xander Song  wrote:

> Hi Marco,
>
> Thanks for your response. Would you mind sending the edgar_main script so
> I can take a look?
>
> On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni  wrote:
>
>> Hey
>>  As far as I know you can generate a dataflow template out of your beam
>> code by specifying an option on command line?
>> I am running this CMD and once template is generated I kick off a dflow
>> job via console by pointing at it
>>
>> python -m edgar_main --runner=dataflow --project=datascience-projets
>> --template_location=gs:// Hth
>>
>>
>> On Sat, Apr 4, 2020, 9:52 AM Xander Song  wrote:
>>
>>> I am attempting to write a custom Dataflow Template using the Apache
>>> Beam Python SDK, but am finding the documentation difficult to follow. Does
>>> anyone have a minimal working example of how to write and deploy such a
>>> template?
>>>
>>> Thanks in advance.
>>>
>>


Re: Apache Dataflow Template (Python)

2020-04-04 Thread Xander Song
Hi Marco,

Thanks for your response. Would you mind sending the edgar_main script so I
can take a look?

On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni  wrote:

> Hey
>  As far as I know you can generate a dataflow template out of your beam
> code by specifying an option on command line?
> I am running this CMD and once template is generated I kick off a dflow
> job via console by pointing at it
>
> python -m edgar_main --runner=dataflow --project=datascience-projets
> --template_location=gs:// Hth
>
>
> On Sat, Apr 4, 2020, 9:52 AM Xander Song  wrote:
>
>> I am attempting to write a custom Dataflow Template using the Apache Beam
>> Python SDK, but am finding the documentation difficult to follow. Does
>> anyone have a minimal working example of how to write and deploy such a
>> template?
>>
>> Thanks in advance.
>>
>


Re: Apache Dataflow Template (Python)

2020-04-04 Thread Marco Mistroni
Hey
 As far as I know you can generate a dataflow template out of your beam
code by specifying an option on command line?
I am running this CMD and once template is generated I kick off a dflow job
via console by pointing at it

python -m edgar_main --runner=dataflow --project=datascience-projets
--template_location=gs:// Hth


On Sat, Apr 4, 2020, 9:52 AM Xander Song  wrote:

> I am attempting to write a custom Dataflow Template using the Apache Beam
> Python SDK, but am finding the documentation difficult to follow. Does
> anyone have a minimal working example of how to write and deploy such a
> template?
>
> Thanks in advance.
>


Apache Dataflow Template (Python)

2020-04-04 Thread Xander Song
I am attempting to write a custom Dataflow Template using the Apache Beam
Python SDK, but am finding the documentation difficult to follow. Does
anyone have a minimal working example of how to write and deploy such a
template?

Thanks in advance.