Re: Apache Dataflow Template (Python)/ partially OT

2020-04-08 Thread Luke Cwik
Reach out to Google Cloud support.

On Wed, Apr 8, 2020 at 1:51 AM Marco Mistroni  wrote:

> Hi all
>  Was wondering if anyone has experience similar
> I kicked off 3 dataflow template s via cloud function. It has created 3 VM
> which are still alive after jobs completed and I cannot delete them
> Could anyone assist with this?
> Kind regards
>
> On Mon, Apr 6, 2020, 3:00 PM Marco Mistroni  wrote:
>
>> Hey
>>  Thanks I create template from CMD line...was having issues with CLF but
>> I think I was not using Auth correctly
>> Will try your sample and report back if I am stuck
>> Thanks a lot!
>>
>> On Mon, Apr 6, 2020, 2:20 PM André Rocha Silva <
>> a.si...@portaltelemedicina.com.br> wrote:
>>
>>> Could you create the template already?
>>>
>>> Have you read the article? There I write the cloud function in js. Here
>>> is some example of a cloud function in python:
>>>
>>> import google.auth
>>> import random
>>> import logging
>>>
>>> from googleapiclient.discovery import build
>>>
>>> GCLOUD_PROJECT = 'project-id-123'
>>>
>>>
>>> def RunDataflow(event, context):
>>>
>>> credentials, _ = google.auth.default()
>>>
>>> service = build('dataflow', 'v1b3', credentials=credentials)
>>>
>>> uri = 'gs://bucket/input/file'
>>> output_file = 'gs://bucket/output/file'
>>>
>>> template_path = 'gs://bucket/Dataflow_templates/template'
>>> template_body = {
>>> 'jobName': ('cf-job-' + str(random.randint(1, 101000))),
>>> 'parameters': {
>>> 'input_file': uri,
>>> 'output_file': output_file,
>>> },
>>> }
>>>
>>> request = service.projects().templates().launch(
>>> projectId=GCLOUD_PROJECT,
>>> gcsPath=template_path,
>>> body=template_body)
>>> response = request.execute()
>>>
>>> logging.info(f'RunDataflow: got this response {response}')
>>>
>>>
>>> On Mon, Apr 6, 2020 at 10:13 AM Marco Mistroni 
>>> wrote:
>>>
 @andre sorry to hijack this. Are you able to send a working example of
 kicking off dataflow  template via cloud function?

 Kind regards

 On Mon, Apr 6, 2020, 1:51 PM André Rocha Silva <
 a.si...@portaltelemedicina.com.br> wrote:

> Hey!
>
> Could you make it work? You can take a look in this post, is a
> single file template, easy peasy to create a template from:
>
> https://towardsdatascience.com/my-first-etl-job-google-cloud-dataflow-1fd773afa955
>
> If you want, we can schedule a google hangout and I help you, step by
> step.
> It is the least I can do after having had so much help from the
> community :)
>
> On Sat, Apr 4, 2020 at 4:52 PM Marco Mistroni 
> wrote:
>
>> Hey
>>  sure... it's  a crap script :).. just an ordinary dataflow script
>>
>>
>> https://github.com/mmistroni/GCP_Experiments/tree/master/dataflow/edgar_flow
>>
>>
>> What i meant to say , for your template question, is for you to write
>> a basic script which run on bean... something as simple as this
>>
>>
>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/beam_test.py
>>
>> and then you can create a template out of it by just running this
>>
>> python -m edgar_main  --runner=dataflow --project=datascience-projets
>> --template_location=gs://mm_dataflow_bucket/templates/edgar_dataflow_template
>> --temp_location=gs://mm_dataflow_bucket/temp
>> --staging_location=gs://mm_dataflow_bucket/staging
>>
>> That will create a template 'edgar_dataflow_template' which you can
>> use in GCP dataflow console to create your job.
>>
>> hth, i m sort of a noob to Beam, having started writing code just
>> over a month ago. Feel free to ping me if u get stuck
>>
>> kind regards
>>  Marco
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Apr 4, 2020 at 6:01 PM Xander Song 
>> wrote:
>>
>>> Hi Marco,
>>>
>>> Thanks for your response. Would you mind sending the edgar_main
>>> script so I can take a look?
>>>
>>> On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni 
>>> wrote:
>>>
 Hey
  As far as I know you can generate a dataflow template out of your
 beam code by specifying an option on command line?
 I am running this CMD and once template is generated I kick off a
 dflow job via console by pointing at it

 python -m edgar_main --runner=dataflow
 --project=datascience-projets --template_location=gs:// 
 Hth


 On Sat, Apr 4, 2020, 9:52 AM Xander Song 
 wrote:

> I am attempting to write a custom Dataflow Template using the
> Apache Beam Python SDK, but am finding the documentation difficult to
> follow. Does anyone have a minimal working example of how to write and
> deploy such a template?
>
> Thanks in advance.
>

>
> --
>
>*ANDRÉ ROCHA SILVA*
>   * DATA 

Re: Apache Dataflow Template (Python)/ partially OT

2020-04-08 Thread Marco Mistroni
Hi all
 Was wondering if anyone has experience similar
I kicked off 3 dataflow template s via cloud function. It has created 3 VM
which are still alive after jobs completed and I cannot delete them
Could anyone assist with this?
Kind regards

On Mon, Apr 6, 2020, 3:00 PM Marco Mistroni  wrote:

> Hey
>  Thanks I create template from CMD line...was having issues with CLF but I
> think I was not using Auth correctly
> Will try your sample and report back if I am stuck
> Thanks a lot!
>
> On Mon, Apr 6, 2020, 2:20 PM André Rocha Silva <
> a.si...@portaltelemedicina.com.br> wrote:
>
>> Could you create the template already?
>>
>> Have you read the article? There I write the cloud function in js. Here
>> is some example of a cloud function in python:
>>
>> import google.auth
>> import random
>> import logging
>>
>> from googleapiclient.discovery import build
>>
>> GCLOUD_PROJECT = 'project-id-123'
>>
>>
>> def RunDataflow(event, context):
>>
>> credentials, _ = google.auth.default()
>>
>> service = build('dataflow', 'v1b3', credentials=credentials)
>>
>> uri = 'gs://bucket/input/file'
>> output_file = 'gs://bucket/output/file'
>>
>> template_path = 'gs://bucket/Dataflow_templates/template'
>> template_body = {
>> 'jobName': ('cf-job-' + str(random.randint(1, 101000))),
>> 'parameters': {
>> 'input_file': uri,
>> 'output_file': output_file,
>> },
>> }
>>
>> request = service.projects().templates().launch(
>> projectId=GCLOUD_PROJECT,
>> gcsPath=template_path,
>> body=template_body)
>> response = request.execute()
>>
>> logging.info(f'RunDataflow: got this response {response}')
>>
>>
>> On Mon, Apr 6, 2020 at 10:13 AM Marco Mistroni 
>> wrote:
>>
>>> @andre sorry to hijack this. Are you able to send a working example of
>>> kicking off dataflow  template via cloud function?
>>>
>>> Kind regards
>>>
>>> On Mon, Apr 6, 2020, 1:51 PM André Rocha Silva <
>>> a.si...@portaltelemedicina.com.br> wrote:
>>>
 Hey!

 Could you make it work? You can take a look in this post, is a
 single file template, easy peasy to create a template from:

 https://towardsdatascience.com/my-first-etl-job-google-cloud-dataflow-1fd773afa955

 If you want, we can schedule a google hangout and I help you, step by
 step.
 It is the least I can do after having had so much help from the
 community :)

 On Sat, Apr 4, 2020 at 4:52 PM Marco Mistroni 
 wrote:

> Hey
>  sure... it's  a crap script :).. just an ordinary dataflow script
>
>
> https://github.com/mmistroni/GCP_Experiments/tree/master/dataflow/edgar_flow
>
>
> What i meant to say , for your template question, is for you to write
> a basic script which run on bean... something as simple as this
>
>
> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/beam_test.py
>
> and then you can create a template out of it by just running this
>
> python -m edgar_main  --runner=dataflow --project=datascience-projets
> --template_location=gs://mm_dataflow_bucket/templates/edgar_dataflow_template
> --temp_location=gs://mm_dataflow_bucket/temp
> --staging_location=gs://mm_dataflow_bucket/staging
>
> That will create a template 'edgar_dataflow_template' which you can
> use in GCP dataflow console to create your job.
>
> hth, i m sort of a noob to Beam, having started writing code just over
> a month ago. Feel free to ping me if u get stuck
>
> kind regards
>  Marco
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Apr 4, 2020 at 6:01 PM Xander Song 
> wrote:
>
>> Hi Marco,
>>
>> Thanks for your response. Would you mind sending the edgar_main
>> script so I can take a look?
>>
>> On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni 
>> wrote:
>>
>>> Hey
>>>  As far as I know you can generate a dataflow template out of your
>>> beam code by specifying an option on command line?
>>> I am running this CMD and once template is generated I kick off a
>>> dflow job via console by pointing at it
>>>
>>> python -m edgar_main --runner=dataflow --project=datascience-projets
>>> --template_location=gs:// Hth
>>>
>>>
>>> On Sat, Apr 4, 2020, 9:52 AM Xander Song 
>>> wrote:
>>>
 I am attempting to write a custom Dataflow Template using the
 Apache Beam Python SDK, but am finding the documentation difficult to
 follow. Does anyone have a minimal working example of how to write and
 deploy such a template?

 Thanks in advance.

>>>

 --

*ANDRÉ ROCHA SILVA*
   * DATA ENGINEER*
   (48) 3181-0611

    /andre-rocha-silva/