[ 
https://issues.apache.org/jira/browse/BEAM-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965479#comment-16965479
 ] 

Kenneth Knowles commented on BEAM-8189:
---------------------------------------

[~tvalentyn] any thoughts on this?

> Python DataflowRunner fails when using a Shared VPC from another project
> ------------------------------------------------------------------------
>
>                 Key: BEAM-8189
>                 URL: https://issues.apache.org/jira/browse/BEAM-8189
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.15.0
>            Reporter: Miles Edwards
>            Priority: Major
>
> h1. The Setup:
> I have two Projects on the Google Cloud Platform
> 1) Service Project for my Dataflow jobs
> 2) Host Project for Shared VPC & Subnetworks
> The Host Project has configured Firewall Rules for the Dataflow job. ie. 
> allow all traffic, allow all internal traffic, allow all traffic tagged with 
> 'dataflow' etc
>  
> h1. The Args
> {code:java}
> --project <host project name>
> --network <shared vpc project name>
> --subnetwork "https://www.googleapis.com/compute/v1/projects/<shared vpc 
> project name>/regions/<region job is running in service 
> project>/subnetworks/<name of subnetwork in shared vpc project>"
> --service_account_email=<service account with Compute Network User permission 
> for both projects, shared vpc network & subnetwork>
> {code}
> h1. The Problem
> The job will hang when performing shuffle operations. I will also see the 
> following warning:
> {code:java}
> The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for 
> internal connection with other VMs. Only rules with a target tag 'dataflow' 
> or empty target tags set apply. If you don't specify such a rule, any 
> pipeline with more than one worker that shuffles data will hang. Causes: No 
> firewall rules associated with your network.
> {code}
>  
> h1. What I've Tried
> [StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]]
> 1. Only passing "subnetwork" arg without "network" but that only modifies the 
> warning to state "default" instead of "miles-qa-vpc", which sounds like a 
> logging error to me.
> 2. Firewall rules have been configured to:
>  - allow all traffic
>  - allow all internal traffic
>  - allow all traffic with the source tag 'dataflow'
>  - allow all traffic with the target tag 'dataflow'
> 3. Service Account has been configured to have Compute Network User 
> permissions in both projects.
> 4. Ensured subnetwork is in the same region as the job.
> 5. Network in the service project is happily serving a dedicated cluster for 
> other purposes in the host project.
> It genuinely seems like the spawned Compute Instances are not gaining the 
> configuration.
> I expect the Dataflow job not to report the firewall issue and successfully 
> deal with shuffling (GroupBys etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to