[ https://issues.apache.org/jira/browse/BEAM-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965479#comment-16965479 ]
Kenneth Knowles commented on BEAM-8189: --------------------------------------- [~tvalentyn] any thoughts on this? > Python DataflowRunner fails when using a Shared VPC from another project > ------------------------------------------------------------------------ > > Key: BEAM-8189 > URL: https://issues.apache.org/jira/browse/BEAM-8189 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Affects Versions: 2.15.0 > Reporter: Miles Edwards > Priority: Major > > h1. The Setup: > I have two Projects on the Google Cloud Platform > 1) Service Project for my Dataflow jobs > 2) Host Project for Shared VPC & Subnetworks > The Host Project has configured Firewall Rules for the Dataflow job. ie. > allow all traffic, allow all internal traffic, allow all traffic tagged with > 'dataflow' etc > > h1. The Args > {code:java} > --project <host project name> > --network <shared vpc project name> > --subnetwork "https://www.googleapis.com/compute/v1/projects/<shared vpc > project name>/regions/<region job is running in service > project>/subnetworks/<name of subnetwork in shared vpc project>" > --service_account_email=<service account with Compute Network User permission > for both projects, shared vpc network & subnetwork> > {code} > h1. The Problem > The job will hang when performing shuffle operations. I will also see the > following warning: > {code:java} > The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for > internal connection with other VMs. Only rules with a target tag 'dataflow' > or empty target tags set apply. If you don't specify such a rule, any > pipeline with more than one worker that shuffles data will hang. Causes: No > firewall rules associated with your network. > {code} > > h1. What I've Tried > [StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]] > 1. Only passing "subnetwork" arg without "network" but that only modifies the > warning to state "default" instead of "miles-qa-vpc", which sounds like a > logging error to me. > 2. Firewall rules have been configured to: > - allow all traffic > - allow all internal traffic > - allow all traffic with the source tag 'dataflow' > - allow all traffic with the target tag 'dataflow' > 3. Service Account has been configured to have Compute Network User > permissions in both projects. > 4. Ensured subnetwork is in the same region as the job. > 5. Network in the service project is happily serving a dedicated cluster for > other purposes in the host project. > It genuinely seems like the spawned Compute Instances are not gaining the > configuration. > I expect the Dataflow job not to report the firewall issue and successfully > deal with shuffling (GroupBys etc.) -- This message was sent by Atlassian Jira (v8.3.4#803005)