Thanks Israel! I see Chamikara has reviewed it. On Sun, Feb 2, 2020 at 7:55 AM Israel Herraiz <i...@google.com> wrote:
> Hi all, > > I have updated and polished a pull request I submitted some time ago, and > I would like to bring it to the attention of this list, to see if I could > get some feedback or review of the code. > > The PR is at https://github.com/apache/beam/pull/9852 > > It adds a new option withQueryTempDataset to BigQueryIO.Read. > > Currently, if I want to read from a table with BigQueryIO, I need to > assign the role bigquery.jobUser to the service account of Apache Beam > (e.g. Dataflow). > > However, if I try to read from a view using the same role, the pipeline > will fail, because it needs to create a temporary dataset and table. The > name of this dataset is chosen by Apache Beam. > > This in practice requires giving the service account the permission to > create datasets (e.g. assigning the role bigquery.user, not > bigquery.jobUser), which is a very broad permission. > > With the submitted PR, you can specify the temporary dataset used to read > from queries (e.g. reading from a view). Thus you can just keep the role > bigquery.jobUser in the Beam service account, and just provide additional > permissions in that dataset to create temporary tables (confining any > potential write activity to that dataset only). > > The destination dataset can even be in a different project than the data > you are reading (something that is not possible with the currently > available options), so you don't need to give write permissions in the same > project where the data resides. In situations where there is a > "untouchable" data project with authorized views, it is currently > impossible to read from those authorized views with BigQueryIO, unless you > give write permissions to Beam in the "untouchable" project. With this PR, > you could confine those writes to another project and dataset. > > I hope the need for this option makes sense. Any thoughts? > > Kind regards, > Israel >