kaxil commented on a change in pull request #3876: [AIRFLOW-2887] Add to BigQueryBaseCursor methods for insert (create empty) dataset URL: https://github.com/apache/incubator-airflow/pull/3876#discussion_r217889602
########## File path: airflow/contrib/hooks/bigquery_hook.py ########## @@ -1354,6 +1354,67 @@ def run_grant_dataset_view_access(self, view_project, view_dataset, view_table, source_project, source_dataset) return source_dataset_resource + def create_empty_dataset(self, dataset_id="", project_id="", dataset_reference=None): + """ + Create a new empty dataset: + https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert + + :param project_id: The name of the project where we want to create + an empty a dataset. Don't need to provide, if projectId in dataset_reference. + :type project_id: str + :param dataset_id: The id of dataset. Don't need to provide, + if datasetId in dataset_reference. + :type dataset_id: str + :param dataset_reference: Dataset resource that must be provided + with request body. More info: + https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource + :type dataset_reference: dict + """ + + if dataset_reference: + _validate_value('dataset_reference', dataset_reference, dict) + else: + dataset_reference = {} + + if "datasetReference" not in dataset_reference: + dataset_reference["datasetReference"] = {} + + if not dataset_reference["datasetReference"].get("datasetId") and not dataset_id: + raise ValueError("{} not provided datasetId. Impossible to create dataset") + + dataset_reference = [(dataset_id, "datasetId", ""), + (project_id, "projectId", self.project_id)] + for param_tuple in dataset_reference: + param, param_name, param_default = param_tuple + if param_name not in dataset_reference['datasetReference']: + if param_default and not param: + log = LoggingMixin().log + log.info("{} was not specified. Will be used default " + "value {}.".format(param_name, param_default)) + param = param_default + dataset_reference['datasetReference'].update({param_name: param}) + elif param: + _api_resource_configs_duplication_check( + param_name, param, dataset_reference['datasetReference']) Review comment: Using `_api_resource_configs_duplication_check` would raise an error if it finds duplicate value: ``` raise ValueError("Values of {param_name} param are duplicated. " "`api_resource_configs` contained {param_name} param " "in `query` config and {param_name} was also provided " "with arg to run_query() method. Please remove duplicates." ``` However, we don't have `api_resource_configs` parameter here, hence the info provided to the user is incorrect. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services