kaxil commented on a change in pull request #3876: [AIRFLOW-2887] Add to 
BigQueryBaseCursor methods for insert (create empty) dataset
URL: https://github.com/apache/incubator-airflow/pull/3876#discussion_r217889602
 
 

 ##########
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##########
 @@ -1354,6 +1354,67 @@ def run_grant_dataset_view_access(self,
                 view_project, view_dataset, view_table, source_project, 
source_dataset)
             return source_dataset_resource
 
+    def create_empty_dataset(self, dataset_id="", project_id="", 
dataset_reference=None):
+        """
+        Create a new empty dataset:
+        
https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert
+
+        :param project_id: The name of the project where we want to create
+               an empty a dataset. Don't need to provide, if projectId in 
dataset_reference.
+        :type project_id: str
+        :param dataset_id: The id of dataset. Don't need to provide,
+               if datasetId in dataset_reference.
+        :type dataset_id: str
+        :param dataset_reference: Dataset resource that must be provided
+               with request body. More info:
+               
https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource
+        :type dataset_reference: dict
+        """
+
+        if dataset_reference:
+            _validate_value('dataset_reference', dataset_reference, dict)
+        else:
+            dataset_reference = {}
+
+        if "datasetReference" not in dataset_reference:
+            dataset_reference["datasetReference"] = {}
+
+        if not dataset_reference["datasetReference"].get("datasetId") and not 
dataset_id:
+            raise ValueError("{} not provided datasetId. Impossible to create 
dataset")
+
+        dataset_reference = [(dataset_id, "datasetId", ""),
+                             (project_id, "projectId", self.project_id)]
+        for param_tuple in dataset_reference:
+            param, param_name, param_default = param_tuple
+            if param_name not in dataset_reference['datasetReference']:
+                if param_default and not param:
+                    log = LoggingMixin().log
+                    log.info("{} was not specified. Will be used default "
+                             "value {}.".format(param_name, param_default))
+                    param = param_default
+                dataset_reference['datasetReference'].update({param_name: 
param})
+            elif param:
+                _api_resource_configs_duplication_check(
+                    param_name, param, dataset_reference['datasetReference'])
 
 Review comment:
   Using `_api_resource_configs_duplication_check` would raise an error if it 
finds duplicate value:
   
   ```
   raise ValueError("Values of {param_name} param are duplicated. "
                            "`api_resource_configs` contained {param_name} 
param "
                            "in `query` config and {param_name} was also 
provided "
                            "with arg to run_query() method. Please remove 
duplicates."
   ```
   
   However, we don't have `api_resource_configs` parameter here, hence the info 
provided to the user is incorrect. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to