cgivre commented on code in PR #2585: URL: https://github.com/apache/drill/pull/2585#discussion_r924454323
########## contrib/storage-googlesheets/README.md: ########## @@ -0,0 +1,156 @@ +# Google Sheets Connector for Apache Drill +This connector enables you to query and write to Google Sheets. + +### Usage Notes: +This feature should be considered experimental as Google's API for Sheets is quite complex and amazingly +poorly documented. + +## Setup Step 1: Obtain Credential Information from Google +Ok... this is a pain. GoogleSheets uses OAuth2.0 (may it be quickly deprecated) for authorization. In order to query GoogleSheets, you will first need to obtain three artifacts: + +* Your `clientID`: This is an identifier which uniquely identifies your application to Google +* Your `client_secret`: You can think of this as your password for your application to access GoogleSheets +* Your redirect URL: This is the URL which Google will send the various access tokens which you will need later. For a local installation of Drill, it will be: + `http://localhost:8047/credentials/<plugin name>/update_oauth2_authtoken`. + +1. To obtain the `clientID` and `client_secret` you will need to obtain the Google keys, open the [Google Sheets API](https://console.cloud.google.com/apis/library/sheets.googleapis.com) and click on the `Enable` button. +2. Once you've enabled teh API, you will be taken to the API Manager. Either select a pre-existing project or create a new one. +3. Next, navigate to the `Credentials` in the left panel. +4. Click on `+Create Credentials` at the top of the page. Select `OAuth client ID` and select `Web Application` or `Desktop` as the type. Follow the instructions and download + the JSON file that Google provides. + +Drill does not use the JSON file, but you will be cutting and pasting values from the JSON file into the Drill configuration. + +## Setup Step 2: Configure Drill +Create a storage plugin following the normal procedure for doing so. You can use the example below as a template. Cut and paste the `clientID` and `client_secret` from the +JSON file into your Drill configuration as shown below. Once you've done that, save the configuration. + +```json +{ + "type": "googlesheets", + "allTextMode": true, + "extractHeaders": true, + "oAuthConfig": { + "callbackURL": "http://localhost:8047/credentials/googlesheets/update_oauth2_authtoken", + "authorizationURL": "https://accounts.google.com/o/oauth2/auth", + "authorizationParams": { + "response_type": "code", + "scope": "https://www.googleapis.com/auth/spreadsheets" + } + }, + "credentialsProvider": { + "credentialsProviderType": "PlainCredentialsProvider", + "credentials": { + "clientID": "<YOUR CLIENT ID>", + "clientSecret": "<YOUR CLIENT SECRET>", + "tokenURI": "https://oauth2.googleapis.com/token" + }, + "userCredentials": {} + }, + "enabled": true, + "authMode": "SHARED_USER" +} +``` + +With the exception of the clientID, client_secret and redirects, you should not have to modify any of the other parameters in the configuration. + +### Other Configuration Parameters + +There are two configuration parameters which you may want to adjust: +* `allTextMode`: This parameter when `true` disables Drill's data type inferencing for your files. If your data has inconsistent data types, set this to `true`. Default is + `true`. +* `extractHeaders`: When `true`, Drill will treat the first row of your data as headers. When `false` Drill will assign column names like `field_n` for each column. + +### Authenticating with Google +Once you have configured Drill to query GoogleSheets, there is one final step before you can access data. You must authenticate the application (Drill) with GoogleSheets. After you have saved your GoogleSheets configuration, navigate back to the configuration screen for your plugin and click on `Authorize`. A new window should appear which will prompt you to authenticate with Google services. Once you have done that, you should be able to query GoogleSheets! See, that wasn't so hard! + +### Authentication Modes: +The GoogleSheets plugin supports the `SHARED_USER` and `USER_TRANSLATION` authentication modes. `SHARED_USER` is as the name implies, one user for everyone. `USER_TRANSLATION` +uses different credentials for each individual user. In this case, the credentials are the OAuth2.0 access tokens. + +At the time of writing, we have not yet documented `USER_TRANSLATION` fully, however we will update this readme once that is complete. + +## Querying Data +Once you have configured Drill to query + +### Obtaining the SpreadsheetID +The URL below is a public spreadsheet hosted on GoogleSheets: +[https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/](https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/) + +In this URL, the portion `1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms` is the spreadsheetID. Thus, +if you wanted to query this sheet in Drill, after configuring Drill, you could do so with the following +query: + +```sql +SELECT * +FROM googlesheets.`1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms`.`Class Data` +``` + +The format for the `FROM` clause for GoogleSheets is: +```sql +FROM <plugin name>.<sheet ID>.<tab name> +``` +Note that you must specify the tab name to successfully query GoogleSheets. + +### Using Aliases +Since the sheet IDs from Google are not human readable, one way to make your life easier is to use Drill's aliasing features to provide a better name for the actual sheet name. + +### Data Types +Drill's Google Sheets reader will attempt to infer the data types of the incoming data. As with other connectors, this is an imperfect process since GoogleSheets does not +supply a schema or other information to allow Drill to identify the data types of a column. At present, here is how Drill will map your data: +* Numbers: All numeric columns will be mapped to `DOUBLE` data types +* Boolean: Columns containing `true/false` will be mapped to the `BOOLEAN` type +* Time, Date, Timestamp: Temporal fields will be mapped to the correct type. You can disable able temporal fields by setting the config option `XXX` to `false`. Review Comment: Removed... I was going to do this, but I think `allTextMode` will suffice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org