[GitHub] drill pull request: S3 Plugin: Create 110-s3-storage-plugin.md

jschlesser Fri, 11 Dec 2015 09:38:42 -0800

Github user jschlesser commented on a diff in the pull request:

    https://github.com/apache/drill/pull/275#discussion_r47381866
  
    --- Diff: _docs/connect-a-data-source/plugins/110-s3-storage-plugin.md ---
    @@ -0,0 +1,100 @@
    +---
    +title: "S3 Storage Plugin"
    +parent: "Connect a Data Source"
    +---
    +Drill works with data stored in the cloud. With a few simple steps, you 
can configure the S3 storage plugin for Drill and be off to the races running 
queries.
    +
    +## Connecting Drill to S3
    +
    +Starting with version 1.3.0, Drill has the ability to query files stored 
on Amazon's S3 cloud storage using the S3a library. This is important, because 
S3a adds support for files bigger than 5 gigabytes (these were unsupported 
using Drill's previous S3n interface).
    +
    +There are two simple steps to follow: (1) provide your AWS credentials (2) 
configure S3 storage plugin with S3 bucket
    +
    +#### (1) AWS credentials
    +
    +To enable Drill's S3a support, edit the file conf/core-site.xml in your 
Drill install directory, replacing the text ENTER_YOUR_ACESSKEY and 
ENTER_YOUR_SECRETKEY with your AWS credentials.
    +
    +```
    +<configuration>
    +
    +  <property>
    +    <name>fs.s3a.access.key</name>
    +    <value>ENTER_YOUR_ACCESSKEY</value>
    +  </property>
    +
    +  <property>
    +    <name>fs.s3a.secret.key</name>
    +    <value>ENTER_YOUR_SECRETKEY</value>
    +  </property>
    +
    +</configuration>
    +```
    +
    +#### (2) Configure S3 Storage Plugin
    +
    +Enable S3 storage plugin if you already have one configured or you can add 
a new plugin by following these steps:
    +
    +1. Point your browser to http://<host>:8047 and select the 'Storage' tab. 
(Note: on a single machine system, you'll need to run drill-embedded before you 
can access the web console site)
    +2. Duplicate the 'dfs' plugin. To do this, hit 'Update' next to 'dfs,' and 
then copy the JSON text that appears.
    +3. Create a new storage plugin, and paste in the 'dfs' text.
    +4. Replace -- file:/// with s3a://your.bucketname.
    +5. Name your new plugin, say s3-\<bucketname\>
    +
    +You should now be able to talk to data stored on S3 using the S3a library.
    +
    +## Example S3 Storage Plugin
    +
    +```
    +{
    +  "type": "file",
    +  "enabled": true,
    +  "connection": "s3a://apache.drill.cloud.bigdata/",
    +  "workspaces": {
    +    "root": {
    +      "location": "/",
    +      "writable": false,
    +      "defaultInputFormat": null
    +    },
    +    "tmp": {
    +      "location": "/tmp",
    +      "writable": true,
    --- End diff --
    
    yeah, i checked that :), i checked that the keys have write permissions, i 
checked that I can perform select queries against files in that s3 bucket (i 
can).  I removed jets3 lib from jar directory, made sure that cross-site.xml 
only had an entry for s3a because i initially followed an old blog post for 
enabling s3 with jets3 for drill 1.1.0.    It would be great if this should 
work, because Id love to write parquet files back to S3.   Im running 1.3.0 
drill embedded on a standard AWS linux install if any of those things make a 
difference.   If this is not the right spot for this conversation let me know 
where to move it to.  Im happy to upload any config files for inspection.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: S3 Plugin: Create 110-s3-storage-plugin.md

Reply via email to