Re: Drill on EMR hive with S3 backed storage

2016-03-20 Thread Andries Engelbrecht
With Drill 1.3 you probably want to use the s3a library to start with.
I don't believe you need the jets3t library for that anymore.

However your issue seems to be some confusion between Hive plugin and the 
actual path to the s3 bucket. Perhaps someone with in depth knowledge of the 
Hive plugin can provide advise here. Did you install the drillbits on the same 
nodes as the Hive cluster or is it a separate cluster pointed at the same s3 
buckets?

--Andries
 

> On Mar 16, 2016, at 6:40 PM, Vincent Meng  wrote:
> 
> Hi all,
> 
> I have a EMR hive cluster and all the tables are external tables where
> files are stored on s3.
> 
> Following drill tutorials I've setup drill embed on my local and I can
> successfully connect to remove hive cluster. I can list all the tables in
> the hive cluster.
> 
> However when I do `select .. from` type of queries on those tables drill
> complains about "Error: SYSTEM ERROR: IOException:
> /path/to/hive/table/folder doesn't exist" (assume the actual s3 path is
> "s3://my-bucket-name/path/to/hive/table/folder"). I can see
> /path/to/hive/table/folder is the correct path (but without the
> s3://my-bucket-name prefix).
> 
> My hive storage configuration is like this:
> 
> {
>  "type": "hive",
>  "enabled": true,
>  "configProps": {
>"hive.metastore.uris": "thrift://ip-*.ec2.internal:9083",
>"javax.jdo.option.ConnectionURL":
> "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
>"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
>"fs.default.name": "s3://", # I also tried s3a and s3n,
> none of them works..
>"hive.metastore.sasl.enabled": "false"
>  }
> }
> 
> I'm using drill 1.5.0 and jets3t-0.9.2 as 3rd party library. I tried to
> enable s3 and it works fine, so my aws creds is all right and configured
> correct.
> 
> Any help will be appreciated! I'm stuck on this for two days. I don't have
> any clue to debug this now.
> 
> Thank you very much
> 
> -Vincent



Drill on EMR hive with S3 backed storage

2016-03-18 Thread Vincent Meng
Hi all,

I have a EMR hive cluster and all the tables are external tables where
files are stored on s3.

Following drill tutorials I've setup drill embed on my local and I can
successfully connect to remove hive cluster. I can list all the tables in
the hive cluster.

However when I do `select .. from` type of queries on those tables drill
complains about "Error: SYSTEM ERROR: IOException:
/path/to/hive/table/folder doesn't exist" (assume the actual s3 path is
"s3://my-bucket-name/path/to/hive/table/folder"). I can see
/path/to/hive/table/folder is the correct path (but without the
s3://my-bucket-name prefix).

My hive storage configuration is like this:

{
  "type": "hive",
  "enabled": true,
  "configProps": {
"hive.metastore.uris": "thrift://ip-*.ec2.internal:9083",
"javax.jdo.option.ConnectionURL":
"jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "s3://", # I also tried s3a and s3n,
none of them works..
"hive.metastore.sasl.enabled": "false"
  }
}

I'm using drill 1.5.0 and jets3t-0.9.2 as 3rd party library. I tried to
enable s3 and it works fine, so my aws creds is all right and configured
correct.

Any help will be appreciated! I'm stuck on this for two days. I don't have
any clue to debug this now.

Thank you very much

-Vincent