Connecting to S3 bucket which does not seem to require a key

2017-06-08 Thread Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to publicly
available

https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following message

0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials from any
provider in the chain (state=,code=0)

If I try entering random numbers as keys, I get the following message

Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3,
AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS
Error Message: The AWS Access Key Id you provided does not exist in our
records. (state=,code=0)

Is it possible to connect to a data source that does not seem to require a
key?

Thanks,
Jack


Re: Accessing json fields within CSV file

2017-06-08 Thread Andries Engelbrecht
You can use convert_from and JSON data type.

0: jdbc:drill:> select t.col1, t.col2, t.conv.key1 as key1, t.conv.key2 as 
key2, t.col4 from
. . . . . . . > (select columns[0] as col1 , columns[1] as col2 , 
convert_from(columns[2], 'JSON') as conv  , columns[3] as col4 from 
`/flat/psv-json/json.tbl`) t;
+---+---+-+-+---+
| col1  | col2  |  key1   |  key2   | col4  |
+---+---+-+-+---+
| 1 | xyz   | value1  | value2  | abc   |




If you want to use functions like flatten you will need to make sure the JSON 
in represented in an array.
i.e. [{"key":1, "value": 1},{"key":2, "value":2}]

0: jdbc:drill:> select t.col1, t.col2, t.conv.key as key, t.conv.`value` as 
`value`, t.col4 from
. . . . . . . > (select columns[0] as col1, columns[1]as col2, 
flatten((convert_from(columns[2],'JSON'))) as conv,  columns[3] as col4 from 
`/flat/psv-json/json.tbl`) t;
+---+---+--++---+
| col1  | col2  | key  | value  | col4  |
+---+---+--++---+
| 1 | xyz   | 1| 1  | abc   |
| 1 | xyz   | 2| 2  | abc   |
+---+---+--++---+



--Andries




On 6/8/17, 2:22 AM, "ankit jain"  wrote:

Hi,
I have a few psv file with a few of the columns being a json key value map.
Example:

> 1|xyz|{"key1":"value1", "key2":"value2"}|abc|


I am converting these files to parquet format but want to convert the json
key and values to different columns. How is that possible?

end product being:
id name key1 key2 description
1 xyz value1 value2 abc

Right now am doing something like this but the json column wont explode:

CREATE TABLE dfs.data.`/logs/logsp/`  AS SELECT
> CAST(columns[0] AS INT)  `id`,
> columns[1] AS `name`,
> columns[2] AS `json_column`,
> columns[3] AS `description`,
> from dfs.data.`logs/events.tbl`;


And this is what I get

id name json_column description
1 xyz {"key1":"value1", "key2":"value2"} abc

Thanks in advance,
Ankit Jain




Accessing json fields within CSV file

2017-06-08 Thread ankit jain
Hi,
I have a few psv file with a few of the columns being a json key value map.
Example:

> 1|xyz|{"key1":"value1", "key2":"value2"}|abc|


I am converting these files to parquet format but want to convert the json
key and values to different columns. How is that possible?

end product being:
id name key1 key2 description
1 xyz value1 value2 abc

Right now am doing something like this but the json column wont explode:

CREATE TABLE dfs.data.`/logs/logsp/`  AS SELECT
> CAST(columns[0] AS INT)  `id`,
> columns[1] AS `name`,
> columns[2] AS `json_column`,
> columns[3] AS `description`,
> from dfs.data.`logs/events.tbl`;


And this is what I get

id name json_column description
1 xyz {"key1":"value1", "key2":"value2"} abc

Thanks in advance,
Ankit Jain


Re: Column alias are ignored when Storage Plugin is enabled

2017-06-08 Thread Kunal Khatua
It could be related to these as well:

https://issues.apache.org/jira/browse/DRILL-5537

https://issues.apache.org/jira/browse/DRILL-5538


Please go ahead and file a bug. If it is related, they'll be linked and 
resolved together.


~ Kunal


From: Rahul Raj 
Sent: Thursday, June 8, 2017 12:12:47 AM
To: user@drill.apache.org
Subject: Column alias are ignored when Storage Plugin is enabled

Drill ignores column aliases when a JDBC storage plugin is enabled.

If I execute 'select destination as x from ...some.csv' the column name
appears as 'destination' instead  of 'x' while JDBC storage plugin is
enabled. On disabling the storage plugin, drill returns the results with
aliased name 'x'.

This could be related to https://issues.apache.org/jira/browse/DRILL-4903,
where  results return the implicit columns(fqn,filepath etc..) as well.

Should I go ahead and raise a JIRA on this?

Regards,
Rahul

--
 This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom it is
addressed. If you are not the named addressee then you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and delete this e-mail from your system.


Re: CTAS to wait till the time table is created

2017-06-08 Thread Ted Dunning
On Thu, Jun 8, 2017 at 7:07 AM, Sing, Jasbir 
wrote:

> I am using CTAS command to copy one parquet file from another. But my
> threads are not waiting for the task completion and are moving forward. I
> want my tread to wait till the time my parquet file is created.
> How can I achieve this?
>

What threads?

How are you invoking the CTAS command?

Are you calling Drill via JDBC? Or what?


Column alias are ignored when Storage Plugin is enabled

2017-06-08 Thread Rahul Raj
Drill ignores column aliases when a JDBC storage plugin is enabled.

If I execute 'select destination as x from ...some.csv' the column name
appears as 'destination' instead  of 'x' while JDBC storage plugin is
enabled. On disabling the storage plugin, drill returns the results with
aliased name 'x'.

This could be related to https://issues.apache.org/jira/browse/DRILL-4903,
where  results return the implicit columns(fqn,filepath etc..) as well.

Should I go ahead and raise a JIRA on this?

Regards,
Rahul

-- 
 This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.