Connecting to S3 bucket which does not seem to require a key
Hi, I'm trying to access the NYC Citibike S3 bucket, which seems to publicly available https://s3.amazonaws.com/tripdata/index.html If I leave the Access Key & Secret Key empty, I get the following message 0: jdbc:drill:zk=local> !tables Error: Failure getting metadata: Unable to load AWS credentials from any provider in the chain (state=,code=0) If I try entering random numbers as keys, I get the following message Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS Error Message: The AWS Access Key Id you provided does not exist in our records. (state=,code=0) Is it possible to connect to a data source that does not seem to require a key? Thanks, Jack
Re: Accessing json fields within CSV file
You can use convert_from and JSON data type. 0: jdbc:drill:> select t.col1, t.col2, t.conv.key1 as key1, t.conv.key2 as key2, t.col4 from . . . . . . . > (select columns[0] as col1 , columns[1] as col2 , convert_from(columns[2], 'JSON') as conv , columns[3] as col4 from `/flat/psv-json/json.tbl`) t; +---+---+-+-+---+ | col1 | col2 | key1 | key2 | col4 | +---+---+-+-+---+ | 1 | xyz | value1 | value2 | abc | If you want to use functions like flatten you will need to make sure the JSON in represented in an array. i.e. [{"key":1, "value": 1},{"key":2, "value":2}] 0: jdbc:drill:> select t.col1, t.col2, t.conv.key as key, t.conv.`value` as `value`, t.col4 from . . . . . . . > (select columns[0] as col1, columns[1]as col2, flatten((convert_from(columns[2],'JSON'))) as conv, columns[3] as col4 from `/flat/psv-json/json.tbl`) t; +---+---+--++---+ | col1 | col2 | key | value | col4 | +---+---+--++---+ | 1 | xyz | 1| 1 | abc | | 1 | xyz | 2| 2 | abc | +---+---+--++---+ --Andries On 6/8/17, 2:22 AM, "ankit jain"wrote: Hi, I have a few psv file with a few of the columns being a json key value map. Example: > 1|xyz|{"key1":"value1", "key2":"value2"}|abc| I am converting these files to parquet format but want to convert the json key and values to different columns. How is that possible? end product being: id name key1 key2 description 1 xyz value1 value2 abc Right now am doing something like this but the json column wont explode: CREATE TABLE dfs.data.`/logs/logsp/` AS SELECT > CAST(columns[0] AS INT) `id`, > columns[1] AS `name`, > columns[2] AS `json_column`, > columns[3] AS `description`, > from dfs.data.`logs/events.tbl`; And this is what I get id name json_column description 1 xyz {"key1":"value1", "key2":"value2"} abc Thanks in advance, Ankit Jain
Accessing json fields within CSV file
Hi, I have a few psv file with a few of the columns being a json key value map. Example: > 1|xyz|{"key1":"value1", "key2":"value2"}|abc| I am converting these files to parquet format but want to convert the json key and values to different columns. How is that possible? end product being: id name key1 key2 description 1 xyz value1 value2 abc Right now am doing something like this but the json column wont explode: CREATE TABLE dfs.data.`/logs/logsp/` AS SELECT > CAST(columns[0] AS INT) `id`, > columns[1] AS `name`, > columns[2] AS `json_column`, > columns[3] AS `description`, > from dfs.data.`logs/events.tbl`; And this is what I get id name json_column description 1 xyz {"key1":"value1", "key2":"value2"} abc Thanks in advance, Ankit Jain
Re: Column alias are ignored when Storage Plugin is enabled
It could be related to these as well: https://issues.apache.org/jira/browse/DRILL-5537 https://issues.apache.org/jira/browse/DRILL-5538 Please go ahead and file a bug. If it is related, they'll be linked and resolved together. ~ Kunal From: Rahul RajSent: Thursday, June 8, 2017 12:12:47 AM To: user@drill.apache.org Subject: Column alias are ignored when Storage Plugin is enabled Drill ignores column aliases when a JDBC storage plugin is enabled. If I execute 'select destination as x from ...some.csv' the column name appears as 'destination' instead of 'x' while JDBC storage plugin is enabled. On disabling the storage plugin, drill returns the results with aliased name 'x'. This could be related to https://issues.apache.org/jira/browse/DRILL-4903, where results return the implicit columns(fqn,filepath etc..) as well. Should I go ahead and raise a JIRA on this? Regards, Rahul -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom it is addressed. If you are not the named addressee then you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and delete this e-mail from your system.
Re: CTAS to wait till the time table is created
On Thu, Jun 8, 2017 at 7:07 AM, Sing, Jasbirwrote: > I am using CTAS command to copy one parquet file from another. But my > threads are not waiting for the task completion and are moving forward. I > want my tread to wait till the time my parquet file is created. > How can I achieve this? > What threads? How are you invoking the CTAS command? Are you calling Drill via JDBC? Or what?
Column alias are ignored when Storage Plugin is enabled
Drill ignores column aliases when a JDBC storage plugin is enabled. If I execute 'select destination as x from ...some.csv' the column name appears as 'destination' instead of 'x' while JDBC storage plugin is enabled. On disabling the storage plugin, drill returns the results with aliased name 'x'. This could be related to https://issues.apache.org/jira/browse/DRILL-4903, where results return the implicit columns(fqn,filepath etc..) as well. Should I go ahead and raise a JIRA on this? Regards, Rahul -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom it is addressed. If you are not the named addressee then you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and delete this e-mail from your system.