Re: setting an administrator

2017-05-05 Thread Sudheesh Katkam
There are system options that define the list of users and list of groups that 
are considered administrators. By default, the user running the drillbit is the 

System options can only be changed by administrators. So login as an 
administrator through sqlline and run “ALTER SYSTEM SET …”, or login as an 
administrator through web UI and change system options there.

Details are here: I agree this 
should be better documented. So please open a ticket.

Also, system options are stored in ZooKeeper, which is why manually 
creating/editing that znode worked.

On May 5, 2017, at 7:47 AM, Knapp, Michael 
> wrote:

After a lot of source code digging, and some trial and error, I discovered I 
can set admin users from the zookeeper CLI with this command:

create /drill/sys.options/security.admin.users 

now why the heck this is not in the documentation beats me.  I think the 
developers wanted me to use sqlline to set this, but they left no documentation 
whatsoever about how to establish a connection between sqlline and my zookeeper 
persistent store.

On 5/4/17, 6:27 PM, "Knapp, Michael" 
> wrote:


   I am trying to set drill administrators but it’s just not working.  I have 
setup a custom authenticator that uses a backend database for authentication, 
and that is working.  The only problem is I am a “user” not an administrator, 
leaving me essentially powerless and drill useless.

   First, I think the 
 are not clear, it is not clear to me if I should be executing the SET 
statement from the web console or something else.  I have tried this:

   I updated my drill-override.conf, I have attempted setting 
“” and “security.admin.users”.  I have set them 
to single values and also attempted putting the values in brackets like a list. 
 None of these combinations have worked.

   It was unclear to me how I was supposed to run your SQL statements when I am 
not an administrator in the first place.  Then I guessed I should try it from 
the sqlline, but that also is not working.

   sqlline> ALTER SYSTEM SET `security.admin.users` = "my_id";
   No current connection

   Why is it saying that I have no current connection?  What am I missing here?

   Michael Knapp

   The information contained in this e-mail is confidential and/or proprietary 
to Capital One and/or its affiliates and may only be used solely in performance 
of work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: multiple users and passwords

2017-05-02 Thread Sudheesh Katkam
To clarify:

Passing *end user* credentials from one service to another *to reach the target 
service* is messy.

On May 2, 2017, at 1:06 PM, Sudheesh Katkam 
<<>> wrote:

Drill supports impersonation (“outbound”) to HDFS and Hive; this works because 
the client API allows for inbound impersonation.

In your use-case, does the backend database allow Drill to impersonate the end 
user (“Joe”) i.e. does the database support inbound impersonation? If so, there 
still need to be some changes made in Drill to support that; please open a 
ticket in that case.

Passing credentials from one service to another is messy. Instead if service A 
supports inbound impersonation (or proxy users), then service A can verify 
service B’s credentials once, and allow service B to impersonate end users 
(maybe based on some policies, like Drill). This will avoid having to pass 
through the end user’s credentials.

- Sudheesh

On May 2, 2017, at 11:55 AM, Knapp, Michael 
<<>> wrote:

Sorry I noticed that documentation/link after I sent the original message.  I 
also found the documentation on “Configuring User Impersonation” and 
“Configuring Inbound Impersonation” to be useful and relevant.

I am not sure that these will be adequate though.  Drill supports inbound 
impersonation, but I think I need the opposite, outbound impersonation.

For example, I can setup Drill to use LDAP, and “Joe” can login to the machine. 
 He may do a query joining the database with another source.  Drill can use 
impersonation to execute these queries as Joe.  Unfortunately though, Joe’s 
credentials for the backend database may not be the same as his LDAP 
credentials, and they may be different for the other data sources.  Joe could 
configure the storage plugins to use his database username/password, but 
wouldn’t that also make his password visible to all users?

I guess I can summarize this with one question: Can Drill support separate 
storage plugin configurations per user?

On 5/2/17, 2:36 PM, "Kunal Khatua" <<>> 

  Have you had a look at this link?

  Configuring User Authentication - Apache 
  Authentication is the process of establishing confidence of authenticity. A 
Drill client user is authenticated when a drillbit process running in a Drill 
cluster ...

  - Kunal

  From: Knapp, Michael 
  Sent: Tuesday, May 2, 2017 8:33:03 AM
  Cc: Chagani, Hassan; Swift, John
  Subject: multiple users and passwords

  Drill Developers and Supporters,

  I am hoping to use drill to query a SQL databaes.  There will be many 
different users accessing the drill web console, and each of them have separate 
credentials for accessing the database.  I have the requirement of supporting 
drill queries to the database using the credentials provided by the current 
user.  I am struggling to find a way to do this in drill because I noticed that:

  · The documentation instructs me to provide the username and password 
in the storage plugin, either in the ‘url’ field or as separate ‘username’ and 
‘password’ fields.

  · As far as I know, Drill does not support user logins or various 
permission models.

  So as I see it, if a person can reach the drill web console, then they can 
also see all of the storage plugin configurations.  That means they can see the 
passwords in clear text.  If I opened this up to multiple users, then each of 
them could see everybody else’s passwords.  I cannot simply create a system 
account to perform queries on behalf of others because we have auditing 

  I also noticed that completed queries are logged in the “Profiles” tab on the 
console.  So if somehow I configure things such that credentials are passed in 
a query, they would still be visible to other users by viewing completed 
queries.  So I would also need to prevent that somehow.

  Does anybody know how I can provide drill with each user’s credentials 
without sharing them with every user?

  I don’t see any way to provide credentials in a select statement to my 
database, it looks like it can only be provided while forming a connection.

  I was thinking, maybe I can write a new storage plugin that wraps the RDBMS 
plugin, and consumes credentials by some other method.  I don’t see any 
documentation on how to write your own storage plugin.

  Any ideas or suggestions would be greatly appreciated.

  Michael Knapp


Re: multiple users and passwords

2017-05-02 Thread Sudheesh Katkam
Drill supports impersonation (“outbound”) to HDFS and Hive; this works because 
the client API allows for inbound impersonation.

In your use-case, does the backend database allow Drill to impersonate the end 
user (“Joe”) i.e. does the database support inbound impersonation? If so, there 
still need to be some changes made in Drill to support that; please open a 
ticket in that case.

Passing credentials from one service to another is messy. Instead if service A 
supports inbound impersonation (or proxy users), then service A can verify 
service B’s credentials once, and allow service B to impersonate end users 
(maybe based on some policies, like Drill). This will avoid having to pass 
through the end user’s credentials.

- Sudheesh

> On May 2, 2017, at 11:55 AM, Knapp, Michael  
> wrote:
> Sorry I noticed that documentation/link after I sent the original message.  I 
> also found the documentation on “Configuring User Impersonation” and 
> “Configuring Inbound Impersonation” to be useful and relevant.
> I am not sure that these will be adequate though.  Drill supports inbound 
> impersonation, but I think I need the opposite, outbound impersonation.  
> For example, I can setup Drill to use LDAP, and “Joe” can login to the 
> machine.  He may do a query joining the database with another source.  Drill 
> can use impersonation to execute these queries as Joe.  Unfortunately though, 
> Joe’s credentials for the backend database may not be the same as his LDAP 
> credentials, and they may be different for the other data sources.  Joe could 
> configure the storage plugins to use his database username/password, but 
> wouldn’t that also make his password visible to all users?
> I guess I can summarize this with one question: Can Drill support separate 
> storage plugin configurations per user?
> On 5/2/17, 2:36 PM, "Kunal Khatua"  wrote:
>Have you had a look at this link?
>Configuring User Authentication - Apache 
> Drill
>Authentication is the process of establishing confidence of authenticity. 
> A Drill client user is authenticated when a drillbit process running in a 
> Drill cluster ...
>- Kunal
>From: Knapp, Michael 
>Sent: Tuesday, May 2, 2017 8:33:03 AM
>Cc: Chagani, Hassan; Swift, John
>Subject: multiple users and passwords
>Drill Developers and Supporters,
>I am hoping to use drill to query a SQL databaes.  There will be many 
> different users accessing the drill web console, and each of them have 
> separate credentials for accessing the database.  I have the requirement of 
> supporting drill queries to the database using the credentials provided by 
> the current user.  I am struggling to find a way to do this in drill because 
> I noticed that:
>· The documentation instructs me to provide the username and 
> password in the storage plugin, either in the ‘url’ field or as separate 
> ‘username’ and ‘password’ fields.
>· As far as I know, Drill does not support user logins or various 
> permission models.
>So as I see it, if a person can reach the drill web console, then they can 
> also see all of the storage plugin configurations.  That means they can see 
> the passwords in clear text.  If I opened this up to multiple users, then 
> each of them could see everybody else’s passwords.  I cannot simply create a 
> system account to perform queries on behalf of others because we have 
> auditing requirements.
>I also noticed that completed queries are logged in the “Profiles” tab on 
> the console.  So if somehow I configure things such that credentials are 
> passed in a query, they would still be visible to other users by viewing 
> completed queries.  So I would also need to prevent that somehow.
>Does anybody know how I can provide drill with each user’s credentials 
> without sharing them with every user?
>I don’t see any way to provide credentials in a select statement to my 
> database, it looks like it can only be provided while forming a connection.
>I was thinking, maybe I can write a new storage plugin that wraps the 
> RDBMS plugin, and consumes credentials by some other method.  I don’t see any 
> documentation on how to write your own storage plugin.
>Any ideas or suggestions would be greatly appreciated.
>Michael Knapp
>The information contained in this e-mail is confidential and/or 
> proprietary to Capital One and/or its affiliates and may only be used solely 
> in performance of work or services for Capital One. The information 
> transmitted herewith is 

[HANGOUT] Topics for 04/18/17

2017-04-17 Thread Sudheesh Katkam
Hi drillers,

Our bi-weekly hangout is tomorrow (04/18/17, 10 AM PT). If you have any 
suggestions for hangout topics, you can add them to this thread. We will also 
ask around at the beginning of the hangout for topics.

Hangout link:

Thank you,

Re: Strange results with date_trunc 'QUARTER'

2017-04-03 Thread Sudheesh Katkam
Looks like a bug to me. Please open a ticket. A simple repro would be very 

- Sudheesh

On Apr 3, 2017, at 2:11 PM, Joel Wilsson 
> wrote:


I'm seeing some strange results when trying to group by
date_trunc('QUARTER', ). I can work around it by doing
more or less the same thing as in DateTruncFunctions. Am I missing
something, or is this a bug?

0: jdbc:drill:> SELECT date_trunc('QUARTER',
`taxi_trips`.`dropoff_datetime`), COUNT(*) FROM `hive.default`.`taxi_trips`
GROUP BY date_trunc('QUARTER', `taxi_trips`.`dropoff_datetime`) ORDER BY
date_trunc('QUARTER', `taxi_trips`.`dropoff_datetime`);
| EXPR$0 |   EXPR$1   |
| 2012-01-01 00:00:00.0  | 21817  |
| 2013-01-01 00:00:00.0  | 173157926  |
| 2013-04-01 00:00:00.0  | 3  |
| 2013-07-01 00:00:00.0  | 2  |
| 2013-10-01 00:00:00.0  | 3  |
| 2014-01-01 00:00:00.0  | 8  |
| 2020-01-01 00:00:00.0  | 4  |
7 rows selected (12.734 seconds)

The data is spread out over all months of 2013:

0: jdbc:drill:> SELECT date_trunc('MONTH',
`taxi_trips`.`dropoff_datetime`), COUNT(*) FROM `hive.default`.`taxi_trips`
GROUP BY date_trunc('MONTH', `taxi_trips`.`dropoff_datetime`) ORDER BY
date_trunc('MONTH', `taxi_trips`.`dropoff_datetime`);
| EXPR$0 |  EXPR$1   |
| 2012-12-01 00:00:00.0  | 21817 |
| 2013-01-01 00:00:00.0  | 14772657  |
| 2013-02-01 00:00:00.0  | 13990803  |
| 2013-03-01 00:00:00.0  | 15744402  |
| 2013-04-01 00:00:00.0  | 15108210  |
| 2013-05-01 00:00:00.0  | 15313848  |
| 2013-06-01 00:00:00.0  | 14355098  |
| 2013-07-01 00:00:00.0  | 13830436  |
| 2013-08-01 00:00:00.0  | 12613596  |
| 2013-09-01 00:00:00.0  | 14080300  |
| 2013-10-01 00:00:00.0  | 15009363  |
| 2013-11-01 00:00:00.0  | 14388420  |
| 2013-12-01 00:00:00.0  | 13950801  |
| 2014-01-01 00:00:00.0  | 8 |
| 2020-05-01 00:00:00.0  | 4 |
15 rows selected (12.25 seconds)

This workaround gives the correct results:

0: jdbc:drill:> SELECT date_trunc('YEAR', `taxi_trips`.`dropoff_datetime`)
+ ((extract(month from `taxi_trips`.`dropoff_datetime`)-1)/3) * interval
'3' MONTH, COUNT(*) FROM `hive.default`.`taxi_trips` GROUP BY
date_trunc('YEAR', `taxi_trips`.`dropoff_datetime`) + ((extract(month from
`taxi_trips`.`dropoff_datetime`)-1)/3) * interval '3' MONTH ORDER BY
date_trunc('YEAR', `taxi_trips`.`dropoff_datetime`) + ((extract(month from
`taxi_trips`.`dropoff_datetime`)-1)/3) * interval '3' MONTH;
| EXPR$0 |  EXPR$1   |
| 2012-10-01 00:00:00.0  | 21817 |
| 2013-01-01 00:00:00.0  | 44507862  |
| 2013-04-01 00:00:00.0  | 44777156  |
| 2013-07-01 00:00:00.0  | 40524332  |
| 2013-10-01 00:00:00.0  | 43348584  |
| 2014-01-01 00:00:00.0  | 8 |
| 2020-04-01 00:00:00.0  | 4 |
7 rows selected (13.261 seconds)

The data is read from an external Parquet table:

0: jdbc:drill:> describe `hive.default`.`taxi_trips`;
| dropoff_datetime| TIMESTAMP  | YES  |
| dropoff_latitude| DOUBLE | YES  |
| dropoff_longitude   | DOUBLE | YES  |
| hack_license| CHARACTER VARYING  | YES  |
| medallion   | CHARACTER VARYING  | YES  |
| passenger_count | BIGINT | YES  |
| pickup_datetime | TIMESTAMP  | YES  |
| pickup_latitude | DOUBLE | YES  |
| pickup_longitude| DOUBLE | YES  |
| rate_code   | BIGINT | YES  |
| store_and_fwd_flag  | CHARACTER VARYING  | YES  |
| trip_distance   | DOUBLE | YES  |
| trip_time_in_secs   | BIGINT | YES  |
| vendor_id   | CHARACTER VARYING  | YES  |
14 rows selected (0.184 seconds)
0: jdbc:drill:>

Best regards,

Re: S3 using IAM roles

2017-04-03 Thread Sudheesh Katkam
Glad you could figure this out! Can you open a ticket with details?

- Sudheesh

On Apr 3, 2017, at 12:42 PM, Knapp, Michael 
> wrote:


In case others are in the same situation as I am, I will tell you how I solved 
this.  After A LLL of digging through source code, I discovered the 
following facts:
• Drill is using hadoop’s FileSystem to support S3 queries.  So any 
configuration items that work for that will also work if you place them in the 
core-site.xml file here.
• In the Hadoop-aws jar/source code, it uses these classes to get credentials:
o S3AFileSystem
o S3AUtils
o [Default]S3ClientFactory
• If you configure nothing, then naturally credentials will be searched in this 
o BasicAWSCredentialsProvider – looks for access and secret in the core-site 
xml file
o EnvironmentVariableCredentialsProvider – looks for access and secret in 
environment variables.
o SharedInstanceProfileCredentialsProvider – tries to get credentials from the 

So to solve this problem I had to do these steps:
1. Make sure that core-site.xml DOES NOT set the access and secret key
2. Make sure that your S3 Storage configuration DOES NOT set the access and 
secret key from the Apache Drill web UI, Storage tab
3. In my case, I also needed server side encryption to be supported, there is a 
property you can add to core-site.xml for that.

Here is what my core-site.xml file eventually looked like:


When you query from drill, the format should look like this:
SELECT * FROM s3.`s3a://my-bucket/drill/nation.parquet` limit 3;

Also, if somebody needs to troubleshoot this, then modify the logback.xml, add 



Then you can see log entries for these things in drillbit.log

I hope this may help other people who need to use IAM and/or server side 
encryption with drill.

I also hope that somebody will update the Drill documentation to explain how to 
do this, it could have saved me a day of work!

Michael Knapp

On 4/3/17, 1:13 PM, "Knapp, Michael" 
> wrote:

   Drill Developers,

   I am using IAM roles on EC2 instances, your documentation here:

   instructs me to provide an access key and secret key, which I do not have 
since I am using IAM roles.

   I have been reviewing the source code a few hours now and still have not 
found a point in the code where you connect with S3.  I was surprised to find 
that you do not use the AWS SDK.

   Can you please tell me:

   1.   Does Drill support using IAM roles to provide credentials for S3 

   2.   Where in the code does Drill establish a connection with S3?

   Michael Knapp

   The information contained in this e-mail is confidential and/or proprietary 
to Capital One and/or its affiliates and may only be used solely in performance 
of work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: [HANGOUT] Topics for 01/24/17

2017-01-24 Thread Sudheesh Katkam
Join us here:

On Jan 23, 2017, at 6:43 PM, Sudheesh Katkam 
<<>> wrote:

I meant 01/24/17, 10 AM PT.

On Jan 23, 2017, at 12:43 PM, Sudheesh Katkam 
<<>> wrote:

Hi drillers,

Our bi-weekly hangout is tomorrow (01/23/17, 10 AM PT). If you have any 
suggestions for hangout topics, you can add them to this thread. We will also 
ask around at the beginning of the hangout for topics.

Thank you,

Re: [HANGOUT] Topics for 01/24/17

2017-01-23 Thread Sudheesh Katkam
I meant 01/24/17, 10 AM PT.

> On Jan 23, 2017, at 12:43 PM, Sudheesh Katkam <> wrote:
> Hi drillers,
> Our bi-weekly hangout is tomorrow (01/23/17, 10 AM PT). If you have any 
> suggestions for hangout topics, you can add them to this thread. We will also 
> ask around at the beginning of the hangout for topics.
> Thank you,
> Sudheesh

Re: Impersonation with Drill Web Console or REST API

2016-12-21 Thread Sudheesh Katkam
Maybe the doc should say that Drill supports impersonation through web console. 
These clients use Java client library, just like JDBC.

Note that *inbound* impersonation is not supported yet because Drill does not 
expose an “impersonation_target” field through the web login form.

Thank you,

> On Dec 21, 2016, at 10:08 AM, Akihiko Kusanagi  wrote:
> Hi,
> The 'Impersonation Support' table In the following page says that
> impersonation
> is not supported with Drill Web Console or REST API.
> However, when authentication and impersonation are enabled, impersonation is
> in effect through Web UI.
> $ cat drill-override.conf
> ...
> drill.exec: {
> ...
> impersonation: {
>   enabled: true
> },
> ...
> Only mapr user has read permission for nation.parquet, and Drillbit is
> running as mapr user.
> $ hadoop fs -ls /sample-data
> ...
> drwx--   - mapr mapr   1210 2016-01-11 19:58 nation.parquet
> ...
> Then, login as the other user via Drill Web UI, and run this query:
> select * from dfs.`/sample-data/nation.parquet`
> This returns the following error, so it seems that impersonation is in
> effect.
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IOException: 2049.177.8452826 /sample-data/nation.parquet (Input/output
> error) Fragment 0:0 [Error Id: 91684467-8a4f-4fb8-8ad7-6ee04b7f8f53 on
> node3:31010]
> When drill.exec.impersonation.enabled = false, the query above returns
> multiple rows.
> Is this expected behavior? Does the document need to be updated?
> Thanks,
> Aki

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

2016-12-21 Thread Sudheesh Katkam
Two more questions..

(1) How many nodes in your cluster?
(2) How many queries are running when the failure is seen?

If you have multiple large queries running at the same time, the load on the 
system could cause those failures (which are heartbeat related).

The two options I suggested decrease the parallelism of stages in a query, this 
implies lesser load but slower execution.

System level option affect all queries, and session level affect queries on a 
specific connection. Not sure what is preferred in your environment.

Also, you may be interested in metrics. More info here: 

Thank you,

> On Dec 21, 2016, at 4:31 AM, Anup Tiwari <> wrote:
> @sudheesh, yes drill bit is running on datanodeN/10.*.*.5:31010).
> Can you tell me how this will impact to query and do i have to set this at
> session level OR system level?
> Regards,
> *Anup Tiwari*
> On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang <> wrote:
>> I am pretty sure this is the same as DRILL-4708.
>> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam <>
>> wrote:
>>> Is the drillbit service (running on datanodeN/10.*.*.5:31010) actually
>>> down when the error is seen?
>>> If not, try lowering parallelism using these two session options, before
>>> running the queries:
>>> planner.width.max_per_node (decrease this)
>>> planner.slice_target (increase this)
>>> Thank you,
>>> Sudheesh
>>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari <>
>>> wrote:
>>>> Hi Team,
>>>> We are running some drill automation script on a daily basis and we
>> often
>>>> see that some query gets failed frequently by giving below error ,
>> Also i
>>>> came across DRILL-4708 <
>> jira/browse/DRILL-4708
>>>> which seems similar, Can anyone give me update on that OR workaround to
>>>> avoid such issue ?
>>>> *Stack Trace :-*
>>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613 <-->
>>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. Drillbit
>>> down?
>>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] (state=,code=0)
>>>> java.sql.SQLException: CONNECTION ERROR: Connection /10.*.*.1:41613
>> <-->
>>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. Drillb
>>>> it down?
>>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
>>>>   at
>>>> org.apache.calcite.avatica.AvaticaConnection$1.execute(
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(
>>>>   at
>>>> org.apache.calcite.avatica.AvaticaConnection.
>> prepareAndExecuteInternal(
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillConnectionImpl.
>>> prepareAndExecuteInternal(
>>>>   at
>>>> org.apache.calcite.avatica.AvaticaStatement.executeInternal(
>>>>   at
>>>> org.apache.calcite.avatica.AvaticaStatement.execute(
>>>>   at
>>>> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(
>>>>   at sqlline.Commands.execute(
>>>>   at sqlline.Commands.sql(
>>>>   at sqlline.SqlLine.dispatch(
>>>>   at sqlline.SqlLine.runCommands(
>>>>   at sqlline.Com

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

2016-12-20 Thread Sudheesh Katkam
Is the drillbit service (running on datanodeN/10.*.*.5:31010) actually down 
when the error is seen?

If not, try lowering parallelism using these two session options, before 
running the queries:

planner.width.max_per_node (decrease this)
planner.slice_target (increase this)

Thank you,

> On Dec 20, 2016, at 12:28 AM, Anup Tiwari  wrote:
> Hi Team,
> We are running some drill automation script on a daily basis and we often
> see that some query gets failed frequently by giving below error , Also i
> came across DRILL-4708 
> which seems similar, Can anyone give me update on that OR workaround to
> avoid such issue ?
> *Stack Trace :-*
> Error: CONNECTION ERROR: Connection /10.*.*.1:41613 <-->
> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. Drillbit down?
> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] (state=,code=0)
> java.sql.SQLException: CONNECTION ERROR: Connection /10.*.*.1:41613 <-->
> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. Drillb
> it down?
> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> org.apache.calcite.avatica.AvaticaConnection$1.execute(
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> org.apache.calcite.avatica.AvaticaStatement.execute(
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(
>at sqlline.Commands.execute(
>at sqlline.Commands.sql(
>at sqlline.SqlLine.dispatch(
>at sqlline.SqlLine.runCommands(
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>at java.lang.reflect.Method.invoke(
> sqlline.ReflectiveCommandHandler.execute(
>at sqlline.SqlLine.dispatch(
>at sqlline.SqlLine.initArgs(
>at sqlline.SqlLine.begin(
>at sqlline.SqlLine.start(
>at sqlline.SqlLine.main(
> Caused by: org.apache.drill.common.exceptions.UserException: CONNECTION
> ERROR: Connection /10.*.*.1:41613 <--> datanodeN/10.*.*.5:31010 (user
> client) closed unexpectedly. Drillbit down?
> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> org.apache.drill.common.exceptions.UserException$
> org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler$1.operationComplete(
> io.netty.util.concurrent.DefaultPromise.notifyListener0(
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(
> io.netty.util.concurrent.DefaultPromise.notifyListeners(
> io.netty.util.concurrent.DefaultPromise.trySuccess(

[ANNOUNCE] Apache Drill 1.9.0 Released

2016-11-29 Thread Sudheesh Katkam
On behalf of the Apache Drill community, I am happy to announce the release
of Apache Drill 1.9.0.

For information about Apache Drill, and to get involved, visit the project

This release introduces new features and enhancements, including
asynchronous Parquet reader, Parquet filter pushdown, dynamic UDF support
and HTTPD format plugin. In all, 70 issues have been resolved.

The binary and source artifacts are available here:

Review the release notes for a complete list of fixes and enhancements:

Thanks to everyone in the community who contributed to this release!

- Sudheesh

Re: Apache Drill Hangout Minutes - 11/1/16

2016-11-02 Thread Sudheesh Katkam
I am going to update the pull request so that both will be "ok".

This implies that username/password credentials will be sent to the server
twice, during handshake and during SASL exchange. And sending credentials
through handshake will be deprecated (and removed in a future release).

Thank you,

On Wed, Nov 2, 2016 at 2:58 PM, Jacques Nadeau <> wrote:

> Since I'm not that close to DRILL-4280, I wanted to clarify expectation:
> <1.9 Client  <==>  1.9 Server (ok)
>  1.9 Client  <==> <1.9 Server (fails)
> Is that correct?
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> On Tue, Nov 1, 2016 at 8:44 PM, Sudheesh Katkam <>
> wrote:
> > Hi Laurent,
> >
> > That's right; this was mentioned in the design document.
> >
> > I am piggybacking on previous changes that break the "newer clients
> talking
> > to older servers" compatibility. For example, as I understand, some
> > resolved sub-tasks of DRILL-4714 [1] *implicitly* break this
> compatibility;
> > say the "newer" API that was introduced is used by an application which
> is
> > talking to an older server. The older server drops the connection, unable
> > to handle the message.
> >
> > In DRILL-4280, there is an *explicit* break in that specific
> compatibility,
> > and the error message is much cleaner with a version mismatch message.
> The
> > difference is that the C++ client (unlike the Java client) checks for the
> > server version as well, which make the compatibility break more visible.
> >
> > I am not sure about the plan of action in general about this
> compatibility.
> > However, I could work around the issue by advertising clients' SASL
> > capability to the server. What do you think?
> >
> > Thank you,
> > Sudheesh
> >
> > [1]
> >
> > On Nov 1, 2016, at 7:49 PM, Laurent Goujon <> wrote:
> >
> > Just for clarity, DRILL-4280 is a breaking-protocol change, so is the
> plan
> > to defer this change to a later release, or to defer bringing back
> > compatibility between newer clients and older servers to a later release?
> >
> > Laurent
> >
> > On Tue, Nov 1, 2016 at 3:43 PM, Zelaine Fong <> wrote:
> >
> > Oops, mistake in my notes.  For the second item, I meant DRILL-4280, not
> > DRILL-1950.
> >
> > On Tue, Nov 1, 2016 at 3:40 PM, Zelaine Fong <> wrote:
> >
> > Attendees: Paul, Padma, Sorabh, Boaz, Sudheesh, Vitalii, Roman, Dave O,
> > Arina, Laurent, Kunal, Zelaine
> >
> > I had to leave the hangout at 10:30, so my notes only cover the
> >
> > discussion
> >
> > up till then.
> >
> > 1) Variable width decimal support - Dave O
> >
> > Currently Drill only supports fixed width byte array storage of decimals.
> > Dave has submitted a pull request for DRILL-4834 to add support for
> >
> > storing
> >
> > decimals with variable width byte arrays.  Eventually, variable width can
> > replace fixed width, but the pull request doesn't cover that.  Dave would
> > like someone in the community to review his pull request.
> >
> > 2) 1.9 release - Sudheesh
> >
> > Sudheesh is collecting pull requests for the release.  Some have been
> > reviewed and are waiting to be merged.  Sudheesh plans to commit a batch
> > this Wed and another this Friday.  He's targeting having a release
> > candidate build available next Monday.
> >
> > Laurent asked about Sudheesh's pull request for DRILL-1950.  He asked
> > whether thought had been given to supporting newer Drill clients with
> >
> > older
> >
> > Drill servers.  Sudheesh indicated that doing this would entail a
> >
> > breaking
> >
> > change in the protocol, and the plan was to defer doing this for a later
> > release where we may want to make other breaking changes like this.
> >

Hangout starting now..

2016-10-04 Thread Sudheesh Katkam

Thank you,

Re: [HANGOUT] Topics for 10/04/16

2016-10-04 Thread Sudheesh Katkam
Join us at this link: 

> On Oct 3, 2016, at 9:26 PM, Shadi Khalifa <> wrote:
> Hi,
> I have been working on integrating WEKA into Drill to support building and 
> scoring classification models. I have been successful in supporting all WEKA 
> classifiers and making them run in a distributed fashion over Drill 1.2. The 
> classifier accuracy is not affected by running in a distributed fashion and 
> the training and scoring times are getting a huge boost using Drill. A paper 
> on this has been published in the IEEE symposium on Big Data in June 2016 
> [available: 
>] and 
> we are now in the process of publishing another paper in which QDrill 
> supports all WEKA algorithms. FYI, this can be easily extended to support 
> clustering and other types of WEKA algorithms. The architecture also allows 
> supporting other data mining libraries.
> The QDrill project website is, the 
> project downloadable version on it is little bit old but I'm planning to 
> upload a more updated stable version within the next 10 days. I'm also using 
> an SVN repository and planning to move the project to GitHub to make it 
> easier to get the latest Drill versions and to may be integrate with Drill at 
> some point. 
> Unfortunately, I have another meeting tomorrow at the same time of the 
> hangout, but I would love to know your opinion and to discuss the process of 
> evaluating this extension and may be integrating it with Drill at some point. 
> Regards
> Shadi KhalifaPhD CandidateSchool of Computing Queen's University Canada
> I'm just a neuron in the society collective brain
> 01001001 0010 01101100 0110 01110110 01100101 0010 01000101 
> 01100111 0001 0111 01110100 
> P Please consider your environmental responsibility before printing this 
> e-mail
>On Monday, October 3, 2016 10:52 PM, Laurent Goujon <> 
> wrote:
> Hi,
> I'm currently working on improving metadata support for both the JDBC
> driver and the C++ connector, more specifically the following JIRAs:
> DRILL-4853: Update C++ protobuf source files
> DRILL-4420: Server-side metadata and prepared-statement support for C++
> connector
> DRILL-4880: Support JDBC driver registration using ServiceLoader
> DRILL-4925: Add tableType filter to GetTables metadata query
> DRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata
> APIs
> I  already opened multiple pull requests for those (the list is available
> at
> I'm planning to join tomorrow hangout in case people have questions about
> those.
> Cheers,
> Laurent
> On Mon, Oct 3, 2016 at 10:28 AM, Subbu Srinivasan <>
> wrote:
>> Can we close on ?
>> On Mon, Oct 3, 2016 at 10:27 AM, Sudheesh Katkam <>
>> wrote:
>>> Hi drillers,
>>> Our bi-weekly hangout is tomorrow (10/04/16, 10 AM PT). If you have any
>>> suggestions for hangout topics, you can add them to this thread. We will
>>> also ask around at the beginning of the hangout for topics.
>>> Thank you,
>>> Sudheesh

[HANGOUT] Topics for 10/04/16

2016-10-03 Thread Sudheesh Katkam
Hi drillers,

Our bi-weekly hangout is tomorrow (10/04/16, 10 AM PT). If you have any
suggestions for hangout topics, you can add them to this thread. We will
also ask around at the beginning of the hangout for topics.

Thank you,

Re: Right outer join fails

2016-09-26 Thread Sudheesh Katkam
Hi Kathir,

I tried simple filter conditions with aliases.

This query did not return any result:
select city[0] as cityalias from dfs.tmp.`data.json` where cityalias = 1;

But, this query works fine:
select city[0] as cityalias from dfs.tmp.`data.json` where city[0] = 1;

So I suppose aliases are supported in join or filter conditions. There is an 
enhancement request for aliases in group by conditions [1]; please open an 
enhancement ticket for this issue and link it to [1].

Thank you,


> On Sep 21, 2016, at 2:24 PM, Kathiresan S <> 
> wrote:
> Hi Sudheesh,
> There is another related issue around this.
> For the same data I've used for DRILL-4890
> <>, below query doesn't
> return any result (which is supposed to return one row)
> select city[0] as cityalias from dfs.tmp.`data.json` a join (select id as
> idalias from dfs.tmp.`cities.json`) b on a.*cityalias  *= b.idalias
> However, the query below works fine
> select city[0] as cityalias from dfs.tmp.`data.json` a join (select id as
> idalias from dfs.tmp.`cities.json`) b on a.*city[0] * = b.idalias
> Using an alias for city[0] in the join condition makes it return no result.
> Any idea, is this a known issue (is there any JIRA issue already tracked
> for this) or should a separate JIRA issue be filed for this?
> *Files used for testing:*
> *Json file 1: data.json*
> { "name": "Jim","city" : [1,2]}
> *Json file 2: cities.json*
> {id:1,name:"Sendurai"}
> {id:2,name:"NYC"}
> Thanks,
> Kathir
> On Wed, Sep 14, 2016 at 8:23 AM, Kathiresan S <>
> wrote:
>> ​Hi Sudheesh,
>> I've filed a JIRA for this
>> Thanks,
>> Kathir
>> On Wed, Sep 14, 2016 at 8:09 AM, Kathiresan S <
>>> wrote:
>>> Hi Sudheesh,
>>> Thanks for checking this out.
>>> I do get the same error what you get, when i run Drillbit on my Eclipse
>>> and run the same query from WebUI pointing to my local instance, and on top
>>> of this error, i do get the "QueryDataBatch was released twice" error as
>>> well.
>>> But, in drillbit.log of one of the nodes on the cluster, where this
>>> failed, i don't see the IndexOutOfBoundsException. Somehow, the 
>>> IndexOutOfBoundsException
>>> log is getting suppressed and only the QueryDataBatch error is logged. But
>>> thats a separate issue.
>>> I did run it from WebUI and its in RUNNING state forever (actually i
>>> started one yesterday and left the tab, its still in RUNNING state)
>>> Sure, I'll file a JIRA and will provide the details here.
>>> Thanks again!
>>> Regards,
>>> Kathir
>>> On Tue, Sep 13, 2016 at 8:17 PM, Sudheesh Katkam <>
>>> wrote:
>>>> Hi Kathir,
>>>> I tried the same query in embedded mode, and I got a different error.
>>>> java.lang.IndexOutOfBoundsException: index: 0, length: 8 (expected:
>>>> range(0, 0))
>>>>at io.netty.buffer.DrillBuf.checkIndexD(
>>>>at io.netty.buffer.DrillBuf.chk(
>>>>at io.netty.buffer.DrillBuf.getLong(
>>>>at org.apache.drill.exec.vector.BigIntVector$Accessor.get(BigIn
>>>>at org.apache.drill.exec.vector.BigIntVector$Accessor.getObject
>>>> (
>>>>at org.apache.drill.exec.vector.RepeatedBigIntVector$Accessor.g
>>>> etObject(
>>>>at org.apache.drill.exec.vector.RepeatedBigIntVector$Accessor.g
>>>> etObject(
>>>>at org.apache.drill.exec.vector.accessor.GenericAccessor.getObj
>>>> ect(
>>>>at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.
>>>> getObject(
>>>>at org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObje
>>>> ct(
>>>>at org.apache.drill.jdbc.impl.Ava

Re: drill rest api converts all data types to string

2016-09-16 Thread Sudheesh Katkam
This is different.

> Nested data however is returned with the original data types intact.

.. which makes sense, looking at the code.

Thank you,

> On Sep 16, 2016, at 11:18 AM, Jacques Nadeau <> wrote:
> FYI, it is already filed:
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> On Fri, Sep 16, 2016 at 8:56 AM, Sudheesh Katkam <>
> wrote:
>> Hi Niek,
>> That is a bug; thank you for digging out the exact location. Please open a
>> ticket <>. Let’s start
>> tracking the details of the fix in that ticket.
>> Thank you,
>> Sudheesh
>>> On Sep 15, 2016, at 2:49 AM, Niek Bartholomeus <>
>> wrote:
>>> Hi,
>>> I'm using the drill rest api to query my parquet files that were
>> generated by spark.
>>> I noticed that numeric and boolean data types are all converted to
>> string in the results. Nested data however is returned with the original
>> data types intact.
>>> Probably this is happening here:
>> 2d9f9abb4c47d08f8462599c8d6076a61a1708fe/exec/java-exec/src/
>> main/java/org/apache/drill/exec/server/rest/
>>> Is there any way how to fix this?
>>> I'm using the latest version of drill.
>>> Thanks in advance,
>>> Niek.

Re: Drill 1.8.0 Error: RESOURCE ERROR: Failed to create schema tree.

2016-09-16 Thread Sudheesh Katkam
This is how to get a verbose error:

First set the option:
> SET `exec.errors.verbose` = true;

And then run the query. The detailed output will point us to where the error 

Thank you,

> On Sep 15, 2016, at 9:12 PM, Abhishek Girish  
> wrote:
> Hi Kartik,
> Can you take a look at the logs (or turn on verbose errors) and share the
> relevant stack trace? Also what platform is this on?
> -Abhishek
> On Thu, Sep 15, 2016 at 4:26 PM, Kartik Bhatia  wrote:
>> When I run the following
>> 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 5;
>> It gives me java expection with  Error: RESOURCE ERROR: Failed to create
>> schema tree.
>> ~~
>> This e-mail message from State Compensation Insurance Fund and all
>> attachments transmitted with it
>> may be privileged or confidential and protected from disclosure. If you
>> are not the intended recipient,
>> you are hereby notified that any dissemination, distribution, copying, or
>> taking any action based on it
>> is strictly prohibited and may have legal consequences. If you have
>> received this e-mail in error,
>> please notify the sender by reply e-mail and destroy the original message
>> and all copies.
>> ~~

Re: drill rest api converts all data types to string

2016-09-16 Thread Sudheesh Katkam
Hi Niek,

That is a bug; thank you for digging out the exact location. Please open a 
ticket . Let’s start tracking the 
details of the fix in that ticket.

Thank you,

> On Sep 15, 2016, at 2:49 AM, Niek Bartholomeus  wrote:
> Hi,
> I'm using the drill rest api to query my parquet files that were generated by 
> spark.
> I noticed that numeric and boolean data types are all converted to string in 
> the results. Nested data however is returned with the original data types 
> intact.
> Probably this is happening here: 
> Is there any way how to fix this?
> I'm using the latest version of drill.
> Thanks in advance,
> Niek.

Re: Quering Hbase with TimeRange, Version, Timestamp,etc

2016-09-14 Thread Sudheesh Katkam
AFAIK, not possible currently.

Thank you,

> On Sep 14, 2016, at 5:15 AM, Ulf Andreasson @ MapR  
> wrote:
> I am also curious regarding this, is it possible ?
> Ulf Andreasson | Ericsson Global Alliance Solution Engineer, | +46
> 72 700 2295
> On Fri, Mar 4, 2016 at 2:51 AM, Abhishek  wrote:
>> In hbase shell we could use command like
>> get '/test/service','rk1',{COLUMN=>'fm1:col1',TIMERAGE=>[0,TS1]}
>> to query the record versions fall into [0,TS1) where TS1 is the timestamp
>> for each version (the ts1 we set when running the 'put' command).
>> Is there a way to do the same SQL query in Drill to query such data?
>> e.g. select * from hbase.`/test/table` as t where t.fm1.TIMESTAMP =
>> 1442X
>> or is there a native search mod in Drill that I could use hbase query
>> instead of sql?
>> --
>> If you tell the truth, you don't have to remember anything..
>> Regards,
>> Abhishek Agrawal

Re: Right outer join fails

2016-09-13 Thread Sudheesh Katkam
Hi Kathir,

I tried the same query in embedded mode, and I got a different error.

java.lang.IndexOutOfBoundsException: index: 0, length: 8 (expected: range(0, 0))
at io.netty.buffer.DrillBuf.checkIndexD(
at io.netty.buffer.DrillBuf.chk(
at io.netty.buffer.DrillBuf.getLong(

In this case, the Java client library is not able to consume the results sent 
from the server, and the query was CANCELLED (as seen in the query profile, on 
the web UI). Are you seeing the same?

I am not aware of any workarounds; this seems like a bug to me. Can you open a 
ticket ?

Thank you,

> On Sep 13, 2016, at 7:10 AM, Kathiresan S  
> wrote:
> ​Hi,
> Additional info on this. Array column ('city' in the example) is the issue.
> 1. When i select the just the first occurrence of the array column, the
> query works fine
> select,[0], from dfs.tmp.`data.json` a right join
> dfs.tmp.`cities.json` b on[0]
> Result
> Jim 1 Sendurai
> null null NYC
> 2. And when i do a repeated_count on the array column, it returns -2 on the
> second row
> select,repeated_count(, from dfs.tmp.`data.json` a
> right join dfs.tmp.`cities.json` b on[0]
> Result
> Jim 2 Sendurai
> null -2 NYC
> Any idea/work around for this issue would be highly appreciated
> Thanks,
> Kathir
> ​
> On Sat, Sep 10, 2016 at 9:56 PM, Kathiresan S 
> wrote:
>> Hi,  A Query with right outer join fails while the inner and left outer
>> joins work for the same data. I've replicated the issue with some simple
>> data here and this happens in both 1.6.0 and 1.8.0
>> *Json file 1: data.json*
>> { "name": "Jim","city" : [1,2]}
>> *Json file 2: cities.json*
>> {id:1,name:"Sendurai"}
>> {id:2,name:"NYC"}
>> *Queries that work:*
>> 1.  select,,, from dfs.tmp.`data.json` a left
>> outer join dfs.tmp.`cities.json` b on[0]
>> 2. select,,, from dfs.tmp.`data.json` a join
>> dfs.tmp.`cities.json` b on[0]
>> *Query that fails:*
>> select,,, from dfs.tmp.`data.json` a right outer
>> join dfs.tmp.`cities.json` b on[0]
>> *On the server side, i see below error trace :*
>> java.lang.IllegalStateException: QueryDataBatch was released twice.
>> org.apache.drill.exec.rpc.user.QueryDataBatch.release(
>> [drill-java-exec-1.6.0.jar:1.6.0]
>> org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(
>> [drill-java-exec-1.6.0.jar:1.6.0]
>> org.apache.drill.exec.rpc.user.UserClient.handleReponse(
>> ~[drill-java-exec-1.6.0.jar:1.6.0]
>>at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(
>> ~[drill-rpc-1.6.0.jar:1.6.0]
>>at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(
>> ~[drill-rpc-1.6.0.jar:1.6.0]
>>at org.apache.drill.exec.rpc.RpcBus.handle(
>> ~[drill-rpc-1.6.0.jar:1.6.0]
>>at org.apache.drill.exec.rpc.RpcBus$
>> ~[drill-rpc-1.6.0.jar:1.6.0]
>>at org.apache.drill.common.SerializedExecutor$
>> [drill-rpc-1.6.0.jar:1.6.0]
>> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(
>> [drill-rpc-1.6.0.jar:1.6.0]
>> org.apache.drill.common.SerializedExecutor.execute(
>> [drill-rpc-1.6.0.jar:1.6.0]
>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(
>> [drill-rpc-1.6.0.jar:1.6.0]
>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(
>> [drill-rpc-1.6.0.jar:1.6.0]
>>at io.netty.handler.codec.MessageToMessageDecoder.channelRead(

Re: Authentication with PAM fails with Centos 7

2016-09-13 Thread Sudheesh Katkam
Hi Pradeeban,

I am not entirely sure what the problem is. The important part from the stack 
trace is here:

> Invalid user credentials: PAM profile 'pkathi2' validation failed

From code 
 this error message says that verification against a PAM profile named 
“pkathi2” failed. 
+ So in your case, looks like both username and PAM profile name are the same?
+ We have tested the feature in Centos 6 against “login” and “sudo” profiles. 
So maybe some PAM configuration issue specific to “pkathi2” and/or Centos 7?
+ Drill internally uses JPAM  as bridge to PAM. 
So as a last resort, you may need to ensure JPAM is working on Centos 7. If you 
are comfortable running tests in Java, try this test 

Thank you,

> On Sep 12, 2016, at 10:04 AM, Pradeeban Kathiravelu  
> wrote:
> Hi,
> I have configured Drill with authentication using PAM successfully several
> times in Ubuntu (14.04 and 16.04). However, when I try to do the same with
> Centos 7 (I have sudo access to this), it fails.
> Please note that this is a simple embedded Drill instance, and it works
> without the authentication configured.
> I followed the same steps -
> for
> authentication. The credentials are correct (despite the error logs
> suggesting them to be wrong below).
> $DRILL_HOME/bin/drill-embedded -n pkathi2 -p *
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=512M; support was removed in 8.0
> set 12, 2016 12:58:40 PM org.glassfish.jersey.server.ApplicationHandler
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
> 01:25:26...
> set 12, 2016 12:58:41 PM org.glassfish.jersey.internal.Errors logErrors
> WARNING: The following warnings have been detected: HINT: A HTTP GET
> method, public void
> throws java.lang.Exception, returns a void type. It can be intentional and
> perfectly fine, but it is a little uncommon that GET method returns always
> "204 No Content".
> Error: Failure in connecting to Drill:
> org.apache.drill.exec.rpc.RpcException: HANDSHAKE_VALIDATION : Status:
> AUTH_FAILED, Error Id: 3f791614-a251-40b2-aa36-d02200f4cb6e, Error message:
> Invalid user credentials: PAM profile 'pkathi2' validation failed
> (state=,code=0)
> java.sql.SQLException: Failure in connecting to Drill:
> org.apache.drill.exec.rpc.RpcException: HANDSHAKE_VALIDATION : Status:
> AUTH_FAILED, Error Id: 3f791614-a251-40b2-aa36-d02200f4cb6e, Error message:
> Invalid user credentials: PAM profile 'pkathi2' validation failed
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(
> net.hydromatic.avatica.UnregisteredDriver.connect(
>at org.apache.drill.jdbc.Driver.connect(
>at sqlline.DatabaseConnection.connect(
>at sqlline.DatabaseConnection.getConnection(
>at sqlline.Commands.connect(
>at sqlline.Commands.connect(
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>at java.lang.reflect.Method.invoke(
> sqlline.ReflectiveCommandHandler.execute(
>at sqlline.SqlLine.dispatch(
>at sqlline.SqlLine.initArgs(
>at sqlline.SqlLine.begin(
>at sqlline.SqlLine.start(
>at sqlline.SqlLine.main(
> Caused by: org.apache.drill.exec.rpc.RpcException: HANDSHAKE_VALIDATION :
> Status: AUTH_FAILED, Error Id: 3f791614-a251-40b2-aa36-d02200f4cb6e, Error
> message: Invalid user credentials: PAM profile 'pkathi2' validation failed
> org.apache.drill.exec.client.DrillClient$FutureHandler.connectionFailed(
> org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler.connectionFailed(
> org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$HandshakeSendHandler.success(

Re: Query hangs on planning

2016-09-01 Thread Sudheesh Katkam
That setting is for off-heap memory. The earlier case hit heap memory limit.

> On Sep 1, 2016, at 11:36 AM, Zelaine Fong  wrote:
> One other thing ... have you tried tuning the planner.memory_limit
> parameter?  Based on the earlier stack trace, you're hitting a memory limit
> during query planning.  So, tuning this parameter should help that.  The
> default is 256 MB.
> -- Zelaine
> On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
>> wrote:
>> While planning we use heap memory. 2GB of heap should be sufficient for
>> what you mentioned. This looks like a bug to me. Can you raise a jira for
>> the same? And it would be super helpful if you can also attach the data set
>> used.
>> Rahul
>> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante 
>> wrote:
>>> Sure,
>>> This is what I remember:
>>> * Failure
>>>   - embedded mode on my laptop
>>>   - drill memory: 2Gb/4Gb (heap/direct)
>>>   - cpu: 4cores (+hyperthreading)
>>>   - `planner.width.max_per_node=6`
>>> * Success
>>>   - AWS Cluster 2x c3.8xlarge
>>>   - drill memory: 16Gb/32Gb
>>>   - cpu: limited by kubernetes to 24cores
>>>   - `planner.width.max_per_node=23`
>>> I'm very busy right now to test again, but I'll try to provide better
>> info
>>> as soon as I can.
>>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
 Can you please share the number of cores on the setup where the query
>> hung
 as compared to the number of cores on the setup where the query went
 through successfully.
 And details of memory from the two scenarios.
 On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante 
 For the record, I think this was just bad memory configuration after
>> all.
> I retested on bigger machines and everything seems to be working fine.
> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
> Oscar, can you please report a JIRA with the required steps to
>> reproduce
>> the OOM error. That way someone from the Drill team will take a look
>> and
>> investigate.
>> For others interested here is the stack trace.
>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
>> 378aaa4ce50e:foreman]
>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>> Occurred,
>> exiting. Information message: Unable to handle out of memory condition
>> in
>> Foreman.
>> java.lang.OutOfMemoryError: Java heap space
>>   at java.util.Arrays.copyOfRange(
>> ~[na:1.7.0_111]
>>   at java.lang.String.( ~[na:1.7.0_111]
>>   at java.lang.StringBuilder.toString(
>> ~[na:1.7.0_111]
>>   at org.apache.calcite.util.Util.newInternal(
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>   at
>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>   at
>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>   at
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>   at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .transform(
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .transform(
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .convertToDrel(
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .convertToDrel(
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> tPlan(
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>   at
>> java:
>> 257)

Re: [Drill-Issues] Drill-1.6.0: Drillbit not starting

2016-08-04 Thread Sudheesh Katkam
Can you check if there are any errors in the drillbit.out file? This file 
should be in the same directory as the log file.

Thank you,

> On Aug 4, 2016, at 4:25 AM, Shankar Mane  wrote:
> I am getting this error infrequently. Most of the time drill starts
> normally and sometimes this gives below error. I am running drill 1.6.0 in
> cluster mode. ZK has also setup.
> Could some one please explain where the issue is ?
> 2016-08-04 03:45:15,870 [main] INFO  o.a.d.e.s.s.PersistentStoreRegistry -
> Using the configured PStoreProvider class: '
> 2016-08-04 03:45:16,430 [main] INFO  o.apache.drill.exec.server.Drillbit -
> Construction completed (1294 ms).
> 2016-08-04 03:45:28,250 [main] WARN  o.apache.drill.exec.server.Drillbit -
> Failure on close()
> java.lang.NullPointerException: null
> ~[drill-java-exec-1.6.0.jar:1.6.0]
> org.apache.drill.common.AutoCloseables.close(
> ~[drill-common-1.6.0.jar:1.6.0]
> org.apache.drill.common.AutoCloseables.close(
> ~[drill-common-1.6.0.jar:1.6.0]
>at org.apache.drill.exec.server.Drillbit.close(
> [drill-java-exec-1.6.0.jar:1.6.0]
>at org.apache.drill.exec.server.Drillbit.start(
> [drill-java-exec-1.6.0.jar:1.6.0]
>at org.apache.drill.exec.server.Drillbit.start(
> [drill-java-exec-1.6.0.jar:1.6.0]
>at org.apache.drill.exec.server.Drillbit.main(
> [drill-java-exec-1.6.0.jar:1.6.0]
> 2016-08-04 03:45:28,250 [main] INFO  o.apache.drill.exec.server.Drillbit -
> Shutdown completed (1819 ms).

Re: tmp noexec

2016-07-26 Thread Sudheesh Katkam
epoll was supposed to be disabled as part of this PR [1] pending perf. tests; 
IIRC test clusters were busy then, and I lost track of this change. I’ll post 
an update soon.

Thank you,


> On Jul 26, 2016, at 6:54 PM, Jacques Nadeau  wrote:
> I don't think this will fix your issue since this is a internal extraction.
> Try using: -Ddrill.exec.enable-epoll=false in your drill-env. That should
> (hopefully) disable the extraction of epoll drivers (which should actually
> be disabled by default I believe due to disconnection problems in heavy
> load cases).
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> On Tue, Jul 26, 2016 at 7:20 AM, scott  wrote:
>> Thanks Leon for the suggestion, but do you think this config change will
>> help with my startup problem? It looks like it changes operations for sort
>> after startup.
>> Scott
>> On Mon, Jul 25, 2016 at 3:55 PM, Leon Clayton 
>> wrote:
>>> I move the /tmp off local disk into the distributed FS on a node local
>>> volume on MapR. Other file systems can be inserted.
>>> Open up drill-override.conf on all of the nodes, and insert this :
>>> sort: {
>>>purge.threshold : 100,
>>>external: {
>>>  batch.size : 4000,
>>>  spill: {
>>>batch.size : 4000,
>>>group.size : 100,
>>>threshold : 200,
>>>directories : [ "/var/mapr/local/Hostname/drillspill" ],
>>>fs : "maprfs:///"
>>>  }
>>>  }
 On 25 Jul 2016, at 16:44, scott  wrote:
 I've run into an issue where Drill will not start if mount permissions
>>> are
 set on /tmp to noexec. The permissions were set to noexec due to
>> security
 concerns. I'm using Drill version 1.7. The error I get when starting
>>> Drill
 Exception in thread "main" java.lang.UnsatisfiedLinkError:
 /tmp/ failed to
>> map
 segment from shared object: Operation not permitted
 Does anyone know of a way to configure Drill to use a different tmp

Changes in Launch Scripts

2016-07-22 Thread Sudheesh Katkam
Hi all,

I just committed DRILL-4581 [1] that changes launch scripts.

The patch should be backward compatible. This email is just an FYI to start
using the new style of file. The major usability change is
that Drill defaults have been moved from conf/ to
bin/; changes to variables in will override the

See the ticket for the full list of changes.

Thank you,


[DISCUSS] New Feature: Kerberos Authentication

2016-07-22 Thread Sudheesh Katkam
Hi all,

I plan to work on DRILL-4280: Kerberos Authentication for Clients [1]. The
design document [2] is attached to the ticket. Please read and comment!

Thank you,


Re: Pushdown Capabilities with RDBMS

2016-07-15 Thread Sudheesh Katkam
Hi Marcus,

I am glad that you are exploring Drill! Per RDBMS storage plugin documentation 
[1], join pushdown is supported. So the scenario you described is likely a bug; 
can you open a ticket [2] with the details on how to reproduce the issue?

Thank you,



> On Jul 15, 2016, at 1:48 PM, Marcus Rehm  wrote:
> Hi all,
> I started to teste Drill and I'm very excited about the possibilities.
> By now I'm trying to map ours databases running on Oracle 11g. After try
> some queries I realized that the amount of time Drill takes to complete is
> bigger than a general sql client takes. Looking the execution plan I saw
> (or understood) that Drill is doing the join of tables and is not pushing
> it down to the database.
> Is there any configuration required to it? How can I tell Drill to send to
> Oracle the task of doing the join?
> Thanks in Advance.
> Best regards,
> Marcus Rehm

Re: Olingo plugin

2016-07-12 Thread Sudheesh Katkam
If you have any questions, please ask on the dev list: 

I am not an expert in this area, but other developers can help. And we welcome 
your contribution :)

Thank you,

> On Jul 12, 2016, at 11:18 AM, Steve Warren <> wrote:
> Thanks! I'll have a look, I had found the contrib in github.
> On Tue, Jul 12, 2016 at 11:12 AM, Sudheesh Katkam < 
> <>>
> wrote:
>> There is some documentation available [1].
>> There are five implementations in the contrib directory for reference [2].
>> Thank you,
>> Sudheesh
>> [1]
>> <>
>> <
>> <>
>> [2] 
>> <> <
>> <>>
>>> On Jul 12, 2016, at 11:03 AM, Steve Warren <> wrote:
>>> I considered that, but couldn't find documentation on writing plugins. Is
>>> there any available?
>>> On Tue, Jul 12, 2016 at 10:55 AM, Sudheesh Katkam <>
>>> wrote:
>>>> Hi Steve,
>>>> AFAIK, no such plans. Would you like to open a ticket, and work on it?
>>>> Thank you,
>>>> Sudheesh
>>>>> On Jul 12, 2016, at 10:10 AM, Steve Warren <> wrote:
>>>>> Are there plans to release an Olingo (odata) plugin?
>>>>> --
>>>>> Confidentiality Notice and Disclaimer:  The information contained in
>> this
>>>>> e-mail and any attachments, is not transmitted by secure means and may
>>>> also
>>>>> be legally privileged and confidential.  If you are not an intended
>>>>> recipient, you are hereby notified that any dissemination,
>> distribution,
>>>> or
>>>>> copying of this e-mail is strictly prohibited.  If you have received
>> this
>>>>> e-mail in error, please notify the sender and permanently delete the
>>>> e-mail
>>>>> and any attachments immediately. You should not retain, copy or use
>> this
>>>>> e-mail or any attachment for any purpose, nor disclose all or any part
>> of
>>>>> the contents to any other person. MyVest Corporation, MyVest Advisors
>> and
>>>>> their affiliates accept no responsibility for any unauthorized access
>>>>> and/or alteration or dissemination of this communication nor for any
>>>>> consequence based on or arising out of the use of information that may
>>>> have
>>>>> been illegitimately accessed or altered.
>>> --
>>> Confidentiality Notice and Disclaimer:  The information contained in this
>>> e-mail and any attachments, is not transmitted by secure means and may
>> also
>>> be legally privileged and confidential.  If you are not an intended
>>> recipient, you are hereby notified that any dissemination, distribution,
>> or
>>> copying of this e-mail is strictly prohibited.  If you have received this
>>> e-mail in error, please notify the sender and permanently delete the
>> e-mail
>>> and any attachments immediately. You should not retain, copy or use this
>>> e-mail or any attachment for any purpose, nor disclose all or any part of
>>> the contents to any other person. MyVest Corporation, MyVest Advisors and
>>> their affiliates accept no responsibility for any unauthorized access
>>> and/or alteration or dissemination of this communication nor for any
>>> consequence based on or arising out of the use of information that may
>> have
>>> been illegitimately accessed or altered.
> -- 
> Confidentiality Notice and Disclaimer:  The information contained in this 
> e-mail and any attachments, is not transmitted by secure means and may also 
> be legally privileged and confidential.  If you are not an intended 
> recipient, you are hereby notified that any dissemination, distribution, or 
> copying of this e-mail is strictly prohibited.  If you have received this 
> e-mail in error, please notify the sender and permanently delete the e-mail 
> and any attachments immediately. You should not retain, copy or use this 
> e-mail or any attachment for any purpose, nor disclose all or any part of 
> the contents to any other person. MyVest Corporation, MyVest Advisors and 
> their affiliates accept no responsibility for any unauthorized access 
> and/or alteration or dissemination of this communication nor for any 
> consequence based on or arising out of the use of information that may have 
> been illegitimately accessed or altered.

Re: Olingo plugin

2016-07-12 Thread Sudheesh Katkam
There is some documentation available [1].

There are five implementations in the contrib directory for reference [2].

Thank you,


> On Jul 12, 2016, at 11:03 AM, Steve Warren <> wrote:
> I considered that, but couldn't find documentation on writing plugins. Is
> there any available?
> On Tue, Jul 12, 2016 at 10:55 AM, Sudheesh Katkam <>
> wrote:
>> Hi Steve,
>> AFAIK, no such plans. Would you like to open a ticket, and work on it?
>> Thank you,
>> Sudheesh
>>> On Jul 12, 2016, at 10:10 AM, Steve Warren <> wrote:
>>> Are there plans to release an Olingo (odata) plugin?
>>> --
>>> Confidentiality Notice and Disclaimer:  The information contained in this
>>> e-mail and any attachments, is not transmitted by secure means and may
>> also
>>> be legally privileged and confidential.  If you are not an intended
>>> recipient, you are hereby notified that any dissemination, distribution,
>> or
>>> copying of this e-mail is strictly prohibited.  If you have received this
>>> e-mail in error, please notify the sender and permanently delete the
>> e-mail
>>> and any attachments immediately. You should not retain, copy or use this
>>> e-mail or any attachment for any purpose, nor disclose all or any part of
>>> the contents to any other person. MyVest Corporation, MyVest Advisors and
>>> their affiliates accept no responsibility for any unauthorized access
>>> and/or alteration or dissemination of this communication nor for any
>>> consequence based on or arising out of the use of information that may
>> have
>>> been illegitimately accessed or altered.
> -- 
> Confidentiality Notice and Disclaimer:  The information contained in this 
> e-mail and any attachments, is not transmitted by secure means and may also 
> be legally privileged and confidential.  If you are not an intended 
> recipient, you are hereby notified that any dissemination, distribution, or 
> copying of this e-mail is strictly prohibited.  If you have received this 
> e-mail in error, please notify the sender and permanently delete the e-mail 
> and any attachments immediately. You should not retain, copy or use this 
> e-mail or any attachment for any purpose, nor disclose all or any part of 
> the contents to any other person. MyVest Corporation, MyVest Advisors and 
> their affiliates accept no responsibility for any unauthorized access 
> and/or alteration or dissemination of this communication nor for any 
> consequence based on or arising out of the use of information that may have 
> been illegitimately accessed or altered.

Re: Olingo plugin

2016-07-12 Thread Sudheesh Katkam
Hi Steve,

AFAIK, no such plans. Would you like to open a ticket, and work on it?

Thank you,

> On Jul 12, 2016, at 10:10 AM, Steve Warren  wrote:
> Are there plans to release an Olingo (odata) plugin?
> -- 
> Confidentiality Notice and Disclaimer:  The information contained in this 
> e-mail and any attachments, is not transmitted by secure means and may also 
> be legally privileged and confidential.  If you are not an intended 
> recipient, you are hereby notified that any dissemination, distribution, or 
> copying of this e-mail is strictly prohibited.  If you have received this 
> e-mail in error, please notify the sender and permanently delete the e-mail 
> and any attachments immediately. You should not retain, copy or use this 
> e-mail or any attachment for any purpose, nor disclose all or any part of 
> the contents to any other person. MyVest Corporation, MyVest Advisors and 
> their affiliates accept no responsibility for any unauthorized access 
> and/or alteration or dissemination of this communication nor for any 
> consequence based on or arising out of the use of information that may have 
> been illegitimately accessed or altered.

Re: Drill - Hive - Kerberos

2016-07-06 Thread Sudheesh Katkam
Can you set the following property to false in the Hive storage plugin 
configuration, and try again?

"hive.server2.enable.doAs": “false"

Thank you,

> On Jul 5, 2016, at 11:29 AM, Joseph Swingle  wrote:
> Yes.  Impersonation is enabled.:
> drill.exec: {
>  cluster-id: "hhe",
>  zk.connect: "zk1:2181,zk22181,zk3:2181"
>  impersonation: {
>  enabled: true,
>  max_chained_user_hops: 3
>  }
> }
> On Mon, Jun 20, 2016 at 6:22 PM, Chun Chang  wrote:
>> Did you enable impersonation? Check the drill-override.conf file to verify
>> that impersonation is enabled.
>> On Mon, Jun 20, 2016 at 5:17 AM, Joseph Swingle 
>> wrote:
>>> Yes secure cluster.  Strange that I can browse hdfs, and can get the
>>> metadata about hive database and tables.
>>> But every sql query to pull data from hive tables results in that error.
 On Jun 17, 2016, at 6:24 PM, Chun Chang  wrote:
 Hi Joseph,
 Are you running DRILL on a secure cluster? I had success with the
>>> following
 storage plugin configuration with MapR distribution, SQL standard
 authorization with Kerberos:
 hive storage plugin:
 "type": "hive",
 "enabled": true,
 "configProps": {
  "hive.metastore.uris": "thrift://",
  "": "maprfs:///",
  "hive.server2.enable.doAs": "false",
  "hive.metastore.sasl.enabled": "true",
>>> "hive/"
 On Fri, Jun 17, 2016 at 1:28 PM, Joseph Swingle 
> I have a Hive Storage plugin configured (bottom).   I am using HDP 2.4
>>> w/
> Hive 1.2.1, Drill 1.6
> I can connect just fine with Drill Explorer.  I can browse, and view
> content on hdfs just fine with Drill Explorer.  The .csv files etc,
>>> display
> fine.
> I can browse to see the list of schemas in Hive just fine with Drill
> Explorer.  But every SQL query (for example “select * from foo )
>>> returns:
> Caused by: Failed to get numRows from HiveTable
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   ... 44 common frames omitted
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException:
> Failed to create input splits: Can't get Master Kerberos principal for
>>> use
> as renewer
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   ... 45 common frames omitted
> Caused by: Can't get Master Kerberos principal
>> for
> use as renewer
>   at
> ~[hadoop-mapreduce-client-core-2.7.1.jar:na]
>   at
> ~[hadoop-mapreduce-client-core-2.7.1.jar:na]
>   at
> ~[hadoop-mapreduce-client-core-2.7.1.jar:na]
>   at
>> org.apache.hadoop.mapred.FileInputFormat.listStatus(
> ~[hadoop-mapreduce-client-core-2.7.1.jar:na]
>   at
>> org.apache.hadoop.mapred.FileInputFormat.getSplits(
> ~[hadoop-mapreduce-client-core-2.7.1.jar:na]
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   at
> ~[drill-storage-hive-core-1.6.0.jar:1.6.0]
>   at Method)
> ~[na:1.8.0_45]
>   at
> ~[na:1.8.0_45]
>   at

Hangout Starting Now

2016-06-28 Thread Sudheesh Katkam
Hangout starting now: 

Join us!

Thank you,

Suggestions for Hangout Topics for 06/28/16

2016-06-27 Thread Sudheesh Katkam
If you have any suggestions for hangout topics for tomorrow, you can add it to 
this thread. We will also ask around at the beginning of the hangout for any 
topics. The goal is to try to cover whatever possible during the 1 hr.

Thank you,

Re: Oracle Query Problem

2016-05-31 Thread Sudheesh Katkam
Can you enable verbose logging and post the resulting error message? You can do 
this by executing the following statement, and then the failing query.

SET `exec.errors.verbose` = true;

Thank you,

> On May 31, 2016, at 11:27 AM, SanjiV SwaraJ  wrote:
> Hello I have Oralce Query for Selecting all the columns:-
> SELECT tc.column_name, tc.owner, tc.table_name, tc.column_id, tc.nullable,
> tc.data_type, c.constraint_type, c.r_owner AS reference_owner,
> rcc.table_name AS reference_table, rcc.column_name AS reference_column_name
> tc.owner = cc.owner AND tc.table_name = cc.table_name AND tc.column_name =
> c.owner AND tc.table_name = c.table_name AND c.constraint_name =
> cc.constraint_name ) LEFT OUTER JOIN ALL_CONS_COLUMNS rcc ON ( c.r_owner =
> rcc.owner AND c.r_constraint_name = rcc.constraint_name ) WHERE
> tc.table_name = 'REPORTSETTING' AND tc.OWNER = 'NVN' ORDER BY tc.column_id;
> *This query is working fine in Oracle DB, but while using same query in
> Drill, it giving error. Query for Drill is:-*
> SELECT tc.column_name, tc.owner, tc.table_name,
> tc.column_id,tc.nullable,tc.data_type,c.constraint_type,c.r_owner AS
> reference_owner, rcc.table_name AS reference_table, rcc.column_name AS
> reference_column_name FROM OracleDB.SYS.ALL_TAB_COLUMNS tc LEFT OUTER JOIN
> OracleDB.SYS.ALL_CONS_COLUMNS cc ON ( tc.owner = cc.owner AND tc.table_name
> = cc.table_name AND tc.column_name = cc.COLUMN_NAME ) LEFT OUTER JOIN
> OracleDB.SYS.ALL_CONSTRAINTS c ON ( tc.owner = c.owner AND tc.table_name =
> c.table_name AND c.constraint_name = cc.constraint_name ) LEFT OUTER JOIN
> OracleDB.SYS.ALL_CONS_COLUMNS rcc ON ( c.r_owner = rcc.owner AND
> c.r_constraint_name = rcc.constraint_name ) WHERE tc.table_name =
> Following Error Showing:-
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
> The JDBC storage plugin failed while trying setup the SQL query. sql SELECT
> * FROM (SELECT "t1"."OWNER", "t1"."TABLE_NAME", "t1"."COLUMN_NAME",
> "t1"."DATA_TYPE", "t1"."NULLABLE", "t1"."COLUMN_ID",
> "t0"."TABLE_NAME", "t0"."COLUMN_NAME", "t0"."DATA_TYPE",
> "t0"."DATA_PRECISION", "t0"."DATA_SCALE", "t0"."NULLABLE",
> "t0"."NUM_DISTINCT", "t0"."LOW_VALUE", "t0"."HIGH_VALUE", "t0"."DENSITY",
> "t0"."NUM_NULLS", "t0"."NUM_BUCKETS", "t0"."LAST_ANALYZED",
> "t0"."GLOBAL_STATS", "t0"."USER_STATS", "t0"."AVG_COL_LEN",
> "t0"."CHAR_LENGTH", "t0"."CHAR_USED", "t0"."V80_FMT_IMAGE",
> "t0"."TABLE_NAME" = "ALL_CONS_COLUMNS"."TABLE_NAME" AND "t0"."$f31" =
> "t1"."$f36" = "ALL_CONSTRAINTS"."OWNER" AND "t1"."TABLE_NAME" =
> VARCHAR(120) CHARACTER SET "ISO-8859-1") "$f5" FROM
> "SYS"."ALL_CONS_COLUMNS") "t3" ON "t2"."R_OWNER" = "t3"."$f5" AND
> "t2"."R_CONSTRAINT_NAME" = "t3"."CONSTRAINT_NAME" plugin OracleDB Fragment
> 0:0 [Error Id: 2a11fed2-ec79-4ef1-9d29-781af21274f6
> *Please Tell me what i am doing wrong in this query?*
> -- 
> Thanks & Regards.
> Sanjiv
> ​Swaraj​

Hangout Topics For 05/31/16

2016-05-27 Thread Sudheesh Katkam
Hey y’all,

Let’s use this thread to pre-populate a list of topics to discuss on Tuesday’s 
hangout (05/31/16), so people can attend if they are interested in the 
mentioned topics. I will also collect topics at the beginning of the hangout.

+ DRILL-4280: Kerberos Authentication (Sudheesh)

Thank you,

Re: Impersonation

2016-04-27 Thread Sudheesh Katkam
Hi Lunen,

Is the drillbit process running? If not, look for errors in logs in

In embedded mode, Zookeeper is not required. Try: *bin/sqlline –u
"jdbc:drill:zk=local" -n user1 -p user1PW*

This command starts and connects to a drillbit (which is embedded in the
sqlline process). Notice that *impersonation_target* is also not required.

Thank you,

On Wed, Apr 27, 2016 at 1:51 AM, Lunen de Lange 

> Hi All,
> I’m having issues getting impersonation to work.
> I'm trying to build in security on our Drill (1.6.0) system. I managed to
> get the security user authentication to work(JPam as explained in the
> documentation), but the impersonation does not seem to work. It seems to
> execute and fetch via the root user regardless of who has logged in via
> My drill-override.conf file is configured as follows:
>   drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   impersonation: {
> enabled: true,
> max_chained_user_hops: 3
>   },
>   security.user.auth {
>   enabled: true,
>  packages += "",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> }
> We are also only using Drill on one server, therefore I'm running
> drill-embedded to start things up.
> I’m starting up my Zookeeper separately. (I can see the drill instance
> connect to ZK)
> When I try sqlline, with the following command, I get a “No
> DrillbitEndpoint can be found”
> root@machinename:/opt/apache-drill-1.6.0/bin# ./sqlline –u
> “jdbc:drill:schema=dfs;zk=localhost:2181;impersonation_target=user1” – n
> user1 –p user1PW
> I have also looked at doing my own built in security, but I'm not able to
> retrieve the username from a SQL query. I have tried the following without
> any luck:
> USER()
> Any ideas on this approach?
> Kind regards,
> [image: Intenda Logo.jpg]
> *Lunen de Lange*
> *Big Data Developer/Project Manager*
> +44 (0)782 463 4516 ' | Tel: +44 (0)845 468 3632 | Direct: 01707 367 628
> |  http://www.intenda.net8 |[image: cid:image003.jpg@01CC67BA.79482150]
>  |[image:
> cid:image004.jpg@01CC67BA.79482150]
>  |[image:
> cid:image005.png@01CC67BD.E10828F0] 
> IMPORTANT NOTICE: This communication contains information that is
> confidential and may also be privileged. It is for the exclusive use of the
> intended recipient(s). Any unauthorized use, alteration or dissemination is
> prohibited. If you have received this communication in error, please
> return it with the title "received in error" to
> * <*
> *>* then delete
> the email and destroy
> any copies of it. Any views expressed in this message are those of the
> individual sender and not necessarily those of Intenda UK Ltd. Intenda UK
> Ltd accepts no liability whatsoever for any loss whether it be direct,
> indirect or consequential, arising from information made available and
> actions resulting there from.
> P Think be4 u print

Re: high latency of querying hive datasource

2016-04-07 Thread Sudheesh Katkam
Can you gather the query profile from the web UI [1]?

This mailing list does not accept attachments; please put the profile in a 
shareable location (Dropbox?), and post a link.

Thank you,


> On Apr 7, 2016, at 2:16 PM, Tomek W  wrote:
> Hello,
> I configured hive datasource. It is simple file system. These data are
> saved by spark application (/user/hive/warehaouse).
> I am able from command line of drill query data. However, for 1200 rows
> table it consumes 20s.
> Be honestly, I am launching embeded instance of your drill, but 20s. seems
> to be very much. Given that this table is very small.
> How to improve this result?
> First of all, is it possible to  keep table in memory (RAM)
> Thanks in advance,
> Tom

Re: ConnectTimeoutException when starting drill

2016-03-19 Thread Sudheesh Katkam
Drill 1.6 introduces support 
 for Java 1.8, 
which is generally available today. Download here 

Thank you,

> On Mar 3, 2016, at 1:40 PM, Rob Terpilowski  wrote:
> I am attempting to get Apache Drill running on an Ubuntu box with Java 
> 1.8.0_51 as root.
> I've downloaded and unzipped the drill tar.gz file.
> When I attempt to run the ./bin/drill-embedded  command.  (I've attempted to 
> run the drill-localhost command as well)
> The command sits for a little less than a minute at the line:
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> The message is the followed by the following exceptions.  Any idea of where I 
> can begin looking for the issue.  I changed the port number from 31010 to a 
> number of other different ports with the same result.  
> Any help would be appreciated.
> Thanks,
> -Rob
> Error: Failure in connecting to Drill: 
> org.apache.drill.exec.rpc.RpcException: CONNECTION : 
> connection timed out: 
> (state=,code=0)
> java.sql.SQLException: Failure in connecting to Drill: 
> org.apache.drill.exec.rpc.RpcException: CONNECTION : 
> connection timed out: 
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(
>   at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(
>   at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(
>   at 
> net.hydromatic.avatica.UnregisteredDriver.connect(
>   at org.apache.drill.jdbc.Driver.connect(
>   at sqlline.DatabaseConnection.connect(
>   at sqlline.DatabaseConnection.getConnection(
>   at sqlline.Commands.connect(
>   at sqlline.Commands.connect(
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>   at java.lang.reflect.Method.invoke(
>   at 
> sqlline.ReflectiveCommandHandler.execute(
>   at sqlline.SqlLine.dispatch(
>   at sqlline.SqlLine.initArgs(
>   at sqlline.SqlLine.begin(
>   at sqlline.SqlLine.start(
>   at sqlline.SqlLine.main(
> Caused by: org.apache.drill.exec.rpc.RpcException: CONNECTION : 
> connection timed out: 
>   at 
> org.apache.drill.exec.client.DrillClient$FutureHandler.connectionFailed(
>   at 
> org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(
>   at 
> org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(
>   at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(
>   at 
>   at 
> io.netty.util.concurrent.PromiseTask$
>   at 
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
>   at
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$
>   at
> Caused by: java.util.concurrent.ExecutionException: 
> connection timed out: 
>   at io.netty.util.concurrent.AbstractFuture.get(
>   at 
> org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(
>   ... 12 more
> Caused by: connection timed out: 
>   at 

Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-23 Thread Sudheesh Katkam
> Do you have an interface proposal? I didn't see that.

Are you referring to the Drill client interface to used by applications?

> Also, what do you think about my comment and Keys response about moving 
> pooling to the Driver and then making "connection" lightweight.

An API to change the user on a connection can be easily added later (for now, 
we use a connection property). Since Drill connections are already lightweight, 
this is not an immediate problem. Unlike OracleConnection 
JDBC/ ODBC do not have a provision for proxy sessions in their specification, 
so I am not entirely clear how we would expose “change user on connection” to 
applications using these API.

> Connection level identity setting is only viable if the scalability concerns 
> I raised in the doc and Jacques indirectly raised are addressed.
> Historically DB connections have been so expensive that most applications 
> created pools of connections and reused them across users. That model doesn't 
> work if each connection is tied to a single user. That's why the typical 
> implementation has provided for changing the identity on an existing 
> connection.
> Now, if the Drill connection is a very lightweight object (possibly mapping 
> to a single heavier weight hidden process level object), then tying identity 
> to the connection is fine. I don't know enough about the Drill architecture 
> to comment on that but I think a good rule of thumb would be "is it 
> reasonable to keep 50+ Drill connections open where each has a different user 
> identity?" If the answer is no, then the design needs to consider the scale. 
> I'll also add that much further in the future if/when Drill takes on more 
> operational types of access that 50 connections will rise to a much larger 
> number.

Thank you,

> On Feb 22, 2016, at 2:27 PM, Jacques Nadeau <> wrote:
> Got it, makes sense.
> Do you have an interface proposal? I didn't see that.
> Also, what do you think about my comment and Keys response about moving
> pooling to the Driver and then making "connection" lightweight.
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> On Mon, Feb 22, 2016 at 9:59 AM, Sudheesh Katkam <>
> wrote:
>> “… when creating this connection, as part of the connection properties
>> (JDBC, C++ Client), the application passes the end user’s identity (e.g.
>> username) …”
>> I had written the change user as a session option as part of the
>> enhancement only, where you’ve pointed out a better way. I addressed your
>> comments on the doc.
>> Thank you,
>> Sudheesh
>>> On Feb 22, 2016, at 9:49 AM, Jacques Nadeau <> wrote:
>>> Maybe I misunderstood the design document.
>>> I thought this was how the user would be changed: "Provide a way to
>> change
>>> the user after the connection is made (details) through a session option"
>>> Did I miss something?
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>> On Mon, Feb 22, 2016 at 9:06 AM, Neeraja Rentachintala <
>>>> wrote:
>>>> Jacques,
>>>> I think the current proposal by Sudheesh is an API level change to pass
>>>> this additional end user id during the connection establishment.
>>>> Can you elaborate what you mean by random query.
>>>> -Neeraja
>>>> On Sun, Feb 21, 2016 at 5:07 PM, Jacques Nadeau <>
>>>> wrote:
>>>>> Sudheesh, thanks for putting this together. Reviewing Oracle
>>>> documentation,
>>>>> they expose this at the API level rather than through a random query. I
>>>>> think we should probably model after that rather than invent a new
>>>>> mechanism. This also means we can avoid things like query parsing,
>>>>> execution roundtrip, query profiles, etc to provide this functionality.
>>>>> See here:
>>>>> --
>>>>> Jacques Nadeau
>>>>> CTO and Co-Founder, Dremio
>>>>> On Fri, Feb 19, 2016 at 2:18 PM, Keys B

Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-22 Thread Sudheesh Katkam
“… when creating this connection, as part of the connection properties (JDBC, 
C++ Client), the application passes the end user’s identity (e.g. username) …”

I had written the change user as a session option as part of the enhancement 
only, where you’ve pointed out a better way. I addressed your comments on the 

Thank you,

> On Feb 22, 2016, at 9:49 AM, Jacques Nadeau <> wrote:
> Maybe I misunderstood the design document.
> I thought this was how the user would be changed: "Provide a way to change
> the user after the connection is made (details) through a session option"
> Did I miss something?
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> On Mon, Feb 22, 2016 at 9:06 AM, Neeraja Rentachintala <
>> wrote:
>> Jacques,
>> I think the current proposal by Sudheesh is an API level change to pass
>> this additional end user id during the connection establishment.
>> Can you elaborate what you mean by random query.
>> -Neeraja
>> On Sun, Feb 21, 2016 at 5:07 PM, Jacques Nadeau <>
>> wrote:
>>> Sudheesh, thanks for putting this together. Reviewing Oracle
>> documentation,
>>> they expose this at the API level rather than through a random query. I
>>> think we should probably model after that rather than invent a new
>>> mechanism. This also means we can avoid things like query parsing,
>>> execution roundtrip, query profiles, etc to provide this functionality.
>>> See here:
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>> On Fri, Feb 19, 2016 at 2:18 PM, Keys Botzum <>
>> wrote:
>>>> This is a great feature to add to Drill and I'm excited to see design
>> on
>>>> it starting.
>>>> The ability for an intermediate server that is likely already
>>>> authenticating end users, to send end user identity down to Drill adds
>> a
>>>> key element into an end to end secure design by enabling Drill and the
>>> back
>>>> end systems to see the real user and thus perform meaningful
>>> authorization.
>>>> Back when I was building many JEE applications I know the DBAs where
>> very
>>>> frustrated that the application servers blinded them to the identity of
>>> the
>>>> end user accessing important corporate data. When JEE application
>> servers
>>>> and databases finally added the ability to impersonate that addressed a
>>> lot
>>>> of security concerns. Of course this isn't a perfect solution and I'm
>>> sure
>>>> others will recognize that in some scenarios impersonation isn't the
>> best
>>>> approach, but having that as an option in Drill is very valuable.
>>>> Keys
>>>> ___
>>>> Keys Botzum
>>>> Senior Principal Technologist
>>>> <>
>>>> 443-718-0098
>>>> MapR Technologies
>>>> <>
>>>>> On Feb 19, 2016, at 4:49 PM, Sudheesh Katkam <>
>>>> wrote:
>>>>> Hey y’all,
>>>>> I plan to work on DRILL-4281 <
>>>>>: support for
>>>> inbound/client impersonation. Please review the design document <
>>>> ,
>>>> which is open for comments. There is also a link to proof-of-concept
>>>> (slightly hacky).
>>>>> Thank you,
>>>>> Sudheesh

[DISCUSS] New Feature: Drill Client Impersonation

2016-02-19 Thread Sudheesh Katkam
Hey y’all,

I plan to work on DRILL-4281 
: support for inbound/client 
impersonation. Please review the design document 
 which is open for comments. There is also a link to proof-of-concept (slightly 

Thank you,

Re: Rest & web authentication

2015-12-07 Thread Sudheesh Katkam
Hi Niko,

Web authentication is not available yet; DRILL-3201 
 is being reviewed. The doc 
pages are ahead.

Thank you,

> On Dec 7, 2015, at 7:06 AM, Keys Botzum  wrote:
> I think this answers your question (the answer is yes according to the docs):
> Keys
> ___
> Keys Botzum 
> Senior Principal Technologist
> 443-718-0098
> MapR Technologies 
>> On Dec 7, 2015, at 10:04 AM, Niko Arvilommi  wrote:
>> Hi
>> Is it possible to secure the WEB UI and REST api with password or similiar 
>> methods. Shared keys or something ?
>> It seems to be too open if it not possible
>> Best Niko Arvilommi

Re: Rest & web authentication

2015-12-07 Thread Sudheesh Katkam
Hi John,

MapR Drill 1.3 has the web authentication feature (DRILL-3201), whereas Apache 
Drill 1.3 does not.

Thank you,

> On Dec 7, 2015, at 11:16 AM, John Omernik <> wrote:
> This is really odd to me. I have 1.3 (Compiled by MapR) running on my
> cluster right now.  I added to my java.library.path, and I have
> ldap configured on the nodes in pam.  When I hit the Web UI (via HTTPS,
> HTTP is disabled) I get prompt for user name and the authentication goes
> against the pam on the system, allowing me in.  Only the user running the
> drill bits has "admin" functions in Drill, other users only see a subset of
> tabs (Query, Profiles, Metrics) (the Drillbit user has Query, profiles,
> storage, metrics, and threads).
> This is also true with the RestAPI.  I have to authenticate to use it at
> this point.   I just do forms authentication using the python requests
> module. (I feel like it should accept basic auth over SSL for API good
> ness, but my hack works)
> Thus, I feel like it's been secured fairly well (SSL, Authentication, Users
> etc).  So I do not understand why DRILL-3201 says 1.5, and while it's
> stated that Web Authentication is not available yet...
> On Mon, Dec 7, 2015 at 11:13 AM, Andries Engelbrecht <
>> wrote:
>> This topic came up a while ago and there was a bit of confusion.
>> See the discussion thread here for more info.
>> <
>> Sudheesh listed the JIRA to track it.
>> --Andries
>>> On Dec 7, 2015, at 8:32 AM, Sudheesh Katkam <>
>> wrote:
>>> Hi Niko,
>>> Web authentication is not available yet; DRILL-3201 <
>>> is being reviewed. The
>> doc pages are ahead.
>>> Thank you,
>>> Sudheesh
>>>> On Dec 7, 2015, at 7:06 AM, Keys Botzum <> wrote:
>>>> I think this answers your question (the answer is yes according to the
>> docs):
>> <
>>>> Keys
>>>> ___
>>>> Keys Botzum
>>>> Senior Principal Technologist
>>>> <>
>>>> 443-718-0098
>>>> MapR Technologies
>>>> <>
>>>>> On Dec 7, 2015, at 10:04 AM, Niko Arvilommi <>
>> wrote:
>>>>> Hi
>>>>> Is it possible to secure the WEB UI and REST api with password or
>> similiar methods. Shared keys or something ?
>>>>> It seems to be too open if it not possible
>>>>> Best Niko Arvilommi

Re: Announcing new committer: Kristine Hahn

2015-12-04 Thread Sudheesh Katkam
Congratulations and welcome, Kris!

> On Dec 4, 2015, at 9:19 AM, Jacques Nadeau  wrote:
> The Apache Drill PMC is very pleased to announce Kristine Hahn as a new
> committer.
> Kris has worked tirelessly on creating and improving the Drill
> documentation. She has been extraordinary in her engagement with the
> community and has greatly accelerated the speed to resolution of doc issues
> and improvements.
> Welcome Kristine!

Re: How to unsubscribe from the mail group ?

2015-11-16 Thread Sudheesh Katkam
You can get digests; see 

> On Nov 15, 2015, at 6:24 PM, Kim Chew  wrote:
> On Sun, Nov 15, 2015 at 4:03 PM, ganesh  wrote:
>> Hi,
>> I would like to un subscribe from the group . Can someone tell the
>> procedure. ?
>> Also incase I want Daily mail, instead of Individual mail from group .. is
>> there any such option ?
>> --
>> *Name: Ganesh Semalty*
>> *Location: Gurgaon,Haryana(India)*
>> *Email Id: *
>> P
>> *Please consider the environment before printing this e-mail - SAVE TREE.*

Resetting an option

2015-08-10 Thread Sudheesh Katkam
Hey y‘all,

Re DRILL-1065, at system 
level (ALTER system RESET …), resetting an option would mean changing the value 
to the default provided by Drill. But, at session level (ALTER session RESET 
…), would resetting an option mean:
(a) changing the value to the default provided by Drill? or,
(b) changing the value to the system value, that an admin could’ve changed?

(b) would not allow non-admin users to know what the default is (easily). 
However, for a given option, (a) would allow a non-admin user to know what the 
default is (by resetting) and what the system setting is (from sys.options). 

Thank you,

Re: Resetting an option

2015-08-10 Thread Sudheesh Katkam
Correction: currently any user can SET or RESET an option for session and 

 On Aug 10, 2015, at 2:20 PM, Sudheesh Katkam wrote:
 Hey y‘all,
 Re DRILL-1065, at system 
 level (ALTER system RESET …), resetting an option would mean changing the 
 value to the default provided by Drill. But, at session level (ALTER session 
 RESET …), would resetting an option mean:
 (a) changing the value to the default provided by Drill? or,
 (b) changing the value to the system value, that an admin could’ve changed?
 (b) would not allow non-admin users to know what the default is (easily). 
 However, for a given option, (a) would allow a non-admin user to know what 
 the default is (by resetting) and what the system setting is (from 
 sys.options). Opinions?
 Thank you,

Re: Help with optimizing a query

2015-08-10 Thread Sudheesh Katkam
Hi Yousef,

If possible, could you put the profile in a publicly accessible location 
(Dropbox, etc) and post the link here?

Thank you,

 On Aug 10, 2015, at 3:43 PM, Yousef Lasi wrote:
 We're running a 4 file join on a set of parquet files, the largest of which 
 is about 20 GB in size. The query plan seems to indicate that most, if not 
 all the time for the query (30 minutes) is spent on the first two major 
 fragments. The physical plan looks like the output for these 2 fragments 
 below. I am not sure how to best interpret the results to optimize the query. 
 It's pretty clear based on the plan output as well as the actual system 
 resources utilized during execution that we are not CPU, Memory or I/O bound. 
  That doesn't leave a whole lot left to chase down. Any suggestions on where 
 to look?
00-00Screen : rowType = RecordType(ANY ACTION, ANY ID_TRAN_COLL, 
 DMT_SETT_MV_PSN, ANY CD_CL_OMNI): rowcount = 4.061288430004E7, cumulative 
 cost = {7.33055744629E8 rows, 1.5711522516974182E10 cpu, 0.0 io, 
 2.5162863282995203E13 network, 1.723515700962E10 memory}, id = 117134 
 00-01  Project(ACTION=[$0], ID_TRAN_COLL=[$1], ID_ACC=[$2], 
 ID_IMNT_STD=[$3], CD_TYP_PSN=[$4], DT_ACQS=[$5], CD_TX_ACQS=[$6], 
 DMT_TRD_UGL_LT=[$39], DMT_TRD_UGL_LT_LCL=[$40], ID_ACC_ALT=[$41], 
 CD_MTH_CSTNG=[$42], CD_CCY_TRD_PRIM=[$43], RT_SPOT=[$44], 
 CD_CL_OMNI=[$51]) : rowType = RecordType(ANY ACTION, ANY ID_TRAN_COLL, ANY 
 DMT_SETT_MV_PSN, ANY CD_CL_OMNI): rowcount = 4.061288430004E7, cumulative 
 cost = {7.28994456199E8 rows, 

Re: pending queries jamming the system

2015-08-03 Thread Sudheesh Katkam
Hi Stefan,

Can you create a JIRA for this? Please attach to the JIRA:
(1) thread dumps of the three Drillbits (you can get this using jstack), and 
(2) json query profiles of the PENDING queries (you can get this from the Full 
JSON Profile at the bottom of the profile page).

Thank you,

 On Aug 3, 2015, at 9:00 AM, Stefán Baxter wrote:
 I have a small cluster of 3 drillbits running. It's been working just fine
 until it stopped working altogether. I notice a few pending queries and
 when I try to cancel them, via the admin, they either report that they
 don't know where they are running or the cancelling process freezes.
 What is a) the easiest way to delete all pending tasks? and b) make sure
 that this does not happen?
 All the best,

Re: using the REST API

2015-07-16 Thread Sudheesh Katkam
See inline.

 On Jul 16, 2015, at 4:36 AM, Stefán Baxter wrote:
 I have a few questions regarding the rest API.
   - Is it possible that the rest api (query.json) should return numeric
   values as strings?

There is a ticket for this:

   - count(*)  being an example of that
   - calls for conversion on the browser side
   - I find no obvious setting for this
   - Is there any other serialization/encoding available apart from JSON?
   (like Protobuff)

Not currently.

   - naming the REST endpoint after a serialization may be a hint here
   - Is it possible to use gzipped response to minimize the total delivery

That could be an enhancement.

 I'm using this interface instead of the JDBC driver and would like to do
 everything needed to speed it up.

AFAIK, there is not much you can to to speed REST API up.
Just in case, here’s a link on how to use Drill using ODBC:


Thank you,

Re: Rest API

2015-07-16 Thread Sudheesh Katkam
See inline.

 On Jul 16, 2015, at 4:03 AM, Preetham Nadig 
 I am planning to use DRILL for one of my projects and I have been working 
 with it for couple of weeks

That’s awesome!

 One of the things I would like to do is access DRILL over a REST API, I have 
 successfully queried over a web client.
 But is it possible to send a more than one query in a a rest call or a user 
 defined function kind of functionality?

It’s one query per REST call. Here’s the REST API reference:

You should be able to reference a UDF from you query over REST API. Reference 
to writing custom UDFs:
 If not what alternate options exist to use drill programmatically so that it 
 acts as the interface between data source and a end user application?

You can use the Drill JDBC and OBDC drivers. Here’s a simple example using 


Thank you,

Re: Flatten Output Json

2015-07-16 Thread Sudheesh Katkam
Does this help?

Thank you,

 On Jul 15, 2015, at 10:55 PM, Usman Ali wrote:
  Drill sqlline displays output in a nice format. I am guessing it must
 be flattening the output json before printing it. Is there any function
 available in source code of drill to flatten the response json?
 Usman Ali

Re: Set Drill Response Format to CSV Through Rest APIs

2015-07-16 Thread Sudheesh Katkam
Currently we support only JSON through REST API.

Thank you,

 On Jul 15, 2015, at 9:26 PM, Usman Ali wrote:
 Is there any way to set response format of drill to csv  instead of
 json using Rest APIs? If yes, then what other response formats are
 available in drill.
 Usman Ali

Re: Operation category READ is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.Standby

2015-07-16 Thread Sudheesh Katkam
Can you try just “thrift://nn2:9083 thrift://nn2:9083” (and not include the 
failover namenode) for “hive.metastore.uris” property?

Thank you,

 On Jul 16, 2015, at 1:43 AM, Arthur Chan wrote:
 Anyone has idea what I would be wrong in setup Drill?
 On Tue, Jul 14, 2015 at 4:21 PM, Arthur Chan
 I have HDFS HA with two namenodes (nn1 and nn2 respectively)
 When the namenode nn1 is failover to nn2,  when querying HIVE, I got the
 following error:
 Query Failed: An Error Occurred
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
 org.apache.hadoop.ipc.RemoteException: Operation category READ is not
 supported in state standby at
 at org.apache.hadoop.ipc.RPC$ at
 org.apache.hadoop.ipc.Server$Handler$ at
 org.apache.hadoop.ipc.Server$Handler$ at Method) at at
 at org.apache.hadoop.ipc.Server$
  type: hive,
  enabled: true,
  configProps: {
hive.metastore.uris: thrift://nn1:9083,thrift://nn2:9083,
hive.metastore.sasl.enabled: false
 Any idea to resolve the issue? Please help!

Re: JAVA API for Drill

2015-06-01 Thread Sudheesh Katkam
Adding to Hanifi’s comment. Loot at QueryWrapper#run method and 

- Sudheesh

 On Jun 1, 2015, at 2:29 PM, Hanifi Gunes wrote:
 Have a look at QuerySubmitter
 It does boilerplate for posting queries on top DrillClient. All remains is
 to attach a result listener to perform your custom logic.
 On Mon, Jun 1, 2015 at 2:19 PM, Norris Lee wrote:
 Hi Nishith,
 As far as I know, I don't think there is any documentation on that.
 Hopefully the function names are relatively self-explanatory. If not, feel
 free to ask on this list for clarification.
 -Original Message-
 From: Nishith Maheshwari []
 Sent: Monday, June 01, 2015 4:19 AM
 Subject: Re: JAVA API for Drill
 Thanks Norris
 Is there any documentation regarding the usage of these libraries and
 functions? As in which function does what.
 On Wed, May 27, 2015 at 10:06 PM, Norris Lee wrote:
 Hi Nishith,
 Take a look at the and .cpp/.hpp classes of the
 project for the  Java and C++ libraries respectively.
 -Original Message-
 From: Nishith Maheshwari []
 Sent: Wednesday, May 27, 2015 1:45 AM
 Subject: Re: JAVA API for Drill
 Thank you Martin and Rajkumar for your prompt responses.
 I am actually looking if some API is available which provides this
 functionality. In the documentation it is mentioned in : -
 *You can connect to Apache Drill through the following interfaces:*
   - *Drill shell*
   - *Drill Web UI*
   - *ODBC
   - *JDBC*
   - *C++ API*
 and in -
 *What clients are supported?*
   - *BI tools via the ODBC and JDBC drivers (eg, Tableau, Excel,
   MicroStrategy, Spotfire, QlikView, Business Objects)*
   - *Custom applications via the REST API*
   - *Java and C applications via the dedicated Java and C libraries*
 It would be great if you/somebody can point me to the C++ api or the
 dedicated JAVA library or API as mentioned in the documentation.
 Thanks and regards,
 Nishith Maheshwari
 On Wed, May 27, 2015 at 12:44 PM, Rajkumar Singh
 Do you try drill-jdbc driver? I will suggest you to use java jdbc
 connectivity to query drill using the drill-jdbc driver.I have not
 tried this to query HBASE using drill but it should work if you have
 correctly configured the HBase Storage plugin with the DRILL.
 Rajkumar Singh
 On May 27, 2015, at 12:09 PM, Nishith Maheshwari
 I wanted to create a java application to connect and query over a
 HBase database using Drill, but was unable to find any
 documentation regarding this.
 Is there a JAVA api through which Drill can be accessed? I did see
 mention of C++ and JAVA api in the documentation but there was no
 other link or information regarding the same.
 Nishith Maheshwari

Re: Drill logical plan optimization

2015-05-28 Thread Sudheesh Katkam
Hi Rajkumar,

Here are some links: (Performance 
Tuning Guide)

Did you mean optimize physical plan?


 On May 27, 2015, at 11:14 PM, Rajkumar Singh wrote:
 I am looking for some measures/params to looked upon to optimize the drill 
 logical query plan if i want to resubmit it through the Drill UI, Could you 
 please points me some docs so that I can go through it.
 Rajkumar Singh
 MapR Technologies

Re: Announcing new committer: Hanifi Gunes

2015-04-16 Thread Sudheesh Katkam
Congratulations, Hanifi!

 On Apr 16, 2015, at 2:29 PM, Jacques Nadeau wrote:
 The Apache Drill PMC is very pleased to announce Hanifi Gunes as a new
 He has been providing great contributions over the past six months.
 Additionally, has quickly become the go to expert for value vectors.
 Welcome Hanifi!