from:"Andries Engelbrecht"

Also, I have noticed strange behavior when connecting to Drill if the system 
Drill is running on is overloaded (especially lots of memory pressure).

Normally I clean up some processes and restart Drill (with the appropriate 
memory configuration).


--Andries


On 5/10/18, 8:23 AM, "Andries Engelbrecht" <aengelbre...@mapr.com> wrote:

In the ODBC admin when you configure a DSN you will see a SSL Options 
button in the Authentication section.
Once you click it, you will see a window with a list of options, make sure 
to uncheck Enable SSL.

For more info on the ODBC driver see

https://maprdocs.mapr.com/home/attachments/JDBC_ODBC_drivers/DrillODBCInstallandConfigurationGuide.pdf



--Andries


On 5/10/18, 8:06 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

How to disable ssl on driver


-Original Message-
    From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Thursday, May 10, 2018 4:03 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

My apologies after a bit more digging, I seem to found the issue in my 
environment.

My ODBC driver kept enabling SSL for the connection causing the 
handshake error. I had to verify and turn SSL off manually. Verify that SSL is 
disabled on your ODBC driver connection or the DSN.

Then for Tableau to work properly I had to turn of both authentication 
as well as impersonation. See setting in drill-override.conf
drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}


This obviously is not very secure, but works in my environment. Drill 
1.12 with ODBC 1.3.16.



--Andries



    
On 5/10/18, 7:54 AM, "Andries Engelbrecht" <aengelbre...@mapr.com> 
wrote:

Interesting, I did a quick test and it seems to be an issue with 
Authentication disabled on the drillbit.

In my drill-override.conf I turn off authentication and 
impersonation

drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}


FYI, you may also want to make sure SSL it not enabled.

When I restart the drillbit and try to connect with the ODBC driver 
I get the following error

[MapR][Drill] (30)  Handshake failure occurred while trying to 
connect to local=ip-172-31-47-166:31010. Server message: handshake: short read

And in the drillbit.log I see

2018-05-10 14:45:30,553 [UserServer-1] ERROR 
o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
Connection: /172.31.47.166:31010 <--> /172.31.7.24:49715 (user server).  
Closing connection.


This looks like a bug, and related to similar user issues on this 
list. Perhaps someone else can verify as well and log a bug. I noticed other 
issues when connecting via Tableau with the metadata queries failing as well as 
it was expecting a username to use. Very unfortunate error as the channel gets 
closed leading to a handshake error for the user on an user authentication 
issue.

As a workaround I would suggest you enable at least PLAIN 
authentication on the drillbit and use user/pass to access.

--Andries



On 5/10/18, 7:18 AM, "Peter Edike" 
<peter.ed...@interswitchgroup.com> wrote:

That’s the exact location from which I downloaded the odbc 
driver.

There is no authentication and yes drillbit is listening on 
31010. I am connecting to drillbit directly 

Best regards,
Peter Edike

        
-Original Message-
From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Thursday, May 10, 2018 3:08 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

Make sure to use v 1.3.16 of the ODBC driver 
http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.3.16.1049/

What authentication mechanism are you using?
Also are you connecting to Drillbit directly or through 
Zookeeper?

Re: Handhsake Error

In the ODBC admin when you configure a DSN you will see a SSL Options button in 
the Authentication section.
Once you click it, you will see a window with a list of options, make sure to 
uncheck Enable SSL.

For more info on the ODBC driver see
https://maprdocs.mapr.com/home/attachments/JDBC_ODBC_drivers/DrillODBCInstallandConfigurationGuide.pdf



--Andries


On 5/10/18, 8:06 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

How to disable ssl on driver


-Original Message-
    From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Thursday, May 10, 2018 4:03 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

My apologies after a bit more digging, I seem to found the issue in my 
environment.

My ODBC driver kept enabling SSL for the connection causing the handshake 
error. I had to verify and turn SSL off manually. Verify that SSL is disabled 
on your ODBC driver connection or the DSN.

Then for Tableau to work properly I had to turn of both authentication as 
well as impersonation. See setting in drill-override.conf
drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}


This obviously is not very secure, but works in my environment. Drill 1.12 
with ODBC 1.3.16.



--Andries




On 5/10/18, 7:54 AM, "Andries Engelbrecht" <aengelbre...@mapr.com> wrote:

Interesting, I did a quick test and it seems to be an issue with 
Authentication disabled on the drillbit.

In my drill-override.conf I turn off authentication and impersonation

drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}


FYI, you may also want to make sure SSL it not enabled.

When I restart the drillbit and try to connect with the ODBC driver I 
get the following error

[MapR][Drill] (30)  Handshake failure occurred while trying to connect 
to local=ip-172-31-47-166:31010. Server message: handshake: short read

And in the drillbit.log I see

2018-05-10 14:45:30,553 [UserServer-1] ERROR 
o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
Connection: /172.31.47.166:31010 <--> /172.31.7.24:49715 (user server).  
Closing connection.


This looks like a bug, and related to similar user issues on this list. 
Perhaps someone else can verify as well and log a bug. I noticed other issues 
when connecting via Tableau with the metadata queries failing as well as it was 
expecting a username to use. Very unfortunate error as the channel gets closed 
leading to a handshake error for the user on an user authentication issue.

As a workaround I would suggest you enable at least PLAIN 
authentication on the drillbit and use user/pass to access.

--Andries



On 5/10/18, 7:18 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> 
wrote:

That’s the exact location from which I downloaded the odbc driver.

There is no authentication and yes drillbit is listening on 31010. 
I am connecting to drillbit directly 

Best regards,
Peter Edike

    
    -Original Message-
From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Thursday, May 10, 2018 3:08 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

Make sure to use v 1.3.16 of the ODBC driver 
http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.3.16.1049/

What authentication mechanism are you using?
Also are you connecting to Drillbit directly or through Zookeeper?
Try to connect to drillbit directly on port 31010 (if the drillbit 
config is the default ports), before trying zookeeper connections.


--Andries

On 5/10/18, 4:52 AM, "Peter Edike" 
<peter.ed...@interswitchgroup.com> wrote:


The error I am getting occurs whilst I am trying to configure a 
DSN in the ODBC Admin. I am unable to do a connection test successfully;

The version of ODBC Driver I am using is : MapR_Drill 1.3
My version of Drill is also: 1.12

Kind Regards

    
    -Orig

Re: Handhsake Error

My apologies after a bit more digging, I seem to found the issue in my 
environment.

My ODBC driver kept enabling SSL for the connection causing the handshake 
error. I had to verify and turn SSL off manually. Verify that SSL is disabled 
on your ODBC driver connection or the DSN.

Then for Tableau to work properly I had to turn of both authentication as well 
as impersonation. See setting in drill-override.conf
drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}


This obviously is not very secure, but works in my environment. Drill 1.12 with 
ODBC 1.3.16.



--Andries




On 5/10/18, 7:54 AM, "Andries Engelbrecht" <aengelbre...@mapr.com> wrote:

Interesting, I did a quick test and it seems to be an issue with 
Authentication disabled on the drillbit.

In my drill-override.conf I turn off authentication and impersonation

drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}


FYI, you may also want to make sure SSL it not enabled.

When I restart the drillbit and try to connect with the ODBC driver I get 
the following error

[MapR][Drill] (30)  Handshake failure occurred while trying to connect to 
local=ip-172-31-47-166:31010. Server message: handshake: short read

And in the drillbit.log I see

2018-05-10 14:45:30,553 [UserServer-1] ERROR 
o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
Connection: /172.31.47.166:31010 <--> /172.31.7.24:49715 (user server).  
Closing connection.


This looks like a bug, and related to similar user issues on this list. 
Perhaps someone else can verify as well and log a bug. I noticed other issues 
when connecting via Tableau with the metadata queries failing as well as it was 
expecting a username to use. Very unfortunate error as the channel gets closed 
leading to a handshake error for the user on an user authentication issue.

As a workaround I would suggest you enable at least PLAIN authentication on 
the drillbit and use user/pass to access.

--Andries



On 5/10/18, 7:18 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

That’s the exact location from which I downloaded the odbc driver.

There is no authentication and yes drillbit is listening on 31010. I am 
connecting to drillbit directly 

Best regards,
Peter Edike


-----Original Message-
From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Thursday, May 10, 2018 3:08 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

Make sure to use v 1.3.16 of the ODBC driver 
http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.3.16.1049/

What authentication mechanism are you using?
Also are you connecting to Drillbit directly or through Zookeeper?
Try to connect to drillbit directly on port 31010 (if the drillbit 
config is the default ports), before trying zookeeper connections.


--Andries

On 5/10/18, 4:52 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> 
wrote:


The error I am getting occurs whilst I am trying to configure a DSN 
in the ODBC Admin. I am unable to do a connection test successfully;

The version of ODBC Driver I am using is : MapR_Drill 1.3
My version of Drill is also: 1.12

Kind Regards

    
    -Original Message-
From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Wednesday, May 9, 2018 9:09 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

You stated you can connect with Drill Explorer on the client 
system, can you also configure a DSN in the ODBC admin and do a connection test 
successfully?
You mentioned you used the No Authentication option to connect to 
Drill? What version of the ODBC driver are you using? Also which version of 
Drill?

I assume you are running Windows?

On Tableau are you using the Generic ODBC data source or the Apache 
Drill data source to connect to Drill? Try both and see if either works. The 
Apache Drill data source connection can only support No and Basic (user/pass 
authentication).

--Andries

On 5/9/18, 11:12 AM, "Peter Edike" 
<peter.ed...@interswitchgroup.com> wrote:

I h

Re: Handhsake Error

Interesting, I did a quick test and it seems to be an issue with Authentication 
disabled on the drillbit.

In my drill-override.conf I turn off authentication and impersonation

drill.exec: {
  security.user.auth.enabled: false,
  impersonation.enabled: false,
  security.user.encryption.sasl.enabled: false,
  security.user.encryption.ssl.enabled: false
}

FYI, you may also want to make sure SSL it not enabled.

When I restart the drillbit and try to connect with the ODBC driver I get the 
following error

[MapR][Drill] (30)  Handshake failure occurred while trying to connect to 
local=ip-172-31-47-166:31010. Server message: handshake: short read

And in the drillbit.log I see

2018-05-10 14:45:30,553 [UserServer-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /172.31.47.166:31010 <--> 
/172.31.7.24:49715 (user server).  Closing connection.

This looks like a bug, and related to similar user issues on this list. Perhaps 
someone else can verify as well and log a bug. I noticed other issues when 
connecting via Tableau with the metadata queries failing as well as it was 
expecting a username to use. Very unfortunate error as the channel gets closed 
leading to a handshake error for the user on an user authentication issue.

As a workaround I would suggest you enable at least PLAIN authentication on the 
drillbit and use user/pass to access.

--Andries

On 5/10/18, 7:18 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

That’s the exact location from which I downloaded the odbc driver.

There is no authentication and yes drillbit is listening on 31010. I am 
connecting to drillbit directly 

Best regards,
Peter Edike

-Original Message-----
From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Thursday, May 10, 2018 3:08 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

Make sure to use v 1.3.16 of the ODBC driver 
http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.3.16.1049/

What authentication mechanism are you using?
Also are you connecting to Drillbit directly or through Zookeeper?
Try to connect to drillbit directly on port 31010 (if the drillbit config 
is the default ports), before trying zookeeper connections.

--Andries

On 5/10/18, 4:52 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

The error I am getting occurs whilst I am trying to configure a DSN in 
the ODBC Admin. I am unable to do a connection test successfully;

The version of ODBC Driver I am using is : MapR_Drill 1.3
My version of Drill is also: 1.12

Kind Regards

    -Original Message-
From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Wednesday, May 9, 2018 9:09 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

You stated you can connect with Drill Explorer on the client system, 
can you also configure a DSN in the ODBC admin and do a connection test 
successfully?
You mentioned you used the No Authentication option to connect to 
Drill? What version of the ODBC driver are you using? Also which version of 
Drill?

I assume you are running Windows?

On Tableau are you using the Generic ODBC data source or the Apache 
Drill data source to connect to Drill? Try both and see if either works. The 
Apache Drill data source connection can only support No and Basic (user/pass 
authentication).

--Andries

On 5/9/18, 11:12 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> 
wrote:

I have increased the timeout and I am quite sure there are no 
authentication modules enabled. Is there anyway I can check if the request is 
getting to the server in the first place

On May 9, 2018 7:09 PM, Parth Chandra <par...@apache.org> wrote:
If you haven't tried it already, try increasing the handshake 
timeout. Do
you have any security/authentication settings turned on? One 
possibility is
that an authentication module is being accessed by the server 
during the
handshake and the server is taking too long to reply to the 
handshake
causing the timeout.

On Wed, May 9, 2018 at 3:05 AM, Peter Edike <
peter.ed...@interswitchgroup.com> wrote:

>
> Hello everyone
>
> I am trying to setup a datasource to connect to a drillbit 
running on a
> host
>  I am using the Direct To DrillBit Option And have specified the
> ip-address of the server on which the dril

Re: Handhsake Error

Make sure to use v 1.3.16 of the ODBC driver
http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.3.16.1049/

What authentication mechanism are you using?
Also are you connecting to Drillbit directly or through Zookeeper?
Try to connect to drillbit directly on port 31010 (if the drillbit config is 
the default ports), before trying zookeeper connections.

--Andries

On 5/10/18, 4:52 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

The error I am getting occurs whilst I am trying to configure a DSN in the 
ODBC Admin. I am unable to do a connection test successfully;

The version of ODBC Driver I am using is : MapR_Drill 1.3
My version of Drill is also: 1.12

Kind Regards

-Original Message-
    From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Wednesday, May 9, 2018 9:09 PM
To: user@drill.apache.org
Subject: Re: Handhsake Error

You stated you can connect with Drill Explorer on the client system, can 
you also configure a DSN in the ODBC admin and do a connection test 
successfully?
You mentioned you used the No Authentication option to connect to Drill? 
What version of the ODBC driver are you using? Also which version of Drill?

I assume you are running Windows?

On Tableau are you using the Generic ODBC data source or the Apache Drill 
data source to connect to Drill? Try both and see if either works. The Apache 
Drill data source connection can only support No and Basic (user/pass 
authentication).

--Andries

On 5/9/18, 11:12 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

I have increased the timeout and I am quite sure there are no 
authentication modules enabled. Is there anyway I can check if the request is 
getting to the server in the first place

On May 9, 2018 7:09 PM, Parth Chandra <par...@apache.org> wrote:
If you haven't tried it already, try increasing the handshake timeout. 
Do
you have any security/authentication settings turned on? One 
possibility is
that an authentication module is being accessed by the server during the
handshake and the server is taking too long to reply to the handshake
causing the timeout.

On Wed, May 9, 2018 at 3:05 AM, Peter Edike <
peter.ed...@interswitchgroup.com> wrote:

>
> Hello everyone
>
> I am trying to setup a datasource to connect to a drillbit running on 
a
> host
>  I am using the Direct To DrillBit Option And have specified the
> ip-address of the server on which the drill bit service is running as 
well
> as the appropriate ports
> I can telnet from my windows pc into this port via the telnet command 
but
> all attempts to initiate a connection from DSN Setup Dialog Box Fails 
with
> Handshake Error.
>
> FAILED!
>
> [MapR][Drill] (1010) Error occurred while trying to connect: 
[MapR][Drill]
> (40)  Handshake timed out (HandshakeTimeout=5) while trying to 
connect to
> local=172.x.x.x:31010. Check whether Drillbit is running in a healthy 
state
> or increase the timeout.
>
> Warm Regards
> Peter E
> 
>
> This message has been marked as CONFIDENTIAL on Wednesday, May 9, 
2018 @
> 11:05:23 AM
>
>

Re: Handhsake Error

2018-05-09 Thread Andries Engelbrecht

You stated you can connect with Drill Explorer on the client system, can you 
also configure a DSN in the ODBC admin and do a connection test successfully?
You mentioned you used the No Authentication option to connect to Drill? What 
version of the ODBC driver are you using? Also which version of Drill?

I assume you are running Windows?

On Tableau are you using the Generic ODBC data source or the Apache Drill data 
source to connect to Drill? Try both and see if either works. The Apache Drill 
data source connection can only support No and Basic (user/pass authentication).

--Andries

On 5/9/18, 11:12 AM, "Peter Edike"  wrote:

I have increased the timeout and I am quite sure there are no 
authentication modules enabled. Is there anyway I can check if the request is 
getting to the server in the first place

On May 9, 2018 7:09 PM, Parth Chandra  wrote:
If you haven't tried it already, try increasing the handshake timeout. Do
you have any security/authentication settings turned on? One possibility is
that an authentication module is being accessed by the server during the
handshake and the server is taking too long to reply to the handshake
causing the timeout.

On Wed, May 9, 2018 at 3:05 AM, Peter Edike <
peter.ed...@interswitchgroup.com> wrote:

>
> Hello everyone
>
> I am trying to setup a datasource to connect to a drillbit running on a
> host
>  I am using the Direct To DrillBit Option And have specified the
> ip-address of the server on which the drill bit service is running as well
> as the appropriate ports
> I can telnet from my windows pc into this port via the telnet command but
> all attempts to initiate a connection from DSN Setup Dialog Box Fails with
> Handshake Error.
>
> FAILED!
>
> [MapR][Drill] (1010) Error occurred while trying to connect: [MapR][Drill]
> (40)  Handshake timed out (HandshakeTimeout=5) while trying to connect to
> local=172.x.x.x:31010. Check whether Drillbit is running in a healthy 
state
> or increase the timeout.
>
> Warm Regards
> Peter E
> 
>
> This message has been marked as CONFIDENTIAL on Wednesday, May 9, 2018 @
> 11:05:23 AM
>
>

Re: Exception While Querying Decimal Fields in Apache Drill

2018-04-27 Thread Andries Engelbrecht

What version of Drill are you using?
Also is planner.enable_decimal_data_type set to true on the system?

--Andries

On 4/27/18, 7:24 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

Tried That, It did not work. Still Fails with the exception. Let me even 
add that even if the query is a simple select from statement, as long any 
of the fields is decimal type, the statement will fail with the stated exception

Please Help a lot depends on this

Best regards,
Peter Edike

This message has been marked as CONFIDENTIAL on Friday, April 27, 2018 @ 
3:24:36 PM

-Original Message-
    From: Andries Engelbrecht <aengelbre...@mapr.com> 
Sent: Friday, April 27, 2018 3:07 PM
To: user@drill.apache.org
Subject: Re: Exception While Querying Decimal Fields in Apache Drill

Perhaps try to convert the predicate and select operations involving the 
decimal types to float or similar.

i.e tran_completed = 1.0   and ((cast(SETTLE_AMOUNT_IMPACT as double) 
*(-1.0))/100.0)

Alternatively you may have to cast the decimals as float, but that will be 
more cumbersome.

--Andries

On 4/27/18, 5:18 AM, "Peter Edike" <peter.ed...@interswitchgroup.com> wrote:

I am trying to run the following query in apache drill, I am querying 
data stored in parquet files using the following query


select pan, count(*) as number_of_transactions ,
terminal_id,SUM((cast(SETTLE_AMOUNT_IMPACT as double) *-1)/100) AS 
settle_amount_impact


from dfs.`/iswdata/storage/products/superswitch/parquet/transactions`

where pan like '506126%' and terminal_id like '1%' and
sink_node_name like ('SWTDB%')
and source_node_name not like ('SWTDBLsrc')
and tran_completed=1
and tran_reversed = 0
and tran_postilion_originated = 1
AND tran_type = '01'
--and pan like '506126%0011'
group by pan,terminal_id

The Schema for the data I am querying is as follows


post_tran_id LONG

 post_tran_cust_id :LONG

 settle_entity_id :INTEGER

 batch_nr : INTEGER

 prev_post_tran_id : LONG

 next_post_tran_id : LONG

 sink_node_name : STRING

 tran_postilion_originated : DECIMAL

 tran_completed : DECIMAL

 tran_amount_req : DECIMAL

 tran_amount_rsp : DECIMAL

 settle_amount_impact : DECIMAL

 tran_cash_req : DECIMAL

 tran_cash_rsp : DECIMAL

tran_currency_code : STRING

tran_tran_fee_req : DECIMAL

tran_tran_fee_rsp : DECIMAL

tran_tran_fee_currency_code : STRING

tran_proc_fee_req : DECIMAL

tran_proc_fee_rsp : DECIMAL

tran_proc_fee_currency_code : STRING

settle_amount_req : DECIMAL

settle_amount_rsp : DECIMAL

settle_cash_req : DECIMAL

settle_cash_rsp : DECIMAL

settle_tran_fee_req : DECIMAL

settle_tran_fee_rsp : DECIMAL

settle_proc_fee_req : DECIMAL

settle_proc_fee_rsp : DECIMAL

settle_currency_code : STRING

However When I run the query against the dataset, I get the following 
exception


SYSTEM ERROR: ClassCastException: 
org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast to 
org.apache.drill.exec.vector.VariableWidthVector


More so, the same error occurs when I include a decimal field in the 
select clause. Please, is there something I am missing or doing wrong, Any 
pointer will be deeply appreciated

Kind Regards

Peter

Re: Exception While Querying Decimal Fields in Apache Drill

2018-04-27 Thread Andries Engelbrecht

Perhaps try to convert the predicate and select operations involving the 
decimal types to float or similar.

i.e tran_completed = 1.0   and ((cast(SETTLE_AMOUNT_IMPACT as double) 
*(-1.0))/100.0)

Alternatively you may have to cast the decimals as float, but that will be more 
cumbersome.

--Andries

On 4/27/18, 5:18 AM, "Peter Edike"  wrote:

I am trying to run the following query in apache drill, I am querying data 
stored in parquet files using the following query


select pan, count(*) as number_of_transactions ,
terminal_id,SUM((cast(SETTLE_AMOUNT_IMPACT as double) *-1)/100) AS 
settle_amount_impact


from dfs.`/iswdata/storage/products/superswitch/parquet/transactions`

where pan like '506126%' and terminal_id like '1%' and
sink_node_name like ('SWTDB%')
and source_node_name not like ('SWTDBLsrc')
and tran_completed=1
and tran_reversed = 0
and tran_postilion_originated = 1
AND tran_type = '01'
--and pan like '506126%0011'
group by pan,terminal_id

The Schema for the data I am querying is as follows


post_tran_id LONG

 post_tran_cust_id :LONG

 settle_entity_id :INTEGER

 batch_nr : INTEGER

 prev_post_tran_id : LONG

 next_post_tran_id : LONG

 sink_node_name : STRING

 tran_postilion_originated : DECIMAL

 tran_completed : DECIMAL

 tran_amount_req : DECIMAL

 tran_amount_rsp : DECIMAL

 settle_amount_impact : DECIMAL

 tran_cash_req : DECIMAL

 tran_cash_rsp : DECIMAL

tran_currency_code : STRING

tran_tran_fee_req : DECIMAL

tran_tran_fee_rsp : DECIMAL

tran_tran_fee_currency_code : STRING

tran_proc_fee_req : DECIMAL

tran_proc_fee_rsp : DECIMAL

tran_proc_fee_currency_code : STRING

settle_amount_req : DECIMAL

settle_amount_rsp : DECIMAL

settle_cash_req : DECIMAL

settle_cash_rsp : DECIMAL

settle_tran_fee_req : DECIMAL

settle_tran_fee_rsp : DECIMAL

settle_proc_fee_req : DECIMAL

settle_proc_fee_rsp : DECIMAL

settle_currency_code : STRING

However When I run the query against the dataset, I get the following 
exception


SYSTEM ERROR: ClassCastException: 
org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast to 
org.apache.drill.exec.vector.VariableWidthVector


More so, the same error occurs when I include a decimal field in the select 
clause. Please, is there something I am missing or doing wrong, Any pointer 
will be deeply appreciated

Kind Regards

Peter

Re: create table in mysql db

2018-03-28 Thread Andries Engelbrecht

Unfortunately Drill can not write to MySQL, only read. Perhaps write to a CSV 
file on HDFS and then import the CSV file into MySQL.

Regards
--Andries


On 3/28/18, 5:50 AM, "Shrinivas Deshmukh"  
wrote:

Hi,

This is Shrinivas.

I am joining a mysql db table with a csv file in hdfs. I want to store the
results in a new mysql db table. I get an error :  Schema [MySQL.dummy] is
immutable.

MySQL is my mysql plugin name and dummy is the database name.

Here is my MySQL plugin configuration :
{
  "type": "jdbc",
  "driver": "com.mysql.jdbc.Driver",
  "url": "jdbc:mysql://host:3306",
  "username": "root",
  "password": "",
  "enabled": true
}

Please help me.

Thank you.

Regards,
Shrinivas Deshmukh.

Re: Way to "pivot"

2018-03-06 Thread Andries Engelbrecht

If the X, Y and Z is unique for each timestamp you can perhaps use group by 
(dt, X, Y , Z)  and case to make the X, Y , Z columns. May be worth looking 
into, but is going to be expensive to execute. Just an idea, but have not 
tested it.

--Andries


On 3/6/18, 6:46 AM, "John Omernik"  wrote:

I am not sure if this is the right thing for what I am trying to do, but I
have data in this formate


source   dtvalue
X2018-03-06 11:00 0.31
X2018-03-06 12:00 0.94
X2018-03-06 13:00 0.89
X2018-03-06 14:00 0.01
X2018-03-06 15:00 0.43
Y2018-03-06 11:00 1.43
Y2018-03-06 12:00 0.50
Y2018-03-06 13:00 0.10
Y2018-03-06 14:00 0.42
Y2018-03-06 15:00 0.41
Z2018-03-06 11:00 5.34
Z2018-03-06 12:00 4.32
Z2018-03-06 13:00 4.20
Z2018-03-06 14:00 0.89
Z2018-03-06 15:00 0.01

I'd like to graph it as three lines (X, Y and Z) over time, so the graph
tool I am using asks for it this format:



dt   X Y Z

2018-03-06 11:00 0.31   1.43 5.34
2018-03-06 12:00 0.94   0.50 4.32
2018-03-06 13:00 0.89   0.10 4.20
2018-03-06 14:00 0.01   0.42 0.89
2018-03-06 15:00 0.43   0.41 0.01


So I think that would be a PIVOT like function right (which I don't think
Drill has) Is there a way to "fake" this in Drill using some other built in
functions?

Thanks!

John

Re: Schema problems trying to convert JSON to Parquet

2018-02-23 Thread Andries Engelbrecht

This is a challenge when dealing with JSON. You can either force the data type
in the CTAS statement (likely better option) or deal with the data type change
in parquet table(s) by using CAST, etc. In the case of zip codes you need to
consider if it will be 5 digits or the extended 5-4 digits to decide if the
data type should be INT or VARCHAR.

Also look into the TYPEOF function, which you can use with CASE to deal with
these types of issues.

I prefer to deal with data issues as soon as possible in the pipeline, so the
tables you create are consistent and clean.

--Andries

On 2/23/18, 12:04 PM, "Lee, David" wrote:

Using Drill's CTAS statements I've run into a schema inconsistency issue
and I'm not sure how to solve it..

CREATE TABLE name [ (column list) ] AS query;

If I have a directory called Cities which have JSON files which look like:

a.json:
{ "city":"San Francisco", "zip":"94105"}
{ "city":"San Jose", "zip":"94088"}

b.json:
{ "city":"Toronto ", "zip": null}
{ "city":"Montreal", "zip" null}

If I create a parquet file out of the Cities directory I will end up with
files called:

1_0_0.parquet through 1_5_1.parquet

Now I got a problem:

Most of the parquet files have a column type of char for zip.
Some of the parquet files have a column type of int for zip because the zip
value for a group of records was NULL..

This produces schema change errors later when trying to query the parquet
directory.

Is it possible for Drill to do a better job learning schemas across all
json files in a directory before creating parquet?

This message may contain information that is confidential or privileged. If
you are not the intended recipient, please advise the sender immediately and
delete this message. See
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.blackrock.com_corporate_en-2Dus_compliance_email-2Ddisclaimers=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU=xX9pt8fqAgZu6-OzbVwvpCjEh4kJpnqajJvMlRkLkZQ=hpMAe2P_obD6f_4QKWUE_yeIbxM6me3oniVH3btG2Eg=
for further information. Please refer to
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.blackrock.com_corporate_en-2Dus_compliance_privacy-2Dpolicy=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU=xX9pt8fqAgZu6-OzbVwvpCjEh4kJpnqajJvMlRkLkZQ=TCbMg__Jd7CA-8aVdb8xaCCPLXmqWwRNk1mHMB5d7uo=
for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.blackrock.com_corporate_en-2Dus_about-2Dus_contacts-2Dlocations=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU=xX9pt8fqAgZu6-OzbVwvpCjEh4kJpnqajJvMlRkLkZQ=99fuo3ra5r022Cja5zizkZcv2vzuxNneLGJjKbwv6Kw=.

Re: Fixed-width files

2018-02-20 Thread Andries Engelbrecht

You can also try and see if you can just use the CSV plugin to read a line as 
columns[0] and then use the substr function to pull out the fields in the line.
http://drill.apache.org/docs/string-manipulation/#substr

Here is a simple example

Simple csv file

[test]$ cat test.csv
col1col2col3


jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), 
substr(columns[0],5,4), substr(columns[0],9,4) from  
dfs.root.`/data/csv/test/test.csv`;
+-+-+-+
| EXPR$0  | EXPR$1  | EXPR$2  |
+-+-+-+
| col1| col2| col3|
+-+-+-+



--Andries




On 2/20/18, 1:17 AM, "Flavio Pompermaier"  wrote:

For the moment I've created an improvement issue about this:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6170=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ=ajRYDHKrMFgV1AMW2Q8weYDZtzb7-U5CqR9fML7ihno=

On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier 
wrote:

> Thanks Paul for this suggestion, I think I'm going to give it a try.
> Once I've created my EasyFormatPlugin where should I put the produced jar?
> in which folder within jars directory?
>
> On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers 
> wrote:
>
>> It may be that by "fixed width text", Flavio means a file in which the
>> text columns are of fixed width: kind of like old-school punch cards.
>> Drill has no reader for this use case, but if you are a Java programmer,
>> you can create one. See Drill Pull Request #1114 [1] for one example of a
>> regex reader along with pointers to a second example I'm building for a
>> book. Should be easy to adopt this code to take a list of column widths 
in
>> place of the regex. Actually, you could use the regex with a pattern that
>> just picks out a fixed number of characters.
>> Thanks,
>> - Paul
>>
>> [1]  
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ=-0LdlBnmAXaipanP87yJezn5HPEHQIQVX5izxnNTYFY=
>>
>>
>>
>>
>> On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua <
>> kkha...@mapr.com> wrote:
>>
>>  As long as you have delimiters, you should be able to import it as a
>> regular CSV file. Using views that define the fixed-width nature should
>> help operators downstream work more efficiently.
>>
>> -Original Message-
>> From: Flavio Pompermaier [mailto:pomperma...@okkam.it]
>> Sent: Monday, February 19, 2018 6:50 AM
>> To: user@drill.apache.org
>> Subject: Fixed-width files
>>
>> Hi to all,
>> I'm currently looking for the best solution to load a fixed-width text
>> file into Drill.
>> Is there any way right now to do that? Is there anyone that already have
>> a working connector?
>> Is it better to implement a brand new FormatPluginConfig or
>> StoragePluginConfig?
>>
>> Best,
>> Flavio
>>
>>
>

Re: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-29 Thread Andries Engelbrecht

Also a couple of other things to consider (may not directly fix your current 
issue, but good practices).

1) In the S3 bucket how many parquet files are present that you are trying to 
query with Drill? (Other have asked for the Drill plans as well, as it will 
also be good to see how many minor fragments the Drill cluster is using to read 
the S3 data).
2) The View with Select * from x is typically a bad idea when dealing with 
BI tools. Drill is based on Schema discovery and by doing this you are forcing 
Drill to discover the Schema every time a tool requests metadata, and you also 
make query planning much harder and more expensive.
3) For parquet data you may consider updating the metadata cache, help to 
improve query planning speed, metadata operations, etc. Also a good check to 
see if the data is healthy.

You may consider looking at some pointers here .
https://community.mapr.com/community/exchange/blog/2017/01/25/drill-best-practices-for-bi-and-analytical-tools

--Andries

On 1/28/18, 6:18 PM, "Kunal Khatua"  wrote:

Hi Francis

Looks like the logs didn’t get attached.

As for the physical plan, ignore the “memory = 0.0” ; because it is a 
physical plan and not the actual executed query’s profile.

What you want to do to debug the possible area where the memory is being 
consumed is in the query’s profile page.

This will tell you which is the longest running major fragment:

http://:8047/profiles/#fragment-overview

You’re looking for the Max Peak Memory column to see which ones are holding 
the most memory.

http://:8047/profiles/#operator-overview

And this will tell you the statistics for the various operators’ 
consumption of memory.

For running such a query through the WebUI is generally not recommended, 
because (by design), the WebUI creates a ResultSetListener in the Drillbit 
where it holds all the records before sending anything back. 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6050=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU=tIURVi72-_36So0nsCefh5MA-mOIGlH1cc7sltau_LE=cgP4XNEuVrjWyzrt2dbKEP6oC2dDKEYdr8jnGO-kJkc=)

However, for Tableau, the DrillClient should have been consuming the 
records, so I’m wondering why Drillbit should run out of memory .

Could you share the following for both scenarios (the Web UI and when 
running via Tableau)?

  1.  The profiles

  2.  The Drillbit logs (drillbit.out, drillbit.log, etc)

Thanks

Kunal

From: Francis McGregor-Macdonald [mailto:fran...@mc-mac.com]

Sent: Sunday, January 28, 2018 2:49 PM

To: user@drill.apache.org

Subject: Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses 
unlimited memory

Hi all,

A physical plan attached ... all memory appears to be 0.0 which seems odd?

Thanks

On Sun, Jan 28, 2018 at 10:37 PM, Francis McGregor-Macdonald 
> wrote:

And with logs as attachments.

On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald 
> wrote:

Thanks Paul and Kunal,

I think I have the right information now. With Paul's changes (and fixing 
up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still 
blowing past memory limits. It does the same thing when re-running the query 
from the web console so presumably its not actually Tableau related despite me 
first generating it that way.

Thanks.

On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald 
> wrote:

Thanks Paul,

I will update with your suggested memory allocations also and retry.

Zookeeper crashed too which might explain more? I have attached the logs 
from Zookeeper too.

Thanks

On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers 
> wrote:

Hi Francis,

Thanks much for the

Re: Drill Push to Tableau, Error -

2017-12-18 Thread Andries Engelbrecht

Does Tableau automatically generate the DML or is it user generated?
Is the table used for multiple sessions or only for a single session?
Does other session create the same/similar tables for use?

As Kunal mentioned temporary tables may the way to go.

--Andries


On 12/17/17, 9:36 AM, "Kunal Khatua"  wrote:

It looks like your system is trying to create the attributes table 
concurrently. Based on the name of the table, my hunch is that your workflow(s) 
is/are trying to generate tables with the same name (attributes) for use in 
reporting. 

The suggested way to work around this is to use a naming convention that 
hints to the report(s) that will leverage the tables.

So, instead of 2 or more reports trying to create the same 'attributes' 
table, how about you try something like .. 'attr_report1' .. 'attr_report2' , 
etc. ?

Also, if these tables are transient in nature, you can consider using the 
'create temp table as ...'  SQL. This will create temporary tables with the 
lifespan of the connection you're working on. The moment you close that 
connection, Drill will clean up. So, if you have multiple connections trying to 
create a temporary 'attributes' table, they will all be isolated. 

Hope that helps. Let us know how else you are using Drill.

Thanks
~ Kunal


-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com] 
Sent: Sunday, December 17, 2017 9:30 AM
To: user@drill.apache.org
Subject: FW: Drill Push to Tableau, Error - 

Forwarded from d...@drill.apache.org

From: Spinn, Brandi [mailto:brandi.sp...@siriusxm.com]
Sent: Friday, December 15, 2017 1:46 PM
To: d...@drill.apache.org
Subject: Drill Push to Tableau, Error -

Hello,

We are currently running a project which is utilizing the Drill push to 
Tableau function to be able to work with our data sets, we are already working 
with Tableau regarding our needs and determined that this is our best course of 
action considering how large data sets are - over 2 million rows per day.

At the moment we have several visualizations we have published, but we are 
running in an issue each morning where some of them are not updating according 
to our schedules, and when we review the logs we find some the same type of 
"fatal" errors that do not always allow the visualizations to update.

I have reached out to our account rep through Tableau for possible 
guidance, however, you might be a more appropriate resource; any help you could 
provide would be appreciated, I have not been able to find much via the 
internets.

Below is a sample of the error codes we are seeing, please let me know you 
are able to assist or if you need any additional information. Thank you!

Auto_AOD_Content:288 - create table dfs.tmp.attributes as select distinct 
channels.marketingname as channelmarketingname, channels.streamingname as 
channelstreamingname, channels.channelguid as channelGuid, channels.channelid 
as channelId, categories.category_name as channelcategory, CASE when 
music.channel_guid is not null then 'Music' else null end as genre_Music, CASE 
when news.channel_guid is not null then 'News' else null end as genre_News, 
CASE when sports.channel_guid is not null then 'Sports' else null end as 
genre_Sports,  CASE when talk.channel_guid is not null then 'Talk' else null 
end as genre_Talk, CASE when howard.channel_guid is not null then 'Howard' else 
null end as genre_Howard, categories.channel_name channelName from 
dfs.root.`/SXM/archive/parsed/Channel-parsed-type2/2017-12-14*` channels join 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` categories on 
categories.channel_guid=channels.channelguid left join (select channel_guid 
from dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Music') music on channels.channelguid = 
music.channel_guid left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'News') news on channels.channelguid = news.channel_guid 
left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Sports') sports on channels.channelguid = 
sports.channel_guid left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Talk') talk on channels.channelguid = talk.channel_guid 
left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Howard') howard on channels.channelguid = 
howard.channel_guid

Auto_AOD_Content:304 - There was a SQL ERROR on DRILL side

Auto_AOD_Content:305 - VALIDATION ERROR: A table or view with given name 
[attributes] already exists in schema [dfs.tmp]

Re: sqlline parquet to tsv filesize imabalance causing slow sqoop export to MS sql server

2017-11-17 Thread Andries Engelbrecht

You can also try to add an order by at the end of the CTAS.
I.e. 
create table dfs.tmp.`mytable_export`
as select
ID, NAME, STATUS, GROUP, from_unixtime(etl_date/1000) as etl_date from 
dfs.root.`/location/of/table/to/convert/mytable` ORDER BY ID;

Only use this as needed, as the sort is expensive and mostly used for tables 
that will be read frequently. First try Kunal’s suggestion as it should be much 
less expensive.

--Andries


On 11/16/17, 10:28 PM, "Kunal Khatua"  wrote:

It might be that your parallelization is causing it to generate 4 files, 
where only <= 3  files are sufficient. 

Try experimenting with the planner. width .max_per_query  to a value of 3 
... that might help.

https://drill.apache.org/docs/configuration-options-introduction/


-Original Message-
From: Reed Villanueva [mailto:rvillanu...@ucera.org] 
Sent: Thursday, November 16, 2017 5:29 PM
To: user@drill.apache.org
Subject: sqlline parquet to tsv filesize imabalance causing slow sqoop 
export to MS sql server

I am new to using drill and am trying to convert a table stored on hadoop 
dfs as .parquet to .tsv format using sqlline that came with the drill package. 
The problem is that when doing this, the tsv files are poorly 'balanced'. When 
checking the sizes of the converted files, I see:

+---++

| Fragment  | Number of records written  |

+---++

| 1_3   | 1  |

| 1_1   | 306955 |

| 1_2   | 493009 |

| 1_0   | 698228 |

+---++

[mapr@mnode02 mytable_export]$ ls -l
total 486470
-rwxr-xr-x 1 mapr mapr 105581719 Oct 19 10:25 1_0_0.tsv -rwxr-xr-x 1 mapr 
mapr 155385226 Oct 19 10:25 1_1_0.tsv -rwxr-xr-x 1 mapr mapr 237176680 Oct 19 
10:25 1_2_0.tsv
-rwxr-xr-x 1 mapr mapr   279 Oct 19 10:25 1_3_0.tsv

So when trying to export the files from hadoop using sqoop export, (I
think) the imbalance is causing the export progress to go very slowly for 
some of the job mappers (when checking the hadoop applications web interface).

My question, then, is whether there is a way to control the size and amount 
of the tsv files being created? Ultimately I am trying to export the table in 
tsv format to a Microsoft SQL Server DB(and can't alleviate the slowdown with 
--batch or --direct because sqoop apparently does not support those options for 
MS SQL server, nor is there a way to sqoop export parquet to sql server).

The sql I am using to use to convert the table is mytable.sql:

alter session set `store.format`='tsv';
create table dfs.tmp.`mytable_export`
as select
ID, NAME, STATUS, GROUP, from_unixtime(etl_date/1000) as etl_date from 
dfs.root.`/location/of/table/to/convert/mytable`;

with this script bash myscript.sh mytable:

#!/bin/bash
...
tablename=$1
sqldir="/path/to/sql/to/run/"$tablename".sql"
echo $sqldir
...
#write table to tsv
/opt/mapr/drill/drill-1.8.0/bin/sqlline \
-u jdbc:drill:zk=mnode01:5181,mnode02:5181,mnode03:5181 \
-n username\
-p password \
--run=$sqldir
...

Any suggestions or advice would be appreciated, thanks.

--
This electronic message is intended only for the named recipient, and may 
contain information that is confidential or privileged. If you are not the 
intended recipient, you are hereby notified that any disclosure, copying, 
distribution or use of the contents of this message is strictly prohibited. If 
you have received this message in error or are not the named recipient, please 
notify us immediately by contacting the sender at the electronic mail address 
noted above, and delete and destroy all copies of this message. Thank you.

Re: RPC error when accesing Drill through tableau

2017-11-07 Thread Andries Engelbrecht

You may want to log the ODBC driver. Which version of the ODBC driver are you 
using?

Are you using Windows or OSX?

For windows edit the registry entry HKEY_LOCAL_MACHINE/SOFTWARE/MapR/MapR Drill 
ODBC Driver/Driver

On OSX copy file mapr.drillodbc.ini in /Library/mapr/drill/Setup/ to 
/Library/mapr/drill/lib

Set the log level and location in the above for either OS.

Log levels

0
Disables all logging.
1
Logs severe error events that lead the driver to abort.
2
Logs error events that might allow the driver to continue running.
3
Logs events that might result in an error if action is not taken.
4
Logs general information that describes the progress of the driver.
5
Logs detailed information that is useful for debugging the driver.
6
Logs all driver activity.


Does it take a long time for the dashboards to refresh in Tableau?
You may also want to look at the query profiles in Drill as well.



--Andries




On 11/6/17, 5:16 PM, "Divya Gehlot" <divya.htco...@gmail.com> wrote:

Hi Andries,
Thanks for the advise .
I am using Drill 1.10 and Tableau version 10.4 .
I am connecting Drill thorugh Apache Drill connector.
By Posting in community I wanted the know whether the issue is at tableau
server or at Apache Drill ?


Thanks,
Divya

On 3 November 2017 at 22:05, Andries Engelbrecht <aengelbre...@mapr.com>
wrote:

> Use Drill 1.10 with the current published ODBC driver, there may be some
> issues with Drill 1.11.
> Also use Tableau 10.2+ with the Apache Drill connector. This works much
> better than generic ODBC.
> Make sure you can connect to Drill via ODBC, use Drill Explorer as 
example.
>
> Also look at some Drill Best Practices for BI Tools
> https://community.mapr.com/community/exchange/blog/2017/
> 01/25/drill-best-practices-for-bi-and-analytical-tools
>
>
> —Andries
>
>
>
>
> On Nov 3, 2017, at 12:57 AM, Divya Gehlot <divya.htco...@gmail.com divya.htco...@gmail.com>> wrote:
>
> Hi,
> I am getting protocol disconnected error in tableau.
> When checked the drill logs I could see below
>
> 2017-11-03 07:51:52,798 [BitServer-4] WARN
> o.a.drill.exec.work.foreman.Foreman - Dropping request to move to
> COMPLETED
> state as query is already at CANCELED state (which is terminal).
> 2017-11-03 07:51:52,798 [BitServer-4] WARN
> o.a.d.e.w.b.ControlMessageHandler - Dropping request to cancel fragment.
> 2603e022-e3c1-e5a7-d54f-08e1399b4936:0:0 does not exist.
> 2017-11-03 07:51:52,805 [BitServer-4] INFO
> o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate
> query result to initiating client. This would happen if a client is
> disconnected before response notice can be sent.
> org.apache.drill.exec.rpc.RpcException: Failure sending message.
>at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:124)
> [drill-rpc-1.10.0.jar:1.10.0]
>at
> org.apache.drill.exec.rpc.user.UserServer$BitToUserConnection.
> sendResult(UserServer.java:199)
> [drill-java-exec-1.10.0.jar:1.10.0]
>at
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.
> close(Foreman.java:868)
> [drill-java-exec-1.10.0.jar:1.10.0]
>at
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:1001)
> [drill-java-exec-1.10.0.jar:1.10.0]
>at
> org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:116)
> [drill-java-exec-1.10.0.jar:1.10.0]
>at
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.
> processEvent(Foreman.java:1027)
> [drill-java-exec-1.10.0.jar:1.10.0]
>
>
> As suggested by one of the Drill user I set the
> rpc: {
>user: {
>  timeout: 30
>}
>  },
>
> When I set above conf then could see the the dashboard is still in loading
> state while  no error log in drill.
>
> Appreciate the help to resolve the issue!
>
> Thanks,
> Divya
>

Re: RPC error when accesing Drill through tableau

2017-11-03 Thread Andries Engelbrecht

Use Drill 1.10 with the current published ODBC driver, there may be some issues 
with Drill 1.11.
Also use Tableau 10.2+ with the Apache Drill connector. This works much better 
than generic ODBC.
Make sure you can connect to Drill via ODBC, use Drill Explorer as example.

Also look at some Drill Best Practices for BI Tools
https://community.mapr.com/community/exchange/blog/2017/01/25/drill-best-practices-for-bi-and-analytical-tools


—Andries




On Nov 3, 2017, at 12:57 AM, Divya Gehlot 
> wrote:

Hi,
I am getting protocol disconnected error in tableau.
When checked the drill logs I could see below

2017-11-03 07:51:52,798 [BitServer-4] WARN
o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED
state as query is already at CANCELED state (which is terminal).
2017-11-03 07:51:52,798 [BitServer-4] WARN
o.a.d.e.w.b.ControlMessageHandler - Dropping request to cancel fragment.
2603e022-e3c1-e5a7-d54f-08e1399b4936:0:0 does not exist.
2017-11-03 07:51:52,805 [BitServer-4] INFO
o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate
query result to initiating client. This would happen if a client is
disconnected before response notice can be sent.
org.apache.drill.exec.rpc.RpcException: Failure sending message.
   at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:124)
[drill-rpc-1.10.0.jar:1.10.0]
   at
org.apache.drill.exec.rpc.user.UserServer$BitToUserConnection.sendResult(UserServer.java:199)
[drill-java-exec-1.10.0.jar:1.10.0]
   at
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:868)
[drill-java-exec-1.10.0.jar:1.10.0]
   at
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:1001)
[drill-java-exec-1.10.0.jar:1.10.0]
   at
org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:116)
[drill-java-exec-1.10.0.jar:1.10.0]
   at
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1027)
[drill-java-exec-1.10.0.jar:1.10.0]


As suggested by one of the Drill user I set the
rpc: {
   user: {
 timeout: 30
   }
 },

When I set above conf then could see the the dashboard is still in loading
state while  no error log in drill.

Appreciate the help to resolve the issue!

Thanks,
Divya

Re: Drill Capacity

2017-11-02 Thread Andries Engelbrecht

How much memory is allocated to the Drill environment?
Embedded or in a cluster?

I don’t think there is a particular limit, but a single JSON file will be read 
by a single minor fragment, in general it is better to match the number/size of 
files to the Drill environment.

In the short term try to bump up planner.memory.max_query_memory_per_node in 
the options and see if that works for you.

--Andries



On 11/2/17, 7:46 AM, "Yun Liu"  wrote:

Hi,

I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.

Thanks,

Yun Liu
Solutions Delivery Consultant
321 West 44th St | Suite 501 | New York, NY 10036
+1 212.871.8355 office | +1 646.752.4933 mobile

CAST, Leader in Software Analysis and Measurement
Achieve Insight. Deliver Excellence.
Join the discussion http://blog.castsoftware.com/
LinkedIn | 
Twitter | 
Facebook

Re: Drill performance question

2017-10-30 Thread Andries Engelbrecht

As Ted touched on - CSV requires a lot of text parsing and data type 
conversion, while parquet doesn’t. This typically saves a ton of CPU when you 
query, even if you can’t leverage partitioning or columnar advantages.

--Andries

On 10/30/17, 10:57 AM, "Saurabh Mahapatra"  wrote:

As Kunal points out, Parquet and partitioning should help. The concept of
partitioning is not new to warehousing type of queries. To get performance,
you aggregate and partition by the granularity of the query you are
seeking. For example, by day, by week, by month or even quarters. The goal
is to avoid on the fly aggregation for speed.

It is not always true that a columnar format will outperform a row-based
format in all situations. If you are doing point-access based analytics
where the selectivity of your filter is very very high and you want access
to all columns, row-based formats may actually do better.

Data layout and query characteristics are the yin-yang of performance for
any query engine on massive scale data. They need to match for the problem
you are solving.

Best,
Saurabh

On Mon, Oct 30, 2017 at 10:46 AM, Kunal Khatua  wrote:

> I second Ted's suggestion!
>
> Since we haven't seen what your profile's operator overview, we can't say
> for sure why the performance isn't good.
>
> On the top of my head ,these are most likely things happening that make
> your performance so bad:
>
> 1. All the CSV files are being read and rows rejected because there is no
> way for Drill to understand the which segments of data have the relevant
> time ranges that you might be looking at.
> 2. Your CSV data has many columns, but you only care about a few... CSV
> readers will need to process the irrelevant ones too.
> 3. There is a cost to reading and casting/converting the data into a
> date/time format.
>
> So, as Ted suggested, writing as a parquet file will give you the most
> bang for the buck.
> Partitioning on, say, a date helps.. but you also don't want it too
> granular.
>
> Last but not the least, if you are doing a query of the form..
> select X,Y,Z where time between  and 
> you will benefit immensely from the data being sorted with that time 
field.
>
> Hope that helps.
>
> ~ Kunal
>
> -Original Message-
> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
> Sent: Monday, October 30, 2017 9:34 AM
> To: user 
> Subject: Re: Drill performance question
>
> Also, on a practical note, Parquet will likely crush CSV on performance.
> Columnar. Compressed. Binary.  All that.
>
>
>
> On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra <
> saurabhmahapatr...@gmail.com> wrote:
>
> > Hi Charles,
> >
> > Can you share some query patterns on this data? More specifically, the
> > number of columns you retrieving out of the total, the filter on the
> > time dimension itself (ranges and granularities)
> >
> > How much is ad hoc and how much is not.
> >
> > Best,
> > Saurabh
> >
> > On Mon, Oct 30, 2017 at 9:27 AM, Charles Givre  wrote:
> >
> > > Hello all,
> > > I have a dataset consisting of about 16 GB of CSV files.  I am
> > > looking to do some time series analysis of this data, and created a
> > > view but when I started doing aggregate queries using components of
> > > the date, the performance was disappointing.  Would it be better to
> > > do a CTAS and partition by components of the date?  If so, would
> > > parquet be the best format?
> > > Would anyone have other suggestions of things I could do to improve
> > > performance?
> > > Thanks,
> > > — C
> >
>

Re: Time out error when query from BI tool like tableau

2017-10-13 Thread Andries Engelbrecht

With Tableau try to use Tableau 10.2+ and use the Apache Drill Data Source vs 
generic ODBC.

Also see this in regards to Drill Metadata management and Views that will help 
a lot to speed up prepared statement speedup. Created this a while ago since 
many users run into some of these issues.

https://community.mapr.com/community/exchange/blog/2017/01/25/drill-best-practices-for-bi-and-analytical-tools

--Andries

On 10/13/17, 12:49 AM, "Divya Gehlot"  wrote:

Hi,
I am accessing Drill multi directory parquet tables in Tableau server and
when I try to access it I am getting below error :

 prepared statement took longer than 1 ms. Query cancellation
> requested. Retry after changing the option
> 'prepare.statement.create_timeout_ms' to a higher value. ]

But same query when I exceute through Drill terminal or web console I dont
any errors .

Appreciate the help !!

Thanks,
Divya

Querying MapR-DB JSON Tables not returning results when specifying columns or CF's

2017-09-11 Thread Andries Engelbrecht

Created a MapR-DB JSON table, but not able to query data specifying column or 
CF’s.

When doing a select * the data is returned.

i.e.

0: jdbc:drill:> select * from dfs.maprdb.`/sdc/nycbike` b limit 1;
++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
|_id |  age  |  arc  | avg_speed_mph  | bikeid  | birth 
year  | end station id  | end station latitude  | end station longitude  | end 
station name  | gender  | start station id  | start station latitude  | start 
station longitude  | start station name  | start_date  |  starttime   | 
  stoptime   | tripduration  |   tripid   |  usertype   
|  station  |
++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
| 2017-04-01 00:00:58-25454  | 51.0  | 0.39  | 7.2| 25454   | 
1966.0  | 430 | 40.7014851| -73.98656928   
| York St & Jay St  | M   | 217   | 40.70277159 | 
-73.99383605 | Old Fulton St   | 2017-04-01  | 2017-04-01 
00:00:58  | 2017-04-01 00:04:14  | 195   | 2017-04-01 00:00:58-25454  | 
Subscriber  | {"end station id":"430"}  |
++---+---++-+-+-+---++---+-+---+-+--+-+-+--+--+---++-+---+
1 row selected (0.191 seconds)


However trying to specify a column or CF name nothing is returned.

Specify a column name

0: jdbc:drill:> select bikeid from dfs.maprdb.`/sdc/nycbike` b limit 10;
+--+
|  |
+--+
+--+
No rows selected (0.067 seconds)

0: jdbc:drill:> select b.bikeid from dfs.maprdb.`/sdc/nycbike` b limit 1;
+--+
|  |
+--+
+--+
No rows selected (0.062 seconds)


Specify a CF name the same result.

0: jdbc:drill:> select b.station from dfs.maprdb.`/sdc/nycbike` b limit 1;
+--+
|  |
+--+
+--+
No rows selected (0.063 seconds)


Drill 1.10 and the user has full read/write/traverse permissions on the table.




Thanks

Andries

Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Andries Engelbrecht

Typically when you use the MapR-FS plugin you don’t need to specify the cluster 
root path in the dfs workspace.

Instead of "location": "/mapr/cluster3",   use "location": "/",

"connection": "maprfs:///", already points to the default MapR cluster root.

--Andries

On 9/11/17, 2:23 AM, "Takeo Ogawara"  wrote:

Dear all,

I’m using PCAP storage plugin over MapR FS(5.2.0) with Drill(1.11.0) 
compiled as follows.
$ mvn clean install -DskipTests -Pmapr

Some queries caused errors as following.
Does anyone know how to solve these errors?

1. Query error when cluster-name is not specified
Storage “mfs” setting is this.

> "type": "file",
>   "enabled": true,
>   "connection": "maprfs:///",
>   "config": null,
>   "workspaces": {
> "root": {
>   "location": "/mapr/cluster3",
>   "writable": false,
>   "defaultInputFormat": null
> }
>   }

With this setting, the following query failed.
> select * from mfs.`x.pcap` ;
> Error: DATA_READ ERROR: /x.pcap (No such file or directory)
> 
> File name: /x.pcap
> Fragment 0:0
> 
> [Error Id: 70b73062-c3ed-4a10-9a88-034b4e6d039a on node21:31010] 
(state=,code=0)

But these queries passed.
> select * from mfs.root.`x.pcap` ;
> select * from mfs.`x.csv`;
> select * from mfs.root.`x.csv`;

2. Large PCAP file
Query on very large PCAP file (larger than 100GB) failed with following 
error message.
> Error: SYSTEM ERROR: IllegalStateException: Bad magic number = 0a0d0d0a
> 
> Fragment 1:169
> 
> [Error Id: 8882c359-c253-40c0-866c-417ef1ce5aa3 on node22:31010] 
(state=,code=0)

This happens even on Linux FS not MapR FS

Thank you.

Re: Merge and save parquet files in Drill

2017-08-17 Thread Andries Engelbrecht

Do you partition the table?
You may want to sort (order by) on the columns you partition, or just order by 
in any case on the column(s) you are most likely going to use for predicates. 
It increases the CTAS time, but normally will improve the query performance 
quite a bit.

Yes a large number of files does affect the query performance, using metadata 
caching helps improve the query planning time a lot.

--Andries


On 8/16/17, 11:12 PM, "Divya Gehlot"  wrote:

Hi,
I have CTAS with partition on 4 columns and when I save it it creates lots
of small files ~ 102290 where size of each file is in KBs .

My queries are :
1.Does the lots of small files reduce the performance while reading the
data in Drill ?
2.If yes ,How can I merge the small parquet files ?



Thanks,
Divya

Re: Drill JDBC connection on windows in embedded node

2017-08-07 Thread Andries Engelbrecht

Try jdbc:drill:drillbit=localhost

Since embedded mode doesn’t use ZK you need to connect directly to the drillbit 
itself.

--Andries

On 8/6/17, 7:37 PM, "Divya Gehlot"  wrote:

Hi,
I followed the this link
 to
connect to drill in JDBC mode .

*Have installed drill 1.11 on windows 10 in embedded mode .*

Configurations which I used :

*JDBC Driver* :

apache-drill-1.11.0.tar\apache-drill-1.11.0\jars\jdbc-driver\drill-jdbc-all-1.11.0.jar

JDBC URL options which I tried :

*Option1 :* jdbc:drill:zk=localhost:2181/drill/drillbits1
*Option 2* :   jdbc:drill:zk=localhost:2181
*Option 3 *: jdbc:drill:zk=:2181

I am getting below error with all the  the options

*ERROR* :
Unexpected Error occurred attempting to open an SQL connection.
class java.io.IOException: Failure to connect to the zookeeper cluster
service within the allotted time of 1 milliseconds

Appreciate the help .


Thanks ,
Divya

Re: Visibility of Workspaces or Views

2017-07-13 Thread Andries Engelbrecht

Normally workspaces in Drill DFS plugin should be tied to a directory in the 
underlying DFS.

If the user/group that logged in does not have read/write/exec permissions on 
the directory, it shouldn’t show up in the show schemas nor should the user be 
able to select the workspace in Drill.

Did a quick test and the “enduser” (not part of analyst group and not the 
analyst user) was not able to see masterviews workspace tied to directory 
views_master in the DFS. The masterviews workspace did not show with show 
schemas.

drwxr-x---. 2 analyst analyst 6 Jul  6 20:45 views_master

Plugin info

"masterviews": {
  "location": "/data/views_master",
  "writable": true,
  "defaultInputFormat": null
},

If I try to use it as the “enduser”

use dfs.masterviews;

Error: [MapR][DrillJDBCDriver](500165) Query execution error: 
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
Schema [dfs.masterviews] is not valid with respect to either root schema or 
current default schema.

Current default schema:  No default schema selected

Also if I try to use a view in the directory directly from Drill it reports 
back table not found.

Similarly the file permissions will limit access to use the view, but in my 
experience the views do show up for user (which causes other issue). However if 
the directory permissions are set correctly the workspace does not come up for 
the user or group without the correct permissions. I would still recommend to 
set view file permissions correctly as well.

In this case I used POSIX file/dir permissions, but some more advanced DFS 
systems can handle ACEs, etc which makes it much more workable in large complex 
user/schema environments.

See if this solves your issue.

--Andries

On 7/13/17, 5:04 PM, "Paul Rogers"  wrote:

Hi Francis,

I don’t believe that Drill currently has a way to do this: workspaces are 
global resources shared across all users. The admin (only) can edit all plugin 
and workspace definitions.

We’d need some form of security tags on each definition, and a way to map 
users to tags to make this work, but Drill does not yet have this feature.

That said, I believe that, while the workspace itself is public, the files 
in the workspace can be restricted using file permissions when you enable 
impersonation in Drill. [1]

- Paul

[1] https://drill.apache.org/docs/securing-drill-introduction/

> On Jul 13, 2017, at 1:09 PM, Francis McGregor-Macdonald 
 wrote:
> 
> Hi,
> 
> I have a situation where I would like to restrict access to workspaces
> based on the user. I have an instance where I would like to allow some
> third-party access to a subset of views. I can't find a standard method
> here.
> 
> The only similar issue I could find was this:
> https://issues.apache.org/jira/browse/DRILL-3467
> 
> Is there a standard practice here to limit workspaces for users?
> 
> Thanks,
> Francis

Re: Help on hive connection with Drill

2017-07-06 Thread Andries Engelbrecht

I assume you looked at the basic Hive Storage plugin configuration already.
http://drill.apache.org/docs/hive-storage-plugin/

Also see this link for Hive Authorization
http://drill.apache.org/docs/configuring-user-impersonation-with-hive-authorization/

You will need to enable user authentication and impersonation for Drill as well.
http://drill.apache.org/docs/securing-drill/

--Andries


On 7/6/17, 7:36 AM, "Sunil Kumar Thimmapuram"  
wrote:

Hi Team:

Greetings!!

I am new to Drill and started exploring the tool and capabilities. As part
of it, I am looking for a help to establish the connection to Hive from
Drill.

Our Hadoop Eco System has a hive2 server and I have been give access to
hive server with user credentials for me to login to Hive.

With following the documentation of Drill, I tried using the hive server
but not able to successfully establish the connection. I know I have never
provided the credentials anywhere.

Can any of our team who did established the connection to hive successfully
can help me with requisites that I need to get from Hadoop/Hive team and
the hive connection details.

Any help on my request is much appreciated and will help me to proceed
further with my work.

Thanks & Regards
Sunil Kumar T.S.

Re: ODBC querying issues- Can only see files in Drill Explorer, not with other client

2017-06-16 Thread Andries Engelbrecht

With tools like Excel you will either have to figure out how to enter custom 
SQL, or if you want the data to be more visible to these tools you will have to 
create Drill Views and then reference these views from the tool via ODBC/JDBC. 
Properly define the column name and data types in the Views to make it easier 
for the end user/tool to process the data (this way you push the work to Drill).

--Andries

On 6/16/17, 8:16 AM, "Jack Ingoldsby"  wrote:

Hi,
Sorry to bump this, but I just asked by one of my sales guys if I could get
this working for a customer meeting in a couple of hours where an inability
to query S3 via Apache Drill ODBC or JDBC  is  a dealbreaker.

If anyone has any thoughts would be greatly appreciated. Possibly an Amazon
token might find its way to you...
Regards,
Jack

On Thu, Jun 15, 2017 at 6:42 PM, Jack Ingoldsby 
wrote:

> Hi,
>
> I'm using Windows embedded to connect to S3, but am having querying using
> ODBC
>
> The ODBC connection works (connection string below)
> CastAnyToVarchar=true;Catalog=s3citibike;Schema=default;
> HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=
> local;NumberOfPrefetchBuffers=5;StringColumnLength=1024;
> ConvertToCast=false
>
> Using Drill Explorer (direct to Drillbit), I can see the files in
> s3citibike.default, and view the data (see attached image) but  for some
> reason I cannot see my files when using ODBC with another client such as
> Excel.
>
>
>
>  I can query using sqline, for example the bellow returns the dataset
>
> SELECT * FROM `s3citibike`.`default`.`./201307-citibike-tripdata.csv`
> LIMIT 100;
>
> I'm kind of guessing I'm just not specifying the folder path correctly,
> but I've been looking around for a while, tried Catalog = DRILL, schema =
> s3citibike.default, no avail.
>
>
> I'd try the drill-jdbc-all-1.10.0.jar JDBC driver for my client, but
> understand it doesn't work with embedded Windows
>
> Can anyone see where I'm going wrong?
>
> Thanks and regards,
> Jack
>
>
>

Re: Connecting to S3 bucket which does not seem to require a key

2017-06-12 Thread Andries Engelbrecht

You may be better of downloading the NYC bike data set locally and convert to 
parquet.
Converting from csv.zip to parquet will result in large improvements in 
performance if you do various queries on the data set.

--Andries

On 6/11/17, 10:48 PM, "Abhishek Girish"  wrote:

Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].

I'm not sure if Drill can access S3 without the credentials. It might be
possible via custom authenticators [2]. Hopefully others who have tried
this will comment.

[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html

On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby 
wrote:

> Hi,
> I'm trying to access the NYC Citibike S3 bucket, which seems to publicly
> available
>
> https://s3.amazonaws.com/tripdata/index.html
> If I leave the Access Key & Secret Key empty, I get the following message
>
> 0: jdbc:drill:zk=local> !tables
> Error: Failure getting metadata: Unable to load AWS credentials from any
> provider in the chain (state=,code=0)
>
> If I try entering random numbers as keys, I get the following message
>
> Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3,
> AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS
> Error Message: The AWS Access Key Id you provided does not exist in our
> records. (state=,code=0)
>
> Is it possible to connect to a data source that does not seem to require a
> key?
>
> Thanks,
> Jack
>

Re: Accessing json fields within CSV file

2017-06-08 Thread Andries Engelbrecht

You can use convert_from and JSON data type.

0: jdbc:drill:> select t.col1, t.col2, t.conv.key1 as key1, t.conv.key2 as 
key2, t.col4 from
. . . . . . . > (select columns[0] as col1 , columns[1] as col2 , 
convert_from(columns[2], 'JSON') as conv  , columns[3] as col4 from 
`/flat/psv-json/json.tbl`) t;
+---+---+-+-+---+
| col1  | col2  |  key1   |  key2   | col4  |
+---+---+-+-+---+
| 1 | xyz   | value1  | value2  | abc   |




If you want to use functions like flatten you will need to make sure the JSON 
in represented in an array.
i.e. [{"key":1, "value": 1},{"key":2, "value":2}]

0: jdbc:drill:> select t.col1, t.col2, t.conv.key as key, t.conv.`value` as 
`value`, t.col4 from
. . . . . . . > (select columns[0] as col1, columns[1]as col2, 
flatten((convert_from(columns[2],'JSON'))) as conv,  columns[3] as col4 from 
`/flat/psv-json/json.tbl`) t;
+---+---+--++---+
| col1  | col2  | key  | value  | col4  |
+---+---+--++---+
| 1 | xyz   | 1| 1  | abc   |
| 1 | xyz   | 2| 2  | abc   |
+---+---+--++---+



--Andries




On 6/8/17, 2:22 AM, "ankit jain"  wrote:

Hi,
I have a few psv file with a few of the columns being a json key value map.
Example:

> 1|xyz|{"key1":"value1", "key2":"value2"}|abc|


I am converting these files to parquet format but want to convert the json
key and values to different columns. How is that possible?

end product being:
id name key1 key2 description
1 xyz value1 value2 abc

Right now am doing something like this but the json column wont explode:

CREATE TABLE dfs.data.`/logs/logsp/`  AS SELECT
> CAST(columns[0] AS INT)  `id`,
> columns[1] AS `name`,
> columns[2] AS `json_column`,
> columns[3] AS `description`,
> from dfs.data.`logs/events.tbl`;


And this is what I get

id name json_column description
1 xyz {"key1":"value1", "key2":"value2"} abc

Thanks in advance,
Ankit Jain

Re: Partitioning for parquet

2017-06-01 Thread Andries Engelbrecht

Sorting the data by the partition column in the CTAS is a good plan normally, 
not only does it sort the output by the most likely filter column but also 
limits the number of parquet files being written to a single stream per 
partition. Drill can write data per fragment by partition, unless you add a 
sort operator.

And as Jinfeng mentioned metadata caching is very helpful on large data sets.

There is some info available on partition strategies for Drill on parquet to 
optimize performance.

--Andries


On 6/1/17, 6:55 AM, "yousef.l...@gmail.com on behalf of Raz Baluchi" 
 wrote:

I guess there is such a thing as over partitioning...

The query on the table partitioned by date spends most of the elapsed time
on the 'planning' phase, with the execution being roughly equal to the one
on the table partitioned by year and month.

Based on these results, I've added a third table which is partitioned
simply by year. I've also added an ORDER BY to the CTAS in an attempt to
sort the table by date.

This third table seems to have the fastest query times so far with the
least amount of 'planning'.  My take away from this exercise is to limit
the partitioning to the minimum required to obtain parquet files in the
range of 100 MB or so. Is that a valid lesson learned?

On Thu, Jun 1, 2017 at 1:05 AM, Jinfeng Ni  wrote:

> You may want to check if query on the second table is slower because of
> planning time or execution time. That could be determined by looking at 
the
> query profile in web-UI.
>
> Two factors might impact the planning time for second table having 11837:
> 1. Reading parquet metadata from those parquet files.  Parquet metadata
> cache file might help for the cases of large number of small files.
> 2. Filter expression evaluation cost : query second would evaluate
> expression 11837 times, vs just 410 times for first table.
>
> In general, if you have 100M rows in 11837 files, ==> that's about 8500
> rows per file. Performance-wise, this does not seem to be a good choice 
for
> parquet format.
>
>
>
> On Wed, May 31, 2017 at 9:33 PM, Padma Penumarthy 
> wrote:
>
> > Are you running same query on both tables ? What is the filter condition
> ?
> > Since they are partitioned differently, same filter may prune the files
> > differently.
> > If possible, can you share query profiles ?
> > You can check query profiles to see how many rows are being read from
> disk
> > in both cases.
> >
> > Thanks,
> > Padma
> >
> >
> > > On May 31, 2017, at 6:15 PM, Raz Baluchi 
> wrote:
> > >
> > > As an experiment, I created an event file will 100 million entries
> > spanning
> > > 25 years. I then created tables both ways, one partitioned by year and
> > > month and the other by date. The first table created 410 parquet files
> > and
> > > the second 11837.
> > >
> > > Querying the first table is consistently faster by a factor of 2x to
> 10x,
> > >
> > > Is this because drill is not very efficient at querying a large number
> of
> > > small(ish) parquet files?
> > >
> > > On Wed, May 31, 2017 at 6:42 PM, rahul challapalli <
> > > challapallira...@gmail.com> wrote:
> > >
> > >> If most of your queries use date column in the filter condition, I
> would
> > >> partition the data on the date column. Then you can simply say
> > >>
> > >> select * from events where `date` between '2016-11-11' and
> '2017-01-23';
> > >>
> > >> - Rahul
> > >>
> > >> On Wed, May 31, 2017 at 3:22 PM, Raz Baluchi 
> > >> wrote:
> > >>
> > >>> So, if I understand you correctly, I would have to include the 'yr'
> and
> > >>> 'mnth' columns in addition to the 'date' column in the query?
> > >>>
> > >>> e.g.
> > >>>
> > >>> select * from events where yr in (2016, 2017)  and mnth in (11,12,1)
> > and
> > >>> date between '2016-11-11' and '2017-01-23';
> > >>>
> > >>> Is that correct?
> > >>>
> > >>> On Wed, May 31, 2017 at 4:49 PM, rahul challapalli <
> > >>> challapallira...@gmail.com> wrote:
> > >>>
> >  How to partition data is dependent on how you want to access your
> > data.
> > >>> If
> >  you can foresee that most of the queries use year and month, then
> > >>> go-ahead
> >  and partition the data on those 2 columns. You can do that like
> below
> > 
> >  create table partitioned_data partition by (yr, mnth) as select
> >  extract(year from `date`) yr, extract(month from `date`) mnth,
> `date`,
> >   from mydata;
> > 
> >

Re: Wrong alias name in Window function

2017-03-31 Thread Andries Engelbrecht

I see a similar error in OSX Sierra when using CTTAS. Haven’t looked at Drill 
on Sierra yet, or the /tmp for that matter on OSX for a while, but I noticed is 
a link to /private/tmp now. Perhaps someone who has tested CTTAS on Sierra can 
give some insight, don’t have time to look into it.

--Andries

On 3/31/17, 2:16 AM, "Arina Yelchiyeva"  wrote:

Hi Amir,

I am not Mac user, so I can't exactly tell you what the problem is. Maybe
other Mac users will recognize this error.
Generally it seems Drill can't create table in dfs.tmp schema. By default
dfs.tmp points to /tmp directory.
Maybe you don't have write access in this directory. Directory for dfs.tmp
can be changed on Web UI in Storage Plugin tab.

Kind regards
Arina


On Thu, Mar 30, 2017 at 11:42 PM, Amir Kafri  wrote:

> Thank you Arina.
>
> I tried your query, and the create query statement fails with the 
following
> stack (on a fresh 1.10 installation, mac), any ideas?
> Regarding my issue, I'm working on re-creating it.
>
> SYSTEM ERROR: URISyntaxException: Relative path in
> absolute URI: wifi-03-30-2017__00:03:49.log
>
>
>
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected
> exception during fragment initialization: Failure while trying to
> check if a table or view with given name [t] already exists in schema
> [dfs.tmp]: java.net.URISyntaxException: Relative path in absolute URI:
> wifi-03-30-2017__00:03:49.log
> org.apache.drill.exec.work.foreman.Foreman.run():298
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.DrillRuntimeException)
> Failure while trying to check if a table or view with given name [t]
> already exists in schema [dfs.tmp]: java.net.URISyntaxException:
> Relative path in absolute URI: wifi-03-30-2017__00:03:49.log
> org.apache.drill.exec.planner.sql.handlers.SqlHandlerUtil.ge
> tTableFromSchema():215
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandle
> r.checkDuplicatedObjectExistence():307
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandle
> r.getPlan():86
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():131
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():79
> org.apache.drill.exec.work.foreman.Foreman.runSQL():1050
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.IllegalArgumentException)
> java.net.URISyntaxException: Relative path in absolute URI:
> wifi-03-30-2017__00:03:49.log
> org.apache.hadoop.fs.Path.initialize():205
> org.apache.hadoop.fs.Path.():171
> org.apache.hadoop.fs.Path.():93
> org.apache.hadoop.fs.Globber.glob():241
> org.apache.hadoop.fs.FileSystem.globStatus():1655
> org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():548
> org.apache.drill.exec.dotdrill.DotDrillUtil.getDotDrills():61
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$Works
> paceSchema.getTable():493
> org.apache.drill.exec.planner.sql.handlers.SqlHandlerUtil.ge
> tTableFromSchema():212
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandle
> r.checkDuplicatedObjectExistence():307
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandle
> r.getPlan():86
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():131
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():79
> org.apache.drill.exec.work.foreman.Foreman.runSQL():1050
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.net.URISyntaxException) Relative path in absolute
> URI: wifi-03-30-2017__00:03:49.log
> java.net.URI.checkPath():1823
> java.net.URI.():745
> org.apache.hadoop.fs.Path.initialize():202
> org.apache.hadoop.fs.Path.():171
> org.apache.hadoop.fs.Path.():93
> org.apache.hadoop.fs.Globber.glob():241
> org.apache.hadoop.fs.FileSystem.globStatus():1655
> org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():548
> org.apache.drill.exec.dotdrill.DotDrillUtil.getDotDrills():61
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$Works
> paceSchema.getTable():493
>

Re: Explain Plan for Parquet data is taking a lot of timre

2017-02-24 Thread Andries Engelbrecht

Looks like the metadata cache is being used  "usedMetadataFile=true, ". But to 
be sure did you perform a REFRESH TABLE METADATA  on the parquet 
data?


However it looks like it is reading a full batch " rowcount = 32600.0, 
cumulative cost = {32600.0 rows, 32600.0"


Didn't the limit operator get pushed down to the parquet reader in 1.9?

Perhaps try 1.9 and see if in the ParquetGroupScan the number of rows gets 
reduced to 100.


Can you look in the query profile where time is spend, also how long it takes 
before the query starts to run in the WebUI profile.


Best Regards


Andries Engelbrecht


Senior Solutions Architect

MapR Alliances and Channels Engineering


aengelbre...@mapr.com


[1483990071965_mapr-logo-signature.png]


From: Jinfeng Ni <j...@apache.org>
Sent: Thursday, February 23, 2017 4:53:34 PM
To: user
Subject: Re: Explain Plan for Parquet data is taking a lot of timre

The reason the plan shows only one single parquet file is because
"LIMIT 100" is applied and filter out the rest of them.

Agreed that parquet metadata caching might help reduce planning time,
when there are large number of parquet files.

On Thu, Feb 23, 2017 at 4:44 PM, rahul challapalli
<challapallira...@gmail.com> wrote:
> You said there are 2144 parquet files but the plan suggests that you only
> have a single parquet file. In any case its a long time to plan the query.
> Did you try the metadata caching feature [1]?
>
> Also how many rowgroups and columns are present in the parquet file?
>
> [1] https://drill.apache.org/docs/optimizing-parquet-metadata-reading/
>
> - Rahul
>
> On Thu, Feb 23, 2017 at 4:24 PM, Jeena Vinod <jeena.vi...@oracle.com> wrote:
>
>> Hi,
>>
>>
>>
>> Drill is taking 23 minutes for a simple select * query with limit 100 on
>> 1GB uncompressed parquet data. EXPLAIN PLAN for this query is also taking
>> that long(~23 minutes).
>>
>> Query: select * from .root.`testdata` limit 100;
>>
>> Query  Plan:
>>
>> 00-00Screen : rowType = RecordType(ANY *): rowcount = 100.0,
>> cumulative cost = {32810.0 rows, 33110.0 cpu, 0.0 io, 0.0 network, 0.0
>> memory}, id = 1429
>>
>> 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount =
>> 100.0, cumulative cost = {32800.0 rows, 33100.0 cpu, 0.0 io, 0.0 network,
>> 0.0 memory}, id = 1428
>>
>> 00-02SelectionVectorRemover : rowType = (DrillRecordRow[*]):
>> rowcount = 100.0, cumulative cost = {32800.0 rows, 33100.0 cpu, 0.0 io, 0.0
>> network, 0.0 memory}, id = 1427
>>
>> 00-03  Limit(fetch=[100]) : rowType = (DrillRecordRow[*]):
>> rowcount = 100.0, cumulative cost = {32700.0 rows, 33000.0 cpu, 0.0 io, 0.0
>> network, 0.0 memory}, id = 1426
>>
>> 00-04Scan(groupscan=[ParquetGroupScan
>> [entries=[ReadEntryWithPath [path=/testdata/part-r-0-
>> 097f7399-7bfb-4e93-b883-3348655fc658.parquet]], selectionRoot=/testdata,
>> numFiles=1, usedMetadataFile=true, cacheFileRoot=/testdata,
>> columns=[`*`]]]) : rowType = (DrillRecordRow[*]): rowcount = 32600.0,
>> cumulative cost = {32600.0 rows, 32600.0 cpu, 0.0 io, 0.0 network, 0.0
>> memory}, id = 1425
>>
>>
>>
>> I am using Drill1.8 and it is setup on 5 node 32GB cluster and the data is
>> in Oracle Storage Cloud Service. When I run the same query on 1GB TSV file
>> in this location it is taking only 38 seconds .
>>
>> Also testdata contains around 2144 .parquet files each around 500KB.
>>
>>
>>
>> Is there any additional configuration required for parquet?
>>
>> Kindly suggest how to improve the response time here.
>>
>>
>>
>> Regards
>> Jeena
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Storage Plugin for accessing Hive ORC Table from Drill

2017-01-20 Thread Andries Engelbrecht

-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.planner.logical.DrillScanRel.computeSelfCost(DrillScanRel.java:159)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.getNonCumulativeCost(RelMdPercentageOriginalRows.java:165)
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
... 42 common frames omitted
Caused by: java.io.IOException: Failed to get numRows from HiveTable
at
org.apache.drill.exec.store.hive.HiveMetadataProvider.getStats(HiveMetadataProvider.java:113)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.store.hive.HiveScan.getScanStats(HiveScan.java:224)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
... 45 common frames omitted
Caused by: java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.store.hive.HiveMetadataProvider$1.run(HiveMetadataProvider.java:253)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.store.hive.HiveMetadataProvider$1.run(HiveMetadataProvider.java:241)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
at java.security.AccessController.doPrivileged(Native Method)
~[na:1.8.0_72]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_72]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
~[hadoop-common-2.7.1.jar:na]
at
org.apache.drill.exec.store.hive.HiveMetadataProvider.splitInputWithUGI(HiveMetadataProvider.java:241)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.store.hive.HiveMetadataProvider.getPartitionInputSplits(HiveMetadataProvider.java:142)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.store.hive.HiveMetadataProvider.getStats(HiveMetadataProvider.java:105)
~[drill-storage-hive-core-1.9.0.jar:1.9.0]
... 46 common frames omitted
Caused by: java.util.concurrent.ExecutionException:
java.lang.NumberFormatException: For input string: "004_"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[na:1.8.0_72]
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
~[na:1.8.0_72]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
... 55 common frames omitted
Caused by: java.lang.NumberFormatException: For input string: "004_"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
~[na:1.8.0_72]
at java.lang.Long.parseLong(Long.java:589) ~[na:1.8.0_72]
at java.lang.Long.parseLong(Long.java:631) ~[na:1.8.0_72]
at
org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:310)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
at
org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:379)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620)
~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_72]
... 3 common frames omitted




Regards,
*Anup Tiwari*

On Thu, Jan 19, 2017 at 9:18 PM, Andries Engelbrecht <aengelbre...@mapr.com>
wrote:

> I have not seen issues reading Hive ORC data with Drill.
>
>
> What is the DDL for the table in Hive?
>
>
> --Andries
>
> 
> From: Anup Tiwari <anup.tiw...@games24x7.com>
> Sent: Thursday, January 19, 2017 12:49:20 AM
> To: user@drill.apache.org
> Cc: d...@drill.apache.org
> Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill
>
> We have created a ORC format table in hive and we were trying to read it in
> drill through hive plugin, but it is giving us error. But with same hive
> plugin, we are able to read parquet table created in hive.
>
> So after searching a bit, i found a drill documentation link
> <https://drill.apache.org/docs/apache-drill-contribution-ideas/> which
> says
> that we have to create custom storage plugin to read ORC format tables. So
> can you tell me how to create custom storage plugin in this case?
>
>
>
> Regards,
> *Anup Tiwari*
>
> On Thu, Jan 19, 2017 at 1:55 PM, Nitin Pawar <nitinpawar...@gmail.com>
> wrote:
>
> > you want to use the ORC files created by hive directly in drill or you
> want
> > to use them through hive?
> >
> > On Thu, Jan 19, 2017 at 1:40 PM, Anup Tiwari <anup.tiw...@games24x7.com>
> > wrote:
> >
> > > +Dev
> > >
> &g

Re: Storage Plugin for accessing Hive ORC Table from Drill

2017-01-19 Thread Andries Engelbrecht

I have not seen issues reading Hive ORC data with Drill.

What is the DDL for the table in Hive?

--Andries

From: Anup Tiwari 
Sent: Thursday, January 19, 2017 12:49:20 AM
To: user@drill.apache.org
Cc: d...@drill.apache.org
Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill

We have created a ORC format table in hive and we were trying to read it in
drill through hive plugin, but it is giving us error. But with same hive
plugin, we are able to read parquet table created in hive.

So after searching a bit, i found a drill documentation link
 which says
that we have to create custom storage plugin to read ORC format tables. So
can you tell me how to create custom storage plugin in this case?

Regards,
*Anup Tiwari*

On Thu, Jan 19, 2017 at 1:55 PM, Nitin Pawar 
wrote:

> you want to use the ORC files created by hive directly in drill or you want
> to use them through hive?
>
> On Thu, Jan 19, 2017 at 1:40 PM, Anup Tiwari 
> wrote:
>
> > +Dev
> >
> > Can someone help me in this?
> >
> > Regards,
> > *Anup Tiwari*
> >
> > On Sun, Jan 15, 2017 at 2:21 PM, Anup Tiwari 
> > wrote:
> >
> > > Hi Team,
> > >
> > > Can someone tell me how to configure custom storage plugin in Drill for
> > > accessing hive ORC tables?
> > >
> > > Thanks in advance!!
> > >
> > > Regards,
> > > *Anup Tiwari*
> > >
> >
>
>
>
> --
> Nitin Pawar
>

Re: Issue Drill querying S3 recursive

2017-01-12 Thread Andries Engelbrecht

You may want to upgrade to the latest Drill version (1.9).


Also what is the storage plugin called dfs, fs,...?


If it is called dfs then you can thry the following.


use dfs;

show files;


This will show you the files and directories in the plugin root, you can 
navigate from there.


If it shows you t as a directory you can then see which files are in t;


show files in `/t`;


See if the csv file shows up, if so this should work.


select * from dfs.`/t`;



--Andries


From: Magesh Waran 
Sent: Thursday, January 12, 2017 6:31:22 AM
To: user@drill.apache.org
Subject: Issue Drill querying S3 recursive

Hi Team,

I am trying to query a file under folder 't/atms-csv.csv' which I can 
successfully do it.

[Inline image 1]


There is another file in that location which as additional data from another 
day (both file columnmodel). when I try query recursively using folder name 't' 
which shows below error
[Inline image 2]
Tried queries (All doesnt worked out):
1. select * from `t/`;
2. select * from `/t/`;
3. select * from dfs.`t/`;
4. select * from dfs`/t/`;
5. select * from fs.`t/`;
6. select * from fs`/t/`;


I am using below versions,
Drill - 1.6.0
S3 protool: S3n (also tried S3, S3a)
Sample Query String:
 {
 "connection": "s3a://x-bucket",
  "config": {
"fs.s3a.access.key": "",
"fs.s3a.secret.key": ""
  },
  "workspaces": {
"root": {
  "location": "/",
  "writable": true,
  "defaultInputFormat": null
}
}


Thanks & Regards
Mageshwaran M
Contact: +91-9944667832

Re: exception when connecting via jdbc i.e. squirrell windows embedded mode

2016-12-16 Thread Andries Engelbrecht

Connect directly to the drillbit when using an embedded drillbit. 

URL will be something like this

jdbc:drill:drillbit=localhost
or
jdbc:drill:drillbit=:31010


Make sure to add the JDBC driver package jar(s) to the CP of the application.


--Andries




> On Dec 16, 2016, at 12:48 AM, ignazio deiudicibus  
> wrote:
> 
> Hi all,
> 
> I've just installed drill as described here:
> 
> 
> https://drill.apache.org/docs/drill-in-10-minutes/
> 
> and started drill using
> sqlline.bat -u "jdbc:drill:zk=local"
> 
> everything is ok, I can query.
> But, I can't connect via jdbc
> 
> here it is the exception
> 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.sql.SQLException: Failure in connecting to Drill:
> oadd.org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for
> client.
> 
> ..
> 
> 
> Caused by: java.io.IOException: Failure to connect to the zookeeper cluster
> service within the allotted time of 1 milliseconds.
> at
> oadd.org.apache.drill.exec.coord.zk.ZKClusterCoordinator.start(ZKClusterCoordinator.java:123)
> at
> oadd.org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:243)
> 
> 
> I use this jdbc string:
> 
> jdbc:drill:zk=local:2181
> 
> tried also
> 
> jdbc:drill:zk=localhost:2181
> 
> jdbc:drill:zk=127.0.0.1:2181
> 
> drill-override is:
> 
> drill.exec: {
>  cluster-id: "drillbits1",
>  zk.connect: "locahost:2181"
> }
> 
> what's wrong with it?
> 
> Thank a lot
> regards
> I
> 
> -- 
> ignazio de iudicibus
> mobile +39 3280842455
> 
>

Re: cast not working in adding two columns

2016-12-15 Thread Andries Engelbrecht

What are you trying to achieve?
It seems you are trying to add an integer to a string column numerically.

Can you actually cast the name column to integers?

--Andries

> On Dec 15, 2016, at 10:05 AM, Sanjiv Kumar  wrote:
> 
> Hello
>   I am using drill 1.9 version in embedded mode and in Window 10.
> My question is when i  am adding two integer column, its working fine.
> e.g:- select P.`costprice` + P.`initialprice` from
> testplugin.dbo.testtable P;
> NOTE:- both columns costprice and initialprice are integer.
> 
> But when i am adding integer column with varchar column, then it working.
>   e.g:-  select P.`costprice` + P.`Name` from testplugin.dbo.testtable P;
>   or
>   select P.`costprice` + cast(P.`Name` as int) from
> testplugin.dbo.testtable P;
> Both are not working.Please suggest some way how to add two different
> datatype columns.
> 
> 
> 
> 
> 
> 
> 
> -- 
> Thanks & Regards.
> Sanjiv Kumar.

Re: Connecting Tableau 10.1 to Drill cluster issue

2016-12-01 Thread Andries Engelbrecht

There is an issue with Tableau 10.1 and the generic ODBC connection to Drill.

The workaround is to start Tableau 10.1 with the following parameter
-DProtocolServerReconnect=1


--Andries



> On Dec 1, 2016, at 3:55 AM, Tomislav Novosel  
> wrote:
> 
> Dear team,
> 
> I am having issue with connecting Tableau 10.1 to Apache Drill (1.8.0) 6 node 
> cluster. I installed
> MapR ODBC Driver 32-bit on my Windows 64-bit machine, configured driver and 
> test connection
> with ZooKeeper quorum and without quorum (direct connection). Connection 
> succeeded and Drill Explorer
> connection succeeded.
> 
> The problem is connecting Tableau to Drill cluster. After selecting ODBC 
> connection from Tableau and providing all
> the connection data for ZK quorum or direct drillbit connection, Tableau is 
> rising error:
> 
> "
> # The protocol is disconnected!
> # Unable to connect using the DSN named "MapR Drill". Check that the DSN 
> exists and is a valid connection.
> "
> 
> DSN name exists and works well. Tableau TDC file is also installed. ZK 
> hostnames and all the other Drill cluster hostnames, including their IP 
> addresses are saved in drivers/etc/hosts file on local machine. Drill cluster 
> is running
> smoothly.
> 
> I will appreciate any help.
> 
> Regards,
> Tomislav
> 
> -- 
> 
> 
> *- Izjava o odricanju odgovornosti -*
> 
> *Ova elektronička poruka i njeni prilozi mogu sadržavati povlaštene i/ili 
> povjerljive informacije. Molimo Vas da poruku ne čitate ako niste njen 
> naznačeni primatelj. Ako ste ovu poruku primili greškom, molimo Vas da o tome 
> obavijestite pošiljatelja i da izvornu poruku i njene privitke uništite bez 
> čitanja ili bilo kakvog pohranjivanja. Svaka neovlaštena upotreba, 
> distribucija, reprodukcija ili priopćavanje ove poruke je zabranjeno. 
> Poslovna inteligencija d.o.o. ne preuzima odgovornost za sadržaj ove poruke, 
> odnosno za posljedice radnji koje bi proizašle iz proslijeđenih informacija, 
> a niti stajališta izražena u ovoj poruci ne odražavaju nužno službena 
> stajališta Poslovne inteligencije d.o.o. S obzirom na nepostojanje potpune 
> sigurnosti e-mail komunikacije, Poslovna inteligencija d.o.o. ne preuzima 
> odgovornost za eventualnu štetu nastalu uslijed zaraženosti e-mail poruke 
> virusom ili drugim štetnim programom, neovlaštene interferencije, pogrešne 
> ili zakašnjele dostave poruke uslijed tehničkih problema. Poslovna 
> inteligencija d.o.o zadržava pravo nadziranja i pohranjivanja e-mail poruka 
> koje se šalju iz Poslovne inteligencije d.o.o. ili u nju  pristižu.*
> 
> *- Disclaimer -*
> 
> *This e-mail message and its attachments may contain privileged and/or 
> confidential information. Please do not read the message if you are not its 
> designated recipient. If you have received this message by mistake, please 
> inform its sender and destroy the original message and its attachments 
> without reading or storing of any kind. Any unauthorized use, distribution, 
> reproduction or publication of this message is forbidden. Poslovna 
> inteligencija d.o.o. is neither responsible for the contents of this message, 
> nor for the consequences arising from actions based on the forwarded 
> information, nor do opinions contained within this message necessarily 
> reflect the official opinions of Poslovna inteligencija d.o.o. Considering 
> the lack of complete security of e-mail communication, Poslovna inteligencija 
> d.o.o. is not responsible for the potential damage created due to infection 
> of an e-mail message with a virus or other malicious program, unauthorized 
> interference, erroneous or delayed delivery of the message due to technical 
> problems. Poslovna inteligencija d.o.o. reserves the right to supervise and 
> store both incoming and outgoing  e-mail messages.*

Re: Unable to connect Tableau 9.2 to Drill cluster using zookeeper quorum

2016-12-01 Thread Andries Engelbrecht

When using ZK connection string with either JDBC or ODBC always make sure that 
the hostnames can be resolved.

See http://drill.apache.org/docs/odbc-configuration-reference/ 


Also make sure that hostnames can be resolved for all Drillbit nodes.

A short explanation of the mechanics.

- When the drillbits start up they register with ZK
- ZK keeps the list of available drillbits that registered and HOSTNAMES of the 
nodes (not IPs) - use zkCli.sh to check the registered nodes and actual 
resolution with GET in ZK
- When the the client uses ODBC or JDBC to connect to ZK it gets the list of 
available drillbits and chooses one drillbit to connect to by using the HOSTNAME
- The client then tries to connect to the chosen drillbit using the ZK info 
(which is the hostname)

If the client is unable to resolve the hostname of the drillbit that was chosen 
it will fail.

Using ZK connection allows for HA in terms of new connections as well as 
balances the connection management and foreman duties for multiple connections 
by spreading it over the registered drillbits. Connecting directly to a 
drillbit is good for testing purposes, but not ideal for larger scale and 
production environments.


--Andries


> On Dec 1, 2016, at 5:01 AM, Anup Tiwari  wrote:
> 
> Hi Team,
> 
> I am trying to connect to my drill cluster from tableau using MapR Drill
> ODBC Driver.
> 
> I followed steps given in
> https://drill.apache.org/docs/using-apache-drill-with-tableau-9-server/ and
> subsequent links and successfully connected to individual "direct drillbit"
> reading docs. But when i am trying to connect to "zookeeper quorum" instead
> of "direct drillbit", i am getting below error on MapR interface :
> 
> FAILED!
> [MapR][Drill] (1010) Error occurred while trying to connect: [MapR][Drill]
> (20) The hostname of '10.x.x.x' cannot be resolved. Please check your DNS
> setup or connect directly to Drillbit.
> 
> Please note that since i am giving directly IP(Drill Hosts which are on
> AWS) so i believe i don't have to maintain DNS entries in host file.
> 
> Also corresponding zookeeper logs are as follows :-
> 
> 2016-12-01 18:08:42,541 [myid:3] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection
> from /192.*.*.*:53159
> 2016-12-01 18:08:42,543 [myid:3] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@854] - Connection request from old
> client /192.*.*.*:53159; will be dropped if server is in r-o mode
> 2016-12-01 18:08:42,543 [myid:3] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish
> new session at /192.*.*.*:53159
> 2016-12-01 18:08:42,546 [myid:3] - INFO
> [CommitProcessor:3:ZooKeeperServer@645] - Established session
> 0x358ba2951720006 with negotiated timeout 3 for client /192.*.*.*:53159
> 2016-12-01 18:08:42,793 [myid:3] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x358ba2951720006, likely client has closed socket
>at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
>at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
>at java.lang.Thread.run(Thread.java:745)
> 2016-12-01 18:08:42,794 [myid:3] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
> client /192.*.*.*:53159 which had sessionid 0x358ba2951720006
> 2016-12-01 18:08:42,795 [myid:3] - ERROR
> [CommitProcessor:3:NIOServerCnxn@178] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
>at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
>at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
>at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
>at
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:77)
> 
> 
> I have gone through this link but this doesn't helped me :-
> http://stackoverflow.com/questions/30940981/zookeeper-error-cannot-open-channel-to-x-at-election-address
> 
> Regards,
> *Anup Tiwari*

Re: Setting up HDFS access on EMR

2016-10-13 Thread Andries Engelbrecht

See

http://drill.apache.org/docs/file-system-storage-plugin/ 


"connection": "hdfs://:/"


As Ted stated you need to point to the name node for HDFS.


--Andries


> On Oct 13, 2016, at 5:04 AM, David Kincaid  wrote:
> 
> Thanks, Ted. The full URL I was using was http://. I'll give your
> suggestion a try when I'm able to work on this again tonight. I guess I
> took the documentation too literally when it said "To query a file on HDFS
> from a node on the cluster, you can simply change the connection from
> file:/// to hdfs:// in the dfs storage plugin."
> Thanks again,
> Dave
> 
> 
> On Thu, Oct 13, 2016 at 12:39 AM, Ted Dunning  wrote:
> 
>> What is the full URL you used?
>> 
>> With hdfs://, you need to supply a name node address.
>> 
>> With file://, you don't.
>> 
>> Contrarily, with maprfs:// you don't need an address since it is implied in
>> the client connection.
>> 
>> 
>> 
>> On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid 
>> wrote:
>> 
>>> I have an Amazon EMR cluster launched with Drill loaded. I'm trying to
>>> configure the dfs storage plugin to use HDFS. The docs say that I should
>>> simply need to change the "connection" setting from "file:///" to
>> "hdfs://"
>>> in order to use HDFS on the cluster that Drill is running on. However,
>> when
>>> I do this and try to run a query I get an error that says
>>> "org.apache.drill.common.exceptions.UserRemoteException:
>>> SYSTEM ERROR: URISyntaxException: Expected authority at index 7: hdfs://
>>> [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
>>> 
>>> Am I doing something wrong or is there an issue here?
>>> 
>>> Thanks,
>>> 
>>> Dave
>>> 
>>

Re: Error parsing JSON

2016-10-12 Thread Andries Engelbrecht

Look into the JSON data model
http://drill.apache.org/docs/json-data-model/ 


You can set the union type to true for the session and the use the typeof 
function to determine data types of the fields. Perhaps do a group by typeof.
exec.enable_union_type

Do note that setting to all text mode need to be turned off before working on 
this.


--Andries


> On Oct 12, 2016, at 11:17 AM, Dan Blondowski  
> wrote:
> 
> Yes, I understand that part.
> I wasn’t clear enough.  By trial and error, I meant - me trying to find
> which field was bad.   I’m going to ask the guys that provide the source
> file, to make sure data types for each field are consistent across records.
> 
> 
> 
> 
> 
> On 10/12/16, 12:53 PM, "Khurram Faraaz"  wrote:
> 
>> Since your JSON file has different types of data (line int, boolean,
>> strings etc) we need to tell Drill to consider all data in that JSON file
>> as text. This is by design, and this is not trial and error.
>> Such occurrence of different types of data in a JSON file is called
>> SchemaChange and to avoid that we set `store.json.all_text_mode` = true
>> 
>> On Wed, Oct 12, 2016 at 9:31 PM, Dan Blondowski <
>> dan.blondow...@dhigroupinc.com> wrote:
>> 
>>> Hello.  You were correct.
>>> I was able to set `store.json.all_text_mode` = true; and run the query.
>>> 
>>> I also found the data that needed changing.  There were actually 3
>>> fields.
>>> 
>>> Is there a way for Drill to display which field is bad, so I don¹t have
>>> to
>>> do the trial & error?
>>> 
>>> 
>>> 
>>> 
>>> On 10/12/16, 10:12 AM, "Abhishek Girish" 
>>> wrote:
>>> 
 Hey Dan,
 
 This usually happens when there is a schema change across records. It
 could
 be intentional (records do need different types for some fields) or bad
 formatting (1 vs "1").
 
 Can you try setting this session option and retry your query? Let us
>>> know
 if it helps. You could attempt to use explicit casts in the query to
>>> get
 types you want.
 
 set `store.json.all_text_mode` = true;
 
 Regards,
 Abhishek
 
 On Wednesday, October 12, 2016, Dan Blondowski <
 dan.blondow...@dhigroupinc.com> wrote:
 
> Hello.
> 
> I have a json file with 2 records (see attached).
> 
> When I run a query against it I get this error:
> 
> *Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
> BigInt
> type when you are using a ValueWriter of type
> NullableVarCharWriterImpl.*
> 
> 
> *File  /test.json*
> 
> *Record  2*
> 
> *Fragment 0:0*
> 
> 
> *[Error Id: 8b5166cc-28b6-488c-893f-f0265d483e13 on
> ip-10-3-48-183:31010]
> (state=,code=0)*
> 
> If I break it up into individual files, I can run the query against
>>> both
> records just fine. Any clues?
> 
> Thanks!
> 
> Daniel Blondowski
> 
> Big Data Architecture/Engineering
> DHI Group, Inc.
> 
> dan.blondow...@dice.com
> 
> 515-313-2137
> 
> 
> --
> This email has been scanned for email related threats by Mimecast
> 
> --
> 
>>> 
>>> ---
>>> This email has been scanned for email related threats and delivered
>>> safely by Mimecast.
>>> For more information please visit http://www.mimecast.com
>>> 
>>> ---
>>> 
> ---
> This email has been scanned for email related threats and delivered safely by 
> Mimecast.
> For more information please visit http://www.mimecast.com
> ---

Re: storage plugin for simple web server

2016-10-10 Thread Andries Engelbrecht

Can you do a NFS connection to the webserver?

Then maybe just use a local fs storage plugin with the NFS mount as the 
workspace.

I have not tried it myself, but it may be an option to test in your case.

--Andries


> On Oct 7, 2016, at 11:39 AM, Di Pe  wrote:
> 
> Hi,
> 
> I have a couple of 100 csv files on a web server that I can just pull down
> via https without any credentials, I wonder how I can write a storage
> plugin for drill that pull these files directly from the web web server
> without having to download them to the local file system.
> 
> I have a couple of options:
> 
> 1) the plugin could just do to a simple http directory listing to get these
> files
> 2) I could provide a text file with the urls of the files, simply like
>https://mywebserver.com/myfolder/myfile1.csv
>https://mywebserver.com/myfolder/myfile2.csv
> 3) the web server supports json file listing like this
>curl -s https://mywebserver.com/myfolder?format=json | python -m
> json.tool
> [
>{
>"hash": "e5f62378c79ec9c491aa130374dba93b",
>"last_modified": "2016-09-30T19:15:45.730950",
>"bytes": 211169,
>"name": "myfile1.csv",
>"content_type": "text/csv"
>},
>{
> 
> Option 3 would be the most elegant to me
> 
> 
> does something like this already exist or would I duplicate the s3 plugin
> and modify it?
> 
> like this ?
> 
> Thanks for your help!
> dipe
> 
> 
> {
>  "type": "file",
>  "enabled": true,
>  "connection": "https://mywebserver.com/myfolder?format=json;,
>  "config": null,
>  "workspaces": {
>"root": {
>  "location": "/",
>  "writable": false,
>  "defaultInputFormat": null
>},
>"tmp": {
>  "location": "/tmp",
>  "writable": true,
>  "defaultInputFormat": null
>}
>  },
>  "formats": {
>"psv": {
>  "type": "text",
>  "extensions": [
>"tbl"
>  ],
>  "delimiter": "|"
>},
>"csv": {
>  "type": "text",
>  "extensions": [
>"csv"
>  ],
>  "delimiter": ","
>},
>"tsv": {
>  "type": "text",
>  "extensions": [
>"tsv"
>  ],
>  "delimiter": "\t"
>},
>"parquet": {
>  "type": "parquet"
>},
>"json": {
>  "type": "json",
>  "extensions": [
>"json"
>  ]
>},
>"avro": {
>  "type": "avro"
>},
>"sequencefile": {
>  "type": "sequencefile",
>  "extensions": [
>"seq"
>  ]
>},
>"csvh": {
>  "type": "text",
>  "extensions": [
>"csvh"
>  ],
>  "extractHeader": true,
>  "delimiter": ","
>}
>  }
> }

Re: IN operator can take how many inputs ?

2016-10-10 Thread Andries Engelbrecht

Here is the link to upgrading Drill on MapR 5.1
http://maprdocs.mapr.com/51/Drill/upgrading_drill_5.1.html 


Drill 1.8 is supported on MapR 5.1
http://maprdocs.mapr.com/home/InteropMatrix/r_eco_matrix.html 


--Andries


> On Oct 8, 2016, at 1:35 AM, Nicolas Paris  wrote:
> 
> Le sam. 8 oct. 2016 à 05:55, Jinfeng Ni  > a écrit :
> 
>> With larger value for such option, a big IN-list may not be converted
>> into a subquery. As such, the query performance might be suboptimal.
>> 
> 
> My use case is a predicate push-down on an external jdbc database.
> 
> 
>> On Fri, Oct 7, 2016 at 6:54 PM, Gautam Parai > > wrote:
>>> Yes, this was fixed in Drill 1.8 - please let us know if this does not
>> work
>>> in 1.8. As Jinfeng mentioned earlier, the option can be set to a very
>> large
>>> value (LONG type)
>> 
> 
> I am not sure I am able to upgrade drill version on mapr 5.1 distribution.
> I didn't find any documentation. Have you got such ?
> 
> 
>>> Gautam
>>> 
>>> On Fri, Oct 7, 2016 at 2:29 AM, Tushar Pathare 
>> wrote:
>>> 
 Hello,
 
 I think it is fixed in 1.8 .try it on
 
 Get Outlook for iOS
 
 
 
 
 On Fri, Oct 7, 2016 at 12:18 PM +0300, "Nicolas Paris" <
 nipari...@gmail.com> wrote:
 
 Hi,
 
 I am in 1.6 Drill version with mapr distribution 5.1.
 I get this error :
 Error: VALIDATION ERROR: The option 'planner.in_subquery_threshold' does
 not exist
 
 
 Le lun. 3 oct. 2016 à 17:18, Jinfeng Ni  a écrit :
 
> You can modify option `planner.in_subquery_threshold`. By default,
> it's set to be 20. That's the threshold when planner decides to
> convert IN-list to a subquery.
> 
> select * from sys.options where name like '%in_subquery%';
> 
> ++---+-+
 --+--+-+---++
> |  name  | kind  |  type   |  status  |
> num_val  | string_val  | bool_val  | float_val  |
> 
> ++---+-+
 --+--+-+---++
> | planner.in_subquery_threshold  | LONG  | SYSTEM  | DEFAULT  | 20
>  | null| null  | null   |
> 
> ++---+-+
 --+--+-+---++
> 
> On Sun, Oct 2, 2016 at 12:56 AM, Tushar Pathare 
> wrote:
>> Hello Team,
>> 
>> A select clause with IN operator creates a issue if the count of
>> goes
> beyond 19 for params.Is this a tunable or this is a drawback.
>> If the number of params is made less that 19 the same select
>> statement
> works
>> 
>> Select something……
>> 
>> IN ( '94479 ', '296979 ', '219579 ', '109179 ', '97179 ', '223179 ',
> '96279 ', '224979 ', '282879 ', '33279 ', '277179 ', '177879 ',
>> '272049
 ',
> '49179 ', '104049 ','177879 ', '272049 ', '49179 ', '104049 ',
>> '104049 ')
>> 
>> Error thrown
>> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ
 ERROR:
> The JDBC storage plugin failed while trying setup the SQL query
>> 
>> 
>> 
>> 
>> Tushar B Pathare
>> High Performance Computing (HPC) Administrator
>> General Parallel File System
>> Scientific Computing
>> Bioinformatics Division
>> Research
>> 
>> Sidra Medical and Research Centre
>> Sidra OPC Building
>> PO Box 26999  |  Doha, Qatar
>> Near QNCC,5th Floor
>> Office 4003  ext 37443 | M +974 74793547 <+974%207479%203547>
>> <+974%207479%203547>
>> tpath...@sidra.org | www.sidra.org <
 http://www.sidra.org/>
>> 
>> 
>> 
>> 
>> 
>> 
>> Disclaimer: This email and its attachments may be confidential and
>> are
> intended solely for the use of the individual to whom it is
>> addressed. If
> you are not the intended recipient, any reading, printing, storage,
> disclosure, copying or any other action taken in respect of this
>> e-mail
 is
> prohibited and may be unlawful. If you are not the intended recipient,
> please notify the sender immediately by using the reply function and
>> then
> permanently delete what you have received. Any views or opinions
 expressed
> are solely those of the author and do not necessarily represent those
>> of
> Sidra Medical and Research Center.
> 
 Disclaimer: This email and its attachments may be confidential and are
 intended solely for the use of the

Re: Excluding HDFS .tmp file from multi-file query?

2016-09-22 Thread Andries Engelbrecht

I noticed if you specifically use * for file matching it will still read hidden 
files. However if you only point Drill at a directory it will read the 
directory and sub structure without reading any hidden files.

select * from `/dir1/*`  - will read hidden files
select * from `/dir1` will not read hidden files

So it depends if you need to use file name pattern matching or not. Most of the 
time it is a good idea not to mix different data in the same directory 
structure, but rather use the directory structures to separate different data 
and types as it makes for easier matching management down the road.

At either rate you found a solution for your needs. I have not looked if there 
are exclusion parameters for pattern matching.

--Andries

> On Sep 22, 2016, at 4:11 AM, Robin Moffatt <robin.moff...@rittmanmead.com> 
> wrote:
> 
> Hi,
> 
> It still tried to read it, even with a . prefix:
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/22/*`(type => 'json'));
> Error: DATA_READ ERROR: Failure reading JSON file - Cannot obtain block
> length for
> LocatedBlock{BP-478416316-192.168.10.112-1466151126376:blk_1074005711_265071;
> getBlockSize()=39945; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[192.168.10.115:50010,DS-a0e97909-3d40-4f49-b67f-636e9f10928a,DISK],
> DatanodeInfoWithStorage[192.168.10.114:50010,DS-6c2cd5a6-22c5-4445-9018-ca0f2549a6cf,DISK],
> DatanodeInfoWithStorage[192.168.10.117:50010
> ,DS-70946f9d-95d9-4f35-b19d-97b8dc01cb88,DISK]]}
> 
> File  /user/flume/incoming/twitter/2016/09/22/.FlumeData.1474530954642.tmp
> Record  1
> Fragment 0:0
> 
> [Error Id: 04d44b5d-6d02-4062-9f4a-ebf3831d9ba1 on
> cdh57-01-node-01.moffatt.me:31010] (state=,code=0)
> 
> 
> However - good call on the Flume hdfs.inUsePrefix - by using that I can
> then set my Drill file pattern sufficiently so that it doesn't match on the
> in-use file:
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/22/Flume*`(type =>
> 'json'));
> +-+
> | EXPR$0  |
> +-+
> | 12652   |
> +-+
> 1 row selected (6.34 seconds)
> 
> 
> So problem sidestepped for now - but would be good to understand if I
> couldn't modify the producer, if there's a way to get Drill to omit certain
> patterns from its file matching?
> 
> thanks, Robin.
> 
> 
> On 21 September 2016 at 23:15, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> 
>> Add a . prefix to the Flume temp files. Drill will ignore the hidden files
>> when you query the directory structure.
>> 
>> --Andries
>> 
>>> On Sep 21, 2016, at 2:36 PM, Robin Moffatt <
>> robin.moff...@rittmanmead.com> wrote:
>>> 
>>> Hi,
>>> I have a stream of data from Flume landing in HDFS in files of a set
>> size.
>>> I can query these files individually just fine, and across multiple ones
>>> too - except if the wildcard encompasses the *currently open HDFS file
>> that
>>> Flume is writing to*. When this happens, Drill understandably barfs.
>>> 
>>> 0: jdbc:drill:drillbit=localhost> show files in
>>> `hdfs`.`/user/flume/incoming/twitter/2016/09/21/`;
>>> +--+--+-+---
>> ---++-+--+--
>> +--+
>>> | name | isDirectory  | isFile  |  length  |
>> owner
>>> |group| permissions  |accessTime|
>>> modificationTime |
>>> +--+--+-+---
>> ---++-+--+--
>> +--+
>>> [...]
>>> | FlumeData.1474467815652  | false| true| 1055490  |
>> flume
>>> | supergroup  | rw-r--r--| 2016-09-21 21:52:07.219  | 2016-09-21
>>> 21:58:58.28   |
>>> | FlumeData.1474467815653  | false| true| 1050470  |
>> flume
>>> | supergroup  | rw-r--r--| 2016-09-21 21:58:58.556  | 2016-09-21
>>> 22:06:28.636  |
>>> | FlumeData.1474467815654  | false| true| 1051043  |
>> flume
>>> | supergroup  | rw-r--r--| 2016-09-21 22:06:29.564  | 2016-09-21
>>> 22:13:40.808  |
>>> | FlumeData.1474467815655  | false| true| 1052657  |
>> flume
>>> | supergroup  | rw-r--r--| 2016-09-21 22:13:40.978  | 2016-09-21
>>> 22:23:00.409  |
>>> | FlumeData.1474467815656.tmp  | false

Re: Excluding HDFS .tmp file from multi-file query?

2016-09-21 Thread Andries Engelbrecht

Add a . prefix to the Flume temp files. Drill will ignore the hidden files when 
you query the directory structure.

--Andries

> On Sep 21, 2016, at 2:36 PM, Robin Moffatt  
> wrote:
> 
> Hi,
> I have a stream of data from Flume landing in HDFS in files of a set size.
> I can query these files individually just fine, and across multiple ones
> too - except if the wildcard encompasses the *currently open HDFS file that
> Flume is writing to*. When this happens, Drill understandably barfs.
> 
> 0: jdbc:drill:drillbit=localhost> show files in
> `hdfs`.`/user/flume/incoming/twitter/2016/09/21/`;
> +--+--+-+--++-+--+--+--+
> | name | isDirectory  | isFile  |  length  | owner
> |group| permissions  |accessTime|
> modificationTime |
> +--+--+-+--++-+--+--+--+
> [...]
> | FlumeData.1474467815652  | false| true| 1055490  | flume
> | supergroup  | rw-r--r--| 2016-09-21 21:52:07.219  | 2016-09-21
> 21:58:58.28   |
> | FlumeData.1474467815653  | false| true| 1050470  | flume
> | supergroup  | rw-r--r--| 2016-09-21 21:58:58.556  | 2016-09-21
> 22:06:28.636  |
> | FlumeData.1474467815654  | false| true| 1051043  | flume
> | supergroup  | rw-r--r--| 2016-09-21 22:06:29.564  | 2016-09-21
> 22:13:40.808  |
> | FlumeData.1474467815655  | false| true| 1052657  | flume
> | supergroup  | rw-r--r--| 2016-09-21 22:13:40.978  | 2016-09-21
> 22:23:00.409  |
> | FlumeData.1474467815656.tmp  | false| true| 9447 | flume
> | supergroup  | rw-r--r--| 2016-09-21 22:23:00.788  | 2016-09-21
> 22:23:00.788  |
> +--+--+-+--++-+--+--+--+
> 59 rows selected (0.265 seconds)
> 
> Note the .tmp file as the last one in the folder
> 
> Querying a single file works :
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/FlumeData.1474467815655`(type
> => 'json'));
> +-+
> | EXPR$0  |
> +-+
> | 221 |
> +-+
> 1 row selected (0.685 seconds)
> 
> 
> As does across multiple files where the wildcard pattern would exclude the
> .tmp file:
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/FlumeData.147446781564*`(type
> => 'json'));
> +-+
> | EXPR$0  |
> +-+
> | 2178|
> +-+
> 1 row selected (1.24 seconds)
> 
> 
> But if I try to query all the files, Drill includes the .tmp file and
> errors:
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/*`(type => 'json'));
> Error: DATA_READ ERROR: Failure reading JSON file - Cannot obtain block
> length for
> LocatedBlock{BP-478416316-192.168.10.112-1466151126376:blk_1074004983_264343;
> getBlockSize()=9447; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[
> 192.168.10.116:50010,DS-39bf5e74-3eec-4447-9cd2-f17b5cc259b8,DISK],
> DatanodeInfoWithStorage[192.168.10.113:50010,DS-845945e7-0bc8-44aa-945c-a140ad1f55ab,DISK],
> DatanodeInfoWithStorage[192.168.10.115:50010
> ,DS-a0e97909-3d40-4f49-b67f-636e9f10928a,DISK]]}
> 
> File  /user/flume/incoming/twitter/2016/09/21/FlumeData.1474467815656.tmp
> Record  1
> Fragment 0:0
> 
> [Error Id: d3f322cb-c64d-43c8-9231-fb2c96e8589d on
> cdh57-01-node-01.moffatt.me:31010] (state=,code=0)
> 0: jdbc:drill:drillbit=localhost>
> 
> 
> Is there a way around this with Drill? For example, can I use a regex in
> the path? I've tried, but just hit
> Error: VALIDATION ERROR: null
> 
> thanks, Robin.

Re: Drill upgrade Questions

2016-09-12 Thread Andries Engelbrecht

Ramya,

The MapR Drill 1.8 package has not been released yet. The latest MapR Drill 
Sandbox is Drill 1.6 on MapR 5.1.

Which is available from here
http://maprdocs.mapr.com/home/Drill/Installing-the-Apache-Dri_27689260.html 


Once the MapR Drill 1.8 packaging is released you can upgrade to Drill 1.8 on 
the Sandbox or see if a newer version is available with Drill 1.8 on it.

Do you need Drill 1.8 now or is 1.6 sufficient in the short term?

--Andries



> On Sep 12, 2016, at 2:24 PM, Ramya Vasudevan  wrote:
> 
> 
> Hi
> 
> I used MapR-Sandbox-For-Apache-Drill-1.4.0-5.0.0-vmware.ova to install Mapr 
> 5.0 with Drill 1.6.0 on a CentOS 6.7 VM in the past.
> 
> I am now trying to upgrade drill to 1.8.0 and downloaded 
> apache-drill-1.8.0.tar.gz from 
> https://drill.apache.org/blog/2016/08/30/drill-1.8-released/
> 
> Looking at 
> http://doc.mapr.com/display/MapR/Upgrading+to+the+Latest+Version+of+Drill, I 
> gather that a yum upgrade to 1.8.0 rpm is not possible. Is that correct?
> -   If not, can you please point me to the rpm download location and/or 
> instructions for it?
> 
> Or, can I follow https://drill.apache.org/docs/drill-in-10-minutes/ and 
> extract apache-drill-1.8.0.tar.gz to location and restore all the .conf files 
> from 1.6.0 location?
> -   If so, what are the complete set of instructions for the upgrade?
> 
> Thank You,
> Ramya
> 
> 
>

Re: Date Formatting Question

2016-09-08 Thread Andries Engelbrecht

Looks like Jason just beat me to it :-)



> On Sep 8, 2016, at 3:15 PM, Andries Engelbrecht <aengelbre...@maprtech.com> 
> wrote:
> 
> You are likely looking for the to_char function to convert date to a 
> different string format. You can always convert back to a date by using 
> to_date.
> 
> select current_date, to_char(current_date,'mm/dd/') from (values(1));
> +---+-+
> | current_date  |   EXPR$1|
> +---+-+
> | 2016-09-08| 00/08/2016  |
> +---+-+
> 1 row selected (0.376 seconds)
> 
> See conversion functions here
> 
> http://drill.apache.org/docs/data-type-conversion/#to_char 
> <http://drill.apache.org/docs/data-type-conversion/#to_char>
> 
> 
> --Andries
> 
> 
> 
> 
>> On Sep 8, 2016, at 3:10 PM, Kunal Khatua <kkha...@maprtech.com 
>> <mailto:kkha...@maprtech.com>> wrote:
>> 
>> Hi Charles,
>> 
>> There isn't a single function that would convert to the format you want, 
>> since that would be a user-specific format and not native to the internal 
>> Drill representation of the date format. 
>> 
>> Treating the output format of 'mm/dd/' as a string is a solution, using 
>> the date-time functions that was shared in the link by tokenizing and 
>> concatenating the date tokens of month,date and year.
>> 
>> ~ Kunal
>> On Thu 8-Sep-2016 12:20:43 PM, Charles Givre <cgi...@gmail.com 
>> <mailto:cgi...@gmail.com>> wrote:
>> Hi Khurram,
>> I looked through all that already and I didn't see anything that did what I
>> was wanting to do, or am I missing it?
>> Thanks,
>> -- C
>> 
>> On Thu, Sep 8, 2016 at 3:14 PM, Khurram Faraaz wrote:
>> 
>>> Here is the link to currently supported datetime functions in Drill
>>> 
>>> https://drill.apache.org/docs/date-time-functions-and-arithmetic/#extract 
>>> <https://drill.apache.org/docs/date-time-functions-and-arithmetic/#extract>
>>> 
>>> On Fri, Sep 9, 2016 at 12:32 AM, Charles Givre wrote:
>>> 
>>>> Hello everyone,
>>>> I have a question about formatting dates. Let's say that I have some
>>> data
>>>> which has dates in format of -mm-dd but I would like to convert them
>>> to
>>>> mm/dd/ format (or whatever). Does Drill have something that is
>>> roughly
>>>> equivalent to MySQL's DATE_FORMAT( , ) function
>>>> whereby you can change the formatting of a date to whatever you want?
>>>> Thanks,
>>>> -- Charles
>>>> 
>>> 
>

Re: Date Formatting Question

2016-09-08 Thread Andries Engelbrecht

You are likely looking for the to_char function to convert date to a different 
string format. You can always convert back to a date by using to_date.

select current_date, to_char(current_date,'mm/dd/') from (values(1));
+---+-+
| current_date  |   EXPR$1|
+---+-+
| 2016-09-08| 00/08/2016  |
+---+-+
1 row selected (0.376 seconds)

See conversion functions here

http://drill.apache.org/docs/data-type-conversion/#to_char 



--Andries




> On Sep 8, 2016, at 3:10 PM, Kunal Khatua  wrote:
> 
> Hi Charles,
> 
> There isn't a single function that would convert to the format you want, 
> since that would be a user-specific format and not native to the internal 
> Drill representation of the date format. 
> 
> Treating the output format of 'mm/dd/' as a string is a solution, using 
> the date-time functions that was shared in the link by tokenizing and 
> concatenating the date tokens of month,date and year.
> 
> ~ Kunal
> On Thu 8-Sep-2016 12:20:43 PM, Charles Givre  wrote:
> Hi Khurram,
> I looked through all that already and I didn't see anything that did what I
> was wanting to do, or am I missing it?
> Thanks,
> -- C
> 
> On Thu, Sep 8, 2016 at 3:14 PM, Khurram Faraaz wrote:
> 
>> Here is the link to currently supported datetime functions in Drill
>> 
>> https://drill.apache.org/docs/date-time-functions-and-arithmetic/#extract
>> 
>> On Fri, Sep 9, 2016 at 12:32 AM, Charles Givre wrote:
>> 
>>> Hello everyone,
>>> I have a question about formatting dates. Let's say that I have some
>> data
>>> which has dates in format of -mm-dd but I would like to convert them
>> to
>>> mm/dd/ format (or whatever). Does Drill have something that is
>> roughly
>>> equivalent to MySQL's DATE_FORMAT( , ) function
>>> whereby you can change the formatting of a date to whatever you want?
>>> Thanks,
>>> -- Charles
>>> 
>>

Re: How to save output in any format

2016-09-06 Thread Andries Engelbrecht

You may also look at using the storage plugins to create output by using CTAS.
You then need to set the session/system to use the appropriate storage format 
(store.format).
http://drill.apache.org/docs/configuration-options-introduction/ 


However in your case it is mostly limited to delimited text formats. (No excel, 
etc)
http://drill.apache.org/docs/data-sources-and-file-formats-introduction/ 



--Andries



> On Sep 6, 2016, at 9:19 AM, Abhishek Girish  wrote:
> 
> You can specify this using the outputformat option in sqlline.
> 
> sqlline> !record ~/output.csv
> sqlline> !outputformat csv
> 
> Another way I usually do is via sqlline variables and scripts.
> 
> 
> $DRILL_HOME/bin/sqlline -u jdbc:drill:zk=: --outputformat=csv
> < ~/input.sql > ~/output.csv
> 
> 
> On Tue, Sep 6, 2016 at 12:57 AM,  wrote:
> 
>> I am using drill in embedded mode in window having 1.8 version.
>> Suppose i have fire query :- select * from DemoSQLServer..Attribute; It
>> will return some output.
>> Now my question is :- Is it possible to save the output ?  I know in drill
>> documnet  !record  is there:- https://drill.apache.
>> org/docs/configuring-the-drill-shell/
>> While using record it will save the output in bin folder in file format.
>> But I want to save the output in any format  (e.g:- csv, txt, excel, etc)
>> and in any where (i.e:- in any directory or folder). OR:- Can i give file
>> extension(e.g:- csv, excel, txt, etc) and directory name( e.g:-
>> d://) which i want to save the output in query itself.?
>> Is it Possible in Drill.? or any other option is there.?
>> 
>> Sent from Yahoo Mail. Get the app

Re: move drill log directory to HDFS

2016-08-26 Thread Andries Engelbrecht

https://drill.apache.org/docs/persistent-configuration-storage/ 



--Andries

> On Aug 26, 2016, at 4:25 AM, Anup Tiwari  wrote:
> 
> Also please note that I have tried below in all node's drill-env.sh but its
> not working.
> 
> export DRILL_LOG_DIR="hdfs://namenode:9000/tmp/drilllogs/"
> 
> 
> Regards,
> *Anup Tiwari*
> 
> 
> On Fri, Aug 26, 2016 at 4:06 PM, Anup Tiwari 
> wrote:
> 
>> Hi All,
>> 
>> We are trying to move drill logs directory from local file system to HDFS
>> so that we can refer only one location rather than each node's log
>> directory.
>> 
>> Can anyone help me on this?
>> 
>> Regards,
>> *Anup Tiwari*
>> 
>>

Re: Array data type in Where Clause

2016-08-19 Thread Andries Engelbrecht

Use REPEATED_CONTAINS.
http://drill.apache.org/docs/repeated-contains/ 



--Andries



> On Aug 19, 2016, at 3:15 PM, Kathiresan S  
> wrote:
> 
> Hi,
> 
> We have a json where one of the columns is of type array (of Strings) and
> we would like to run a query to get the rows where any of the values in the
> array matches the given value.
> 
> *(Eg): cities.json*
> {"id":"1","city":["nyc","la"]}
> {"id":"2","city":["la"]}
> {"id":"3","city":["nyc"]}
> {"id":"4","city":["la","nyc"]}
> 
> *Query we would like to run (something like below)*
> 
> 
> *select id from cp.`cities.json` where any[city] = 'la'*
> 
> Or
> 
> *select id from cp.`cities.json` where city contains 'la'*
> 
> which is expected to return rows 1,2 & 4
> 
> Is there any way we can get this done without using FLATTEN?
> 
> Drill version that we use : 1.6.0
> 
> Thanks,
> Kathir

Re: JDBC metadata from storage plugin

2016-08-11 Thread Andries Engelbrecht

I configured mysql using the JDBC plugin and the metadata is available in 
INFORMATION_SCHEMA.

0: jdbc:drill:> select * from INFORMATION_SCHEMA.`TABLES` where TABLE_SCHEMA = 
'mysql';
++---++-+
| TABLE_CATALOG  | TABLE_SCHEMA  | TABLE_NAME | TABLE_TYPE  |
++---++-+
| DRILL  | mysql | BUCKETING_COLS | TABLE   |
| DRILL  | mysql | CDS| TABLE   |
| DRILL  | mysql | COLUMNS_V2 | TABLE   |
| DRILL  | mysql | DATABASE_PARAMS| TABLE   |
| DRILL  | mysql | DBS| TABLE   |
| DRILL  | mysql | DB_PRIVS   | TABLE   |
| DRILL  | mysql | FUNCS  | TABLE   |
| DRILL  | mysql | FUNC_RU| TABLE   |
| DRILL  | mysql | GLOBAL_PRIVS   | TABLE   |
| DRILL  | mysql | IDXS   | TABLE   |
| DRILL  | mysql | INDEX_PARAMS   | TABLE   |
| DRILL  | mysql | PARTITIONS | TABLE   |
| DRILL  | mysql | PARTITION_KEYS | TABLE   |
| DRILL  | mysql | PARTITION_KEY_VALS | TABLE   |
| DRILL  | mysql | PARTITION_PARAMS   | TABLE   |
| DRILL  | mysql | PART_COL_PRIVS | TABLE   |
| DRILL  | mysql | PART_COL_STATS | TABLE   |
| DRILL  | mysql | PART_PRIVS | TABLE   |
| DRILL  | mysql | ROLES  | TABLE   |
| DRILL  | mysql | SDS| TABLE   |
| DRILL  | mysql | SD_PARAMS  | TABLE   |
| DRILL  | mysql | SEQUENCE_TABLE | TABLE   |
| DRILL  | mysql | SERDES | TABLE   |
| DRILL  | mysql | SERDE_PARAMS   | TABLE   |
| DRILL  | mysql | SKEWED_COL_NAMES   | TABLE   |
| DRILL  | mysql | SKEWED_COL_VALUE_LOC_MAP   | TABLE   |
| DRILL  | mysql | SKEWED_STRING_LIST | TABLE   |
| DRILL  | mysql | SKEWED_STRING_LIST_VALUES  | TABLE   |
| DRILL  | mysql | SKEWED_VALUES  | TABLE   |
| DRILL  | mysql | SORT_COLS  | TABLE   |
| DRILL  | mysql | TABLE_PARAMS   | TABLE   |
| DRILL  | mysql | TAB_COL_STATS  | TABLE   |
| DRILL  | mysql | TBLS   | TABLE   |
| DRILL  | mysql | TBL_COL_PRIVS  | TABLE   |
| DRILL  | mysql | TBL_PRIVS  | TABLE   |
| DRILL  | mysql | VERSION| TABLE   |
++---++-+
36 rows selected (0.128 seconds)


0: jdbc:drill:> select * from INFORMATION_SCHEMA.`COLUMNS` where TABLE_SCHEMA = 
'mysql';
++---+-+-+---+-+--++---+-++--++-++-+
| TABLE_CATALOG  | TABLE_SCHEMA  | TABLE_NAME  |   COLUMN_NAME  
 | ORDINAL_POSITION  | COLUMN_DEFAULT  | IS_NULLABLE  | DATA_TYPE  | 
CHARACTER_MAXIMUM_LENGTH  | CHARACTER_OCTET_LENGTH  | NUMERIC_PRECISION  | 
NUMERIC_PRECISION_RADIX  | NUMERIC_SCALE  | DATETIME_PRECISION  | INTERVAL_TYPE 
 | INTERVAL_PRECISION  |
++---+-+-+---+-+--++---+-++--++-++-+
| DRILL  | mysql | BUCKETING_COLS  | SD_ID  
 | 1 | null| NO   | BIGINT | 
null  | null| 64 | 2
| 0  | null| null   | 
null|
| DRILL  | mysql | BUCKETING_COLS  | BUCKET_COL_NAME
 | 2 | null| YES  | CHARACTER VARYING  | 
255   | 1020|

Re: Need help in connecting/retrieving Mongo DB data using Apache drill from Spotfire

2016-08-08 Thread Andries Engelbrecht

http://drill.apache.org/docs/create-view/ 
<http://drill.apache.org/docs/create-view/>

http://drill.apache.org/docs/sql-extensions/ 
<http://drill.apache.org/docs/sql-extensions/>

http://drill.apache.org/docs/mongodb-storage-plugin/ 
<http://drill.apache.org/docs/mongodb-storage-plugin/>


--Andries



> On Aug 8, 2016, at 8:47 AM, Sandhya Modupalli <sandhya_2...@yahoo.com> wrote:
> 
> Hi Andries,
> 
> Thanks for the quick response.
> 
> Could you please point me to the right place where I can find examples for 
> creating drill views or custom sql in Spotfire which can be read by drill.
> 
> Thanks & Regards,
> Sandhya
> From: Andries Engelbrecht <mailto:aengelbre...@maprtech.com>
> Sent: ‎08-‎08-‎2016 04:46 PM
> To: user@drill.apache.org <mailto:user@drill.apache.org>; Modupalli Sandhya 
> <mailto:sandhya_2...@yahoo.com>
> Cc: tshi...@apache.org <mailto:tshi...@apache.org>; jnad...@apache.org 
> <mailto:jnad...@apache.org>
> Subject: Re: Need help in connecting/retrieving Mongo DB data using Apache 
> drill from Spotfire
> 
> You will probably need to create Drill Views on the MongoDB tables for it to 
> display successfully. Tools like Spotfire will query Drill's INFORMATION 
> _SCHEMA for metadata, but complex data sources like Mongo will not provide 
> all the metadata in a consumable manner.
> 
> The advantage of a View is that you can use it to handle complex data 
> structures with Drill extended SQL, and make it available in a tabular format 
> that is more easily consumed by Spotfire.
> 
> Alternatively you can create custom SQL data structure in Spotfire to gain 
> access to Mongo.
> 
> --Andries
> 
> 
>> On Aug 8, 2016, at 1:07 AM, Modupalli Sandhya 
>> <sandhya_2...@yahoo.com.INVALID <mailto:sandhya_2...@yahoo.com.invalid>> 
>> wrote:
>> 
>> Dear team,
>> 
>> We have a usecase where we need to conenct to mongoDB from spotfire and have 
>> to retrieve all data for visualizations. Here we have taken the step to go 
>> with Apache drill which inturn connects to mongo DB.
>> 
>> We have followed the steps given in documentation and have setup the 
>> connection to apache drill from spotfire successfully.
>> 
>> https://drill.apache.org/docs/configuring-tibco-spotfire-server-with-drill/ 
>> <https://drill.apache.org/docs/configuring-tibco-spotfire-server-with-drill/>
>> 
>> When we expand the database connection in spotfire using Drill JDBC/ODBC 
>> driver, we are not able to see the list of columns/attributes under the 
>> collection but the same columns we are able to see when we use the progress 
>> odbc driver for mongo db in spotfire.
>> 
>> We have tried with both ODBC and JDBC driver for apache drill, its the same 
>> issue with both drivers.
>> 
>> When I expand users collection in information designer, am not finding 
>> users(collection).id/name/age, I see only ???* under users collection.
>> 
>> Please find attached screenshots which gives a clear picture that we are not 
>> able to see id/name/age columns of user collection and kindly help us in 
>> fixing this issue at the earliest.
>> 
>> Many Thanks in advance,
>> Sandhya Modupalli
>

Re: Request in Embedded mode

2016-08-08 Thread Andries Engelbrecht

Drill has queueing, see if this helps for your use case.
http://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/ 

The issue is that a Drillbit can run a certain number of parallel fragments per 
query, and you will have to balance that with the number of incoming requests. 

--Andries

> On Aug 4, 2016, at 11:20 PM,  
>  wrote:
> 
> Hello 
> I am running Apache Drill (1.7) in Embedded Mode in Window 10 OS.
> My Query is at time how many request does Drill accepted ?Suppose I have 
> drill install in my system and its running in embedded mode. At a time how 
> many requests does my Drill accept from other system.?  
> 
> Sent from Yahoo Mail. Get the app

Re: Need help in connecting/retrieving Mongo DB data using Apache drill from Spotfire

2016-08-08 Thread Andries Engelbrecht

You will probably need to create Drill Views on the MongoDB tables for it to 
display successfully. Tools like Spotfire will query Drill's INFORMATION 
_SCHEMA for metadata, but complex data sources like Mongo will not provide all 
the metadata in a consumable manner.

The advantage of a View is that you can use it to handle complex data 
structures with Drill extended SQL, and make it available in a tabular format 
that is more easily consumed by Spotfire.

Alternatively you can create custom SQL data structure in Spotfire to gain 
access to Mongo.

--Andries


> On Aug 8, 2016, at 1:07 AM, Modupalli Sandhya 
>  wrote:
> 
> Dear team,
> 
> We have a usecase where we need to conenct to mongoDB from spotfire and have 
> to retrieve all data for visualizations. Here we have taken the step to go 
> with Apache drill which inturn connects to mongo DB.
> 
> We have followed the steps given in documentation and have setup the 
> connection to apache drill from spotfire successfully.
> 
> https://drill.apache.org/docs/configuring-tibco-spotfire-server-with-drill/ 
> 
> 
> When we expand the database connection in spotfire using Drill JDBC/ODBC 
> driver, we are not able to see the list of columns/attributes under the 
> collection but the same columns we are able to see when we use the progress 
> odbc driver for mongo db in spotfire.
> 
> We have tried with both ODBC and JDBC driver for apache drill, its the same 
> issue with both drivers.
> 
> When I expand users collection in information designer, am not finding 
> users(collection).id/name/age, I see only ???* under users collection.
> 
> Please find attached screenshots which gives a clear picture that we are not 
> able to see id/name/age columns of user collection and kindly help us in 
> fixing this issue at the earliest.
> 
> Many Thanks in advance,
> Sandhya Modupalli

Re: S3 bucket configuration security error on the storage plugin

2016-08-04 Thread Andries Engelbrecht

Also if you want to put the key in the plugin config the syntax is as follows

  "config": {
"fs.s3a.access.key": "",
"fs.s3a.secret.key": ""
  },


--Andries


> On Aug 4, 2016, at 7:55 AM, Andries Engelbrecht <aengelbre...@maprtech.com> 
> wrote:
> 
> You have to put the actual ID and KEY in the storage plugin (not recommended 
> per se if you don't have the WebUI secured).
> 
> Or you can edit the core-site.xml file in the drill conf directory to add the 
> s3a ID and KEY there.
> 
> Also use s3a not s3 in the later versions of Drill.
> 
> http://drill.apache.org/docs/s3-storage-plugin/ 
> <http://drill.apache.org/docs/s3-storage-plugin/>
> 
> --Andries
> 
> 
> 
> 
>> On Aug 3, 2016, at 5:48 PM, <mdhr...@yahoo.com.INVALID 
>> <mailto:mdhr...@yahoo.com.invalid>> <mdhr...@yahoo.com.INVALID 
>> <mailto:mdhr...@yahoo.com.invalid>> wrote:
>> 
>> 
>> Hi,
>> I am trying to configure an s3 bucket and I am getting this error. Can any 
>> one help me
>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
>> IllegalArgumentException: AWS Access Key ID and Secret Access Key must be 
>> specified as the username or password (respectively) of a s3 URL, or by 
>> setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
>> (respectively).[Error Id: 02956e73-87a8-4279-b6db-cb89d11821b8 on 
>> 10.64.86.175:31010] 
>> 
>> The bucket name is itgs-hiss-reporting
>>  are the sub folder in it.
>> /staging/oneems
>> The storage plugin  is{
>>   "type": "file",
>>   "enabled": true,
>>   "connection": "s3://itgs-hiss-reporting ",
>>   "config": {
>> "fs.s3a.awsAccessKeyId": "id",
>> "fs.s3a.awsSecretAccessKey": "key"
>>   },
>>   "workspaces": {
>> "root": {
>>   "location": "/staging",
>>   "writable": false,
>>   "defaultInputFormat": "csv"
>> },
>> the core-site.xml config is
>> 
>> 
>> 
>> fs.s3a.access.key
>> ID
>> 
>> 
>> 
>> fs.s3a.secret.key
>> Secret Key
>>
>> 
>> 
>> 
>> 
>> When I got to drill ip:8047 
>> 
>> and then do a show databases I get this error
>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
>> IllegalArgumentException: AWS Access Key ID and Secret Access Key must be 
>> specified as the username or password (respectively) of a s3 URL, or by 
>> setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
>> (respectively).[Error Id: 352382c5-76ec-4253-b323-35e167f14db6 
>> 
>> 
>> 
>> 
>> 
>

Re: S3 bucket configuration security error on the storage plugin

2016-08-04 Thread Andries Engelbrecht

You have to put the actual ID and KEY in the storage plugin (not recommended 
per se if you don't have the WebUI secured).

Or you can edit the core-site.xml file in the drill conf directory to add the 
s3a ID and KEY there.

Also use s3a not s3 in the later versions of Drill.

http://drill.apache.org/docs/s3-storage-plugin/ 


--Andries




> On Aug 3, 2016, at 5:48 PM,  
>  wrote:
> 
> 
> Hi,
> I am trying to configure an s3 bucket and I am getting this error. Can any 
> one help me
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: AWS Access Key ID and Secret Access Key must be 
> specified as the username or password (respectively) of a s3 URL, or by 
> setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
> (respectively).[Error Id: 02956e73-87a8-4279-b6db-cb89d11821b8 on 
> 10.64.86.175:31010] 
> 
> The bucket name is itgs-hiss-reporting
>  are the sub folder in it.
> /staging/oneems
> The storage plugin  is{
>   "type": "file",
>   "enabled": true,
>   "connection": "s3://itgs-hiss-reporting",
>   "config": {
> "fs.s3a.awsAccessKeyId": "id",
> "fs.s3a.awsSecretAccessKey": "key"
>   },
>   "workspaces": {
> "root": {
>   "location": "/staging",
>   "writable": false,
>   "defaultInputFormat": "csv"
> },
> the core-site.xml config is
> 
> 
> 
> fs.s3a.access.key
> ID
> 
> 
> 
> fs.s3a.secret.key
> Secret Key
>
> 
> 
> 
> 
> When I got to drill ip:8047 
> 
> and then do a show databases I get this error
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: AWS Access Key ID and Secret Access Key must be 
> specified as the username or password (respectively) of a s3 URL, or by 
> setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
> (respectively).[Error Id: 352382c5-76ec-4253-b323-35e167f14db6 
> 
> 
> 
> 
>

Re: coaxing hour of day from a timestamp

2016-08-02 Thread Andries Engelbrecht

To simplify Vince's query

0: jdbc:drill:> select date_part('hour', to_timestamp('28/04/16 2:00', 
'dd/MM/yy HH:mm')) from (values(1));
+-+
| EXPR$0  |
+-+
| 2   |
+-+

But basically to_timestamp allows you to specify the format.

--Andries


> On Aug 2, 2016, at 5:14 PM, Vince Gonzalez <vgonza...@mapr.com> wrote:
> 
> How about this?
> 
> 0: jdbc:drill:> select date_part('hour', t.ts) from (select
> to_timestamp('28/04/16 2:00', 'dd/MM/yy HH:mm') ts from sys.version) t;
> +-+
> | EXPR$0  |
> +-+
> | 2   |
> +-+
> 1 row selected (0.442 seconds)
> 
> 
> 
> Vince Gonzalez
> Systems Engineer
> 212.694.3879
> 
> mapr.com
> 
> On Tue, Aug 2, 2016 at 5:11 PM, Joseph Blue <jb...@maprtech.com> wrote:
> 
>> *Query:*
>> select
>> Datatime_start,
>> date_part('day',Datatime_Start) `day`,
>> date_part('month',Datatime_Start) `month`,
>> date_part('year',Datatime_Start) `year`,
>> date_part('hour',Datatime_Start) `hour`,
>> date_part('minute',Datatime_Start) `minute`
>> from dfs.tmp.tv
>> limit 1
>> 
>> The question is = how do I get hour = 2?
>> 
>> *Output:*
>> Datatime_start  day month year hour minute second
>> 28/04/16 2:00   16  4 2028 00  0
>> 
>> On Tue, Aug 2, 2016 at 2:08 PM, Andries Engelbrecht <
>> aengelbre...@maprtech.com> wrote:
>> 
>>> Attachments do not show on the mail list, perhaps just type out an
>> example.
>>> 
>>> --Andries
>>> 
>>>> On Aug 2, 2016, at 1:56 PM, Joseph Blue <jb...@maprtech.com> wrote:
>>>> 
>>>> My bad on the formatting. Here is a screen shot of the query. Note bad
>>> m/d/y and hour=min=sec=0.0
>>>> 
>>>> 
>>>> On Tue, Aug 2, 2016 at 1:46 PM, Joseph Blue <jb...@maprtech.com
>> >> jb...@maprtech.com>> wrote:
>>>> The field I have is a timestamp. The date is obviously in wrong order
>> in
>>> the time stamp (I can break it up and reassemble to get a good date, so
>> no
>>> problem there).
>>>> I do not seem to be able to get the hour of the day using the
>>> date_parts, so that data seems obscured.
>>>> Any ideas how to get the 2 o'clock time out of this field?
>>>> 
>>>> Query...
>>>> select
>>>> Datatime_start,
>>>> date_part('day',Datatime_Start) `day`,
>>>> date_part('month',Datatime_Start) `month`,
>>>> date_part('year',Datatime_Start) `year`,
>>>> date_part('hour',Datatime_Start) `hour`,
>>>> date_part('minute',Datatime_Start) `minute`
>>>> from dfs.tmp.tv <http://dfs.tmp.tv/>
>>>> limit 5
>>>> 
>>>> Results
>>>> 
>>>> Datatime_start
>>>> day
>>>> month
>>>> year
>>>> hour
>>>> minute
>>>> Datatime_start
>>>> day
>>>> month
>>>> year
>>>> hour
>>>> minute
>>>> 28/04/16 2:00 16 4 2028 0 0 28/04/16 2:00 16 4 2028 0 0 28/04/16 2:00
>> 16
>>> 4 2028 0 0 28/04/16 2:00 16 4 2028 0 0
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joseph Blue
>>>> Data Scientist
>>>> jb...@maprtech.com <mailto:jb...@maprtech.com> | www.mapr.com <
>>> http://www.mapr.com/>
>>>> Mobile: 858-357-4926 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joseph Blue
>>>> Data Scientist
>>>> jb...@maprtech.com <mailto:jb...@maprtech.com> | www.mapr.com <
>>> http://www.mapr.com/>
>>>> Mobile: 858-357-4926
>>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Joseph Blue
>> Data Scientist
>> jb...@maprtech.com | www.mapr.com
>> *Mobile: 858-357-4926*
>>

Re: coaxing hour of day from a timestamp

2016-08-02 Thread Andries Engelbrecht

Attachments do not show on the mail list, perhaps just type out an example.

--Andries

> On Aug 2, 2016, at 1:56 PM, Joseph Blue  wrote:
> 
> My bad on the formatting. Here is a screen shot of the query. Note bad m/d/y 
> and hour=min=sec=0.0
> 
> 
> On Tue, Aug 2, 2016 at 1:46 PM, Joseph Blue  > wrote:
> The field I have is a timestamp. The date is obviously in wrong order in the 
> time stamp (I can break it up and reassemble to get a good date, so no 
> problem there).
> I do not seem to be able to get the hour of the day using the date_parts, so 
> that data seems obscured. 
> Any ideas how to get the 2 o'clock time out of this field?
> 
> Query...
> select 
> Datatime_start,
> date_part('day',Datatime_Start) `day`, 
> date_part('month',Datatime_Start) `month`,
> date_part('year',Datatime_Start) `year`,
> date_part('hour',Datatime_Start) `hour`,
> date_part('minute',Datatime_Start) `minute`
> from dfs.tmp.tv 
> limit 5
> 
> Results
> 
> Datatime_start
> day
> month
> year
> hour
> minute
> Datatime_start
> day
> month
> year
> hour
> minute
> 28/04/16 2:00 16 4 2028 0 0 28/04/16 2:00 16 4 2028 0 0 28/04/16 2:00 16 4 
> 2028 0 0 28/04/16 2:00 16 4 2028 0 0 
> 
> 
> 
> 
> 
> 
> -- 
> Joseph Blue
> Data Scientist
> jb...@maprtech.com  | www.mapr.com 
> 
> Mobile: 858-357-4926 
> 
> 
> 
> 
> 
> 
> -- 
> Joseph Blue
> Data Scientist
> jb...@maprtech.com  | www.mapr.com 
> 
> Mobile: 858-357-4926
>

Re: Extrapolate performances standalone -> cluster

2016-07-30 Thread Andries Engelbrecht

A couple of quick things to check.

Look at the query plans of the queries you are interested in. In general expect 
the query times to improve (not quite linear though) for larger data sets, and 
where you may have lots of concurrency as the planning duties, execution, etc 
will be spread out on the cluster.

Expect query planning time to increase for a cluster
- You can offset that by using metadata caching

For Major query fragments that are broken up into many minor fragments you can 
expect that this will improve close to linear due to more parallel minor 
fragments due to bigger cluster ( 5 vs 1 node). Make sure that queries and data 
sets are optimized to parallelize in a cluster, see Drill Best Practices.

For exchange operators you should expect an increase in time due to network and 
a cluster vs single node.

These links will help a lot for some best practices to optimize your cluster.
https://community.mapr.com/docs/DOC-1497 

https://community.mapr.com/thread/18549 

--Andries

> On Jul 30, 2016, at 6:38 AM, Nicolas Paris  wrote:
> 
> Hey,
> 
> I have run tests on drill on a standalone installation (1 computer 8
> core/32GO ram).
> I will get soon a 5 computer cluster (8 core/96GO ram each).
> Is it possible to get an estimation of the performance gain ?
> Is it linear ? Will the performance get better ? Worst ?
> 
> I just want an estimation/extrapolation.
> 
> Thanks !

Re: Drill view in shell mode

2016-07-28 Thread Andries Engelbrecht

Alternatively to see all the VIEWS on the Drill cluster you can run the 
following query.

select * from INFORMATION_SCHEMA.`TABLES` where TABLE_TYPE='VIEW';

This will give you 4 columns.
TABLE_CATALOG - Self explanatory.
TABLE_SCHEMA - which give you the . where the View 
is located
TABLE_NAME - The name of the View
TABLE_TYPE - Where VIEW is for a View and you can remove the predicate to see 
other Tables and System Tables registered in INFORMATION_SCHEMA

--Andries

> On Jul 28, 2016, at 6:02 AM, Neeraja Rentachintala 
>  wrote:
> 
> If you want to see a list of views created, you can use 'show tables' in
> the workspace.
> 
> On Thu, Jul 28, 2016 at 5:05 AM, Santosh Kulkarni <
> santoshskulkarn...@gmail.com> wrote:
> 
>> How to see a view created in Drill Explorer thru Drill shell? Is there any
>> any command for Drill views?
>> 
>> Thanks,
>> 
>> Santosh
>>

Re: Drill JDBC Interpreter: Connection Refused

2016-07-12 Thread Andries Engelbrecht

Perhaps try to create a separate interpreter.

Create a new interpreter and call it Drill or another name than just jdbc.

Below an example I did for a MapR cluster - adjust as needed
Also make sure all hostnames can be properly resolved in the network!! (seen 
way too many issues due to this)
Perhaps add a user and password even if security is not configured.

Name : Drill
Interpreter : jdbc

zeppelin.interpreter.localRepo  
common.max_count  1000
drill.user  drill
drill.password  password
drill.driver   org.apache.drill.jdbc.Driver
drill.url  jdbc:drill:zk=:5181/drill/cluster-drillbits


Dependencies

Artifacts
/opt//mapr/drill/drill-1.6.0/jars/jdbc-driver/drill-jdbc-all-1.6.0.jar


--Andries



> On Jul 12, 2016, at 10:48 AM, Krishnaprasad A S <krishna.pra...@flytxt.com> 
> wrote:
> 
> I have restarted zeppelin each time. Also zeppelin runs on a drillbit node.
> 
> Please find below properties for jdbc interpreter,
> 
> name value
> common.max_count 1000d
> default.driver org.postgresql.Driver
> default.password
> default.url jdbc:postgresql://localhost:5432/
> default.user gpadmin
> drill.driver org.apache.drill.jdbc.Driver
> drill.password
> drill.url jdbc:drill:schema=dfs;drillbit=dk-slv8
> hive.driver org.apache.hive.jdbc.HiveDriver
> hive.password
> hive.url jdbc:hive2://localhost:1
> hive.user hive
> phoenix.driver org.apache.phoenix.jdbc.PhoenixDriver
> phoenix.password
> phoenix.url jdbc:phoenix:localhost:2181:/hbase-unsecure
> phoenix.user phoenixuser
> psql.driver org.postgresql.Driver
> psql.password
> psql.url jdbc:postgresql://localhost:5432/
> psql.user phoenixuser
> tajo.driver org.apache.tajo.jdbc.TajoDriver
> tajo.url jdbc:tajo://localhost:26002/default
> zeppelin.interpreter.localRepo
> /data/installables/zeppelin-0.6.0-bin-all/local-repo/2BQT8WFCV
> 
> zeppelin.jdbc.concurrent.max_connection 10
> zeppelin.jdbc.concurrent.use true
> Dependencies
> artifact exclude
> /data/installables/apache-drill-1.7.0/jars/drill-jdbc-1.7.0.jar
> 
> 
> 
> 
> Thanks,
> Krishnaprasad
> 
> 
> On Tue, Jul 12, 2016 at 8:06 PM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> 
>> Perhaps share the Zeppelin interpreter config line by line so we can see.
>> Properties and Dependencies.
>> 
>> Also did you restart the interpreter after you made the config changes?
>> 
>> I assume you are running Zeppelin on one of the Drill nodes, where is ZK
>> located?
>> 
>> --Andries
>> 
>> 
>>> On Jul 12, 2016, at 5:31 AM, Krishnaprasad A S <
>> krishna.pra...@flytxt.com> wrote:
>>> 
>>> My zookeeper runs on port 2181, thats why I used the same port.
>>> I think drill won't start zookeeper by its own.
>>> I also tried using url "jdbc:drill:schema=dfs.tmp;drillbit="
>>> same error. Is there something I'm missing,
>>> 
>>> 
>>> On Tue, Jul 12, 2016 at 1:11 PM, Khurram Faraaz <kfar...@maprtech.com>
>>> wrote:
>>> 
>>>> Also try using port number 5181 in your drill-override.conf, instead of
>>>> 2181, and then stop and start Drillbit and re-run your program.
>>>> 
>>>> On Tue, Jul 12, 2016 at 1:08 PM, Khurram Faraaz <kfar...@maprtech.com>
>>>> wrote:
>>>> 
>>>>> This one works just fine for me
>>>>> 
>>>>> final String URL_STRING =
>>>> "jdbc:drill:schema=dfs.tmp;drillbit=";
>>>>> 
>>>>> replace IPADDRESS with your IP address in the above line.
>>>>> 
>>>>> On Tue, Jul 12, 2016 at 12:08 PM, Krishnaprasad A S <
>>>>> krishna.pra...@flytxt.com> wrote:
>>>>> 
>>>>>> Yes, I tried with url *jdbc:drill:drillbit=:31010* but same error,
>>>>>> Error in zeppelin console,
>>>>>> java.net.ConnectException: Connection refused at
>>>>>> java.net.PlainSocketImpl.socketConnect(Native Method) at
>>>>>> 
>>>>>> 
>>>> 
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>> at
>>>>>> 
>>>>>> 
>>>> 
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>> 
>>>>>> 
>>>>>> *drill-override.conf *
>>>>>> drill.exec: {
>>>>>> cluster-id: "drillbits1",
>>>>>> zk.connect: "dk-slv8:2181"
>>>>>> }
>>>>>> 
>

Re: Drill JDBC Interpreter: Connection Refused

2016-07-12 Thread Andries Engelbrecht

Perhaps share the Zeppelin interpreter config line by line so we can see. 
Properties and Dependencies.

Also did you restart the interpreter after you made the config changes?

I assume you are running Zeppelin on one of the Drill nodes, where is ZK 
located?

--Andries


> On Jul 12, 2016, at 5:31 AM, Krishnaprasad A S <krishna.pra...@flytxt.com> 
> wrote:
> 
> My zookeeper runs on port 2181, thats why I used the same port.
> I think drill won't start zookeeper by its own.
> I also tried using url "jdbc:drill:schema=dfs.tmp;drillbit="
> same error. Is there something I'm missing,
> 
> 
> On Tue, Jul 12, 2016 at 1:11 PM, Khurram Faraaz <kfar...@maprtech.com>
> wrote:
> 
>> Also try using port number 5181 in your drill-override.conf, instead of
>> 2181, and then stop and start Drillbit and re-run your program.
>> 
>> On Tue, Jul 12, 2016 at 1:08 PM, Khurram Faraaz <kfar...@maprtech.com>
>> wrote:
>> 
>>> This one works just fine for me
>>> 
>>> final String URL_STRING =
>> "jdbc:drill:schema=dfs.tmp;drillbit=";
>>> 
>>> replace IPADDRESS with your IP address in the above line.
>>> 
>>> On Tue, Jul 12, 2016 at 12:08 PM, Krishnaprasad A S <
>>> krishna.pra...@flytxt.com> wrote:
>>> 
>>>> Yes, I tried with url *jdbc:drill:drillbit=:31010* but same error,
>>>> Error in zeppelin console,
>>>> java.net.ConnectException: Connection refused at
>>>> java.net.PlainSocketImpl.socketConnect(Native Method) at
>>>> 
>>>> 
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>> at
>>>> 
>>>> 
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>> 
>>>> 
>>>> *drill-override.conf *
>>>> drill.exec: {
>>>>  cluster-id: "drillbits1",
>>>>  zk.connect: "dk-slv8:2181"
>>>> }
>>>> 
>>>> Some more details,
>>>> I'm querying from hdfs (select count(*) from
>> `dfs`.`/POC/Sample1.csv`)
>>>> the same query works with Drill Web UI.
>>>> Also the errors are logged in  'zeppelin-hadoop-dk-slv8.log' and not in
>>>> 'zeppelin-interpreter-jdbc-hadoop-dk-slv8.log'
>>>> 
>>>> I think there is something I'm missing because I tried removing the
>>>> drill.url and kept it blank again the same error is coming.
>>>> But there is no host or port mentioned for the ConnectException:
>>>> Connection
>>>> refused.
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jul 12, 2016 at 3:31 AM, Andries Engelbrecht <
>>>> aengelbre...@maprtech.com> wrote:
>>>> 
>>>>> Have you tried to connect to a drillbit directly from Zeppelin using
>> the
>>>>> jdbc url jdbc:drill:drillbit=:31010?
>>>>> 
>>>>> What does your drill-override.conf file look like?
>>>>> 
>>>>> 
>>>>>> On Jul 11, 2016, at 2:33 PM, Krishnaprasad A S <
>>>>> krishna.pra...@flytxt.com> wrote:
>>>>>> 
>>>>>> I can see all the 4 drillbits in ui. Aso I tried the query in web ui
>>>>> before
>>>>>> running it in zeppelin. In web ui it works currently. Then what may
>> be
>>>>> the
>>>>>> issue.?
>>>>>> On Jul 12, 2016 3:00 AM, "Andries Engelbrecht" <
>>>>> aengelbre...@maprtech.com>
>>>>>> wrote:
>>>>>> 
>>>>>> What happens if you try to connect to a drillbit directly?
>>>>>> 
>>>>>> This will help to see if it is a zk connection issue.
>>>>>> 
>>>>>> Also I assume the dill cluster is up and running, and if you go to
>> the
>>>>>> webUI it shows all drillbits in the cluster connected and running.
>>>>>> 
>>>>>> 
>>>>>>> On Jul 11, 2016, at 2:27 PM, Krishnaprasad A S <
>>>>> krishna.pra...@flytxt.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> I went through the same link before configuring the interpreter,
>> also
>>>>>> there
>>>>>>> is no security configured.
>>>>>>> On Jul 12, 2016 2:48 AM, "Andries Engelbrecht" <
>>>>> aengelbre...@maprtec

Re: Drill JDBC Interpreter: Connection Refused

Have you tried to connect to a drillbit directly from Zeppelin using the jdbc 
url jdbc:drill:drillbit=:31010?

What does your drill-override.conf file look like?


> On Jul 11, 2016, at 2:33 PM, Krishnaprasad A S <krishna.pra...@flytxt.com> 
> wrote:
> 
> I can see all the 4 drillbits in ui. Aso I tried the query in web ui before
> running it in zeppelin. In web ui it works currently. Then what may be the
> issue.?
> On Jul 12, 2016 3:00 AM, "Andries Engelbrecht" <aengelbre...@maprtech.com>
> wrote:
> 
> What happens if you try to connect to a drillbit directly?
> 
> This will help to see if it is a zk connection issue.
> 
> Also I assume the dill cluster is up and running, and if you go to the
> webUI it shows all drillbits in the cluster connected and running.
> 
> 
>> On Jul 11, 2016, at 2:27 PM, Krishnaprasad A S <krishna.pra...@flytxt.com>
> wrote:
>> 
>> I went through the same link before configuring the interpreter, also
> there
>> is no security configured.
>> On Jul 12, 2016 2:48 AM, "Andries Engelbrecht" <aengelbre...@maprtech.com>
>> wrote:
>> 
>>> Do you have security configured on the Drill cluster? If so make sure to
>>> add the user and password info for the Drill connection.
>>> 
>>> Some good info for configuring Zeppelin with Drill here
>>> 
>>> https://community.mapr.com/docs/DOC-1493 <
>>> https://community.mapr.com/docs/DOC-1493>
>>> 
>>> --Andries
>>> 
>>> 
>>>> On Jul 11, 2016, at 2:13 PM, Krishnaprasad A S <
>>> krishna.pra...@flytxt.com> wrote:
>>>> 
>>>> My drill runs in clustered mode, with 4 drillbits running in 4 nodes. I
>>>> started it using drillbit.sh start command.
>>>> On Jul 12, 2016 2:18 AM, "Andries Engelbrecht" <
>>> aengelbre...@maprtech.com>
>>>> wrote:
>>>> 
>>>>> Are you running Drill in embedded mode or clustered mode?
>>>>> 
>>>>> If in embedded mode you may want to try to connect directly to the
>>> drillbit
>>>>> jdbc:drill:drillbit=:31010
>>>>> 
>>>>> It looks like you are trying to connect to a zk with a drill cluster,
>>> and
>>>>> your setup may just be embedded mode.
>>>>> 
>>>>> --Andries
>>>>> 
>>>>> 
>>>>>> On Jul 11, 2016, at 12:55 PM, Krishnaprasad A S <
>>>>> krishna.pra...@flytxt.com> wrote:
>>>>>> 
>>>>>> hi,
>>>>>> I'm trying to create a drill interpreter in zeppelin using the
> existing
>>>>>> jdbc interpreter.
>>>>>> 
>>>>>> *drill.url = jdbc:drill:zk=:2181/drill/drillbits1drill.driver =
>>>>>> org.apache.drill.jdbc.Driver*
>>>>>> my drillbit runs on the same server as of zeppelin.
>>>>>> 
>>>>>> Added the dependency
> 'apache-drill-1.7.0/jars/drill-jdbc-all-1.7.0.jar'
>>>>> in
>>>>>> zeppelin
>>>>>> I'm the getting the following error while running a sample SQL from
>>>>>> zeppelin,
>>>>>> 
>>>>>> ERROR [2016-07-12 01:11:41,946] ({pool-1-thread-4}
>>>>>> NotebookServer.java[afterStatusChange]:1135) - Error
>>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>>> org.apache.thrift.transport.TTransportException:
>>>>> java.net.ConnectException:
>>>>>> Connection refused
>>>>>> at
>>>>>> 
>>>>> 
>>> 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
>>>>>> at
>>>>>> 
>>>>> 
>>> 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328)
>>>>>> at
>>>>>> 
>>>>> 
>>> 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
>>>>>> at
>>>>> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:260)
>>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
>>>>>> at
>>>>>> 
>>>>> 
>>> 
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.

Re: Drill JDBC Interpreter: Connection Refused

What happens if you try to connect to a drillbit directly?

This will help to see if it is a zk connection issue.

Also I assume the dill cluster is up and running, and if you go to the webUI it 
shows all drillbits in the cluster connected and running.


> On Jul 11, 2016, at 2:27 PM, Krishnaprasad A S <krishna.pra...@flytxt.com> 
> wrote:
> 
> I went through the same link before configuring the interpreter, also there
> is no security configured.
> On Jul 12, 2016 2:48 AM, "Andries Engelbrecht" <aengelbre...@maprtech.com>
> wrote:
> 
>> Do you have security configured on the Drill cluster? If so make sure to
>> add the user and password info for the Drill connection.
>> 
>> Some good info for configuring Zeppelin with Drill here
>> 
>> https://community.mapr.com/docs/DOC-1493 <
>> https://community.mapr.com/docs/DOC-1493>
>> 
>> --Andries
>> 
>> 
>>> On Jul 11, 2016, at 2:13 PM, Krishnaprasad A S <
>> krishna.pra...@flytxt.com> wrote:
>>> 
>>> My drill runs in clustered mode, with 4 drillbits running in 4 nodes. I
>>> started it using drillbit.sh start command.
>>> On Jul 12, 2016 2:18 AM, "Andries Engelbrecht" <
>> aengelbre...@maprtech.com>
>>> wrote:
>>> 
>>>> Are you running Drill in embedded mode or clustered mode?
>>>> 
>>>> If in embedded mode you may want to try to connect directly to the
>> drillbit
>>>> jdbc:drill:drillbit=:31010
>>>> 
>>>> It looks like you are trying to connect to a zk with a drill cluster,
>> and
>>>> your setup may just be embedded mode.
>>>> 
>>>> --Andries
>>>> 
>>>> 
>>>>> On Jul 11, 2016, at 12:55 PM, Krishnaprasad A S <
>>>> krishna.pra...@flytxt.com> wrote:
>>>>> 
>>>>> hi,
>>>>> I'm trying to create a drill interpreter in zeppelin using the existing
>>>>> jdbc interpreter.
>>>>> 
>>>>> *drill.url = jdbc:drill:zk=:2181/drill/drillbits1drill.driver =
>>>>> org.apache.drill.jdbc.Driver*
>>>>> my drillbit runs on the same server as of zeppelin.
>>>>> 
>>>>> Added the dependency 'apache-drill-1.7.0/jars/drill-jdbc-all-1.7.0.jar'
>>>> in
>>>>> zeppelin
>>>>> I'm the getting the following error while running a sample SQL from
>>>>> zeppelin,
>>>>> 
>>>>> ERROR [2016-07-12 01:11:41,946] ({pool-1-thread-4}
>>>>> NotebookServer.java[afterStatusChange]:1135) - Error
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.thrift.transport.TTransportException:
>>>> java.net.ConnectException:
>>>>> Connection refused
>>>>>  at
>>>>> 
>>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
>>>>>  at
>>>>> 
>>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328)
>>>>>  at
>>>>> 
>>>> 
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
>>>>>  at
>>>> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:260)
>>>>>  at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
>>>>>  at
>>>>> 
>>>> 
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
>>>>>  at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>  at
>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>  at
>>>>> 
>>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>>  at
>>>>> 
>>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>>  at
>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>&

Re: Drill JDBC Interpreter: Connection Refused

Do you have security configured on the Drill cluster? If so make sure to add 
the user and password info for the Drill connection.

Some good info for configuring Zeppelin with Drill here

https://community.mapr.com/docs/DOC-1493 
<https://community.mapr.com/docs/DOC-1493>

--Andries


> On Jul 11, 2016, at 2:13 PM, Krishnaprasad A S <krishna.pra...@flytxt.com> 
> wrote:
> 
> My drill runs in clustered mode, with 4 drillbits running in 4 nodes. I
> started it using drillbit.sh start command.
> On Jul 12, 2016 2:18 AM, "Andries Engelbrecht" <aengelbre...@maprtech.com>
> wrote:
> 
>> Are you running Drill in embedded mode or clustered mode?
>> 
>> If in embedded mode you may want to try to connect directly to the drillbit
>> jdbc:drill:drillbit=:31010
>> 
>> It looks like you are trying to connect to a zk with a drill cluster, and
>> your setup may just be embedded mode.
>> 
>> --Andries
>> 
>> 
>>> On Jul 11, 2016, at 12:55 PM, Krishnaprasad A S <
>> krishna.pra...@flytxt.com> wrote:
>>> 
>>> hi,
>>> I'm trying to create a drill interpreter in zeppelin using the existing
>>> jdbc interpreter.
>>> 
>>> *drill.url = jdbc:drill:zk=:2181/drill/drillbits1drill.driver =
>>> org.apache.drill.jdbc.Driver*
>>> my drillbit runs on the same server as of zeppelin.
>>> 
>>> Added the dependency 'apache-drill-1.7.0/jars/drill-jdbc-all-1.7.0.jar'
>> in
>>> zeppelin
>>> I'm the getting the following error while running a sample SQL from
>>> zeppelin,
>>> 
>>> ERROR [2016-07-12 01:11:41,946] ({pool-1-thread-4}
>>> NotebookServer.java[afterStatusChange]:1135) - Error
>>> org.apache.zeppelin.interpreter.InterpreterException:
>>> org.apache.zeppelin.interpreter.InterpreterException:
>>> org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException:
>>> Connection refused
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328)
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
>>>   at
>> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:260)
>>>   at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
>>>   at
>>> 
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
>>>   at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>   at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>   at
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>   at
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>   at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>   at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>   at java.lang.Thread.run(Thread.java:722)
>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
>>> org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException:
>>> Connection refused
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>   at
>>> 
>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>   at
>>> 
>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>   at
>>> 
>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>   at
>>> 
>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:184)
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:163)
>>>   ... 13 more
>>> Caused by: org.apache.thrift.transport.TTransportException:
>>> java.net.ConnectException: Connection refused
>>>   at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
>>>   at
>>> 
>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>   ... 20 more
>>> Caused by: java.net.ConnectException: Connection refused
>> 
>>

Re: Drill JDBC Interpreter: Connection Refused

Are you running Drill in embedded mode or clustered mode?

If in embedded mode you may want to try to connect directly to the drillbit
jdbc:drill:drillbit=:31010

It looks like you are trying to connect to a zk with a drill cluster, and your 
setup may just be embedded mode.

--Andries


> On Jul 11, 2016, at 12:55 PM, Krishnaprasad A S  
> wrote:
> 
> hi,
> I'm trying to create a drill interpreter in zeppelin using the existing
> jdbc interpreter.
> 
> *drill.url = jdbc:drill:zk=:2181/drill/drillbits1drill.driver =
> org.apache.drill.jdbc.Driver*
> my drillbit runs on the same server as of zeppelin.
> 
> Added the dependency 'apache-drill-1.7.0/jars/drill-jdbc-all-1.7.0.jar' in
> zeppelin
> I'm the getting the following error while running a sample SQL from
> zeppelin,
> 
> ERROR [2016-07-12 01:11:41,946] ({pool-1-thread-4}
> NotebookServer.java[afterStatusChange]:1135) - Error
> org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
> Connection refused
>at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
>at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328)
>at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
>at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:260)
>at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
>at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
>at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
> Connection refused
>at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>at
> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>at
> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>at
> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>at
> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:184)
>at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:163)
>... 13 more
> Caused by: org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection refused
>at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
>at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>... 20 more
> Caused by: java.net.ConnectException: Connection refused

Re: how convert mongodb _id into string

2016-07-07 Thread Andries Engelbrecht

See what is returned if you query _id from sqlline to see.

Also you can use Drill Explorer that ships with the ODBC driver and see what is 
returned from the ODBC driver.

This may help you to troubleshoot where the conversion is done, mongo plugin, 
ODBC driver or Tableau.

--Andries

> On Jul 7, 2016, at 7:57 AM, Vladimir Morozov  wrote:
> 
> Is it possible to get string representation of mongodb object id (_id)
> inside Drill sql statement? I have problems with representing _id in
> Tableau where it is  converted into uppercase string , e.g. MongoDb
> "_id" : ObjectId("577e4a2e081d4b0fb59ac96e")
> become '577E4A2E081D4B0FB59AC96E'  Tableau value ,not sure if it is ODBC or
> Tableau conversion bug
> 
> Thanks
> Vlad

Re: question about drill

2016-07-06 Thread Andries Engelbrecht

You can use wildcards to query dir and sub dir

Simple example with a number of csv files in a directory structure
The dfs.orders workspace / has 10 files
/dir1 has another csv files
/dir2 has a csv file
/subdir/dir3 has another csv file

Below are a couple of different examples

0: jdbc:drill:> select count(*) from dfs.orders.`./`;
+-+
| EXPR$0  |
+-+
| 159000  |
+-+
1 row selected (0.268 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*.csv`;
+-+
| EXPR$0  |
+-+
| 122000  |
+-+
1 row selected (0.137 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./dir1/*.csv`;
+-+
| EXPR$0  |
+-+
| 9000|
+-+
1 row selected (0.099 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./dir2/*.csv`;
+-+
| EXPR$0  |
+-+
| 12000   |
+-+
1 row selected (0.092 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./subdir/dir3/*.csv`;
+-+
| EXPR$0  |
+-+
| 16000   |
+-+
1 row selected (0.1 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*.csv`;
+-+
| EXPR$0  |
+-+
| 21000   |
+-+
1 row selected (0.12 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/*.csv`;
+-+
| EXPR$0  |
+-+
| 16000   |
+-+
1 row selected (0.106 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*`;
+-+
| EXPR$0  |
+-+
| 159000  |
+-+
1 row selected (0.173 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*`;
+-+
| EXPR$0  |
+-+
| 37000   |
+-+
1 row selected (0.123 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/.`;
+-+
| EXPR$0  |
+-+
| 159000  |
+-+
1 row selected (0.182 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/.`;
+-+
| EXPR$0  |
+-+
| 37000   |
+-+
1 row selected (0.123 seconds)


Also see
https://drill.apache.org/docs/querying-directories/ 


https://drill.apache.org/docs/query-directory-functions/ 



--Andries



> On Jul 5, 2016, at 11:20 PM, 伍晋博  wrote:
> 
> Hi, I am a master in china and have learned drill for a long time. Now drill 
> has provided functions to query file in a same directory. Drill scans all 
> files in a same directory firstly, then executes other operations for a 
> query. But I have a requirement that require to query multiple files in 
> different directories once. I don't want to move all files in a same 
> directory that will lead some I/O cost. Now I have a idea is to add function 
> in source data to support this function, but I don't have enough ability to 
> understand source data. Can you give some advice on this problem, thank you !
> Jinbo Wu

Re: Drillbit endpoint not found

2016-06-09 Thread Andries Engelbrecht

You have to specify the zk port in jdbc url with sqlline
sqlline -u jdbc:drill:zk=127.0.0.1:2181

If you use a zk quorum you can specify all the zk nodes in similar fashion.
Also if you use JDBC connectivity with other tools you will need to add the 
drill cluster ID.

For more info and examples see

https://drill.apache.org/docs/using-the-jdbc-driver/ 


--Andries



> On Jun 9, 2016, at 7:25 AM, Sebastian Hamann  wrote:
> 
> Hi,
> First: Thanks for all the help for my last question.
> Now I got another problem:
> I was running drill in embedded mode and my autosetup(download everything 
> from java to drill and set it up and run queries) works perfectly for it. The 
> next step is to bring this to a cluster.
> For testing purposes I want to run zookeeper, drillbit and the drill-client 
> all on my main node for node. If that works the next step would be to start 
> drillbits on all nodes and find a good configuration for the zookeeper, but 
> Iam not so far yet. (just mentioning here if someone has a better idea I am 
> always welcoming new ideas).
> So now to the actual problem:
> So the zookeeper is running on the localhost/127.0.0.1:2181 and I can connect 
> to it/check if its running
> (cfg consists only of tickTime=2000 dataDir=/tmp/zookeeper clientPort=2181).
> Then I start the drillbit with the drillbit.sh on the same node and when I 
> check the status it says that its running.
> When I know try to start the drill client, either via drill-localhost, 
> drill-conf (drill.exec: {cluster-id: "drillbits1",zk.connect: 
> "127.0.0.1:2181"}) or sqlline -u jdbc:drill:zk:127.0.0.1 (which if I 
> understood it right should have all the same effect in my configuration), I 
> get the error message at the bottom (currently drill is always supposed to 
> execute that test query).
> Again, thanks in advance!
> Error:
> No DrillbitEndpoint can be found
> apache drill 1.6.0
> "drill baby drill"
> 0: jdbc:drill:> select count(*) from (values(1));
> 0: jdbc:drill:> No current connection
> java.lang.IllegalStateException: No DrillbitEndpoint can be found
>at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:173)
>at 
> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:205)
>at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:151)
>at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:64)
>at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
>at 
> net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
>at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
>at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
>at 
> sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213)
>at sqlline.Commands.close(Commands.java:925)
>at sqlline.Commands.closeall(Commands.java:899)
>at sqlline.SqlLine.begin(SqlLine.java:649)
>at sqlline.SqlLine.start(SqlLine.java:375)
>at sqlline.SqlLine.main(SqlLine.java:268)

Re: Converting INTERVAL to Number

2016-05-25 Thread Andries Engelbrecht

Great!

Answering your own questions on email lists can be therapeutic ;-)

Why not just use
EXTRACT(year from age(dob))

--Andries


> On May 25, 2016, at 7:39 AM, John Omernik  wrote:
> 
> Well I need to include the Staples "That was Easy" button here... I tried:
> 
> EXTRACT(year from cast(age(dob) as INTERVAL YEAR)) as yr_age
> 
> And it worked!
> 
> Self Answering question is self answering...
> 
> 
> 
> 
> On Wed, May 25, 2016 at 9:35 AM, John Omernik  wrote:
> 
>> Hey all, simple question, I have a field, dob, I want to get the current
>> age from...
>> 
>> I have:
>> 
>> cast(age(dob) as INTERVAL YEAR) as yr_age
>> 
>> Which works pretty well, as you can see below, however, I'd like a column
>> that is just the integer age, no months, no P/Y etc.  Now, I can play with
>> string manipulation for a really ugly query, but I was hoping there may be
>> a way to just convert the INTERVAL YEAR (or heck even the return of the age
>> function) to the number of years...
>> 
>> So instead of P25Y, it would be 25, P52Y1M would be 52, P21Y8M would be 21
>> etc.
>> 
>> 
>> Any easy way to do this? If I have to go the string route, is there an
>> easy way to do it in drill as well (I miss the Hive "regexp_extract" in
>> this case)
>> 
>> 
>> John
>> 
>> 
>> select dob, age(dob) as cur_age, cast(age(dob) as INTERVAL YEAR) as
>> yr_age  from am_joined where substr(dob, 6, 2) <> '00' limit 10;
>> 
>> 
>> +-+--+--+
>> 
>> | dob | cur_age  |  yr_age  |
>> 
>> +-+--+--+
>> 
>> | 1991-09-29  | P300M5DT18000S   | P25Y |
>> 
>> | 1965-01-06  | P625M17DT18000S  | P52Y1M   |
>> 
>> | 1995-01-12  | P260M4DT18000S   | P21Y8M   |
>> 
>> | 1988-08-01  | P338M19DT18000S  | P28Y2M   |
>> 
>> | 1984-03-05  | P392M9DT18000S   | P32Y8M   |
>> 
>> | 1980-12-13  | P431M17DT18000S  | P35Y11M  |
>> 
>> | 1976-11-28  | P480M23DT18000S  | P40Y |
>> 
>> | 1984-11-12  | P383M27DT18000S  | P31Y11M  |
>> 
>> | 1965-01-20  | P625M3DT18000S   | P52Y1M   |
>> 
>> | 1984-04-19  | P390M24DT18000S  | P32Y6M
>>

Re: Discussion - "Hidden" Workspaces

2016-05-25 Thread Andries Engelbrecht

It is an interesting idea, but may warrant more discussion in the overall Drill 
metadata management.

For example how will it affect other SPs that are not DFS?
How will it be represented/managed in INFORMATION_SCHEMA when tools are used to 
work with Drill metadata?

I support that this is a good idea, but we need to take all the aspects in 
consideration as Drill is a very powerful tool for data discovery and need to 
consider the overall ecosystem.

--Andries


> On May 25, 2016, at 5:05 AM, John Omernik  wrote:
> 
> Prior to opening a JIRA on this, I was curious what the community thought.
>  I'd like to have a setting for workspaces that would indicate "hidden".
> (Defaulting to false if not specified to not break any already implemented
> workspace definitions)
> 
> For example:
> 
> "workspaces" {
>   "dev": {
>   "location": "/mydev",
>   "writable": true,
>   "defaultInputFormat": null,
>   "hidden": true
>  }
> }
> 
> This would have the effect that when running "show schemas" this workspace
> would not show up in the list.
> 
> Reasoning:  When organizing a large enterprise data
> lake/ocean/cistern/swamp, limited "functional" options provided to the user
> are better then "all" the options.   For example, as an administrator, I
> may want to define workspaces to help clarify ETL processes, or service
> loads that if the user HAS filesystem access they CAN access, however, they
> will never want to, instead, the user would focused on cleaned/enriched
> data.  My users would rarely use the "cp" plugin, however, I don't want to
> eliminate it.  Basically, it doesn't show in show schema, but it can still
> be used both directly in queries, and through the use command.
> 
> Another example: I create home schemas based on a home directory of every
> user.  Users's will know it's there, and can easily access it, however,
> showing up in "show schemas" doesn't provide value, and just clutters the
> data returned in the response.  I want to attempt to provide a clean
> interface and depiction of valuable schemas to my user via workspaces, and
> this small flag, I believe would be a low impact way to do that.
> 
> I would love discussion on this, if others would find this valuable, I will
> happily make a JIRA.
> 
> John

Re: Reading Parquet Files Created Elsewhere

2016-05-23 Thread Andries Engelbrecht

John,

See if convert_from helps in this regard, I believe it is supposed to be faster 
than cast varchar.

This is likely what will work on your data 
CONVERT_FROM(, 'UTF8')

Hopefully someone with more in depth knowledge of the Drill Parquet reader can 
comment.

--Andries



> On May 23, 2016, at 7:35 AM, John Omernik  wrote:
> 
> I am learning more about my data here, the data was created in a CDH
> version of the apache parquet-mr library. (Not sure version yet, getting
> that soon).  They used snappy and version 1.0 of the Parquet spec due to
> Impala needing it.  They are also using setEnableDictionary on the write.
> 
> Trying to figure things out right now
> 
> If I make a view and cast all string fields to a VARCHAR drill shows the
> right result, but it's slow.
> 
> (10 row select from raw = 1.9 seconds, 10 row select with CAST in a view =
> 25 seconds)
> 
> I've resigned myself to converting the table once for performance, which
> isn't an issue however I am getting different issues on that front  (I'll
> open a new thread for that)
> 
> Other than the cast(field AS VARCHAR) as field  is there any other (perhaps
> more performant) way to handle this situation?
> 
> 
> 
> 
> 
> On Mon, May 23, 2016 at 8:31 AM, Todd  wrote:
> 
>> 
>> Looks like Impala encoded string as binary data, I think there is some
>> configuration in Drill(I know spark has) that helps do the conversion.
>> 
>> 
>> 
>> 
>> 
>> At 2016-05-23 21:25:17, "John Omernik"  wrote:
>>> Hey all, I have some Parquet files that I believe were made in a Map
>> Reduce
>>> job and work well in Impala, however, when I read them in Drill, the
>> fields
>>> that are strings come through as [B@25ddbb etc. The exact string
>>> represented as regex would be /\[B@[a-f0-9]{8}/  (Pointers maybe?)
>>> 
>>> Well, I found I  can cast those fields as Varchar... and get the right
>>> data... is this the right approach?  Why is this happening? Performance
>>> wise am I hurting something by doing the cast to Varchar?
>>> 
>>> 
>>> Any thoughts would be helpful...
>>> 
>>> John
>>

Re: Drill Views and Typed Columns

John,

Using a simple tool like squirrel and running a select * limit 0 against 
parquet it is able to retrieve the column names, and if you look at the 
returned metadata it identified the data type. Obviously schema change will be 
a challenge to deal with with limit 0 queries. This is where views can be used 
to force data types for the end tool, skip rows, substitute values, skip added 
columns, etc, which makes it much better for most tools to work with the data.

A function like typeof does require a row to be returned to identify the data 
type.

Ideally it will be great to do initial schema discovery with a tool, associate 
the metadata with the data sources and then make it available for wider use. A 
typeof histogram will be brilliant on data structures with schema changes, that 
way it will be possible to see how many records are affected of a certain 
type/change/etc. A endless list of possibilities. It should be feasible to 
utilize Drill as the execution engine with a smart tool on top of it to process.

--Andries

> On May 16, 2016, at 4:08 PM, John Omernik  wrote:
> 
> So how does a limit0 query return the type if I may ask?  I can use limit 0
> queries with SqlAlchemy and Caravel, but in looking at it, I wasn't sure
> how where the types were returned in the results (unless it's in the
> underlying metadata not displayed to the user).  I can dig more if you tell
> me it's there :)
> 
> On Mon, May 16, 2016 at 4:51 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
> 
>> Both are options (and thats how the bI tools work with Drill today)
>> 
>> - Views with explicit Casts - Will return schema definitions as part show
>> schemas/describe table/show columns queries.
>> - Limit 0 queries - This will be good as well if we can modify Caravel to
>> issue such queries (this is what Tableau does)
>> 
>> For now, I think returning metadata to Caravel using the above options will
>> be the solution. The ideal approach would be actually to have a  data
>> exploration experience on raw data (without curation) within Caravel itself
>> to create this metadata as needed.
>> 
>> -Neeraja
>> 
>> On Mon, May 16, 2016 at 2:45 PM, Ted Dunning 
>> wrote:
>> 
>>> As you suggest, views are a critical way to lock down that kind of
>>> information.
>>> 
>>> Select with limit 0 is often used for meta-data exploration. This is more
>>> robust than asking about tables since not everything is necessarily
>> really
>>> in a single table.
>>> 
>>> On Mon, May 16, 2016 at 2:12 PM, John Omernik  wrote:
>>> 
 Hey all, as part of my exploration of Caravel,  I realized knowing the
 types of columns can be valuable... I can say create a view of a
>>> directory
 of parquet allowing the "show tables" to work well, however, the type
>> for
 every column is "ANY" which may work (need to tweak some things) but I
>> am
 guessing may make certain down stream things in Caravel more difficult.

 So, just thinking aloud here, would it be possible to "cast" in Views
>> to
 allow the view definition to pass along type information?  Even if it
>>> means
 a more verbose view definition, it would be done once, and then down
>>> stream
 tools like Caravel would know the types...

 Thoughts?

 John

>>> 
>>

Re: MapR ODBC Issue

For windows the GUI doesn't allow the extended ZK connection string.

So the best option is (as Krystal mentioned)

ZKQuorum= :5181 or 2181
ZKClusterID =/ /

ex.

ZKQuorum= drill-dev:5181
ZKClusterID = /drill2/awsdrill-drillbits


Works on both Windows and OSX.

--Andries

> On May 16, 2016, at 8:54 AM, Andries Engelbrecht <aengelbre...@maprtech.com> 
> wrote:
> 
> Or as Krystal mentioned just specify the whole cluster ID from the ZK root
> 
> ZKQuorum= :5181 or 2181
> ZKClusterID =/ /
> 
> ex.
> 
> ZKQuorum= drill-dev:5181
> ZKClusterID = /drill2/awsdrill-drillbits
> 
> 
> Both works when I tested in OSX.
> 
> --Andries
> 
>> On May 16, 2016, at 8:50 AM, Andries Engelbrecht <aengelbre...@maprtech.com> 
>> wrote:
>> 
>> Hi John,
>> 
>> It seems the ODBC driver is adding in the default /drill path the ZK 
>> connection string.
>> 
>> I looked at it and here is a workaround.
>> 
>> For OSX/Linux in the odbc.ini file the following works
>> 
>> ZKQuorum= :5181 or 2181/
>> ZKClusterID = /
>> 
>> ex.
>> 
>> ZKQuorum= drill-dev:5181/drill2
>> ZKClusterID = /awsdrill-drillbits
>> 
>> This allows you to add the drill root at the end of the ZK quorum and then 
>> step back up one level on the ClusterID.
>> 
>> See if this works for you.
>> 
>> I will check on Windows in a sec.
>> 
>> 
>> --Andries
>> 
>> 
>> 
>> 
>>> On May 14, 2016, at 9:15 AM, John Omernik <j...@omernik.com> wrote:
>>> 
>>> All -
>>> 
>>> I am using the MapR ODBC driver. I can get it to connect in direct mode,
>>> however, I can't get it to connect in Zookeeper mode.  I think I know why.
>>> 
>>> To start off, I am using a different zk.root in my drill-override. This
>>> allows me to have truly unique drill clusters on the same physical
>>> clusters. This works well for most things, however, what I have found, is I
>>> believe the ODBC driver assumes a hard coded zk.root. (drill).
>>> 
>>> For example, my cluster name is "drillprod" and my zk.root is "drillprod"
>>> So, to connect via JDBC I use URL=
>>> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
>>> 
>>> This works
>>> 
>>> However, I set the ZKClusterID  to be drillprod in ODBC and I get
>>> 
>>> Failure occurred while trying to connect to zk=
>>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>>> hadoopmapr6.brewingintel.com:5181/drill/drillprod
>>> 
>>> 
>>> If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
>>> hadoopmapr5.brewingintel.com:5181,
>>> hadoopmapr6.brewingintel.com:5181/drillprod"
>>> 
>>> 
>>> Then I get this error:
>>> 
>>> 
>>> Failure occurred while trying to connect to zk=
>>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>>> hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
>>> 
>>> 
>>> The Znode: /drill/drillprod doesn't exist, neither does
>>> /drillprod/drill/drillprod. The correct one, for this use case is
>>> "/drillprod/drillprod"
>>> 
>>> 
>>> But I can't seem to get the ODBC driver to check there, as it appears to
>>> hard code a /drill rather than allow me to specify the zk.root as the
>>> drill-override.conf does.
>>> 
>>> 
>>> Please advise
>>> 
>>> 
>>> John
>> 
>

Re: MapR ODBC Issue

Or as Krystal mentioned just specify the whole cluster ID from the ZK root

ZKQuorum= :5181 or 2181
ZKClusterID =/ /

ex.

ZKQuorum= drill-dev:5181
ZKClusterID = /drill2/awsdrill-drillbits


Both works when I tested in OSX.

--Andries

> On May 16, 2016, at 8:50 AM, Andries Engelbrecht <aengelbre...@maprtech.com> 
> wrote:
> 
> Hi John,
> 
> It seems the ODBC driver is adding in the default /drill path the ZK 
> connection string.
> 
> I looked at it and here is a workaround.
> 
> For OSX/Linux in the odbc.ini file the following works
> 
> ZKQuorum= :5181 or 2181/
> ZKClusterID = /
> 
> ex.
> 
> ZKQuorum= drill-dev:5181/drill2
> ZKClusterID = /awsdrill-drillbits
> 
> This allows you to add the drill root at the end of the ZK quorum and then 
> step back up one level on the ClusterID.
> 
> See if this works for you.
> 
> I will check on Windows in a sec.
> 
> 
> --Andries
> 
> 
> 
> 
>> On May 14, 2016, at 9:15 AM, John Omernik <j...@omernik.com> wrote:
>> 
>> All -
>> 
>> I am using the MapR ODBC driver. I can get it to connect in direct mode,
>> however, I can't get it to connect in Zookeeper mode.  I think I know why.
>> 
>> To start off, I am using a different zk.root in my drill-override. This
>> allows me to have truly unique drill clusters on the same physical
>> clusters. This works well for most things, however, what I have found, is I
>> believe the ODBC driver assumes a hard coded zk.root. (drill).
>> 
>> For example, my cluster name is "drillprod" and my zk.root is "drillprod"
>> So, to connect via JDBC I use URL=
>> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
>> 
>> This works
>> 
>> However, I set the ZKClusterID  to be drillprod in ODBC and I get
>> 
>> Failure occurred while trying to connect to zk=
>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>> hadoopmapr6.brewingintel.com:5181/drill/drillprod
>> 
>> 
>> If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
>> hadoopmapr5.brewingintel.com:5181,
>> hadoopmapr6.brewingintel.com:5181/drillprod"
>> 
>> 
>> Then I get this error:
>> 
>> 
>> Failure occurred while trying to connect to zk=
>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>> hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
>> 
>> 
>> The Znode: /drill/drillprod doesn't exist, neither does
>> /drillprod/drill/drillprod. The correct one, for this use case is
>> "/drillprod/drillprod"
>> 
>> 
>> But I can't seem to get the ODBC driver to check there, as it appears to
>> hard code a /drill rather than allow me to specify the zk.root as the
>> drill-override.conf does.
>> 
>> 
>> Please advise
>> 
>> 
>> John
>

Re: MapR ODBC Issue