Connecting Tableau 10.1 to Drill cluster issue

2016-12-01 Thread Tomislav Novosel

Dear team,

I am having issue with connecting Tableau 10.1 to Apache Drill (1.8.0) 6 
node cluster. I installed
MapR ODBC Driver 32-bit on my Windows 64-bit machine, configured driver 
and test connection
with ZooKeeper quorum and without quorum (direct connection). Connection 
succeeded and Drill Explorer

connection succeeded.

The problem is connecting Tableau to Drill cluster. After selecting ODBC 
connection from Tableau and providing all
the connection data for ZK quorum or direct drillbit connection, Tableau 
is rising error:


"
# The protocol is disconnected!
# Unable to connect using the DSN named "MapR Drill". Check that the DSN 
exists and is a valid connection.

"

DSN name exists and works well. Tableau TDC file is also installed. ZK 
hostnames and all the other Drill cluster hostnames, including their IP 
addresses are saved in drivers/etc/hosts file on local machine. Drill 
cluster is running

smoothly.

I will appreciate any help.

Regards,
Tomislav

--


*- Izjava o odricanju odgovornosti -*

*Ova elektronička poruka i njeni prilozi mogu sadržavati povlaštene i/ili 
povjerljive informacije. Molimo Vas da poruku ne čitate ako niste njen 
naznačeni primatelj. Ako ste ovu poruku primili greškom, molimo Vas da o 
tome obavijestite pošiljatelja i da izvornu poruku i njene privitke 
uništite bez čitanja ili bilo kakvog pohranjivanja. Svaka neovlaštena 
upotreba, distribucija, reprodukcija ili priopćavanje ove poruke je 
zabranjeno. Poslovna inteligencija d.o.o. ne preuzima odgovornost za 
sadržaj ove poruke, odnosno za posljedice radnji koje bi proizašle iz 
proslijeđenih informacija, a niti stajališta izražena u ovoj poruci ne 
odražavaju nužno službena stajališta Poslovne inteligencije d.o.o. S 
obzirom na nepostojanje potpune sigurnosti e-mail komunikacije, Poslovna 
inteligencija d.o.o. ne preuzima odgovornost za eventualnu štetu nastalu 
uslijed zaraženosti e-mail poruke virusom ili drugim štetnim programom, 
neovlaštene interferencije, pogrešne ili zakašnjele dostave poruke uslijed 
tehničkih problema. Poslovna inteligencija d.o.o zadržava pravo nadziranja 
i pohranjivanja e-mail poruka koje se šalju iz Poslovne inteligencije 
d.o.o. ili u nju  pristižu.*


*- Disclaimer -*

*This e-mail message and its attachments may contain privileged and/or 
confidential information. Please do not read the message if you are not its 
designated recipient. If you have received this message by mistake, please 
inform its sender and destroy the original message and its attachments 
without reading or storing of any kind. Any unauthorized use, distribution, 
reproduction or publication of this message is forbidden. Poslovna 
inteligencija d.o.o. is neither responsible for the contents of this 
message, nor for the consequences arising from actions based on the 
forwarded information, nor do opinions contained within this message 
necessarily reflect the official opinions of Poslovna inteligencija d.o.o. 
Considering the lack of complete security of e-mail communication, Poslovna 
inteligencija d.o.o. is not responsible for the potential damage created 
due to infection of an e-mail message with a virus or other malicious 
program, unauthorized interference, erroneous or delayed delivery of the 
message due to technical problems. Poslovna inteligencija d.o.o. reserves 
the right to supervise and store both incoming and outgoing  e-mail 
messages.*


Unable to connect Tableau 9.2 to Drill cluster using zookeeper quorum

2016-12-01 Thread Anup Tiwari
Hi Team,

I am trying to connect to my drill cluster from tableau using MapR Drill
ODBC Driver.

I followed steps given in
https://drill.apache.org/docs/using-apache-drill-with-tableau-9-server/ and
subsequent links and successfully connected to individual "direct drillbit"
reading docs. But when i am trying to connect to "zookeeper quorum" instead
of "direct drillbit", i am getting below error on MapR interface :

FAILED!
[MapR][Drill] (1010) Error occurred while trying to connect: [MapR][Drill]
(20) The hostname of '10.x.x.x' cannot be resolved. Please check your DNS
setup or connect directly to Drillbit.

Please note that since i am giving directly IP(Drill Hosts which are on
AWS) so i believe i don't have to maintain DNS entries in host file.

Also corresponding zookeeper logs are as follows :-

2016-12-01 18:08:42,541 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.*.*.*:53159
2016-12-01 18:08:42,543 [myid:3] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@854] - Connection request from old
client /192.*.*.*:53159; will be dropped if server is in r-o mode
2016-12-01 18:08:42,543 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish
new session at /192.*.*.*:53159
2016-12-01 18:08:42,546 [myid:3] - INFO
[CommitProcessor:3:ZooKeeperServer@645] - Established session
0x358ba2951720006 with negotiated timeout 3 for client /192.*.*.*:53159
2016-12-01 18:08:42,793 [myid:3] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x358ba2951720006, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2016-12-01 18:08:42,794 [myid:3] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
client /192.*.*.*:53159 which had sessionid 0x358ba2951720006
2016-12-01 18:08:42,795 [myid:3] - ERROR
[CommitProcessor:3:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:77)


I have gone through this link but this doesn't helped me :-
http://stackoverflow.com/questions/30940981/zookeeper-error-cannot-open-channel-to-x-at-election-address

Regards,
*Anup Tiwari*


Re: Slow query on parquet imported from SQL Server while the external SQL server is down.

2016-12-01 Thread John Omernik
@Abhishek,

Do you think the issue is related to any storage plugin that is enabled and
not available as it applies to all queries?  I guess if it's an issue where
all queries are slow because the foreman is waiting to initialize ALL
storage plugins, regardless of their applicability to the queried data,
then that is a more general issue (that should still be resolved, does the
foreman need to initialize all plugins before querying specific data?)
 However, I am still concerned that the query on the CTAS parquet data is
specifically slower because of it's source.  @Rahul could you test a
different Parquet table, NOT loaded from the SQL server to see if the
enabling or disabling the JDBC storage plugin (with the server unavailable)
has any impact?  Basically, I want to ensure that data that is created as a
Parquet table via CTAS is 100% free of any links to the source data. This
is EXTREMELY important.

John



On Thu, Dec 1, 2016 at 12:46 AM, Abhishek Girish 
wrote:

> Thanks for the update, Rahul!
>
> On Wed, Nov 30, 2016 at 9:45 PM Rahul Raj  >
> wrote:
>
> > Abhishek,
> >
> > Your observation is correct, we just verified that:
> >
> >1. The queries run as expected(faster) with Jdbc plugin disabled.
> >2. Queries run as expected when the plugin's datasource is running.
> >3. With the datasource down, queries run very slow waiting for the
> >connection to fail
> >
> > Rahul
> >
> > On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish <
> > abhishek.gir...@gmail.com>
> > wrote:
> >
> > > @John,
> > >
> > > I agree that this should work. While I am not certain, I don't think
> the
> > > issue is specific to a particular plugin, but the way in a query's
> > > lifecycle, the foreman attempts to initialize every enabled storage
> > plugin
> > > before proceeding to execute the query. So when a particular plugin
> isn't
> > > configured correctly or the underlying datasource is not up, this could
> > > drastically slow down the query execution time.
> > >
> > > I'll look up to see if we have a JIRA for this already - if not will
> file
> > > one.
> > >
> > > On Wed, Nov 30, 2016 at 8:12 AM, John Omernik 
> wrote:
> > >
> > > > So just my opinion in reading this thread.  (sorry for swooping in an
> > > > opining)
> > > >
> > > > If a CTAS is done from any data source into Parquet files there
> > > should
> > > > be NO dependency on the original data source to query the resultant
> > > Parquet
> > > > files.   As a Drill user, as a Drill admin, this breaks the concept
> of
> > > > least surprise.  If I take data from one source, and create Parquet
> > files
> > > > in a distributed file system, it should just work.  If there are
> > "issues"
> > > > with JDBC plugins or the HBase/Hive plugins in a similar manner,
> these
> > > > needs to be hunted down by a large group of villages with pitchforks
> > and
> > > > torches.  I just can't see how this could be acceptable at any level.
> > The
> > > > whole idea of Parquet files is they are self describing, schema
> > included
> > > > files thus a read of a directory of Parquet files should have NO
> > > > dependancies on anything but the parquet files... even the Parquet
> > > > "additions" (such as the METADATA Cache) should be a fail open
> thing...
> > > if
> > > > it exists great, use it, speed things up, but if it doesn't read the
> > > > parquet files as normal (Which I believe is how it operates)
> > > >
> > > > John
> > > >
> > > > On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
> > > > abhishek.gir...@gmail.com
> > > > > wrote:
> > > >
> > > > > Can you attempt to disable to jdbc plugin (configured with
> SQLServer)
> > > and
> > > > > try the query (on parquet) when SQL Server is offline?
> > > > >
> > > > > I've seen a similar issue previously when the HBase / Hive plugin
> was
> > > > > enabled but either the plugin configuration was wrong or the
> > underlying
> > > > > data source was down.
> > > > >
> > > > > On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj
> > >  > > > > com>
> > > > > wrote:
> > > > >
> > > > > > I have created a parquet file using CTAS from a MS SQL Server.
> The
> > > > query
> > > > > on
> > > > > > parquet is getting stuck in STARTING state for a long time before
> > > > > returning
> > > > > > the results.
> > > > > >
> > > > > > We could see that drill was trying to connect to the MS SQL
> server
> > > from
> > > > > > which the data was imported. The MSSQL server was down, drill
> threw
> > > an
> > > > > > exception "Failure while attempting to load JDBC schema", and
> then
> > > > > returned
> > > > > > the results. While SQL server is running, the query executes
> > without
> > > > > > issues.
> > > > > >
> > > > > > Why is drill querying the DB metadata externally and not the
> > imported
> > > > > > parquets?
> > > > > >
> > > > > > Rahul.
> > > > > >
> > > > > > --
> > > > > >  This email and any files transmitted with it are
> confidential
> > > and
> > > > > > intended solely for the use of the in

Re: Unable to connect Tableau 9.2 to Drill cluster using zookeeper quorum

2016-12-01 Thread Andries Engelbrecht
When using ZK connection string with either JDBC or ODBC always make sure that 
the hostnames can be resolved.

See http://drill.apache.org/docs/odbc-configuration-reference/ 


Also make sure that hostnames can be resolved for all Drillbit nodes.

A short explanation of the mechanics.

- When the drillbits start up they register with ZK
- ZK keeps the list of available drillbits that registered and HOSTNAMES of the 
nodes (not IPs) - use zkCli.sh to check the registered nodes and actual 
resolution with GET in ZK
- When the the client uses ODBC or JDBC to connect to ZK it gets the list of 
available drillbits and chooses one drillbit to connect to by using the HOSTNAME
- The client then tries to connect to the chosen drillbit using the ZK info 
(which is the hostname)

If the client is unable to resolve the hostname of the drillbit that was chosen 
it will fail.

Using ZK connection allows for HA in terms of new connections as well as 
balances the connection management and foreman duties for multiple connections 
by spreading it over the registered drillbits. Connecting directly to a 
drillbit is good for testing purposes, but not ideal for larger scale and 
production environments.


--Andries


> On Dec 1, 2016, at 5:01 AM, Anup Tiwari  wrote:
> 
> Hi Team,
> 
> I am trying to connect to my drill cluster from tableau using MapR Drill
> ODBC Driver.
> 
> I followed steps given in
> https://drill.apache.org/docs/using-apache-drill-with-tableau-9-server/ and
> subsequent links and successfully connected to individual "direct drillbit"
> reading docs. But when i am trying to connect to "zookeeper quorum" instead
> of "direct drillbit", i am getting below error on MapR interface :
> 
> FAILED!
> [MapR][Drill] (1010) Error occurred while trying to connect: [MapR][Drill]
> (20) The hostname of '10.x.x.x' cannot be resolved. Please check your DNS
> setup or connect directly to Drillbit.
> 
> Please note that since i am giving directly IP(Drill Hosts which are on
> AWS) so i believe i don't have to maintain DNS entries in host file.
> 
> Also corresponding zookeeper logs are as follows :-
> 
> 2016-12-01 18:08:42,541 [myid:3] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection
> from /192.*.*.*:53159
> 2016-12-01 18:08:42,543 [myid:3] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@854] - Connection request from old
> client /192.*.*.*:53159; will be dropped if server is in r-o mode
> 2016-12-01 18:08:42,543 [myid:3] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish
> new session at /192.*.*.*:53159
> 2016-12-01 18:08:42,546 [myid:3] - INFO
> [CommitProcessor:3:ZooKeeperServer@645] - Established session
> 0x358ba2951720006 with negotiated timeout 3 for client /192.*.*.*:53159
> 2016-12-01 18:08:42,793 [myid:3] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x358ba2951720006, likely client has closed socket
>at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
>at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
>at java.lang.Thread.run(Thread.java:745)
> 2016-12-01 18:08:42,794 [myid:3] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
> client /192.*.*.*:53159 which had sessionid 0x358ba2951720006
> 2016-12-01 18:08:42,795 [myid:3] - ERROR
> [CommitProcessor:3:NIOServerCnxn@178] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
>at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
>at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
>at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
>at
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:77)
> 
> 
> I have gone through this link but this doesn't helped me :-
> http://stackoverflow.com/questions/30940981/zookeeper-error-cannot-open-channel-to-x-at-election-address
> 
> Regards,
> *Anup Tiwari*



Re: Connecting Tableau 10.1 to Drill cluster issue

2016-12-01 Thread Andries Engelbrecht
There is an issue with Tableau 10.1 and the generic ODBC connection to Drill.

The workaround is to start Tableau 10.1 with the following parameter
-DProtocolServerReconnect=1


--Andries



> On Dec 1, 2016, at 3:55 AM, Tomislav Novosel  
> wrote:
> 
> Dear team,
> 
> I am having issue with connecting Tableau 10.1 to Apache Drill (1.8.0) 6 node 
> cluster. I installed
> MapR ODBC Driver 32-bit on my Windows 64-bit machine, configured driver and 
> test connection
> with ZooKeeper quorum and without quorum (direct connection). Connection 
> succeeded and Drill Explorer
> connection succeeded.
> 
> The problem is connecting Tableau to Drill cluster. After selecting ODBC 
> connection from Tableau and providing all
> the connection data for ZK quorum or direct drillbit connection, Tableau is 
> rising error:
> 
> "
> # The protocol is disconnected!
> # Unable to connect using the DSN named "MapR Drill". Check that the DSN 
> exists and is a valid connection.
> "
> 
> DSN name exists and works well. Tableau TDC file is also installed. ZK 
> hostnames and all the other Drill cluster hostnames, including their IP 
> addresses are saved in drivers/etc/hosts file on local machine. Drill cluster 
> is running
> smoothly.
> 
> I will appreciate any help.
> 
> Regards,
> Tomislav
> 
> -- 
> 
> 
> *- Izjava o odricanju odgovornosti -*
> 
> *Ova elektronička poruka i njeni prilozi mogu sadržavati povlaštene i/ili 
> povjerljive informacije. Molimo Vas da poruku ne čitate ako niste njen 
> naznačeni primatelj. Ako ste ovu poruku primili greškom, molimo Vas da o tome 
> obavijestite pošiljatelja i da izvornu poruku i njene privitke uništite bez 
> čitanja ili bilo kakvog pohranjivanja. Svaka neovlaštena upotreba, 
> distribucija, reprodukcija ili priopćavanje ove poruke je zabranjeno. 
> Poslovna inteligencija d.o.o. ne preuzima odgovornost za sadržaj ove poruke, 
> odnosno za posljedice radnji koje bi proizašle iz proslijeđenih informacija, 
> a niti stajališta izražena u ovoj poruci ne odražavaju nužno službena 
> stajališta Poslovne inteligencije d.o.o. S obzirom na nepostojanje potpune 
> sigurnosti e-mail komunikacije, Poslovna inteligencija d.o.o. ne preuzima 
> odgovornost za eventualnu štetu nastalu uslijed zaraženosti e-mail poruke 
> virusom ili drugim štetnim programom, neovlaštene interferencije, pogrešne 
> ili zakašnjele dostave poruke uslijed tehničkih problema. Poslovna 
> inteligencija d.o.o zadržava pravo nadziranja i pohranjivanja e-mail poruka 
> koje se šalju iz Poslovne inteligencije d.o.o. ili u nju  pristižu.*
> 
> *- Disclaimer -*
> 
> *This e-mail message and its attachments may contain privileged and/or 
> confidential information. Please do not read the message if you are not its 
> designated recipient. If you have received this message by mistake, please 
> inform its sender and destroy the original message and its attachments 
> without reading or storing of any kind. Any unauthorized use, distribution, 
> reproduction or publication of this message is forbidden. Poslovna 
> inteligencija d.o.o. is neither responsible for the contents of this message, 
> nor for the consequences arising from actions based on the forwarded 
> information, nor do opinions contained within this message necessarily 
> reflect the official opinions of Poslovna inteligencija d.o.o. Considering 
> the lack of complete security of e-mail communication, Poslovna inteligencija 
> d.o.o. is not responsible for the potential damage created due to infection 
> of an e-mail message with a virus or other malicious program, unauthorized 
> interference, erroneous or delayed delivery of the message due to technical 
> problems. Poslovna inteligencija d.o.o. reserves the right to supervise and 
> store both incoming and outgoing  e-mail messages.*



Re: Slow query on parquet imported from SQL Server while the external SQL server is down.

2016-12-01 Thread Abhishek Girish
AFAIK, should apply to all queries, irrespective of the source of the data
or the plugins involved within the query. So when this issue occurs, I
would expect any query to take long to execute.

On Thu, Dec 1, 2016 at 5:47 AM John Omernik  wrote:

> @Abhishek,
>
> Do you think the issue is related to any storage plugin that is enabled and
> not available as it applies to all queries?  I guess if it's an issue where
> all queries are slow because the foreman is waiting to initialize ALL
> storage plugins, regardless of their applicability to the queried data,
> then that is a more general issue (that should still be resolved, does the
> foreman need to initialize all plugins before querying specific data?)
>  However, I am still concerned that the query on the CTAS parquet data is
> specifically slower because of it's source.  @Rahul could you test a
> different Parquet table, NOT loaded from the SQL server to see if the
> enabling or disabling the JDBC storage plugin (with the server unavailable)
> has any impact?  Basically, I want to ensure that data that is created as a
> Parquet table via CTAS is 100% free of any links to the source data. This
> is EXTREMELY important.
>
> John
>
>
>
> On Thu, Dec 1, 2016 at 12:46 AM, Abhishek Girish <
> abhishek.gir...@gmail.com>
> wrote:
>
> > Thanks for the update, Rahul!
> >
> > On Wed, Nov 30, 2016 at 9:45 PM Rahul Raj <
> rahul@option3consulting.com
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > Your observation is correct, we just verified that:
> > >
> > >1. The queries run as expected(faster) with Jdbc plugin disabled.
> > >2. Queries run as expected when the plugin's datasource is running.
> > >3. With the datasource down, queries run very slow waiting for the
> > >connection to fail
> > >
> > > Rahul
> > >
> > > On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish <
> > > abhishek.gir...@gmail.com>
> > > wrote:
> > >
> > > > @John,
> > > >
> > > > I agree that this should work. While I am not certain, I don't think
> > the
> > > > issue is specific to a particular plugin, but the way in a query's
> > > > lifecycle, the foreman attempts to initialize every enabled storage
> > > plugin
> > > > before proceeding to execute the query. So when a particular plugin
> > isn't
> > > > configured correctly or the underlying datasource is not up, this
> could
> > > > drastically slow down the query execution time.
> > > >
> > > > I'll look up to see if we have a JIRA for this already - if not will
> > file
> > > > one.
> > > >
> > > > On Wed, Nov 30, 2016 at 8:12 AM, John Omernik 
> > wrote:
> > > >
> > > > > So just my opinion in reading this thread.  (sorry for swooping in
> an
> > > > > opining)
> > > > >
> > > > > If a CTAS is done from any data source into Parquet files there
> > > > should
> > > > > be NO dependency on the original data source to query the resultant
> > > > Parquet
> > > > > files.   As a Drill user, as a Drill admin, this breaks the concept
> > of
> > > > > least surprise.  If I take data from one source, and create Parquet
> > > files
> > > > > in a distributed file system, it should just work.  If there are
> > > "issues"
> > > > > with JDBC plugins or the HBase/Hive plugins in a similar manner,
> > these
> > > > > needs to be hunted down by a large group of villages with
> pitchforks
> > > and
> > > > > torches.  I just can't see how this could be acceptable at any
> level.
> > > The
> > > > > whole idea of Parquet files is they are self describing, schema
> > > included
> > > > > files thus a read of a directory of Parquet files should have
> NO
> > > > > dependancies on anything but the parquet files... even the Parquet
> > > > > "additions" (such as the METADATA Cache) should be a fail open
> > thing...
> > > > if
> > > > > it exists great, use it, speed things up, but if it doesn't read
> the
> > > > > parquet files as normal (Which I believe is how it operates)
> > > > >
> > > > > John
> > > > >
> > > > > On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
> > > > > abhishek.gir...@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Can you attempt to disable to jdbc plugin (configured with
> > SQLServer)
> > > > and
> > > > > > try the query (on parquet) when SQL Server is offline?
> > > > > >
> > > > > > I've seen a similar issue previously when the HBase / Hive plugin
> > was
> > > > > > enabled but either the plugin configuration was wrong or the
> > > underlying
> > > > > > data source was down.
> > > > > >
> > > > > > On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj
> > > >  > > > > > com>
> > > > > > wrote:
> > > > > >
> > > > > > > I have created a parquet file using CTAS from a MS SQL Server.
> > The
> > > > > query
> > > > > > on
> > > > > > > parquet is getting stuck in STARTING state for a long time
> before
> > > > > > returning
> > > > > > > the results.
> > > > > > >
> > > > > > > We could see that drill was trying to connect to the MS SQL
> > server
> > > > from
> > > > > > > which the data was imported

Building a LogicalPlan

2016-12-01 Thread Chris Baynes
Hi,

We have a use case in which we want to construct queries programmatically
which Drill would then execute.

So far I've been able to configure a Jdbc StorageEngine, and initialize a
LogicalPlan (using the builder) with that. I am having difficulty trying to
configure scan, project, and filters. I know I need to construct a
LogicalOperator, and these are implemented in
org.apache.drill.common.logical.data

However to construct an operator instance I need one or more of:
JSONOptions, NamedExpression, LogicalExpression, FieldReference.

So right now I have something like this, but don't know how to fill in the
missing pieces...

// scan
JSONOptions opts = new JSONOptions(...);
builder.addLogicalOperator(Scan.builder().storageEngine("pg").selection(opts).build());

// project
NamedExpression nex = new NamedExpression(...);
builder.addLogicalOperator(Project.builder().addExpr(nex));

LogicalPlan plan = builder.build();

Is there any documentation yet, similar to what calcite has for the
RelBuilder (https://calcite.apache.org/docs/algebra.html)?

I suppose I could also construct a RelNode and then try to convert that to
a LogicalPlan, would I be missing out on any drill features this way?

Thanks in advance


Re: Slow query on parquet imported from SQL Server while the external SQL server is down.

2016-12-01 Thread Padma Penumarthy
Yes, for every query, we build schema tree by trying to initialize
all storage plugins and workspaces in them, regardless of schema configuration 
and/or applicability to data being queried. Go ahead and file a JIRA.
We are looking into fixing this.

Thanks,
Padma


> On Dec 1, 2016, at 8:48 AM, Abhishek Girish  wrote:
> 
> AFAIK, should apply to all queries, irrespective of the source of the data
> or the plugins involved within the query. So when this issue occurs, I
> would expect any query to take long to execute.
> 
> On Thu, Dec 1, 2016 at 5:47 AM John Omernik  wrote:
> 
>> @Abhishek,
>> 
>> Do you think the issue is related to any storage plugin that is enabled and
>> not available as it applies to all queries?  I guess if it's an issue where
>> all queries are slow because the foreman is waiting to initialize ALL
>> storage plugins, regardless of their applicability to the queried data,
>> then that is a more general issue (that should still be resolved, does the
>> foreman need to initialize all plugins before querying specific data?)
>> However, I am still concerned that the query on the CTAS parquet data is
>> specifically slower because of it's source.  @Rahul could you test a
>> different Parquet table, NOT loaded from the SQL server to see if the
>> enabling or disabling the JDBC storage plugin (with the server unavailable)
>> has any impact?  Basically, I want to ensure that data that is created as a
>> Parquet table via CTAS is 100% free of any links to the source data. This
>> is EXTREMELY important.
>> 
>> John
>> 
>> 
>> 
>> On Thu, Dec 1, 2016 at 12:46 AM, Abhishek Girish <
>> abhishek.gir...@gmail.com>
>> wrote:
>> 
>>> Thanks for the update, Rahul!
>>> 
>>> On Wed, Nov 30, 2016 at 9:45 PM Rahul Raj <
>> rahul@option3consulting.com
 
>>> wrote:
>>> 
 Abhishek,
 
 Your observation is correct, we just verified that:
 
   1. The queries run as expected(faster) with Jdbc plugin disabled.
   2. Queries run as expected when the plugin's datasource is running.
   3. With the datasource down, queries run very slow waiting for the
   connection to fail
 
 Rahul
 
 On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish <
 abhishek.gir...@gmail.com>
 wrote:
 
> @John,
> 
> I agree that this should work. While I am not certain, I don't think
>>> the
> issue is specific to a particular plugin, but the way in a query's
> lifecycle, the foreman attempts to initialize every enabled storage
 plugin
> before proceeding to execute the query. So when a particular plugin
>>> isn't
> configured correctly or the underlying datasource is not up, this
>> could
> drastically slow down the query execution time.
> 
> I'll look up to see if we have a JIRA for this already - if not will
>>> file
> one.
> 
> On Wed, Nov 30, 2016 at 8:12 AM, John Omernik 
>>> wrote:
> 
>> So just my opinion in reading this thread.  (sorry for swooping in
>> an
>> opining)
>> 
>> If a CTAS is done from any data source into Parquet files there
> should
>> be NO dependency on the original data source to query the resultant
> Parquet
>> files.   As a Drill user, as a Drill admin, this breaks the concept
>>> of
>> least surprise.  If I take data from one source, and create Parquet
 files
>> in a distributed file system, it should just work.  If there are
 "issues"
>> with JDBC plugins or the HBase/Hive plugins in a similar manner,
>>> these
>> needs to be hunted down by a large group of villages with
>> pitchforks
 and
>> torches.  I just can't see how this could be acceptable at any
>> level.
 The
>> whole idea of Parquet files is they are self describing, schema
 included
>> files thus a read of a directory of Parquet files should have
>> NO
>> dependancies on anything but the parquet files... even the Parquet
>> "additions" (such as the METADATA Cache) should be a fail open
>>> thing...
> if
>> it exists great, use it, speed things up, but if it doesn't read
>> the
>> parquet files as normal (Which I believe is how it operates)
>> 
>> John
>> 
>> On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
>> abhishek.gir...@gmail.com
>>> wrote:
>> 
>>> Can you attempt to disable to jdbc plugin (configured with
>>> SQLServer)
> and
>>> try the query (on parquet) when SQL Server is offline?
>>> 
>>> I've seen a similar issue previously when the HBase / Hive plugin
>>> was
>>> enabled but either the plugin configuration was wrong or the
 underlying
>>> data source was down.
>>> 
>>> On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj
> >> com>
>>> wrote:
>>> 
 I have created a parquet file using CTAS from a MS SQL Server.
>>> The
>> query
>>> on
 parquet is getting stuck in STARTING state for a long time
>> before
>>> r

Re: Slow query on parquet imported from SQL Server while the external SQL server is down.

2016-12-01 Thread Abhishek Girish
Thanks for confirming Padma.

I've filed DRILL-5089  to
track this issue.

On Thu, Dec 1, 2016 at 9:50 AM, Padma Penumarthy 
wrote:

> Yes, for every query, we build schema tree by trying to initialize
> all storage plugins and workspaces in them, regardless of schema
> configuration
> and/or applicability to data being queried. Go ahead and file a JIRA.
> We are looking into fixing this.
>
> Thanks,
> Padma
>
>
> > On Dec 1, 2016, at 8:48 AM, Abhishek Girish 
> wrote:
> >
> > AFAIK, should apply to all queries, irrespective of the source of the
> data
> > or the plugins involved within the query. So when this issue occurs, I
> > would expect any query to take long to execute.
> >
> > On Thu, Dec 1, 2016 at 5:47 AM John Omernik  wrote:
> >
> >> @Abhishek,
> >>
> >> Do you think the issue is related to any storage plugin that is enabled
> and
> >> not available as it applies to all queries?  I guess if it's an issue
> where
> >> all queries are slow because the foreman is waiting to initialize ALL
> >> storage plugins, regardless of their applicability to the queried data,
> >> then that is a more general issue (that should still be resolved, does
> the
> >> foreman need to initialize all plugins before querying specific data?)
> >> However, I am still concerned that the query on the CTAS parquet data is
> >> specifically slower because of it's source.  @Rahul could you test a
> >> different Parquet table, NOT loaded from the SQL server to see if the
> >> enabling or disabling the JDBC storage plugin (with the server
> unavailable)
> >> has any impact?  Basically, I want to ensure that data that is created
> as a
> >> Parquet table via CTAS is 100% free of any links to the source data.
> This
> >> is EXTREMELY important.
> >>
> >> John
> >>
> >>
> >>
> >> On Thu, Dec 1, 2016 at 12:46 AM, Abhishek Girish <
> >> abhishek.gir...@gmail.com>
> >> wrote:
> >>
> >>> Thanks for the update, Rahul!
> >>>
> >>> On Wed, Nov 30, 2016 at 9:45 PM Rahul Raj <
> >> rahul@option3consulting.com
> 
> >>> wrote:
> >>>
>  Abhishek,
> 
>  Your observation is correct, we just verified that:
> 
>    1. The queries run as expected(faster) with Jdbc plugin disabled.
>    2. Queries run as expected when the plugin's datasource is running.
>    3. With the datasource down, queries run very slow waiting for the
>    connection to fail
> 
>  Rahul
> 
>  On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish <
>  abhishek.gir...@gmail.com>
>  wrote:
> 
> > @John,
> >
> > I agree that this should work. While I am not certain, I don't think
> >>> the
> > issue is specific to a particular plugin, but the way in a query's
> > lifecycle, the foreman attempts to initialize every enabled storage
>  plugin
> > before proceeding to execute the query. So when a particular plugin
> >>> isn't
> > configured correctly or the underlying datasource is not up, this
> >> could
> > drastically slow down the query execution time.
> >
> > I'll look up to see if we have a JIRA for this already - if not will
> >>> file
> > one.
> >
> > On Wed, Nov 30, 2016 at 8:12 AM, John Omernik 
> >>> wrote:
> >
> >> So just my opinion in reading this thread.  (sorry for swooping in
> >> an
> >> opining)
> >>
> >> If a CTAS is done from any data source into Parquet files there
> > should
> >> be NO dependency on the original data source to query the resultant
> > Parquet
> >> files.   As a Drill user, as a Drill admin, this breaks the concept
> >>> of
> >> least surprise.  If I take data from one source, and create Parquet
>  files
> >> in a distributed file system, it should just work.  If there are
>  "issues"
> >> with JDBC plugins or the HBase/Hive plugins in a similar manner,
> >>> these
> >> needs to be hunted down by a large group of villages with
> >> pitchforks
>  and
> >> torches.  I just can't see how this could be acceptable at any
> >> level.
>  The
> >> whole idea of Parquet files is they are self describing, schema
>  included
> >> files thus a read of a directory of Parquet files should have
> >> NO
> >> dependancies on anything but the parquet files... even the Parquet
> >> "additions" (such as the METADATA Cache) should be a fail open
> >>> thing...
> > if
> >> it exists great, use it, speed things up, but if it doesn't read
> >> the
> >> parquet files as normal (Which I believe is how it operates)
> >>
> >> John
> >>
> >> On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
> >> abhishek.gir...@gmail.com
> >>> wrote:
> >>
> >>> Can you attempt to disable to jdbc plugin (configured with
> >>> SQLServer)
> > and
> >>> try the query (on parquet) when SQL Server is offline?
> >>>
> >>> I've seen a similar issue previously when the HBase /