Drill Support for Sybase Plugin

2016-09-21 Thread Lee, David
I’ve gone through multiple web posts and this mail archive and I’m experiencing 
similar problems with Sybase ASE which other users have had with Microsoft SQL 
Server..

Both products use Transact SQL which introduces the concept of a schema (or 
user) owned tables.

I can get use, show tables and describe working, but SQL statements fail 
because I don’t think the SQL parser understands the schema / user dot 
notation..

Here’s my setup and test scripts:


A. Storage Plugin named SYBASE:

{
  "type": "jdbc",
  "driver": "com.sybase.jdbc4.jdbc.SybDriver",
  "url": "jdbc:sybase:Tds:my_server_address:4100/tempdb",
  "username": "my_login",
  "password": "my_password",
  "enabled": true
}


B. Sybase SQL Script executed in ISQL to create test tables:

use tempdb
go

create table phone_book
(
first_name varchar(10),
last_name varchar(20),
phone_number varchar(12)
)
go

insert phone_book values ('Bob','Marley','555-555-')
insert phone_book values ('Mary','Jane','111-111-')
insert phone_book values ('Bat','Man','911-911-')
go


C.From Apache Drill here’s my output:

INFO: Initiating Jersey application, version Jersey
apache drill 1.8.0
"drill baby drill"

0: jdbc:drill:zk=local> use SYBASE.tempdb;
+---+--
|  ok   |  summary
+---+--
| true  | Default schema changed to [SYBASE.tempdb]
+---+--
1 row selected (5.938 seconds)

0: jdbc:drill:zk=local> show tables;
++--+
| TABLE_SCHEMA   |  TABLE_NAME  |
++--+
| SYBASE.tempdb  | phone_book   |
| SYBASE.tempdb  | sysalternates|
| SYBASE.tempdb  | sysattributes|
| SYBASE.tempdb  | syscolumns   |
| SYBASE.tempdb  | syscomments  |
| SYBASE.tempdb  | sysconstraints   |
etc.. etc.. etc..
++--+
42 rows selected (7.037 seconds)

0: jdbc:drill:zk=local> describe phone_book;
+---++--+
|  COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+---++--+
| first_name| CHARACTER VARYING  | NO   |
| last_name | CHARACTER VARYING  | NO   |
| phone_number  | CHARACTER VARYING  | NO   |
+---++--+
3 rows selected (4.901 seconds)

0: jdbc:drill:zk=local> select * from phone_book;
Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
SQL query.

sql SELECT *
FROM "tempdb"."phone_book"
plugin SYBASE
Fragment 0:0

[Error Id: db3ae9fe-1214-4590-bef0-cc299c9871d9 on .xxx.x.com:31010] 
(state=,code=0)


D.My connection and plug is fine, but I can’t get any SQL to work.. I’ve 
tried all of the following SQL statements which work in ISQL, but are not valid 
in DRILL:

ISQL:

select * from phone_book
select * from tempdb..phone_book
select * from tempdb.guest.phone_book

DRILL:

select * from phone_book;
select * from SYBASE.tempdb..phone_book;
select * from SYBASE.tempdb.guest.phone_book;

Anyone have any solutions before I start digging through the github for Drill 
and Calcite??

David Lee
Vice President | BlackRock
Phone: +1.415.670.2744 | Mobile: +1.415.706.6874


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for 
further information.  Please refer to 
http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2016 BlackRock, Inc. All rights reserved.


Re: Excluding HDFS .tmp file from multi-file query?

2016-09-21 Thread Andries Engelbrecht
Add a . prefix to the Flume temp files. Drill will ignore the hidden files when 
you query the directory structure.

--Andries

> On Sep 21, 2016, at 2:36 PM, Robin Moffatt  
> wrote:
> 
> Hi,
> I have a stream of data from Flume landing in HDFS in files of a set size.
> I can query these files individually just fine, and across multiple ones
> too - except if the wildcard encompasses the *currently open HDFS file that
> Flume is writing to*. When this happens, Drill understandably barfs.
> 
> 0: jdbc:drill:drillbit=localhost> show files in
> `hdfs`.`/user/flume/incoming/twitter/2016/09/21/`;
> +--+--+-+--++-+--+--+--+
> | name | isDirectory  | isFile  |  length  | owner
> |group| permissions  |accessTime|
> modificationTime |
> +--+--+-+--++-+--+--+--+
> [...]
> | FlumeData.1474467815652  | false| true| 1055490  | flume
> | supergroup  | rw-r--r--| 2016-09-21 21:52:07.219  | 2016-09-21
> 21:58:58.28   |
> | FlumeData.1474467815653  | false| true| 1050470  | flume
> | supergroup  | rw-r--r--| 2016-09-21 21:58:58.556  | 2016-09-21
> 22:06:28.636  |
> | FlumeData.1474467815654  | false| true| 1051043  | flume
> | supergroup  | rw-r--r--| 2016-09-21 22:06:29.564  | 2016-09-21
> 22:13:40.808  |
> | FlumeData.1474467815655  | false| true| 1052657  | flume
> | supergroup  | rw-r--r--| 2016-09-21 22:13:40.978  | 2016-09-21
> 22:23:00.409  |
> | FlumeData.1474467815656.tmp  | false| true| 9447 | flume
> | supergroup  | rw-r--r--| 2016-09-21 22:23:00.788  | 2016-09-21
> 22:23:00.788  |
> +--+--+-+--++-+--+--+--+
> 59 rows selected (0.265 seconds)
> 
> Note the .tmp file as the last one in the folder
> 
> Querying a single file works :
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/FlumeData.1474467815655`(type
> => 'json'));
> +-+
> | EXPR$0  |
> +-+
> | 221 |
> +-+
> 1 row selected (0.685 seconds)
> 
> 
> As does across multiple files where the wildcard pattern would exclude the
> .tmp file:
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/FlumeData.147446781564*`(type
> => 'json'));
> +-+
> | EXPR$0  |
> +-+
> | 2178|
> +-+
> 1 row selected (1.24 seconds)
> 
> 
> But if I try to query all the files, Drill includes the .tmp file and
> errors:
> 
> 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
> table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/*`(type => 'json'));
> Error: DATA_READ ERROR: Failure reading JSON file - Cannot obtain block
> length for
> LocatedBlock{BP-478416316-192.168.10.112-1466151126376:blk_1074004983_264343;
> getBlockSize()=9447; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[
> 192.168.10.116:50010,DS-39bf5e74-3eec-4447-9cd2-f17b5cc259b8,DISK],
> DatanodeInfoWithStorage[192.168.10.113:50010,DS-845945e7-0bc8-44aa-945c-a140ad1f55ab,DISK],
> DatanodeInfoWithStorage[192.168.10.115:50010
> ,DS-a0e97909-3d40-4f49-b67f-636e9f10928a,DISK]]}
> 
> File  /user/flume/incoming/twitter/2016/09/21/FlumeData.1474467815656.tmp
> Record  1
> Fragment 0:0
> 
> [Error Id: d3f322cb-c64d-43c8-9231-fb2c96e8589d on
> cdh57-01-node-01.moffatt.me:31010] (state=,code=0)
> 0: jdbc:drill:drillbit=localhost>
> 
> 
> Is there a way around this with Drill? For example, can I use a regex in
> the path? I've tried, but just hit
> Error: VALIDATION ERROR: null
> 
> thanks, Robin.



Excluding HDFS .tmp file from multi-file query?

2016-09-21 Thread Robin Moffatt
Hi,
I have a stream of data from Flume landing in HDFS in files of a set size.
I can query these files individually just fine, and across multiple ones
too - except if the wildcard encompasses the *currently open HDFS file that
Flume is writing to*. When this happens, Drill understandably barfs.

0: jdbc:drill:drillbit=localhost> show files in
`hdfs`.`/user/flume/incoming/twitter/2016/09/21/`;
+--+--+-+--++-+--+--+--+
| name | isDirectory  | isFile  |  length  | owner
 |group| permissions  |accessTime|
modificationTime |
+--+--+-+--++-+--+--+--+
[...]
| FlumeData.1474467815652  | false| true| 1055490  | flume
 | supergroup  | rw-r--r--| 2016-09-21 21:52:07.219  | 2016-09-21
21:58:58.28   |
| FlumeData.1474467815653  | false| true| 1050470  | flume
 | supergroup  | rw-r--r--| 2016-09-21 21:58:58.556  | 2016-09-21
22:06:28.636  |
| FlumeData.1474467815654  | false| true| 1051043  | flume
 | supergroup  | rw-r--r--| 2016-09-21 22:06:29.564  | 2016-09-21
22:13:40.808  |
| FlumeData.1474467815655  | false| true| 1052657  | flume
 | supergroup  | rw-r--r--| 2016-09-21 22:13:40.978  | 2016-09-21
22:23:00.409  |
| FlumeData.1474467815656.tmp  | false| true| 9447 | flume
 | supergroup  | rw-r--r--| 2016-09-21 22:23:00.788  | 2016-09-21
22:23:00.788  |
+--+--+-+--++-+--+--+--+
59 rows selected (0.265 seconds)

Note the .tmp file as the last one in the folder

Querying a single file works :

0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/FlumeData.1474467815655`(type
=> 'json'));
+-+
| EXPR$0  |
+-+
| 221 |
+-+
1 row selected (0.685 seconds)


As does across multiple files where the wildcard pattern would exclude the
.tmp file:

0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/FlumeData.147446781564*`(type
=> 'json'));
+-+
| EXPR$0  |
+-+
| 2178|
+-+
1 row selected (1.24 seconds)


But if I try to query all the files, Drill includes the .tmp file and
errors:

0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM
table(`hdfs`.`/user/flume/incoming/twitter/2016/09/21/*`(type => 'json'));
Error: DATA_READ ERROR: Failure reading JSON file - Cannot obtain block
length for
LocatedBlock{BP-478416316-192.168.10.112-1466151126376:blk_1074004983_264343;
getBlockSize()=9447; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[
192.168.10.116:50010,DS-39bf5e74-3eec-4447-9cd2-f17b5cc259b8,DISK],
DatanodeInfoWithStorage[192.168.10.113:50010,DS-845945e7-0bc8-44aa-945c-a140ad1f55ab,DISK],
DatanodeInfoWithStorage[192.168.10.115:50010
,DS-a0e97909-3d40-4f49-b67f-636e9f10928a,DISK]]}

File  /user/flume/incoming/twitter/2016/09/21/FlumeData.1474467815656.tmp
Record  1
Fragment 0:0

[Error Id: d3f322cb-c64d-43c8-9231-fb2c96e8589d on
cdh57-01-node-01.moffatt.me:31010] (state=,code=0)
0: jdbc:drill:drillbit=localhost>


Is there a way around this with Drill? For example, can I use a regex in
the path? I've tried, but just hit
Error: VALIDATION ERROR: null

thanks, Robin.


Re: Apache Drill Error

2016-09-21 Thread Robin Moffatt
Thanks, that fixed the problem for me on Mac with Drill 1.8. I had the same
error as OP (Unable to locate an executable at
"/System/Library/Frameworks/JavaVM.framework/Versions/A/bin/java" (-1))

$ diff drill-config.sh drill-config.sh.bak
395c395
< #export JAVA_HOME
---
> export JAVA_HOME

On 21 September 2016 at 18:08, Padma Penumarthy 
wrote:

> Trying commenting out the following line in
> distribution/target/apache-drill-1.9.0-SNAPSHOT/apache-
> drill-1.9.0-SNAPSHOT/bin/drill-config.sh
>
> export JAVA_HOME
>
> Thanks,
> Padma
>
>
> > On Sep 20, 2016, at 7:17 PM, Rajasimman Selvaraj 
> wrote:
> >
> > Hi ,
> >   I am getting the following error while starting the drill in Mac OS
> for the first time.
> > Unable to locate an executable at "/System/Library/Frameworks/
> JavaVM.framework/Versions/A/bin/java" (-1)
> >
> > I have the java JDK installed in my machine as listed below.
> > RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ java -version
> > java version "1.8.0_101"
> > Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
> > RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$
> >
> > Please help me to get the drill start and running in my Mac OS.
> >
> > Regards,
> > S.RajaSimman.
>
>


Re: Apache Drill Error

2016-09-21 Thread Padma Penumarthy
Sorry, I meant Try (not Trying)

> On Sep 21, 2016, at 10:08 AM, Padma Penumarthy  
> wrote:
> 
> Trying commenting out the following line in 
> distribution/target/apache-drill-1.9.0-SNAPSHOT/apache-drill-1.9.0-SNAPSHOT/bin/drill-config.sh
> 
> export JAVA_HOME
> 
> Thanks,
> Padma
> 
> 
>> On Sep 20, 2016, at 7:17 PM, Rajasimman Selvaraj > > wrote:
>> 
>> Hi ,
>>   I am getting the following error while starting the drill in Mac OS for 
>> the first time.
>> Unable to locate an executable at 
>> "/System/Library/Frameworks/JavaVM.framework/Versions/A/bin/java" (-1)
>> 
>> I have the java JDK installed in my machine as listed below.
>> RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ java -version
>> java version "1.8.0_101"
>> Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
>> RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ 
>> 
>> Please help me to get the drill start and running in my Mac OS.
>> 
>> Regards,
>> S.RajaSimman.
> 



Re: Apache Drill Error

2016-09-21 Thread Padma Penumarthy
Trying commenting out the following line in 
distribution/target/apache-drill-1.9.0-SNAPSHOT/apache-drill-1.9.0-SNAPSHOT/bin/drill-config.sh

export JAVA_HOME

Thanks,
Padma


> On Sep 20, 2016, at 7:17 PM, Rajasimman Selvaraj  
> wrote:
> 
> Hi ,
>   I am getting the following error while starting the drill in Mac OS for the 
> first time.
> Unable to locate an executable at 
> "/System/Library/Frameworks/JavaVM.framework/Versions/A/bin/java" (-1)
> 
> I have the java JDK installed in my machine as listed below.
> RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ java -version
> java version "1.8.0_101"
> Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
> RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ 
> 
> Please help me to get the drill start and running in my Mac OS.
> 
> Regards,
> S.RajaSimman.



Apache Drill Error

2016-09-21 Thread Rajasimman Selvaraj
Hi ,
   I am getting the following error while starting the drill in Mac OS for the 
first time.
Unable to locate an executable at 
"/System/Library/Frameworks/JavaVM.framework/Versions/A/bin/java" (-1)

I have the java JDK installed in my machine as listed below.
RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
RajaSimmans-MBP:apache-drill-1.8.0 Hadoop$ 

Please help me to get the drill start and running in my Mac OS.

Regards,
S.RajaSimman.