Re: Drill Views and Typed Columns

2016-05-16 Thread Andries Engelbrecht
John,

Using a simple tool like squirrel and running a select * limit 0 against 
parquet it is able to retrieve the column names, and if you look at the 
returned metadata it identified the data type. Obviously schema change will be 
a challenge to deal with with limit 0 queries. This is where views can be used 
to force data types for the end tool, skip rows, substitute values, skip added 
columns, etc, which makes it much better for most tools to work with the data.

A function like typeof does require a row to be returned to identify the data 
type.


Ideally it will be great to do initial schema discovery with a tool, associate 
the metadata with the data sources and then make it available for wider use. A 
typeof histogram will be brilliant on data structures with schema changes, that 
way it will be possible to see how many records are affected of a certain 
type/change/etc. A endless list of possibilities. It should be feasible to 
utilize Drill as the execution engine with a smart tool on top of it to process.

--Andries


> On May 16, 2016, at 4:08 PM, John Omernik  wrote:
> 
> So how does a limit0 query return the type if I may ask?  I can use limit 0
> queries with SqlAlchemy and Caravel, but in looking at it, I wasn't sure
> how where the types were returned in the results (unless it's in the
> underlying metadata not displayed to the user).  I can dig more if you tell
> me it's there :)
> 
> On Mon, May 16, 2016 at 4:51 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
> 
>> Both are options (and thats how the bI tools work with Drill today)
>> 
>> - Views with explicit Casts - Will return schema definitions as part show
>> schemas/describe table/show columns queries.
>> - Limit 0 queries - This will be good as well if we can modify Caravel to
>> issue such queries (this is what Tableau does)
>> 
>> For now, I think returning metadata to Caravel using the above options will
>> be the solution. The ideal approach would be actually to have a  data
>> exploration experience on raw data (without curation) within Caravel itself
>> to create this metadata as needed.
>> 
>> -Neeraja
>> 
>> On Mon, May 16, 2016 at 2:45 PM, Ted Dunning 
>> wrote:
>> 
>>> As you suggest, views are a critical way to lock down that kind of
>>> information.
>>> 
>>> Select with limit 0 is often used for meta-data exploration. This is more
>>> robust than asking about tables since not everything is necessarily
>> really
>>> in a single table.
>>> 
>>> On Mon, May 16, 2016 at 2:12 PM, John Omernik  wrote:
>>> 
 Hey all, as part of my exploration of Caravel,  I realized knowing the
 types of columns can be valuable... I can say create a view of a
>>> directory
 of parquet allowing the "show tables" to work well, however, the type
>> for
 every column is "ANY" which may work (need to tweak some things) but I
>> am
 guessing may make certain down stream things in Caravel more difficult.
 
 So, just thinking aloud here, would it be possible to "cast" in Views
>> to
 allow the view definition to pass along type information?  Even if it
>>> means
 a more verbose view definition, it would be done once, and then down
>>> stream
 tools like Caravel would know the types...
 
 Thoughts?
 
 John
 
>>> 
>> 



Re: Drill Views and Typed Columns

2016-05-16 Thread John Omernik
So how does a limit0 query return the type if I may ask?  I can use limit 0
queries with SqlAlchemy and Caravel, but in looking at it, I wasn't sure
how where the types were returned in the results (unless it's in the
underlying metadata not displayed to the user).  I can dig more if you tell
me it's there :)

On Mon, May 16, 2016 at 4:51 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> Both are options (and thats how the bI tools work with Drill today)
>
>  - Views with explicit Casts - Will return schema definitions as part show
> schemas/describe table/show columns queries.
> - Limit 0 queries - This will be good as well if we can modify Caravel to
> issue such queries (this is what Tableau does)
>
> For now, I think returning metadata to Caravel using the above options will
> be the solution. The ideal approach would be actually to have a  data
> exploration experience on raw data (without curation) within Caravel itself
> to create this metadata as needed.
>
> -Neeraja
>
> On Mon, May 16, 2016 at 2:45 PM, Ted Dunning 
> wrote:
>
> > As you suggest, views are a critical way to lock down that kind of
> > information.
> >
> > Select with limit 0 is often used for meta-data exploration. This is more
> > robust than asking about tables since not everything is necessarily
> really
> > in a single table.
> >
> > On Mon, May 16, 2016 at 2:12 PM, John Omernik  wrote:
> >
> > > Hey all, as part of my exploration of Caravel,  I realized knowing the
> > > types of columns can be valuable... I can say create a view of a
> > directory
> > > of parquet allowing the "show tables" to work well, however, the type
> for
> > > every column is "ANY" which may work (need to tweak some things) but I
> am
> > > guessing may make certain down stream things in Caravel more difficult.
> > >
> > > So, just thinking aloud here, would it be possible to "cast" in Views
> to
> > > allow the view definition to pass along type information?  Even if it
> > means
> > > a more verbose view definition, it would be done once, and then down
> > stream
> > > tools like Caravel would know the types...
> > >
> > > Thoughts?
> > >
> > > John
> > >
> >
>


Re: Drill Views and Typed Columns

2016-05-16 Thread Neeraja Rentachintala
Both are options (and thats how the bI tools work with Drill today)

 - Views with explicit Casts - Will return schema definitions as part show
schemas/describe table/show columns queries.
- Limit 0 queries - This will be good as well if we can modify Caravel to
issue such queries (this is what Tableau does)

For now, I think returning metadata to Caravel using the above options will
be the solution. The ideal approach would be actually to have a  data
exploration experience on raw data (without curation) within Caravel itself
to create this metadata as needed.

-Neeraja

On Mon, May 16, 2016 at 2:45 PM, Ted Dunning  wrote:

> As you suggest, views are a critical way to lock down that kind of
> information.
>
> Select with limit 0 is often used for meta-data exploration. This is more
> robust than asking about tables since not everything is necessarily really
> in a single table.
>
> On Mon, May 16, 2016 at 2:12 PM, John Omernik  wrote:
>
> > Hey all, as part of my exploration of Caravel,  I realized knowing the
> > types of columns can be valuable... I can say create a view of a
> directory
> > of parquet allowing the "show tables" to work well, however, the type for
> > every column is "ANY" which may work (need to tweak some things) but I am
> > guessing may make certain down stream things in Caravel more difficult.
> >
> > So, just thinking aloud here, would it be possible to "cast" in Views to
> > allow the view definition to pass along type information?  Even if it
> means
> > a more verbose view definition, it would be done once, and then down
> stream
> > tools like Caravel would know the types...
> >
> > Thoughts?
> >
> > John
> >
>


Re: Drill Views and Typed Columns

2016-05-16 Thread Ted Dunning
As you suggest, views are a critical way to lock down that kind of
information.

Select with limit 0 is often used for meta-data exploration. This is more
robust than asking about tables since not everything is necessarily really
in a single table.

On Mon, May 16, 2016 at 2:12 PM, John Omernik  wrote:

> Hey all, as part of my exploration of Caravel,  I realized knowing the
> types of columns can be valuable... I can say create a view of a directory
> of parquet allowing the "show tables" to work well, however, the type for
> every column is "ANY" which may work (need to tweak some things) but I am
> guessing may make certain down stream things in Caravel more difficult.
>
> So, just thinking aloud here, would it be possible to "cast" in Views to
> allow the view definition to pass along type information?  Even if it means
> a more verbose view definition, it would be done once, and then down stream
> tools like Caravel would know the types...
>
> Thoughts?
>
> John
>


Drill Views and Typed Columns

2016-05-16 Thread John Omernik
Hey all, as part of my exploration of Caravel,  I realized knowing the
types of columns can be valuable... I can say create a view of a directory
of parquet allowing the "show tables" to work well, however, the type for
every column is "ANY" which may work (need to tweak some things) but I am
guessing may make certain down stream things in Caravel more difficult.

So, just thinking aloud here, would it be possible to "cast" in Views to
allow the view definition to pass along type information?  Even if it means
a more verbose view definition, it would be done once, and then down stream
tools like Caravel would know the types...

Thoughts?

John


Re: Drill & Caravel

2016-05-16 Thread John Omernik
Ya happy to join the hangout tomorrow.

On Mon, May 16, 2016 at 3:09 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> John
> Great. Can we briefly look at this during hang out tomorrow.
>
> On Mon, May 16, 2016 at 12:26 PM, John Omernik  wrote:
>
> > AWESOME! Yep, that works and I like it better than using sys.options. Now
> > to dive in and play with the Dialect.
> >
> > Note, I will be uploading an unfinished dialect in my caraveldrill
> repo...
> > the goal isn't a production ready thing, but a skeleton (based on the
> > access one) of what a dialect is... The purpose is to evolve things as
> put
> > it all in one area for people to work with. As of now, I have no clue how
> > to trap statements with no FROM, but that's my first thing to work on :)
> >
> > Thanks for the help Veera!
> >
> > John
> >
> > On Mon, May 16, 2016 at 2:20 PM, Veera Naranammalpuram <
> > vnaranammalpu...@maprtech.com> wrote:
> >
> > > Does this work?
> > >
> > > 0: jdbc:drill:zk=local> SELECT 'x' AS some_label from (values(1));
> > > +-+
> > > | some_label  |
> > > +-+
> > > | x   |
> > > +-+
> > > 1 row selected (1.41 seconds)
> > > 0: jdbc:drill:zk=local>
> > >
> > > -Veera
> > >
> > > On Mon, May 16, 2016 at 3:19 PM, John Omernik 
> wrote:
> > >
> > > > I suppose I could do select 'x' AS some_label from sys.options limit
> 1;
> > > >
> > > > any reason not to? Any other options?
> > > >
> > > > On Mon, May 16, 2016 at 2:18 PM, John Omernik 
> > wrote:
> > > >
> > > > > Does Drill have a "dummy" table (like dual) that we could test
> > against?
> > > > If
> > > > > we had that I could replace the that in a dialect (I think)
> > > > >
> > > > > SELECT 'test plain returns' AS anon_1
> > > > >
> > > > >
> > > > > SELECT 'x' AS some_label
> > > > >
> > > > >
> > > > > SELECT 'test unicode returns' AS anon_1
> > > > >
> > > > >
> > > > > SELECT 'x' AS some_label
> > > > >
> > > > >
> > > > > Drill is looking for a "FROM" :)
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Veera Naranammalpuram
> > > Product Specialist - SQL on Hadoop
> > > *MapR Technologies (www.mapr.com )*
> > > *(Email) vnaranammalpu...@maprtech.com *
> > > *(Mobile) 917 683 8116 - can text *
> > > *Timezone: ET (UTC -5:00 / -4:00)*
> > >
> >
>


Querying a file prefix in a directory or in S3

2016-05-16 Thread Ashish Goel
Hi,

I have a bunch of files in a common S3 prefix all of them are CSV but with
3 different schemas. The same schema files follow a common naming prefix
convention i.e. all files with schema s1 are named as file1*, all files
with schema s2 are named as file2* etc. Is it possible for me to query such
files from Drill. I couldn't find a reference to specify regex in S3
prefix.

Though my use case is specific to S3, I think the same would apply for DFS.
Wondering if anyone had a similar use case before and how were they able to
work around this.

-- 
Thanks,
Ashish


Re: Drill & Caravel

2016-05-16 Thread John Omernik
I've updated the https://github.com/JohnOmernik/caraveldrill

At this point, adding a database with Drill works! You can now press test
connection and we get a "Seems Ok"  As we prune out old stuff, and add in
Drill stuff, I am using comments to identify what we added and what has
been there, if a class in the dialect has no comments, assume it's all
MSAccess dialect code... (I know this is a horrible way to code, but I am
learning and like sharing my learnings :)

John




On Mon, May 16, 2016 at 2:26 PM, John Omernik  wrote:

> AWESOME! Yep, that works and I like it better than using sys.options. Now
> to dive in and play with the Dialect.
>
> Note, I will be uploading an unfinished dialect in my caraveldrill repo...
> the goal isn't a production ready thing, but a skeleton (based on the
> access one) of what a dialect is... The purpose is to evolve things as put
> it all in one area for people to work with. As of now, I have no clue how
> to trap statements with no FROM, but that's my first thing to work on :)
>
> Thanks for the help Veera!
>
> John
>
> On Mon, May 16, 2016 at 2:20 PM, Veera Naranammalpuram <
> vnaranammalpu...@maprtech.com> wrote:
>
>> Does this work?
>>
>> 0: jdbc:drill:zk=local> SELECT 'x' AS some_label from (values(1));
>> +-+
>> | some_label  |
>> +-+
>> | x   |
>> +-+
>> 1 row selected (1.41 seconds)
>> 0: jdbc:drill:zk=local>
>>
>> -Veera
>>
>> On Mon, May 16, 2016 at 3:19 PM, John Omernik  wrote:
>>
>> > I suppose I could do select 'x' AS some_label from sys.options limit 1;
>> >
>> > any reason not to? Any other options?
>> >
>> > On Mon, May 16, 2016 at 2:18 PM, John Omernik  wrote:
>> >
>> > > Does Drill have a "dummy" table (like dual) that we could test
>> against?
>> > If
>> > > we had that I could replace the that in a dialect (I think)
>> > >
>> > > SELECT 'test plain returns' AS anon_1
>> > >
>> > >
>> > > SELECT 'x' AS some_label
>> > >
>> > >
>> > > SELECT 'test unicode returns' AS anon_1
>> > >
>> > >
>> > > SELECT 'x' AS some_label
>> > >
>> > >
>> > > Drill is looking for a "FROM" :)
>> > >
>> > >
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Veera Naranammalpuram
>> Product Specialist - SQL on Hadoop
>> *MapR Technologies (www.mapr.com )*
>> *(Email) vnaranammalpu...@maprtech.com *
>> *(Mobile) 917 683 8116 - can text *
>> *Timezone: ET (UTC -5:00 / -4:00)*
>>
>
>


Re: Drill & Caravel

2016-05-16 Thread Neeraja Rentachintala
John
Great. Can we briefly look at this during hang out tomorrow.

On Mon, May 16, 2016 at 12:26 PM, John Omernik  wrote:

> AWESOME! Yep, that works and I like it better than using sys.options. Now
> to dive in and play with the Dialect.
>
> Note, I will be uploading an unfinished dialect in my caraveldrill repo...
> the goal isn't a production ready thing, but a skeleton (based on the
> access one) of what a dialect is... The purpose is to evolve things as put
> it all in one area for people to work with. As of now, I have no clue how
> to trap statements with no FROM, but that's my first thing to work on :)
>
> Thanks for the help Veera!
>
> John
>
> On Mon, May 16, 2016 at 2:20 PM, Veera Naranammalpuram <
> vnaranammalpu...@maprtech.com> wrote:
>
> > Does this work?
> >
> > 0: jdbc:drill:zk=local> SELECT 'x' AS some_label from (values(1));
> > +-+
> > | some_label  |
> > +-+
> > | x   |
> > +-+
> > 1 row selected (1.41 seconds)
> > 0: jdbc:drill:zk=local>
> >
> > -Veera
> >
> > On Mon, May 16, 2016 at 3:19 PM, John Omernik  wrote:
> >
> > > I suppose I could do select 'x' AS some_label from sys.options limit 1;
> > >
> > > any reason not to? Any other options?
> > >
> > > On Mon, May 16, 2016 at 2:18 PM, John Omernik 
> wrote:
> > >
> > > > Does Drill have a "dummy" table (like dual) that we could test
> against?
> > > If
> > > > we had that I could replace the that in a dialect (I think)
> > > >
> > > > SELECT 'test plain returns' AS anon_1
> > > >
> > > >
> > > > SELECT 'x' AS some_label
> > > >
> > > >
> > > > SELECT 'test unicode returns' AS anon_1
> > > >
> > > >
> > > > SELECT 'x' AS some_label
> > > >
> > > >
> > > > Drill is looking for a "FROM" :)
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Veera Naranammalpuram
> > Product Specialist - SQL on Hadoop
> > *MapR Technologies (www.mapr.com )*
> > *(Email) vnaranammalpu...@maprtech.com *
> > *(Mobile) 917 683 8116 - can text *
> > *Timezone: ET (UTC -5:00 / -4:00)*
> >
>


Re: Drill & Caravel

2016-05-16 Thread John Omernik
AWESOME! Yep, that works and I like it better than using sys.options. Now
to dive in and play with the Dialect.

Note, I will be uploading an unfinished dialect in my caraveldrill repo...
the goal isn't a production ready thing, but a skeleton (based on the
access one) of what a dialect is... The purpose is to evolve things as put
it all in one area for people to work with. As of now, I have no clue how
to trap statements with no FROM, but that's my first thing to work on :)

Thanks for the help Veera!

John

On Mon, May 16, 2016 at 2:20 PM, Veera Naranammalpuram <
vnaranammalpu...@maprtech.com> wrote:

> Does this work?
>
> 0: jdbc:drill:zk=local> SELECT 'x' AS some_label from (values(1));
> +-+
> | some_label  |
> +-+
> | x   |
> +-+
> 1 row selected (1.41 seconds)
> 0: jdbc:drill:zk=local>
>
> -Veera
>
> On Mon, May 16, 2016 at 3:19 PM, John Omernik  wrote:
>
> > I suppose I could do select 'x' AS some_label from sys.options limit 1;
> >
> > any reason not to? Any other options?
> >
> > On Mon, May 16, 2016 at 2:18 PM, John Omernik  wrote:
> >
> > > Does Drill have a "dummy" table (like dual) that we could test against?
> > If
> > > we had that I could replace the that in a dialect (I think)
> > >
> > > SELECT 'test plain returns' AS anon_1
> > >
> > >
> > > SELECT 'x' AS some_label
> > >
> > >
> > > SELECT 'test unicode returns' AS anon_1
> > >
> > >
> > > SELECT 'x' AS some_label
> > >
> > >
> > > Drill is looking for a "FROM" :)
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Veera Naranammalpuram
> Product Specialist - SQL on Hadoop
> *MapR Technologies (www.mapr.com )*
> *(Email) vnaranammalpu...@maprtech.com *
> *(Mobile) 917 683 8116 - can text *
> *Timezone: ET (UTC -5:00 / -4:00)*
>


Re: Drill & Caravel

2016-05-16 Thread Veera Naranammalpuram
Does this work?

0: jdbc:drill:zk=local> SELECT 'x' AS some_label from (values(1));
+-+
| some_label  |
+-+
| x   |
+-+
1 row selected (1.41 seconds)
0: jdbc:drill:zk=local>

-Veera

On Mon, May 16, 2016 at 3:19 PM, John Omernik  wrote:

> I suppose I could do select 'x' AS some_label from sys.options limit 1;
>
> any reason not to? Any other options?
>
> On Mon, May 16, 2016 at 2:18 PM, John Omernik  wrote:
>
> > Does Drill have a "dummy" table (like dual) that we could test against?
> If
> > we had that I could replace the that in a dialect (I think)
> >
> > SELECT 'test plain returns' AS anon_1
> >
> >
> > SELECT 'x' AS some_label
> >
> >
> > SELECT 'test unicode returns' AS anon_1
> >
> >
> > SELECT 'x' AS some_label
> >
> >
> > Drill is looking for a "FROM" :)
> >
> >
> >
> >
>



-- 
Veera Naranammalpuram
Product Specialist - SQL on Hadoop
*MapR Technologies (www.mapr.com )*
*(Email) vnaranammalpu...@maprtech.com *
*(Mobile) 917 683 8116 - can text *
*Timezone: ET (UTC -5:00 / -4:00)*


Re: Drill & Caravel

2016-05-16 Thread John Omernik
I suppose I could do select 'x' AS some_label from sys.options limit 1;

any reason not to? Any other options?

On Mon, May 16, 2016 at 2:18 PM, John Omernik  wrote:

> Does Drill have a "dummy" table (like dual) that we could test against? If
> we had that I could replace the that in a dialect (I think)
>
> SELECT 'test plain returns' AS anon_1
>
>
> SELECT 'x' AS some_label
>
>
> SELECT 'test unicode returns' AS anon_1
>
>
> SELECT 'x' AS some_label
>
>
> Drill is looking for a "FROM" :)
>
>
>
>


Re: Drill & Caravel

2016-05-16 Thread John Omernik
I managed to get a real sql statement from Caravel to Drill !  It failed,
but at least it wasn't a blob of Unicode text. I had to switch UTF-32 to
UTF-16 in mapr.drillodbc.ini. I now do this directly in my dockerfile for
caraveldrill https://github.com/JohnOmernik/caraveldrill

Now I need to look at the SQL Alchemy side of things.  Drill didn't like
some of the queries that got sent when I did test connections etc.  So now
to figure out how to handle that in the dialect...

(Example of Queries Drill didn't like that Caravel/SQLAlchemy Sent when
adding data connections)

SELECT 'test plain returns' AS anon_1


SELECT 'x' AS some_label


SELECT 'test unicode returns' AS anon_1


SELECT 'x' AS some_label


On Sun, May 15, 2016 at 2:30 PM, John Omernik  wrote:

> I managed to get a skeleton drill dialect to connect to Drill.  That said,
> when I "skeletonized" the access Dialect I likely removed to much, in that
> when I tested the connection from Caravel to Drill, this is the error I got
> (note the query was attempted and I saw it in my drill profiles!!)
>
> Basically I used the URL drill+pyodbc:///?%CONNSTRING%  wtih %CONNSTRING
> being the uq variable in my pyodbc_example.py that I outputted in my github
> above. That worked except that I needed to set
>
> {
> "metadata_params": {},
> "engine_params": {"connect_args": {"autocommit":1}}
> }
>
> in the Caravel data source page as well as the URL to make it work.
>
> Anywho, I think I am out of my depth at this point, I got it to connect
> and submit "something" to drill, but that's about it... if anyone needs
> help getting to where I got to take it from there, I'd be happy to help.
>
> John
>
>
>
> DBAPIError: (pyodbc.Error) ('HY000', '[HY000] [MapR][Drill] (1040) Drill
> failed to execute the query:
> \x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\n[30027]Query
> execution error. Details:[ \nPARSE ERROR: Lexical error at line 1, column
> 1.  Encountered: "\\ufffd" (65533), after : ""\n\nSQL Query
> \x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\n^\n\n\n[Error
> Id: 9e67eab5-3a39-4527-ab22-763e950dcd86 on
> hadoopmapr5.brewingintel.com:20001]\n] (1040) (SQLExecDirectW)')
>
> On Sun, May 15, 2016 at 10:24 AM, John Omernik  wrote:
>
>> Based on some of the SQL Alchemy documentation, they recommend using the
>> pyodbc-access dialect as a good starting point. (Fairly simple, easy to
>> understand how things play out)
>>
>> https://bitbucket.org/zzzeek/sqlalchemy-access
>>
>> I am looking through the code, but my time to dedicate to this is limited
>> do to an implementation at work that is finally happening.  I will continue
>> to post updates as come across them.
>>
>> As a side, I created a git with a simple dev env for Caravel and pyodbc
>> built in a docker container. This includes the MapR ODBC driver, the unix
>> ODBC, and pyodbc. (And has an example python script showing the connection
>> to drill works).
>>
>> Basically, it's everything except the SQL Alchemy Dialect work, Caravel
>> is working with the test data. (And it's persistent, so when you shut down
>> the container, it actually persists the setup work in caravel, so the next
>> time you start, you can just work again).   It's not production ready, but
>> it allows people to play around with things, and shows off using python
>> with Drill via pyodbc, using caravel, and allows folks who may know sql
>> alchemy a starting off point.
>>
>> https://github.com/JohnOmernik/caraveldrill
>>
>>
>> John
>>
>>
>>
>>
>> On Sun, May 15, 2016 at 12:34 AM, Ted Dunning 
>> wrote:
>>
>>> On Sat, May 14, 2016 at 6:31 PM, John Omernik  wrote:
>>>
>>> > Thoughts on approaching writing a dialect for Drill for SQL alchemy?
>>> > Anyone here done that before?
>>> >
>>>
>>> Nobody has. It is moderately involved, but doesn't look complicated ...
>>> just lengthy.
>>>
>>> Neeraja is coordinating efforts on this.
>>>
>>
>>
>


Re: MapR ODBC Issue

2016-05-16 Thread John Omernik
Perfect thanks! By putting the zk root in the ZKClusterID (instead the ZK
Quorum ) it worked!

ZKQuorum   = node1:5181,node2:5181,node3:5181
ZKClusterID= drillprod

[unixODBC][MapR][Drill] (10) Failure occurred while trying to connect to
zk=node1:5181,node2:5181,node3:5181/drill/drillprod (10)
(SQLDriverConnect)')



ZKQuorum   = node1:5181,node2:5181,node3:5181/drillprod
ZKClusterID= drillprod

[unixODBC][MapR][Drill] (10) Failure occurred while trying to connect to
zk=node1:5181,node2:5181,node3:5181/drillprod/drill/drillprod (10)
(SQLDriverConnect)')



ZKQuorum   = node1:5181,node2:5181,node3:5181
ZKClusterID= /drillprod/drillprod

Works!

On Mon, May 16, 2016 at 10:59 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> For windows the GUI doesn't allow the extended ZK connection string.
>
> So the best option is (as Krystal mentioned)
>
> ZKQuorum= :5181 or 2181
> ZKClusterID =/ /
>
> ex.
>
> ZKQuorum= drill-dev:5181
> ZKClusterID = /drill2/awsdrill-drillbits
>
>
> Works on both Windows and OSX.
>
> --Andries
>
> > On May 16, 2016, at 8:54 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> >
> > Or as Krystal mentioned just specify the whole cluster ID from the ZK
> root
> >
> > ZKQuorum= :5181 or 2181
> > ZKClusterID =/ /
> >
> > ex.
> >
> > ZKQuorum= drill-dev:5181
> > ZKClusterID = /drill2/awsdrill-drillbits
> >
> >
> > Both works when I tested in OSX.
> >
> > --Andries
> >
> >> On May 16, 2016, at 8:50 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> >>
> >> Hi John,
> >>
> >> It seems the ODBC driver is adding in the default /drill path the ZK
> connection string.
> >>
> >> I looked at it and here is a workaround.
> >>
> >> For OSX/Linux in the odbc.ini file the following works
> >>
> >> ZKQuorum= :5181 or 2181/
> >> ZKClusterID = /
> >>
> >> ex.
> >>
> >> ZKQuorum= drill-dev:5181/drill2
> >> ZKClusterID = /awsdrill-drillbits
> >>
> >> This allows you to add the drill root at the end of the ZK quorum and
> then step back up one level on the ClusterID.
> >>
> >> See if this works for you.
> >>
> >> I will check on Windows in a sec.
> >>
> >>
> >> --Andries
> >>
> >>
> >>
> >>
> >>> On May 14, 2016, at 9:15 AM, John Omernik  wrote:
> >>>
> >>> All -
> >>>
> >>> I am using the MapR ODBC driver. I can get it to connect in direct
> mode,
> >>> however, I can't get it to connect in Zookeeper mode.  I think I know
> why.
> >>>
> >>> To start off, I am using a different zk.root in my drill-override. This
> >>> allows me to have truly unique drill clusters on the same physical
> >>> clusters. This works well for most things, however, what I have found,
> is I
> >>> believe the ODBC driver assumes a hard coded zk.root. (drill).
> >>>
> >>> For example, my cluster name is "drillprod" and my zk.root is
> "drillprod"
> >>> So, to connect via JDBC I use URL=
> >>>
> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
> >>>
> >>> This works
> >>>
> >>> However, I set the ZKClusterID  to be drillprod in ODBC and I get
> >>>
> >>> Failure occurred while trying to connect to zk=
> >>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
> >>> hadoopmapr6.brewingintel.com:5181/drill/drillprod
> >>>
> >>>
> >>> If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
> >>> hadoopmapr5.brewingintel.com:5181,
> >>> hadoopmapr6.brewingintel.com:5181/drillprod"
> >>>
> >>>
> >>> Then I get this error:
> >>>
> >>>
> >>> Failure occurred while trying to connect to zk=
> >>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
> >>> hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
> >>>
> >>>
> >>> The Znode: /drill/drillprod doesn't exist, neither does
> >>> /drillprod/drill/drillprod. The correct one, for this use case is
> >>> "/drillprod/drillprod"
> >>>
> >>>
> >>> But I can't seem to get the ODBC driver to check there, as it appears
> to
> >>> hard code a /drill rather than allow me to specify the zk.root as the
> >>> drill-override.conf does.
> >>>
> >>>
> >>> Please advise
> >>>
> >>>
> >>> John
> >>
> >
>
>


Re: CTAS Out of Memory

2016-05-16 Thread Stefan Sedich
Jacques,

I will look into it at some point this week, as it is a side project I
might not get time to do it until later in the week, and as I am not overly
familiar with Java profiling I will need to work out that part too!

Will get back to you when I know more.



--
Stefan

On Sat, May 14, 2016 at 4:49 PM Jacques Nadeau  wrote:

> Any chance you can do a jmap/jhat combo and take a look at what is holding
> the most memory? I'm guessing we're not managing the Postgres JDBC cursor
> or backpressure correctly.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich 
> wrote:
>
> > Just trying to do a CTAS on a postgres table, it is not huge and only has
> > 16 odd million rows, I end up with an out of memory after a while.
> >
> > Unable to handle out of memory condition in FragmentExecutor.
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >
> > Is there a way to avoid this without needing to do the CTAS on a subset
> of
> > my table?
> >
>


Re: MapR ODBC Issue

2016-05-16 Thread Andries Engelbrecht
For windows the GUI doesn't allow the extended ZK connection string.

So the best option is (as Krystal mentioned)

ZKQuorum= :5181 or 2181
ZKClusterID =/ /

ex.

ZKQuorum= drill-dev:5181
ZKClusterID = /drill2/awsdrill-drillbits


Works on both Windows and OSX.

--Andries

> On May 16, 2016, at 8:54 AM, Andries Engelbrecht  
> wrote:
> 
> Or as Krystal mentioned just specify the whole cluster ID from the ZK root
> 
> ZKQuorum= :5181 or 2181
> ZKClusterID =/ /
> 
> ex.
> 
> ZKQuorum= drill-dev:5181
> ZKClusterID = /drill2/awsdrill-drillbits
> 
> 
> Both works when I tested in OSX.
> 
> --Andries
> 
>> On May 16, 2016, at 8:50 AM, Andries Engelbrecht  
>> wrote:
>> 
>> Hi John,
>> 
>> It seems the ODBC driver is adding in the default /drill path the ZK 
>> connection string.
>> 
>> I looked at it and here is a workaround.
>> 
>> For OSX/Linux in the odbc.ini file the following works
>> 
>> ZKQuorum= :5181 or 2181/
>> ZKClusterID = /
>> 
>> ex.
>> 
>> ZKQuorum= drill-dev:5181/drill2
>> ZKClusterID = /awsdrill-drillbits
>> 
>> This allows you to add the drill root at the end of the ZK quorum and then 
>> step back up one level on the ClusterID.
>> 
>> See if this works for you.
>> 
>> I will check on Windows in a sec.
>> 
>> 
>> --Andries
>> 
>> 
>> 
>> 
>>> On May 14, 2016, at 9:15 AM, John Omernik  wrote:
>>> 
>>> All -
>>> 
>>> I am using the MapR ODBC driver. I can get it to connect in direct mode,
>>> however, I can't get it to connect in Zookeeper mode.  I think I know why.
>>> 
>>> To start off, I am using a different zk.root in my drill-override. This
>>> allows me to have truly unique drill clusters on the same physical
>>> clusters. This works well for most things, however, what I have found, is I
>>> believe the ODBC driver assumes a hard coded zk.root. (drill).
>>> 
>>> For example, my cluster name is "drillprod" and my zk.root is "drillprod"
>>> So, to connect via JDBC I use URL=
>>> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
>>> 
>>> This works
>>> 
>>> However, I set the ZKClusterID  to be drillprod in ODBC and I get
>>> 
>>> Failure occurred while trying to connect to zk=
>>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>>> hadoopmapr6.brewingintel.com:5181/drill/drillprod
>>> 
>>> 
>>> If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
>>> hadoopmapr5.brewingintel.com:5181,
>>> hadoopmapr6.brewingintel.com:5181/drillprod"
>>> 
>>> 
>>> Then I get this error:
>>> 
>>> 
>>> Failure occurred while trying to connect to zk=
>>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>>> hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
>>> 
>>> 
>>> The Znode: /drill/drillprod doesn't exist, neither does
>>> /drillprod/drill/drillprod. The correct one, for this use case is
>>> "/drillprod/drillprod"
>>> 
>>> 
>>> But I can't seem to get the ODBC driver to check there, as it appears to
>>> hard code a /drill rather than allow me to specify the zk.root as the
>>> drill-override.conf does.
>>> 
>>> 
>>> Please advise
>>> 
>>> 
>>> John
>> 
> 



Re: MapR ODBC Issue

2016-05-16 Thread Andries Engelbrecht
Or as Krystal mentioned just specify the whole cluster ID from the ZK root

ZKQuorum= :5181 or 2181
ZKClusterID =/ /

ex.

ZKQuorum= drill-dev:5181
ZKClusterID = /drill2/awsdrill-drillbits


Both works when I tested in OSX.

--Andries

> On May 16, 2016, at 8:50 AM, Andries Engelbrecht  
> wrote:
> 
> Hi John,
> 
> It seems the ODBC driver is adding in the default /drill path the ZK 
> connection string.
> 
> I looked at it and here is a workaround.
> 
> For OSX/Linux in the odbc.ini file the following works
> 
> ZKQuorum= :5181 or 2181/
> ZKClusterID = /
> 
> ex.
> 
> ZKQuorum= drill-dev:5181/drill2
> ZKClusterID = /awsdrill-drillbits
> 
> This allows you to add the drill root at the end of the ZK quorum and then 
> step back up one level on the ClusterID.
> 
> See if this works for you.
> 
> I will check on Windows in a sec.
> 
> 
> --Andries
> 
> 
> 
> 
>> On May 14, 2016, at 9:15 AM, John Omernik  wrote:
>> 
>> All -
>> 
>> I am using the MapR ODBC driver. I can get it to connect in direct mode,
>> however, I can't get it to connect in Zookeeper mode.  I think I know why.
>> 
>> To start off, I am using a different zk.root in my drill-override. This
>> allows me to have truly unique drill clusters on the same physical
>> clusters. This works well for most things, however, what I have found, is I
>> believe the ODBC driver assumes a hard coded zk.root. (drill).
>> 
>> For example, my cluster name is "drillprod" and my zk.root is "drillprod"
>> So, to connect via JDBC I use URL=
>> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
>> 
>> This works
>> 
>> However, I set the ZKClusterID  to be drillprod in ODBC and I get
>> 
>> Failure occurred while trying to connect to zk=
>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>> hadoopmapr6.brewingintel.com:5181/drill/drillprod
>> 
>> 
>> If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
>> hadoopmapr5.brewingintel.com:5181,
>> hadoopmapr6.brewingintel.com:5181/drillprod"
>> 
>> 
>> Then I get this error:
>> 
>> 
>> Failure occurred while trying to connect to zk=
>> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
>> hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
>> 
>> 
>> The Znode: /drill/drillprod doesn't exist, neither does
>> /drillprod/drill/drillprod. The correct one, for this use case is
>> "/drillprod/drillprod"
>> 
>> 
>> But I can't seem to get the ODBC driver to check there, as it appears to
>> hard code a /drill rather than allow me to specify the zk.root as the
>> drill-override.conf does.
>> 
>> 
>> Please advise
>> 
>> 
>> John
> 



Re: MapR ODBC Issue

2016-05-16 Thread Andries Engelbrecht
Hi John,

It seems the ODBC driver is adding in the default /drill path the ZK connection 
string.

I looked at it and here is a workaround.

For OSX/Linux in the odbc.ini file the following works

ZKQuorum= :5181 or 2181/
ZKClusterID = /

ex.

ZKQuorum= drill-dev:5181/drill2
ZKClusterID = /awsdrill-drillbits

This allows you to add the drill root at the end of the ZK quorum and then step 
back up one level on the ClusterID.

See if this works for you.

I will check on Windows in a sec.


--Andries




> On May 14, 2016, at 9:15 AM, John Omernik  wrote:
> 
> All -
> 
> I am using the MapR ODBC driver. I can get it to connect in direct mode,
> however, I can't get it to connect in Zookeeper mode.  I think I know why.
> 
> To start off, I am using a different zk.root in my drill-override. This
> allows me to have truly unique drill clusters on the same physical
> clusters. This works well for most things, however, what I have found, is I
> believe the ODBC driver assumes a hard coded zk.root. (drill).
> 
> For example, my cluster name is "drillprod" and my zk.root is "drillprod"
> So, to connect via JDBC I use URL=
> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
> 
> This works
> 
> However, I set the ZKClusterID  to be drillprod in ODBC and I get
> 
> Failure occurred while trying to connect to zk=
> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
> hadoopmapr6.brewingintel.com:5181/drill/drillprod
> 
> 
> If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
> hadoopmapr5.brewingintel.com:5181,
> hadoopmapr6.brewingintel.com:5181/drillprod"
> 
> 
> Then I get this error:
> 
> 
> Failure occurred while trying to connect to zk=
> hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
> hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
> 
> 
> The Znode: /drill/drillprod doesn't exist, neither does
> /drillprod/drill/drillprod. The correct one, for this use case is
> "/drillprod/drillprod"
> 
> 
> But I can't seem to get the ODBC driver to check there, as it appears to
> hard code a /drill rather than allow me to specify the zk.root as the
> drill-override.conf does.
> 
> 
> Please advise
> 
> 
> John



Re: MapR ODBC Issue

2016-05-16 Thread Krystal Nguyen
Hi John,

For the ZKClusterID, can you try setting it like this: /drillprod/drillprod

Thanks,
Krystal

On Mon, May 16, 2016 at 8:14 AM, Parth Chandra 
wrote:

> Hi John
>
>   Can you try passing the ZKClusterID parameter to the driver. See example
> here:
>
> https://drill.apache.org/docs/using-a-connection-string/
>
>DRIVER=MapR Drill ODBC Driver;AdvancedProperties=
> {HandshakeTimeout=0;QueryTimeout=0;
> TimestampTZDisplayTimezone=utc;ExcludedSchemas=sys,
> INFORMATION_SCHEMA;};Catalog=DRILL;Schema=;
> ConnectionType=ZooKeeper;ZKQuorum=192.168.39.43:5181;
> ZKClusterID=drillbits1
> Thanks
>
> Parth
>
> On Sat, May 14, 2016 at 9:15 AM, John Omernik  wrote:
>
> > All -
> >
> > I am using the MapR ODBC driver. I can get it to connect in direct mode,
> > however, I can't get it to connect in Zookeeper mode.  I think I know
> why.
> >
> > To start off, I am using a different zk.root in my drill-override. This
> > allows me to have truly unique drill clusters on the same physical
> > clusters. This works well for most things, however, what I have found,
> is I
> > believe the ODBC driver assumes a hard coded zk.root. (drill).
> >
> > For example, my cluster name is "drillprod" and my zk.root is "drillprod"
> > So, to connect via JDBC I use URL=
> >
> >
> "jdbc:drill:zk:hadoopmapr4:5181,hadoopmapr5:5181,hadoopmapr6:5181/drillprod"
> >
> > This works
> >
> > However, I set the ZKClusterID  to be drillprod in ODBC and I get
> >
> > Failure occurred while trying to connect to zk=
> > hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
> > hadoopmapr6.brewingintel.com:5181/drill/drillprod
> >
> >
> > If set the ZKQuorum to be "hadoopmapr4.brewingintel.com:5181,
> > hadoopmapr5.brewingintel.com:5181,
> > hadoopmapr6.brewingintel.com:5181/drillprod"
> >
> >
> > Then I get this error:
> >
> >
> > Failure occurred while trying to connect to zk=
> > hadoopmapr4.brewingintel.com:5181,hadoopmapr5.brewingintel.com:5181,
> > hadoopmapr6.brewingintel.com:5181/drillprod/drill/drillprod
> >
> >
> > The Znode: /drill/drillprod doesn't exist, neither does
> > /drillprod/drill/drillprod. The correct one, for this use case is
> > "/drillprod/drillprod"
> >
> >
> > But I can't seem to get the ODBC driver to check there, as it appears to
> > hard code a /drill rather than allow me to specify the zk.root as the
> > drill-override.conf does.
> >
> >
> > Please advise
> >
> >
> > John
> >
>