Re: Drill & Caravel

2016-05-13 Thread Ted Dunning
The limit 0 form is highly to be preferred since that will give you
information about columns for a view.

This can be set in the dialect for SQL Alchemy.



On Fri, May 13, 2016 at 11:03 AM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> Great, thanks John. I wil look forward for an update on how drill queuing
> part goes : )
> Btw with regards to metadata queries, Drill already supports metadata (both
> the limit 0 form and also show tables/show schemas which are served from
> the Information_schema).
>
> On Friday, May 13, 2016, John Omernik  wrote:
>
> > So with that Docker file, I got caravel working easily with test data (no
> > drill yet) that will be weekend fun (and the pyodbc is already installed
> in
> > the container, so now it's time to play!)
> >
> > So I started my docker image with:
> >
> > sudo docker run -it --rm --net=host
> > -v=/mapr/brewpot/apps/prod/caravel/working:/app/working:rw
> > -v=/mapr/brewpot/apps/prod/caravel/cache:/app/cache:rw zeta/caravel
> > /bin/bash
> >
> >
> > Now, I passed through a couple of volumes that I am not sure I will
> need, I
> > want to play so that my "State" and initialization is saved in those
> > directories in the running container (this is just early testing) I just
> > run bash, and then run the commands below and it works. I was lazy here
> and
> > just did net host, it would likely work with bridged mode, but I am in an
> > airport and wanted to see if I could get it working... the fun part will
> be
> > working with Drill over the weekend. Thanks again Neeraja for sharing
> this!
> >
> >
> >
> >
> > Then I ran these commands(per the docs)  and could explore... pretty easy
> > actually!
> >
> > # Create an admin userfabmanager create-admin --app caravel
> > # Initialize the databasecaravel db upgrade
> > # Create default roles and permissionscaravel init
> > # Load some data to play withcaravel load_examples
> > # Start the development web servercaravel runserver -d
> >
> >
> >
> >
> > On Fri, May 13, 2016 at 11:27 AM, John Omernik  wrote:
> >
> > > So, without running this, but having it build successfully, this seems
> > > like a good place to start, it has caravel, and pyodbc all installed
> > here.
> > > I will be playing more this weekend
> > >
> > > FROM ubuntu
> > >
> > > RUN apt-get update && apt-get install -y build-essential libssl-dev
> > > libffi-dev python-dev python-pip
> > >
> > > RUN apt-get install -y unixodbc-dev unixodbc-bin
> > >
> > > RUN pip install pyodbc
> > >
> > > RUN pip install caravel
> > >
> > > CMD ["python -v"]
> > >
> > > On Fri, May 13, 2016 at 10:44 AM, John Omernik 
> wrote:
> > >
> > >> A little more googling and I found the pyodbc, that looks promising.
> > >>
> > >> On Fri, May 13, 2016 at 10:41 AM, John Omernik 
> > wrote:
> > >>
> > >>> "SQL Alchemy already understands Drill" I was just looking for that,
> is
> > >>> there already some docs/blogs on that? I was going to start there as
> > well
> > >>> to determine how it worked and then look into the dialect writing and
> > see
> > >>> how big that project was.  I didn't find much on the Drill + Alchemy,
> > but I
> > >>> am in an airport and I blame wifi gremlins.
> > >>>
> > >>>
> > >>>
> > >>> On Fri, May 13, 2016 at 10:25 AM, Ted Dunning  >
> > >>> wrote:
> > >>>
> >  SQLAlchemy generates SQL queries and passes them on to Drill. Since
> >  SQLAlchemy already understands Drill, most of what will be needed is
> >  slight
> >  tuning for SQL dialect and providing a mechanism for SQLAlchemy to
> get
> >  meta-data from views.  Tableau does the meta-data discovery using
> > limit
> >  0
> >  queries to get column names. We would hope that similar methods
> would
> >  work.
> > 
> > 
> >  On Fri, May 13, 2016 at 6:13 AM, Erik Antelman  >
> >  wrote:
> > 
> >  > Isn't this a matter of Drill<->SQLAlchemy. Such a support could
> > likely
> >  > enable other frameworks.
> >  >
> >  > Would one think that adaptation of SQLAlchemy to Drill is specific
> > to
> >  > Caravel? What subset of features from a RDBMS ORM is meaningfull,
> >  feasable
> >  > and usefull to map to Drill. This sounds like a broad general
> >  question. I
> >  > am sure there are orms from other language camps that might want
> > Drill
> >  > backends.
> >  > On May 13, 2016 7:33 AM, "John Omernik"  wrote:
> >  >
> >  > > I will be looking into this as well, thanks for sharing!
> >  > > On May 13, 2016 2:01 AM, "Nirav Shah"  >
> >  wrote:
> >  > >
> >  > > > I Hi Neeraja,
> >  > > >
> >  > > > I am interested in contributing if integration is not
> available.
> >  > > > Kindly let me know
> >  > > >
> >  > > > Regards,
> >  > > > Nirav
> >  > > >
> >  > > > On 

Drill & Caravel

2016-05-13 Thread Neeraja Rentachintala
Great, thanks John. I wil look forward for an update on how drill queuing
part goes : )
Btw with regards to metadata queries, Drill already supports metadata (both
the limit 0 form and also show tables/show schemas which are served from
the Information_schema).

On Friday, May 13, 2016, John Omernik  wrote:

> So with that Docker file, I got caravel working easily with test data (no
> drill yet) that will be weekend fun (and the pyodbc is already installed in
> the container, so now it's time to play!)
>
> So I started my docker image with:
>
> sudo docker run -it --rm --net=host
> -v=/mapr/brewpot/apps/prod/caravel/working:/app/working:rw
> -v=/mapr/brewpot/apps/prod/caravel/cache:/app/cache:rw zeta/caravel
> /bin/bash
>
>
> Now, I passed through a couple of volumes that I am not sure I will need, I
> want to play so that my "State" and initialization is saved in those
> directories in the running container (this is just early testing) I just
> run bash, and then run the commands below and it works. I was lazy here and
> just did net host, it would likely work with bridged mode, but I am in an
> airport and wanted to see if I could get it working... the fun part will be
> working with Drill over the weekend. Thanks again Neeraja for sharing this!
>
>
>
>
> Then I ran these commands(per the docs)  and could explore... pretty easy
> actually!
>
> # Create an admin userfabmanager create-admin --app caravel
> # Initialize the databasecaravel db upgrade
> # Create default roles and permissionscaravel init
> # Load some data to play withcaravel load_examples
> # Start the development web servercaravel runserver -d
>
>
>
>
> On Fri, May 13, 2016 at 11:27 AM, John Omernik  wrote:
>
> > So, without running this, but having it build successfully, this seems
> > like a good place to start, it has caravel, and pyodbc all installed
> here.
> > I will be playing more this weekend
> >
> > FROM ubuntu
> >
> > RUN apt-get update && apt-get install -y build-essential libssl-dev
> > libffi-dev python-dev python-pip
> >
> > RUN apt-get install -y unixodbc-dev unixodbc-bin
> >
> > RUN pip install pyodbc
> >
> > RUN pip install caravel
> >
> > CMD ["python -v"]
> >
> > On Fri, May 13, 2016 at 10:44 AM, John Omernik  wrote:
> >
> >> A little more googling and I found the pyodbc, that looks promising.
> >>
> >> On Fri, May 13, 2016 at 10:41 AM, John Omernik 
> wrote:
> >>
> >>> "SQL Alchemy already understands Drill" I was just looking for that, is
> >>> there already some docs/blogs on that? I was going to start there as
> well
> >>> to determine how it worked and then look into the dialect writing and
> see
> >>> how big that project was.  I didn't find much on the Drill + Alchemy,
> but I
> >>> am in an airport and I blame wifi gremlins.
> >>>
> >>>
> >>>
> >>> On Fri, May 13, 2016 at 10:25 AM, Ted Dunning 
> >>> wrote:
> >>>
>  SQLAlchemy generates SQL queries and passes them on to Drill. Since
>  SQLAlchemy already understands Drill, most of what will be needed is
>  slight
>  tuning for SQL dialect and providing a mechanism for SQLAlchemy to get
>  meta-data from views.  Tableau does the meta-data discovery using
> limit
>  0
>  queries to get column names. We would hope that similar methods would
>  work.
> 
> 
>  On Fri, May 13, 2016 at 6:13 AM, Erik Antelman 
>  wrote:
> 
>  > Isn't this a matter of Drill<->SQLAlchemy. Such a support could
> likely
>  > enable other frameworks.
>  >
>  > Would one think that adaptation of SQLAlchemy to Drill is specific
> to
>  > Caravel? What subset of features from a RDBMS ORM is meaningfull,
>  feasable
>  > and usefull to map to Drill. This sounds like a broad general
>  question. I
>  > am sure there are orms from other language camps that might want
> Drill
>  > backends.
>  > On May 13, 2016 7:33 AM, "John Omernik"  wrote:
>  >
>  > > I will be looking into this as well, thanks for sharing!
>  > > On May 13, 2016 2:01 AM, "Nirav Shah" 
>  wrote:
>  > >
>  > > > I Hi Neeraja,
>  > > >
>  > > > I am interested in contributing if integration is not available.
>  > > > Kindly let me know
>  > > >
>  > > > Regards,
>  > > > Nirav
>  > > >
>  > > > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
>  > > > nrentachint...@maprtech.com> wrote:
>  > > >
>  > > > > Hi Folks
>  > > > >
>  > > > > Caravel is nice visualization tool recently open sourced by
>  airbnb.
>  > Did
>  > > > > anyone try to integrate Drill and/or interested in
> contributing
>  to
>  > > making
>  > > > > this work with Drill.
>  > > > >
>  > > > > https://github.com/airbnb/caravel
>  > > > >
>  > > > >
>  > > > > -Thanks
>  > > > 

Re: CTAS Out of Memory

2016-05-13 Thread Stefan Sedich
Interesting,

Wonder if it is related to the varchar issue Zelaine mentioned above, even
with the specific columns specified the query plan shows a SELECT * being
pushed to postgres, does the select not send down the specific columns?

I will create another table with only the columns I want and try again to
see if it is in fact due to the varchar columns.


Thanks

On Fri, May 13, 2016 at 10:54 AM Stefan Sedich 
wrote:

> Jason.
>
> Ran the following:
>
> alter session set `store.format`='csv';
> create table dfs.tmp.foo as select * from my_large_table;
>
>
> Same end result, chews memory until it heaps my heap size and eventually
> hits the OOM, this table has a number of varchar columns but I did only
> select a couple columns in my select, so was hoping it would avoid the
> issue mentioned above with varchar columns, I will create some other test
> tables later with only the values I need and see how that works out.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 10:38 AM Jason Altekruse  wrote:
>
>> I am curious if this is a bug in the JDBC plugin. Can you try to change
>> the
>> output format to CSV? In that case we don't do any large buffering.
>>
>> Jason Altekruse
>> Software Engineer at Dremio
>> Apache Drill Committer
>>
>> On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich 
>> wrote:
>>
>> > Seems like it just ran out of memory again and was not hanging. I tried
>> to
>> > append a limit 100 to the select query and it still runs out of memory,
>> > Just ran the CTAS against some other smaller tables and it works fine.
>> >
>> > I will play around with this some more on the weekend, I can only
>> assume I
>> > am messing something up here, I have in the past created parquet files
>> from
>> > large tables without any issue, will report back.
>> >
>> >
>> >
>> > Thanks
>> >
>> > On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <
>> > adene...@maprtech.com>
>> > wrote:
>> >
>> > > Stefan,
>> > >
>> > > Can you share the query profile for the query that seems to be running
>> > > forever ? you won't find it on disk but you can append .json to the
>> > profile
>> > > web url and save the file.
>> > >
>> > > Thanks
>> > >
>> > > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <
>> stefan.sed...@gmail.com>
>> > > wrote:
>> > >
>> > > > Zelaine,
>> > > >
>> > > > It does, I forgot about those ones, I will do a test where I filter
>> > those
>> > > > out and see how I go, in my test with a 12GB heap size it seemed to
>> > just
>> > > > sit there forever and not finish.
>> > > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong 
>> > wrote:
>> > > >
>> > > > > Stefan,
>> > > > >
>> > > > > Does your source data contain varchar columns?  We've seen
>> instances
>> > > > where
>> > > > > Drill isn't as efficient as it can be when Parquet is dealing with
>> > > > variable
>> > > > > length columns.
>> > > > >
>> > > > > -- Zelaine
>> > > > >
>> > > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
>> > > stefan.sed...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks for getting back to me so fast!
>> > > > > >
>> > > > > > I was just playing with that now, went up to 8GB and still ran
>> into
>> > > it,
>> > > > > > trying to go higher to see if I can find the sweet spot, only
>> got
>> > > 16GB
>> > > > > > total RAM on this laptop :)
>> > > > > >
>> > > > > > Is this an expected amount of memory for not an overly huge
>> table
>> > (16
>> > > > > > million rows, 6 columns of integers), even now at a 12GB heap
>> seems
>> > > to
>> > > > > have
>> > > > > > filled up again.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <
>> ja...@dremio.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > I could not find anywhere this is mentioned in the docs, but
>> it
>> > has
>> > > > > come
>> > > > > > up
>> > > > > > > a few times one the list. While we made a number of efforts to
>> > move
>> > > > our
>> > > > > > > interactions with the Parquet library to the off-heap memory
>> > (which
>> > > > we
>> > > > > > use
>> > > > > > > everywhere else in the engine during processing) the version
>> of
>> > the
>> > > > > > writer
>> > > > > > > we are using still buffers a non-trivial amount of data into
>> heap
>> > > > > memory
>> > > > > > > when writing parquet files. Try raising your JVM heap memory
>> in
>> > > > > > > drill-env.sh on startup and see if that prevents the out of
>> > memory
>> > > > > issue.
>> > > > > > >
>> > > > > > > Jason Altekruse
>> > > > > > > Software Engineer at Dremio
>> > > > > > > Apache Drill Committer
>> > > > > > >
>> > > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
>> > > > > stefan.sed...@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Just trying to do a CTAS on a postgres table, it is not huge
>> > and
>> > > > only
>> > > > > > has

Re: CTAS Out of Memory

2016-05-13 Thread Jason Altekruse
I am curious if this is a bug in the JDBC plugin. Can you try to change the
output format to CSV? In that case we don't do any large buffering.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich 
wrote:

> Seems like it just ran out of memory again and was not hanging. I tried to
> append a limit 100 to the select query and it still runs out of memory,
> Just ran the CTAS against some other smaller tables and it works fine.
>
> I will play around with this some more on the weekend, I can only assume I
> am messing something up here, I have in the past created parquet files from
> large tables without any issue, will report back.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <
> adene...@maprtech.com>
> wrote:
>
> > Stefan,
> >
> > Can you share the query profile for the query that seems to be running
> > forever ? you won't find it on disk but you can append .json to the
> profile
> > web url and save the file.
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich 
> > wrote:
> >
> > > Zelaine,
> > >
> > > It does, I forgot about those ones, I will do a test where I filter
> those
> > > out and see how I go, in my test with a 12GB heap size it seemed to
> just
> > > sit there forever and not finish.
> > >
> > >
> > > Thanks
> > >
> > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong 
> wrote:
> > >
> > > > Stefan,
> > > >
> > > > Does your source data contain varchar columns?  We've seen instances
> > > where
> > > > Drill isn't as efficient as it can be when Parquet is dealing with
> > > variable
> > > > length columns.
> > > >
> > > > -- Zelaine
> > > >
> > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
> > stefan.sed...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for getting back to me so fast!
> > > > >
> > > > > I was just playing with that now, went up to 8GB and still ran into
> > it,
> > > > > trying to go higher to see if I can find the sweet spot, only got
> > 16GB
> > > > > total RAM on this laptop :)
> > > > >
> > > > > Is this an expected amount of memory for not an overly huge table
> (16
> > > > > million rows, 6 columns of integers), even now at a 12GB heap seems
> > to
> > > > have
> > > > > filled up again.
> > > > >
> > > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse 
> > > > wrote:
> > > > >
> > > > > > I could not find anywhere this is mentioned in the docs, but it
> has
> > > > come
> > > > > up
> > > > > > a few times one the list. While we made a number of efforts to
> move
> > > our
> > > > > > interactions with the Parquet library to the off-heap memory
> (which
> > > we
> > > > > use
> > > > > > everywhere else in the engine during processing) the version of
> the
> > > > > writer
> > > > > > we are using still buffers a non-trivial amount of data into heap
> > > > memory
> > > > > > when writing parquet files. Try raising your JVM heap memory in
> > > > > > drill-env.sh on startup and see if that prevents the out of
> memory
> > > > issue.
> > > > > >
> > > > > > Jason Altekruse
> > > > > > Software Engineer at Dremio
> > > > > > Apache Drill Committer
> > > > > >
> > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > > > stefan.sed...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Just trying to do a CTAS on a postgres table, it is not huge
> and
> > > only
> > > > > has
> > > > > > > 16 odd million rows, I end up with an out of memory after a
> > while.
> > > > > > >
> > > > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > > > >
> > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > > > >
> > > > > > >
> > > > > > > Is there a way to avoid this without needing to do the CTAS on
> a
> > > > subset
> > > > > > of
> > > > > > > my table?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > >
> >
>


Re: CTAS Out of Memory

2016-05-13 Thread Stefan Sedich
Seems like it just ran out of memory again and was not hanging. I tried to
append a limit 100 to the select query and it still runs out of memory,
Just ran the CTAS against some other smaller tables and it works fine.

I will play around with this some more on the weekend, I can only assume I
am messing something up here, I have in the past created parquet files from
large tables without any issue, will report back.



Thanks

On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche 
wrote:

> Stefan,
>
> Can you share the query profile for the query that seems to be running
> forever ? you won't find it on disk but you can append .json to the profile
> web url and save the file.
>
> Thanks
>
> On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich 
> wrote:
>
> > Zelaine,
> >
> > It does, I forgot about those ones, I will do a test where I filter those
> > out and see how I go, in my test with a 12GB heap size it seemed to just
> > sit there forever and not finish.
> >
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong  wrote:
> >
> > > Stefan,
> > >
> > > Does your source data contain varchar columns?  We've seen instances
> > where
> > > Drill isn't as efficient as it can be when Parquet is dealing with
> > variable
> > > length columns.
> > >
> > > -- Zelaine
> > >
> > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
> stefan.sed...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for getting back to me so fast!
> > > >
> > > > I was just playing with that now, went up to 8GB and still ran into
> it,
> > > > trying to go higher to see if I can find the sweet spot, only got
> 16GB
> > > > total RAM on this laptop :)
> > > >
> > > > Is this an expected amount of memory for not an overly huge table (16
> > > > million rows, 6 columns of integers), even now at a 12GB heap seems
> to
> > > have
> > > > filled up again.
> > > >
> > > >
> > > >
> > > > Thanks
> > > >
> > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse 
> > > wrote:
> > > >
> > > > > I could not find anywhere this is mentioned in the docs, but it has
> > > come
> > > > up
> > > > > a few times one the list. While we made a number of efforts to move
> > our
> > > > > interactions with the Parquet library to the off-heap memory (which
> > we
> > > > use
> > > > > everywhere else in the engine during processing) the version of the
> > > > writer
> > > > > we are using still buffers a non-trivial amount of data into heap
> > > memory
> > > > > when writing parquet files. Try raising your JVM heap memory in
> > > > > drill-env.sh on startup and see if that prevents the out of memory
> > > issue.
> > > > >
> > > > > Jason Altekruse
> > > > > Software Engineer at Dremio
> > > > > Apache Drill Committer
> > > > >
> > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > > stefan.sed...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Just trying to do a CTAS on a postgres table, it is not huge and
> > only
> > > > has
> > > > > > 16 odd million rows, I end up with an out of memory after a
> while.
> > > > > >
> > > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > > >
> > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > > >
> > > > > >
> > > > > > Is there a way to avoid this without needing to do the CTAS on a
> > > subset
> > > > > of
> > > > > > my table?
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>


Re: workspaces

2016-05-13 Thread Andries Engelbrecht
Alternatively you can just create the storage plugin json file, delete the old 
one and post the new one using the REST API.

See
https://drill.apache.org/docs/rest-api/#storage 




> On May 13, 2016, at 10:05 AM, Vince Gonzalez  wrote:
> 
> I have used a pipeline involving jq and curl to modify storage plugins via
> the rest interface.  The one below adds a workspace to the dfs plugin:
> 
> curl -s localhost:8047/storage.json | jq '.[] | select(.name == "dfs")
> | .config.workspaces |= . + { "nypdmvc": { "location":
> "/Users/vince/data/nyc/nypdmvc", "writable": true,
> "defaultInputFormat": null}  }' | curl -s -X POST -H "Content-Type:
> application/json" -d @- http://localhost:8047/storage/dfs.json 
> 
> 
> Note that this won't work as is if you have authentication enabled.
> 
> On Friday, May 13, 2016, Odin Guillermo Caudillo Gallegos <
> odin.guille...@gmail.com > wrote:
> 
>> I have the restriction to not configure it via web console, so is there a
>> way to configure them on the terminal?
>> Cause in embed mode, i only create the files on the /tmp/ directory via
>> terminal, also on the drill-override.conf file i use another path for the
>> plugins (with sys.store.provider.local.path)
>> 
>> Thanks.
>> 
>> 2016-05-13 11:33 GMT-05:00 Andries Engelbrecht > 
>> >:
>> 
>>> You should start drill in distributed mode first and then configure the
>>> storage plugins.
>>> If you configure the storage plugins in embedded mode the information is
>>> stored in the tmp space instead of registered with ZK for the cluster to
>>> use.
>>> 
>>> --Andries
>>> 
 On May 13, 2016, at 9:08 AM, Odin Guillermo Caudillo Gallegos <
>>> odin.guille...@gmail.com > wrote:
 
 The plugins are working fine in the embbed mode, but when i start the
 drillbit on each server and connect via drill-conf i don't see them.
 Do i need to configure another parameter apart from the zookeeper
>> servers
 in the drill-override.conf file?
 
 2016-05-13 11:01 GMT-05:00 Andries Engelbrecht <
>>> aengelbre...@maprtech.com >:
 
> If Drill was correctly installed in distributed mode the storage
>> plugin
> and workspaces will be used by the Drill cluster.
> 
> Make sure the plugin and workspace was correctly configured and
>>> accepted.
> 
> Are you using the WebUI or REST to configure the storage plugins?
> 
> --Andries
> 
>> On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
> odin.guille...@gmail.com > wrote:
>> 
>> Is there a way to configure workspaces on a distributed installation?
>> Cause i only see the default plugin configuration but not the one
>> that
>>> i
>> created.
>> 
>> Thanks
> 
> 
>>> 
>>> 
>> 
> 
> 
> --



Re: workspaces

2016-05-13 Thread Odin Guillermo Caudillo Gallegos
Ok i'll give it a try, thanks!

2016-05-13 12:05 GMT-05:00 Vince Gonzalez :

> I have used a pipeline involving jq and curl to modify storage plugins via
> the rest interface.  The one below adds a workspace to the dfs plugin:
>
> curl -s localhost:8047/storage.json | jq '.[] | select(.name == "dfs")
> | .config.workspaces |= . + { "nypdmvc": { "location":
> "/Users/vince/data/nyc/nypdmvc", "writable": true,
> "defaultInputFormat": null}  }' | curl -s -X POST -H "Content-Type:
> application/json" -d @- http://localhost:8047/storage/dfs.json
>
> Note that this won't work as is if you have authentication enabled.
>
> On Friday, May 13, 2016, Odin Guillermo Caudillo Gallegos <
> odin.guille...@gmail.com> wrote:
>
> > I have the restriction to not configure it via web console, so is there a
> > way to configure them on the terminal?
> > Cause in embed mode, i only create the files on the /tmp/ directory via
> > terminal, also on the drill-override.conf file i use another path for the
> > plugins (with sys.store.provider.local.path)
> >
> > Thanks.
> >
> > 2016-05-13 11:33 GMT-05:00 Andries Engelbrecht <
> aengelbre...@maprtech.com
> > >:
> >
> > > You should start drill in distributed mode first and then configure the
> > > storage plugins.
> > > If you configure the storage plugins in embedded mode the information
> is
> > > stored in the tmp space instead of registered with ZK for the cluster
> to
> > > use.
> > >
> > > --Andries
> > >
> > > > On May 13, 2016, at 9:08 AM, Odin Guillermo Caudillo Gallegos <
> > > odin.guille...@gmail.com > wrote:
> > > >
> > > > The plugins are working fine in the embbed mode, but when i start the
> > > > drillbit on each server and connect via drill-conf i don't see them.
> > > > Do i need to configure another parameter apart from the zookeeper
> > servers
> > > > in the drill-override.conf file?
> > > >
> > > > 2016-05-13 11:01 GMT-05:00 Andries Engelbrecht <
> > > aengelbre...@maprtech.com >:
> > > >
> > > >> If Drill was correctly installed in distributed mode the storage
> > plugin
> > > >> and workspaces will be used by the Drill cluster.
> > > >>
> > > >> Make sure the plugin and workspace was correctly configured and
> > > accepted.
> > > >>
> > > >> Are you using the WebUI or REST to configure the storage plugins?
> > > >>
> > > >> --Andries
> > > >>
> > > >>> On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
> > > >> odin.guille...@gmail.com > wrote:
> > > >>>
> > > >>> Is there a way to configure workspaces on a distributed
> installation?
> > > >>> Cause i only see the default plugin configuration but not the one
> > that
> > > i
> > > >>> created.
> > > >>>
> > > >>> Thanks
> > > >>
> > > >>
> > >
> > >
> >
>
>
> --
>


Re: workspaces

2016-05-13 Thread Vince Gonzalez
I have used a pipeline involving jq and curl to modify storage plugins via
the rest interface.  The one below adds a workspace to the dfs plugin:

curl -s localhost:8047/storage.json | jq '.[] | select(.name == "dfs")
| .config.workspaces |= . + { "nypdmvc": { "location":
"/Users/vince/data/nyc/nypdmvc", "writable": true,
"defaultInputFormat": null}  }' | curl -s -X POST -H "Content-Type:
application/json" -d @- http://localhost:8047/storage/dfs.json

Note that this won't work as is if you have authentication enabled.

On Friday, May 13, 2016, Odin Guillermo Caudillo Gallegos <
odin.guille...@gmail.com> wrote:

> I have the restriction to not configure it via web console, so is there a
> way to configure them on the terminal?
> Cause in embed mode, i only create the files on the /tmp/ directory via
> terminal, also on the drill-override.conf file i use another path for the
> plugins (with sys.store.provider.local.path)
>
> Thanks.
>
> 2016-05-13 11:33 GMT-05:00 Andries Engelbrecht  >:
>
> > You should start drill in distributed mode first and then configure the
> > storage plugins.
> > If you configure the storage plugins in embedded mode the information is
> > stored in the tmp space instead of registered with ZK for the cluster to
> > use.
> >
> > --Andries
> >
> > > On May 13, 2016, at 9:08 AM, Odin Guillermo Caudillo Gallegos <
> > odin.guille...@gmail.com > wrote:
> > >
> > > The plugins are working fine in the embbed mode, but when i start the
> > > drillbit on each server and connect via drill-conf i don't see them.
> > > Do i need to configure another parameter apart from the zookeeper
> servers
> > > in the drill-override.conf file?
> > >
> > > 2016-05-13 11:01 GMT-05:00 Andries Engelbrecht <
> > aengelbre...@maprtech.com >:
> > >
> > >> If Drill was correctly installed in distributed mode the storage
> plugin
> > >> and workspaces will be used by the Drill cluster.
> > >>
> > >> Make sure the plugin and workspace was correctly configured and
> > accepted.
> > >>
> > >> Are you using the WebUI or REST to configure the storage plugins?
> > >>
> > >> --Andries
> > >>
> > >>> On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
> > >> odin.guille...@gmail.com > wrote:
> > >>>
> > >>> Is there a way to configure workspaces on a distributed installation?
> > >>> Cause i only see the default plugin configuration but not the one
> that
> > i
> > >>> created.
> > >>>
> > >>> Thanks
> > >>
> > >>
> >
> >
>


--


Re: CTAS Out of Memory

2016-05-13 Thread Abdel Hakim Deneche
Stefan,

Can you share the query profile for the query that seems to be running
forever ? you won't find it on disk but you can append .json to the profile
web url and save the file.

Thanks

On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich 
wrote:

> Zelaine,
>
> It does, I forgot about those ones, I will do a test where I filter those
> out and see how I go, in my test with a 12GB heap size it seemed to just
> sit there forever and not finish.
>
>
> Thanks
>
> On Fri, May 13, 2016 at 9:50 AM Zelaine Fong  wrote:
>
> > Stefan,
> >
> > Does your source data contain varchar columns?  We've seen instances
> where
> > Drill isn't as efficient as it can be when Parquet is dealing with
> variable
> > length columns.
> >
> > -- Zelaine
> >
> > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich 
> > wrote:
> >
> > > Thanks for getting back to me so fast!
> > >
> > > I was just playing with that now, went up to 8GB and still ran into it,
> > > trying to go higher to see if I can find the sweet spot, only got 16GB
> > > total RAM on this laptop :)
> > >
> > > Is this an expected amount of memory for not an overly huge table (16
> > > million rows, 6 columns of integers), even now at a 12GB heap seems to
> > have
> > > filled up again.
> > >
> > >
> > >
> > > Thanks
> > >
> > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse 
> > wrote:
> > >
> > > > I could not find anywhere this is mentioned in the docs, but it has
> > come
> > > up
> > > > a few times one the list. While we made a number of efforts to move
> our
> > > > interactions with the Parquet library to the off-heap memory (which
> we
> > > use
> > > > everywhere else in the engine during processing) the version of the
> > > writer
> > > > we are using still buffers a non-trivial amount of data into heap
> > memory
> > > > when writing parquet files. Try raising your JVM heap memory in
> > > > drill-env.sh on startup and see if that prevents the out of memory
> > issue.
> > > >
> > > > Jason Altekruse
> > > > Software Engineer at Dremio
> > > > Apache Drill Committer
> > > >
> > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > stefan.sed...@gmail.com>
> > > > wrote:
> > > >
> > > > > Just trying to do a CTAS on a postgres table, it is not huge and
> only
> > > has
> > > > > 16 odd million rows, I end up with an out of memory after a while.
> > > > >
> > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > >
> > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > >
> > > > >
> > > > > Is there a way to avoid this without needing to do the CTAS on a
> > subset
> > > > of
> > > > > my table?
> > > > >
> > > >
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: CTAS Out of Memory

2016-05-13 Thread Stefan Sedich
Zelaine,

It does, I forgot about those ones, I will do a test where I filter those
out and see how I go, in my test with a 12GB heap size it seemed to just
sit there forever and not finish.


Thanks

On Fri, May 13, 2016 at 9:50 AM Zelaine Fong  wrote:

> Stefan,
>
> Does your source data contain varchar columns?  We've seen instances where
> Drill isn't as efficient as it can be when Parquet is dealing with variable
> length columns.
>
> -- Zelaine
>
> On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich 
> wrote:
>
> > Thanks for getting back to me so fast!
> >
> > I was just playing with that now, went up to 8GB and still ran into it,
> > trying to go higher to see if I can find the sweet spot, only got 16GB
> > total RAM on this laptop :)
> >
> > Is this an expected amount of memory for not an overly huge table (16
> > million rows, 6 columns of integers), even now at a 12GB heap seems to
> have
> > filled up again.
> >
> >
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse 
> wrote:
> >
> > > I could not find anywhere this is mentioned in the docs, but it has
> come
> > up
> > > a few times one the list. While we made a number of efforts to move our
> > > interactions with the Parquet library to the off-heap memory (which we
> > use
> > > everywhere else in the engine during processing) the version of the
> > writer
> > > we are using still buffers a non-trivial amount of data into heap
> memory
> > > when writing parquet files. Try raising your JVM heap memory in
> > > drill-env.sh on startup and see if that prevents the out of memory
> issue.
> > >
> > > Jason Altekruse
> > > Software Engineer at Dremio
> > > Apache Drill Committer
> > >
> > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> stefan.sed...@gmail.com>
> > > wrote:
> > >
> > > > Just trying to do a CTAS on a postgres table, it is not huge and only
> > has
> > > > 16 odd million rows, I end up with an out of memory after a while.
> > > >
> > > > Unable to handle out of memory condition in FragmentExecutor.
> > > >
> > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > >
> > > >
> > > > Is there a way to avoid this without needing to do the CTAS on a
> subset
> > > of
> > > > my table?
> > > >
> > >
> >
>


Re: Drill & Caravel

2016-05-13 Thread John Omernik
So with that Docker file, I got caravel working easily with test data (no
drill yet) that will be weekend fun (and the pyodbc is already installed in
the container, so now it's time to play!)

So I started my docker image with:

sudo docker run -it --rm --net=host
-v=/mapr/brewpot/apps/prod/caravel/working:/app/working:rw
-v=/mapr/brewpot/apps/prod/caravel/cache:/app/cache:rw zeta/caravel
/bin/bash


Now, I passed through a couple of volumes that I am not sure I will need, I
want to play so that my "State" and initialization is saved in those
directories in the running container (this is just early testing) I just
run bash, and then run the commands below and it works. I was lazy here and
just did net host, it would likely work with bridged mode, but I am in an
airport and wanted to see if I could get it working... the fun part will be
working with Drill over the weekend. Thanks again Neeraja for sharing this!




Then I ran these commands(per the docs)  and could explore... pretty easy
actually!

# Create an admin userfabmanager create-admin --app caravel
# Initialize the databasecaravel db upgrade
# Create default roles and permissionscaravel init
# Load some data to play withcaravel load_examples
# Start the development web servercaravel runserver -d




On Fri, May 13, 2016 at 11:27 AM, John Omernik  wrote:

> So, without running this, but having it build successfully, this seems
> like a good place to start, it has caravel, and pyodbc all installed here.
> I will be playing more this weekend
>
> FROM ubuntu
>
> RUN apt-get update && apt-get install -y build-essential libssl-dev
> libffi-dev python-dev python-pip
>
> RUN apt-get install -y unixodbc-dev unixodbc-bin
>
> RUN pip install pyodbc
>
> RUN pip install caravel
>
> CMD ["python -v"]
>
> On Fri, May 13, 2016 at 10:44 AM, John Omernik  wrote:
>
>> A little more googling and I found the pyodbc, that looks promising.
>>
>> On Fri, May 13, 2016 at 10:41 AM, John Omernik  wrote:
>>
>>> "SQL Alchemy already understands Drill" I was just looking for that, is
>>> there already some docs/blogs on that? I was going to start there as well
>>> to determine how it worked and then look into the dialect writing and see
>>> how big that project was.  I didn't find much on the Drill + Alchemy, but I
>>> am in an airport and I blame wifi gremlins.
>>>
>>>
>>>
>>> On Fri, May 13, 2016 at 10:25 AM, Ted Dunning 
>>> wrote:
>>>
 SQLAlchemy generates SQL queries and passes them on to Drill. Since
 SQLAlchemy already understands Drill, most of what will be needed is
 slight
 tuning for SQL dialect and providing a mechanism for SQLAlchemy to get
 meta-data from views.  Tableau does the meta-data discovery using limit
 0
 queries to get column names. We would hope that similar methods would
 work.


 On Fri, May 13, 2016 at 6:13 AM, Erik Antelman 
 wrote:

 > Isn't this a matter of Drill<->SQLAlchemy. Such a support could likely
 > enable other frameworks.
 >
 > Would one think that adaptation of SQLAlchemy to Drill is specific to
 > Caravel? What subset of features from a RDBMS ORM is meaningfull,
 feasable
 > and usefull to map to Drill. This sounds like a broad general
 question. I
 > am sure there are orms from other language camps that might want Drill
 > backends.
 > On May 13, 2016 7:33 AM, "John Omernik"  wrote:
 >
 > > I will be looking into this as well, thanks for sharing!
 > > On May 13, 2016 2:01 AM, "Nirav Shah" 
 wrote:
 > >
 > > > I Hi Neeraja,
 > > >
 > > > I am interested in contributing if integration is not available.
 > > > Kindly let me know
 > > >
 > > > Regards,
 > > > Nirav
 > > >
 > > > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
 > > > nrentachint...@maprtech.com> wrote:
 > > >
 > > > > Hi Folks
 > > > >
 > > > > Caravel is nice visualization tool recently open sourced by
 airbnb.
 > Did
 > > > > anyone try to integrate Drill and/or interested in contributing
 to
 > > making
 > > > > this work with Drill.
 > > > >
 > > > > https://github.com/airbnb/caravel
 > > > >
 > > > >
 > > > > -Thanks
 > > > > Neeraja
 > > > >
 > > >
 > >
 >

>>>
>>>
>>
>


Re: workspaces

2016-05-13 Thread Odin Guillermo Caudillo Gallegos
I have the restriction to not configure it via web console, so is there a
way to configure them on the terminal?
Cause in embed mode, i only create the files on the /tmp/ directory via
terminal, also on the drill-override.conf file i use another path for the
plugins (with sys.store.provider.local.path)

Thanks.

2016-05-13 11:33 GMT-05:00 Andries Engelbrecht :

> You should start drill in distributed mode first and then configure the
> storage plugins.
> If you configure the storage plugins in embedded mode the information is
> stored in the tmp space instead of registered with ZK for the cluster to
> use.
>
> --Andries
>
> > On May 13, 2016, at 9:08 AM, Odin Guillermo Caudillo Gallegos <
> odin.guille...@gmail.com> wrote:
> >
> > The plugins are working fine in the embbed mode, but when i start the
> > drillbit on each server and connect via drill-conf i don't see them.
> > Do i need to configure another parameter apart from the zookeeper servers
> > in the drill-override.conf file?
> >
> > 2016-05-13 11:01 GMT-05:00 Andries Engelbrecht <
> aengelbre...@maprtech.com>:
> >
> >> If Drill was correctly installed in distributed mode the storage plugin
> >> and workspaces will be used by the Drill cluster.
> >>
> >> Make sure the plugin and workspace was correctly configured and
> accepted.
> >>
> >> Are you using the WebUI or REST to configure the storage plugins?
> >>
> >> --Andries
> >>
> >>> On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
> >> odin.guille...@gmail.com> wrote:
> >>>
> >>> Is there a way to configure workspaces on a distributed installation?
> >>> Cause i only see the default plugin configuration but not the one that
> i
> >>> created.
> >>>
> >>> Thanks
> >>
> >>
>
>


Re: CTAS Out of Memory

2016-05-13 Thread Zelaine Fong
Stefan,

Does your source data contain varchar columns?  We've seen instances where
Drill isn't as efficient as it can be when Parquet is dealing with variable
length columns.

-- Zelaine

On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich 
wrote:

> Thanks for getting back to me so fast!
>
> I was just playing with that now, went up to 8GB and still ran into it,
> trying to go higher to see if I can find the sweet spot, only got 16GB
> total RAM on this laptop :)
>
> Is this an expected amount of memory for not an overly huge table (16
> million rows, 6 columns of integers), even now at a 12GB heap seems to have
> filled up again.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 9:20 AM Jason Altekruse  wrote:
>
> > I could not find anywhere this is mentioned in the docs, but it has come
> up
> > a few times one the list. While we made a number of efforts to move our
> > interactions with the Parquet library to the off-heap memory (which we
> use
> > everywhere else in the engine during processing) the version of the
> writer
> > we are using still buffers a non-trivial amount of data into heap memory
> > when writing parquet files. Try raising your JVM heap memory in
> > drill-env.sh on startup and see if that prevents the out of memory issue.
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich 
> > wrote:
> >
> > > Just trying to do a CTAS on a postgres table, it is not huge and only
> has
> > > 16 odd million rows, I end up with an out of memory after a while.
> > >
> > > Unable to handle out of memory condition in FragmentExecutor.
> > >
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >
> > >
> > > Is there a way to avoid this without needing to do the CTAS on a subset
> > of
> > > my table?
> > >
> >
>


Re: Queries and Timeout

2016-05-13 Thread Abdel Hakim Deneche
Long running queries shouldn't timeout. This is most likely a bug.

Is it reproducible ? Can you give more details about the query ?

Thanks

On Mon, May 9, 2016 at 12:30 PM, Subbu Srinivasan 
wrote:

> What is the best way to implement queries that are long running? If queries
> take a long
> time I get this error.
>
> I understand that setting query timeouts are not yet supported in the JDBC
> interface.
> I get this error even if I run the query from the drill console (and  !set
> timeout -1)
>
> Error: SYSTEM ERROR: ConnectTimeoutException: connection timed out:
>
>
>
> --
> Pardon me for typos or  if I do not start with a hi or address you by name.
> Want to make sure
> my carpel tunnel syndrome does not get worse.
>
> Subbu
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: CTAS Out of Memory

2016-05-13 Thread Stefan Sedich
Thanks for getting back to me so fast!

I was just playing with that now, went up to 8GB and still ran into it,
trying to go higher to see if I can find the sweet spot, only got 16GB
total RAM on this laptop :)

Is this an expected amount of memory for not an overly huge table (16
million rows, 6 columns of integers), even now at a 12GB heap seems to have
filled up again.



Thanks

On Fri, May 13, 2016 at 9:20 AM Jason Altekruse  wrote:

> I could not find anywhere this is mentioned in the docs, but it has come up
> a few times one the list. While we made a number of efforts to move our
> interactions with the Parquet library to the off-heap memory (which we use
> everywhere else in the engine during processing) the version of the writer
> we are using still buffers a non-trivial amount of data into heap memory
> when writing parquet files. Try raising your JVM heap memory in
> drill-env.sh on startup and see if that prevents the out of memory issue.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich 
> wrote:
>
> > Just trying to do a CTAS on a postgres table, it is not huge and only has
> > 16 odd million rows, I end up with an out of memory after a while.
> >
> > Unable to handle out of memory condition in FragmentExecutor.
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >
> > Is there a way to avoid this without needing to do the CTAS on a subset
> of
> > my table?
> >
>


Re: workspaces

2016-05-13 Thread Abdel Hakim Deneche
I believe Drill stores storage plugins in different places when running in
embedded mode vs distributed mode. Embedded mode uses local disk and
distributed mode uses Zookeeper.

On Fri, May 13, 2016 at 9:08 AM, Odin Guillermo Caudillo Gallegos <
odin.guille...@gmail.com> wrote:

> The plugins are working fine in the embbed mode, but when i start the
> drillbit on each server and connect via drill-conf i don't see them.
> Do i need to configure another parameter apart from the zookeeper servers
> in the drill-override.conf file?
>
> 2016-05-13 11:01 GMT-05:00 Andries Engelbrecht  >:
>
> > If Drill was correctly installed in distributed mode the storage plugin
> > and workspaces will be used by the Drill cluster.
> >
> > Make sure the plugin and workspace was correctly configured and accepted.
> >
> > Are you using the WebUI or REST to configure the storage plugins?
> >
> > --Andries
> >
> > > On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
> > odin.guille...@gmail.com> wrote:
> > >
> > > Is there a way to configure workspaces on a distributed installation?
> > > Cause i only see the default plugin configuration but not the one that
> i
> > > created.
> > >
> > > Thanks
> >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Drill & Caravel

2016-05-13 Thread John Omernik
So, without running this, but having it build successfully, this seems like
a good place to start, it has caravel, and pyodbc all installed here. I
will be playing more this weekend

FROM ubuntu

RUN apt-get update && apt-get install -y build-essential libssl-dev
libffi-dev python-dev python-pip

RUN apt-get install -y unixodbc-dev unixodbc-bin

RUN pip install pyodbc

RUN pip install caravel

CMD ["python -v"]

On Fri, May 13, 2016 at 10:44 AM, John Omernik  wrote:

> A little more googling and I found the pyodbc, that looks promising.
>
> On Fri, May 13, 2016 at 10:41 AM, John Omernik  wrote:
>
>> "SQL Alchemy already understands Drill" I was just looking for that, is
>> there already some docs/blogs on that? I was going to start there as well
>> to determine how it worked and then look into the dialect writing and see
>> how big that project was.  I didn't find much on the Drill + Alchemy, but I
>> am in an airport and I blame wifi gremlins.
>>
>>
>>
>> On Fri, May 13, 2016 at 10:25 AM, Ted Dunning 
>> wrote:
>>
>>> SQLAlchemy generates SQL queries and passes them on to Drill. Since
>>> SQLAlchemy already understands Drill, most of what will be needed is
>>> slight
>>> tuning for SQL dialect and providing a mechanism for SQLAlchemy to get
>>> meta-data from views.  Tableau does the meta-data discovery using limit 0
>>> queries to get column names. We would hope that similar methods would
>>> work.
>>>
>>>
>>> On Fri, May 13, 2016 at 6:13 AM, Erik Antelman 
>>> wrote:
>>>
>>> > Isn't this a matter of Drill<->SQLAlchemy. Such a support could likely
>>> > enable other frameworks.
>>> >
>>> > Would one think that adaptation of SQLAlchemy to Drill is specific to
>>> > Caravel? What subset of features from a RDBMS ORM is meaningfull,
>>> feasable
>>> > and usefull to map to Drill. This sounds like a broad general
>>> question. I
>>> > am sure there are orms from other language camps that might want Drill
>>> > backends.
>>> > On May 13, 2016 7:33 AM, "John Omernik"  wrote:
>>> >
>>> > > I will be looking into this as well, thanks for sharing!
>>> > > On May 13, 2016 2:01 AM, "Nirav Shah" 
>>> wrote:
>>> > >
>>> > > > I Hi Neeraja,
>>> > > >
>>> > > > I am interested in contributing if integration is not available.
>>> > > > Kindly let me know
>>> > > >
>>> > > > Regards,
>>> > > > Nirav
>>> > > >
>>> > > > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
>>> > > > nrentachint...@maprtech.com> wrote:
>>> > > >
>>> > > > > Hi Folks
>>> > > > >
>>> > > > > Caravel is nice visualization tool recently open sourced by
>>> airbnb.
>>> > Did
>>> > > > > anyone try to integrate Drill and/or interested in contributing
>>> to
>>> > > making
>>> > > > > this work with Drill.
>>> > > > >
>>> > > > > https://github.com/airbnb/caravel
>>> > > > >
>>> > > > >
>>> > > > > -Thanks
>>> > > > > Neeraja
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: workspaces

2016-05-13 Thread Andries Engelbrecht
You should start drill in distributed mode first and then configure the storage 
plugins.
If you configure the storage plugins in embedded mode the information is stored 
in the tmp space instead of registered with ZK for the cluster to use.

--Andries

> On May 13, 2016, at 9:08 AM, Odin Guillermo Caudillo Gallegos 
>  wrote:
> 
> The plugins are working fine in the embbed mode, but when i start the
> drillbit on each server and connect via drill-conf i don't see them.
> Do i need to configure another parameter apart from the zookeeper servers
> in the drill-override.conf file?
> 
> 2016-05-13 11:01 GMT-05:00 Andries Engelbrecht :
> 
>> If Drill was correctly installed in distributed mode the storage plugin
>> and workspaces will be used by the Drill cluster.
>> 
>> Make sure the plugin and workspace was correctly configured and accepted.
>> 
>> Are you using the WebUI or REST to configure the storage plugins?
>> 
>> --Andries
>> 
>>> On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
>> odin.guille...@gmail.com> wrote:
>>> 
>>> Is there a way to configure workspaces on a distributed installation?
>>> Cause i only see the default plugin configuration but not the one that i
>>> created.
>>> 
>>> Thanks
>> 
>> 



Re: CTAS Out of Memory

2016-05-13 Thread Jason Altekruse
I could not find anywhere this is mentioned in the docs, but it has come up
a few times one the list. While we made a number of efforts to move our
interactions with the Parquet library to the off-heap memory (which we use
everywhere else in the engine during processing) the version of the writer
we are using still buffers a non-trivial amount of data into heap memory
when writing parquet files. Try raising your JVM heap memory in
drill-env.sh on startup and see if that prevents the out of memory issue.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich 
wrote:

> Just trying to do a CTAS on a postgres table, it is not huge and only has
> 16 odd million rows, I end up with an out of memory after a while.
>
> Unable to handle out of memory condition in FragmentExecutor.
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
> Is there a way to avoid this without needing to do the CTAS on a subset of
> my table?
>


Re: workspaces

2016-05-13 Thread Odin Guillermo Caudillo Gallegos
The plugins are working fine in the embbed mode, but when i start the
drillbit on each server and connect via drill-conf i don't see them.
Do i need to configure another parameter apart from the zookeeper servers
in the drill-override.conf file?

2016-05-13 11:01 GMT-05:00 Andries Engelbrecht :

> If Drill was correctly installed in distributed mode the storage plugin
> and workspaces will be used by the Drill cluster.
>
> Make sure the plugin and workspace was correctly configured and accepted.
>
> Are you using the WebUI or REST to configure the storage plugins?
>
> --Andries
>
> > On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos <
> odin.guille...@gmail.com> wrote:
> >
> > Is there a way to configure workspaces on a distributed installation?
> > Cause i only see the default plugin configuration but not the one that i
> > created.
> >
> > Thanks
>
>


CTAS Out of Memory

2016-05-13 Thread Stefan Sedich
Just trying to do a CTAS on a postgres table, it is not huge and only has
16 odd million rows, I end up with an out of memory after a while.

Unable to handle out of memory condition in FragmentExecutor.

java.lang.OutOfMemoryError: GC overhead limit exceeded


Is there a way to avoid this without needing to do the CTAS on a subset of
my table?


Re: workspaces

2016-05-13 Thread Andries Engelbrecht
If Drill was correctly installed in distributed mode the storage plugin and 
workspaces will be used by the Drill cluster.

Make sure the plugin and workspace was correctly configured and accepted.

Are you using the WebUI or REST to configure the storage plugins?

--Andries

> On May 13, 2016, at 8:48 AM, Odin Guillermo Caudillo Gallegos 
>  wrote:
> 
> Is there a way to configure workspaces on a distributed installation?
> Cause i only see the default plugin configuration but not the one that i
> created.
> 
> Thanks



workspaces

2016-05-13 Thread Odin Guillermo Caudillo Gallegos
Is there a way to configure workspaces on a distributed installation?
Cause i only see the default plugin configuration but not the one that i
created.

Thanks


Re: Drill & Caravel

2016-05-13 Thread John Omernik
Lots of potential here.
On May 13, 2016 9:42 AM, "Neeraja Rentachintala" <
nrentachint...@maprtech.com> wrote:

> Yes, the key thing is the SQL Alchemy layer.
> I can see it more broadly being used than just Caravel.
>
> On Fri, May 13, 2016 at 6:13 AM, Erik Antelman 
> wrote:
>
> > Isn't this a matter of Drill<->SQLAlchemy. Such a support could likely
> > enable other frameworks.
> >
> > Would one think that adaptation of SQLAlchemy to Drill is specific to
> > Caravel? What subset of features from a RDBMS ORM is meaningfull,
> feasable
> > and usefull to map to Drill. This sounds like a broad general question. I
> > am sure there are orms from other language camps that might want Drill
> > backends.
> > On May 13, 2016 7:33 AM, "John Omernik"  wrote:
> >
> > > I will be looking into this as well, thanks for sharing!
> > > On May 13, 2016 2:01 AM, "Nirav Shah" 
> wrote:
> > >
> > > > I Hi Neeraja,
> > > >
> > > > I am interested in contributing if integration is not available.
> > > > Kindly let me know
> > > >
> > > > Regards,
> > > > Nirav
> > > >
> > > > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
> > > > nrentachint...@maprtech.com> wrote:
> > > >
> > > > > Hi Folks
> > > > >
> > > > > Caravel is nice visualization tool recently open sourced by airbnb.
> > Did
> > > > > anyone try to integrate Drill and/or interested in contributing to
> > > making
> > > > > this work with Drill.
> > > > >
> > > > > https://github.com/airbnb/caravel
> > > > >
> > > > >
> > > > > -Thanks
> > > > > Neeraja
> > > > >
> > > >
> > >
> >
>


Re: Drill & Caravel

2016-05-13 Thread Neeraja Rentachintala
Yes, the key thing is the SQL Alchemy layer.
I can see it more broadly being used than just Caravel.

On Fri, May 13, 2016 at 6:13 AM, Erik Antelman  wrote:

> Isn't this a matter of Drill<->SQLAlchemy. Such a support could likely
> enable other frameworks.
>
> Would one think that adaptation of SQLAlchemy to Drill is specific to
> Caravel? What subset of features from a RDBMS ORM is meaningfull, feasable
> and usefull to map to Drill. This sounds like a broad general question. I
> am sure there are orms from other language camps that might want Drill
> backends.
> On May 13, 2016 7:33 AM, "John Omernik"  wrote:
>
> > I will be looking into this as well, thanks for sharing!
> > On May 13, 2016 2:01 AM, "Nirav Shah"  wrote:
> >
> > > I Hi Neeraja,
> > >
> > > I am interested in contributing if integration is not available.
> > > Kindly let me know
> > >
> > > Regards,
> > > Nirav
> > >
> > > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
> > > nrentachint...@maprtech.com> wrote:
> > >
> > > > Hi Folks
> > > >
> > > > Caravel is nice visualization tool recently open sourced by airbnb.
> Did
> > > > anyone try to integrate Drill and/or interested in contributing to
> > making
> > > > this work with Drill.
> > > >
> > > > https://github.com/airbnb/caravel
> > > >
> > > >
> > > > -Thanks
> > > > Neeraja
> > > >
> > >
> >
>


Re: Drill & Caravel

2016-05-13 Thread Erik Antelman
Isn't this a matter of Drill<->SQLAlchemy. Such a support could likely
enable other frameworks.

Would one think that adaptation of SQLAlchemy to Drill is specific to
Caravel? What subset of features from a RDBMS ORM is meaningfull, feasable
and usefull to map to Drill. This sounds like a broad general question. I
am sure there are orms from other language camps that might want Drill
backends.
On May 13, 2016 7:33 AM, "John Omernik"  wrote:

> I will be looking into this as well, thanks for sharing!
> On May 13, 2016 2:01 AM, "Nirav Shah"  wrote:
>
> > I Hi Neeraja,
> >
> > I am interested in contributing if integration is not available.
> > Kindly let me know
> >
> > Regards,
> > Nirav
> >
> > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> > > Hi Folks
> > >
> > > Caravel is nice visualization tool recently open sourced by airbnb. Did
> > > anyone try to integrate Drill and/or interested in contributing to
> making
> > > this work with Drill.
> > >
> > > https://github.com/airbnb/caravel
> > >
> > >
> > > -Thanks
> > > Neeraja
> > >
> >
>


Run multiple login

2016-05-13 Thread ankit beohar
Hi Team,

I have one data node and run one drillbit but if one person login in
sqlline another person can not login on the same machine.
Can you please suggest how can I enable this.

Best Regards,
ANKIT BEOHAR


Re: Drill & Caravel

2016-05-13 Thread John Omernik
I will be looking into this as well, thanks for sharing!
On May 13, 2016 2:01 AM, "Nirav Shah"  wrote:

> I Hi Neeraja,
>
> I am interested in contributing if integration is not available.
> Kindly let me know
>
> Regards,
> Nirav
>
> On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
> > Hi Folks
> >
> > Caravel is nice visualization tool recently open sourced by airbnb. Did
> > anyone try to integrate Drill and/or interested in contributing to making
> > this work with Drill.
> >
> > https://github.com/airbnb/caravel
> >
> >
> > -Thanks
> > Neeraja
> >
>


Re: Drill & Caravel

2016-05-13 Thread Nirav Shah
I Hi Neeraja,

I am interested in contributing if integration is not available.
Kindly let me know

Regards,
Nirav

On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> Hi Folks
>
> Caravel is nice visualization tool recently open sourced by airbnb. Did
> anyone try to integrate Drill and/or interested in contributing to making
> this work with Drill.
>
> https://github.com/airbnb/caravel
>
>
> -Thanks
> Neeraja
>