from:"Neeraja Rentachintala"

Re: Article on how to use Apache Drill with Oracle's BI tool, OBIEE

2016-08-22 Thread Neeraja Rentachintala

Hi Robin
This is great. thanks for putting it together.
Bridget Bevens takes care of the docs and can help publishing this on the
site.

thanks
Neeraja

On Mon, Aug 22, 2016 at 4:08 AM, Robin Moffatt <
robin.moff...@rittmanmead.com> wrote:

> Hi,
> I've written an article on how to use Apache Drill with Oracle's BI tool,
> OBIEE:
> http://www.rittmanmead.com/blog/2016/08/using-apache-drill-with-obiee-12c/
> Is there a way to get this added to the documentation list here?
> https://drill.apache.org/docs/using-drill-with-bi-tools/
>
> thanks,
>
> Robin Moffatt
>

Drill on Azure

2016-08-02 Thread Neeraja Rentachintala

Just happened to see this great set of comprehensive blog/tutorials on how
to deploy Drill on Azure and use it with a variety of sources on Azure.
Wanted to share the link with the other users that might be interested in
this topic.

https://blogs.msdn.microsoft.com/data_otaku/2016/05/27/deploying-apache-drill-on-azure/

Re: Drill view in shell mode

2016-07-28 Thread Neeraja Rentachintala

If you want to see a list of views created, you can use 'show tables' in
the workspace.

On Thu, Jul 28, 2016 at 5:05 AM, Santosh Kulkarni <
santoshskulkarn...@gmail.com> wrote:

> How to see a view created in Drill Explorer thru Drill shell? Is there any
> any command for Drill views?
>
> Thanks,
>
> Santosh
>

Re: Connecting to Drill ODBC DSN takes exceptionally long time

2016-07-25 Thread Neeraja Rentachintala

Andries did a great job on putting together the material below on this
topic. This info will be helpful to you to optimize metadata access
experience from Tableay.
Additionally make sure you are using the Tableau TDC file that ships with
Drill ODBC drier.

https://community.mapr.com/community/answers/blog/2016/07/20/drill-best-practices-for-bi-and-analytical-tools

-Neeraja

On Mon, Jul 25, 2016 at 9:17 PM, Santosh Kulkarni <
santoshskulkarn...@gmail.com> wrote:

> While connecting Tableau to Drill ODBC DSN, it takes almost 5 minutes to
> connect to Drill.I created 2 DSN, one for Zookeeper Quorum and other one
> for Direct to drillbit, Both take very long time to connect successfully to
> Drill.
>
> Also, after the connection just to open Schema and the tables within the
> schema it takes another few minutes. Underlying datasource is Hive.
>
> Any thoughts on what causes this issue?
>
> Thanks,
>
> Santosh
>

Re: MapR ODBC driver

2016-07-19 Thread Neeraja Rentachintala

Steve

The Apache distribution of Drill does not have a open source version of
ODBC driver.  There is however a JDBC driver that is available that you can
use to connect to Drill from BI tools. The ODBC driver you mentioned below
is provided by Simba and the license restrictions indicate that it
is licensed by MapR.

Regarding your other question, yes, Drill ships with MapR distribution of
Hadoop.
For more info refer to https://www.mapr.com/products/apache-drill


thanks
Neeraja


On Tue, Jul 19, 2016 at 3:27 PM, Steve Warren  wrote:

> I noticed the MapR ODBC driver contains the following restriction in their
> license agreement.
>
> *"Restrictions*. Customer shall only use the Software in conjunction with
> the MapR Product and not on a standalone basis.  For avoidance of doubt,
> Customer is not authorized to use the Software with other distributions for
> Apache Hadoop. For purposes of this Agreement, the “MapR Product” shall
> mean the MapR Distribution for Apache Hadoop."
>
> This seems problematic. Is there another ODBC driver that works with drill?
> Does MapR intend to loosen the restrictions on this product and/or open
> source it?
>
> Does Drill even ship with the MapR's distribution of Apache Hadoop?
>
> --
> Confidentiality Notice and Disclaimer:  The information contained in this
> e-mail and any attachments, is not transmitted by secure means and may also
> be legally privileged and confidential.  If you are not an intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this e-mail is strictly prohibited.  If you have received this
> e-mail in error, please notify the sender and permanently delete the e-mail
> and any attachments immediately. You should not retain, copy or use this
> e-mail or any attachment for any purpose, nor disclose all or any part of
> the contents to any other person. MyVest Corporation, MyVest Advisors and
> their affiliates accept no responsibility for any unauthorized access
> and/or alteration or dissemination of this communication nor for any
> consequence based on or arising out of the use of information that may have
> been illegitimately accessed or altered.
>

Re: How to query array in Hbase

2016-07-04 Thread Neeraja Rentachintala

You can use Convert_to/Convert_from functions with JSON encoding if it is
JSON data.

Once you can get hold of the JSON string, you can use repeated_contains to
check for the existence of an element.

On Sun, Jul 3, 2016 at 9:47 PM, GameboyNO1 <7304...@qq.com> wrote:

> Hi,
>
>
> I put an array in hbase column (now in Json string, but is not necessary
> if drill requires the change), is it possible I can query whether an item
> is in the array?
>
>
> For example, I put string "["192.168.0.1","192.168.0.2",...]" in a hbase
> column, and want to query out rows contains 192.168.0.1 in the array.
>
>
> Thanks!
>
>
> Alfie

Re: Drill taking way too long to plan query

2016-06-23 Thread Neeraja Rentachintala

You might want to enable metadata caching and see if it helps.

 https://drill.apache.org/docs/optimizing-parquet-metadata-reading/

On Thu, Jun 23, 2016 at 1:36 PM, Tanmay Solanki  wrote:

> Below is the plan. The amount of files is ~213000 files of parquet data.
>
> 0: jdbc:drill:> explain plan for select count(*) from
> s3.`tables/stats/iad/201604*/`;
> +--+--+
> |
>
>
>
>
>
>
>   text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@7cb1a0e8[columns
> = null, isStarQuery = false, isSkipQuery = false]])
>  | {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "DirectGroupScan",
> "@id" : 3,
> "cost" : 20.0
>   }, {
> "pop" : "project",
> "@id" : 2,
> "exprs" : [ {
>   "ref" : "`EXPR$0`",
>   "expr" : "`count`"
> } ],
> "child" : 3,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 20.0
>   }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
>   "ref" : "`EXPR$0`",
>   "expr" : "`EXPR$0`"
> } ],
> "child" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 20.0
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 20.0
>   } ]
> } |
> +--+--+
> 1 row selected (7493.869 seconds)
> Additionally I have the drillbit.log for this query which I will post
> below:
> 2016-06-23 18:25:16,417 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:foreman]
> INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 2893d673-3dad-dd21-d5e6-8ef28e0f81c9: explain plan for select count(*) from
> s3.`tables/stats/iad/201604*/`
> 2016-06-23 20:29:45,446 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:foreman]
> INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed
> 218474 out of 218474 using 16 threads. Time: 3474817ms total, 254.452884ms
> avg, 50344ms max.
> 2016-06-23 20:29:45,446 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:foreman]
> INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed
> 218474 out of 218474 using 16 threads. Earliest start: 431.101000 ?s,
> Latest start: 3474340355.187000 ?s, Average start: 1753982685.665761 ?s .
> 2016-06-23 20:30:10,211 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> INFO  o.a.d.e.w.fragment.FragmentExecutor -
> 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State change requested
> AWAITING_ALLOCATION --> RUNNING
> 2016-06-23 20:30:10,211 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> INFO  o.a.d.e.w.f.FragmentStatusReporter -
> 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State to report: RUNNING
> 2016-06-23 20:30:10,226 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> INFO  o.a.d.e.w.fragment.FragmentExecutor -
> 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State change requested RUNNING
> --> FINISHED
> 2016-06-23 20:30:10,226 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> INFO  o.a.d.e.w.f.FragmentStatusReporter -
> 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State to report: FINISHED
>
>
>
> On Thursday, 23 June 2016 11:22 AM, Ted Dunning 
> wrote:
>
>
>  Also, how many files?  What format?
>
> Being so slow is an anomaly.
>
>
>
>
> On Thu, Jun 23, 2016 at 11:15 AM, Khurram Faraaz 
> wrote:
>
> > Can you please share the query plan for that long running query here ?
> >
> > On Thu, Jun 23, 2016 at 11:40 PM, Tanmay Solanki <
> > tsolank...@yahoo.in.invalid> wrote:
> >
> > > I am trying to run a query on Apache drill to simply count the number
> of
> > > rows in a table stored in parquet format in S3. I am running this on a
> 20
> > > node r3.8xlarge EC2 instance cluster and I have my direct memory set to
> > > 80GB, heap memory set to 32GB and set the
> > > planner.memory.max_memory_per_node to a very high value. However,
> > counting
> > > the rows in this table takes around 7662 seconds, or around 2 hours,
> for
> > > drill to finish the query on a 9.93TB, 56 billion rows, and 174 column
> > > dataset.It seems like, from the logs and the web console that query
> > > planning itself is taking near 99% of the time and actual query
> execution
> > > is almost taking no time. I ran the same query on PrestoDB of a similar
> > > setup (20 node r3.8xlarge) and found that it completed in 137 seconds
> or
> > > just over 2 minutes. Is there someting wrong with my configuration of
> > drill
> > > possibly or is this what is expected for drill.
> > >
> >
>
>
>
>

Re: Is this normal view behavior?

2016-06-23 Thread Neeraja Rentachintala

This is a bug.

On Thu, Jun 23, 2016 at 1:32 PM, rahul challapalli <
challapallira...@gmail.com> wrote:

> This looks like a bug. If you renamed the dir0 column as p_day, then you
> should see that in sqlline as well. And I have never seen
> "_DEFAULT_COL_TO_READ_"
> before. Can you file a jira?
>
> - Rahul
>
> On Thu, Jun 23, 2016 at 12:33 PM, John Omernik  wrote:
>
> > I have a table that is a directory of parquet files, each row had say 3
> > columns, and the table is split into subdirectories that allow me to use
> > dir0 partitioning.
> >
> > so if I select * from `table`
> >
> > I get col1, col2, col3, and dir0 as my fields returned.
> >
> > So if I create a view
> >
> > CREATE VIEW view_myview as
> > select dir0 as `p_day`, col1, col2, col3 from `path/to/table`
> >
> > and run
> > select * from view_myview
> >
> > why, in sqlline, isn't the first column named "p_day"
> >
> > I can reference things in my query by p_day, however, the returned
> results,
> > still say dir0?
> >
> > I dir0 | col1| col2 | col3 |
> >
> > If I do select p_day, col1 then I get
> >
> > | dir0 | col1|
> >
> > if I do select p_day then I get
> >
> > | _DEFAULT_COL_TO_READ_ | dir0 |
> >
> > where the first column (DEFAULT_COL_TO_READ) is always null.
> >
> > If I do select dir0 from view I get "dir0" not found.
> >
> > I guess, the "expected" (principal of least surprise) would be to have it
> > just be a column, that is always labeled p_day, and if I only select
> that,
> > I get the dir0 value repeated for each value.
> >
> > Am I over thinking minutia again? :)
> >
>

Re: Drill on Phoenix

2016-06-16 Thread Neeraja Rentachintala

Alex
can yo briefly describe your use case for using Drill with Phoenix.

On Thu, Jun 16, 2016 at 10:42 AM, James Taylor 
wrote:

> Yes, we've created a new Phoenix storage plugin for Drill here[1], and
> there's a good presentation put together by Jacques on here[2] that covers
> Drillix (that's our initiative name) plus Drill and Arrow. This is
> definitely a work in progress at the POC level, but IMHO is very promising.
> We need a couple of things to happen to make it real: 1) finish the
> Phoenix/Calcite integration - this is on the calcite branch in Phoenix and
> progressing nicely (they'll be a good Hadoop Summit talk on this by Julian
> and Maryann[3]), and 2) get Drill on the latest Calcite.
>
> Thanks,
> James
>
> [1] https://github.com/jacques-n/drill/tree/phoenix_plugin
> [2] https://t.co/ZBhmAirswW
> [3] Thu, June 30 @ 12:20PM - 1:00PM
> How We Re-Engineered Phoenix with a Cost-Based Optimizer Based on Calcite
>
> On Thu, Jun 16, 2016 at 9:41 AM, Tom Barber 
> wrote:
>
> > you should be able to use the jdbc connectivity to connect like other
> jdbc
> > datasources
> > On 16 Jun 2016 17:35, "Alex Kamil"  wrote:
> >
> >> Can Drill be integrated with Apache Phoenix?
> >>
> >> Thanks
> >> Alex
> >>
> >
>

Re: Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

2016-05-25 Thread Neeraja Rentachintala

There is ability to do retrieve JSON fields using the convert_to function
in Drill. Check the following doc.

https://drill.apache.org/docs/data-type-conversion/#convert_to-and-convert_from


On Wed, May 25, 2016 at 9:26 AM, Andrew Evans  wrote:

> Drill Members,
>
> I have an intriguing problem where I have hundreds of thousands or even
> millions of records stored in jsonb format in uneven JSon objects in a
> PostgreSQL database. I would like to be able to implicitly grab column
> names and data types using your tool since neither PostreSQL or Pentaho
> have this function. My current tool for this task is Scala but I am not
> dedicated to writing and keeping up with the JSon format specs; etc. in
> real time.
>
> Is it possible by conversion or otherwise to read the jsonb or a text
> string from PostgreSQL as if it were being queried from a "json" type
> storage instead of a "jdbc" type storage? If so, could I pull in different
> columns from PostgreSQL without some sort of key (with the original query)?
> Is there the ability to do some thing like SELECT pkey,
> split_to_columns(convert_to(field,'JSON')) FROM postgres.mytable?
>
> For context, I have posted an example record below:
>
> pkey   ,  data
> 1423234, "{"loadCaseResult": {"case": {"type": "Adoption (Family)",
> "judge": null, "style": "Confidential", "caseId": "12", "events": {"event":
> [{"date": "08/06/2014", "type": "Request: Action", "comment":
> "900125-Request for Action\n Filed/Issued By"}}}
>
> Thank you for your time,
>
> Andrew Evans
> Java/Scala/Python Programmer at Hygenics Data,LLC
>

Re: Drill Views and Typed Columns

2016-05-16 Thread Neeraja Rentachintala

Both are options (and thats how the bI tools work with Drill today)

 - Views with explicit Casts - Will return schema definitions as part show
schemas/describe table/show columns queries.
- Limit 0 queries - This will be good as well if we can modify Caravel to
issue such queries (this is what Tableau does)

For now, I think returning metadata to Caravel using the above options will
be the solution. The ideal approach would be actually to have a  data
exploration experience on raw data (without curation) within Caravel itself
to create this metadata as needed.

-Neeraja

On Mon, May 16, 2016 at 2:45 PM, Ted Dunning  wrote:

> As you suggest, views are a critical way to lock down that kind of
> information.
>
> Select with limit 0 is often used for meta-data exploration. This is more
> robust than asking about tables since not everything is necessarily really
> in a single table.
>
> On Mon, May 16, 2016 at 2:12 PM, John Omernik  wrote:
>
> > Hey all, as part of my exploration of Caravel,  I realized knowing the
> > types of columns can be valuable... I can say create a view of a
> directory
> > of parquet allowing the "show tables" to work well, however, the type for
> > every column is "ANY" which may work (need to tweak some things) but I am
> > guessing may make certain down stream things in Caravel more difficult.
> >
> > So, just thinking aloud here, would it be possible to "cast" in Views to
> > allow the view definition to pass along type information?  Even if it
> means
> > a more verbose view definition, it would be done once, and then down
> stream
> > tools like Caravel would know the types...
> >
> > Thoughts?
> >
> > John
> >
>

Re: Drill & Caravel

2016-05-16 Thread Neeraja Rentachintala

John
Great. Can we briefly look at this during hang out tomorrow.

On Mon, May 16, 2016 at 12:26 PM, John Omernik  wrote:

> AWESOME! Yep, that works and I like it better than using sys.options. Now
> to dive in and play with the Dialect.
>
> Note, I will be uploading an unfinished dialect in my caraveldrill repo...
> the goal isn't a production ready thing, but a skeleton (based on the
> access one) of what a dialect is... The purpose is to evolve things as put
> it all in one area for people to work with. As of now, I have no clue how
> to trap statements with no FROM, but that's my first thing to work on :)
>
> Thanks for the help Veera!
>
> John
>
> On Mon, May 16, 2016 at 2:20 PM, Veera Naranammalpuram <
> vnaranammalpu...@maprtech.com> wrote:
>
> > Does this work?
> >
> > 0: jdbc:drill:zk=local> SELECT 'x' AS some_label from (values(1));
> > +-+
> > | some_label  |
> > +-+
> > | x   |
> > +-+
> > 1 row selected (1.41 seconds)
> > 0: jdbc:drill:zk=local>
> >
> > -Veera
> >
> > On Mon, May 16, 2016 at 3:19 PM, John Omernik  wrote:
> >
> > > I suppose I could do select 'x' AS some_label from sys.options limit 1;
> > >
> > > any reason not to? Any other options?
> > >
> > > On Mon, May 16, 2016 at 2:18 PM, John Omernik 
> wrote:
> > >
> > > > Does Drill have a "dummy" table (like dual) that we could test
> against?
> > > If
> > > > we had that I could replace the that in a dialect (I think)
> > > >
> > > > SELECT 'test plain returns' AS anon_1
> > > >
> > > >
> > > > SELECT 'x' AS some_label
> > > >
> > > >
> > > > SELECT 'test unicode returns' AS anon_1
> > > >
> > > >
> > > > SELECT 'x' AS some_label
> > > >
> > > >
> > > > Drill is looking for a "FROM" :)
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Veera Naranammalpuram
> > Product Specialist - SQL on Hadoop
> > *MapR Technologies (www.mapr.com )*
> > *(Email) vnaranammalpu...@maprtech.com *
> > *(Mobile) 917 683 8116 - can text *
> > *Timezone: ET (UTC -5:00 / -4:00)*
> >
>

Drill & Caravel

2016-05-13 Thread Neeraja Rentachintala

Great, thanks John. I wil look forward for an update on how drill queuing
part goes : )
Btw with regards to metadata queries, Drill already supports metadata (both
the limit 0 form and also show tables/show schemas which are served from
the Information_schema).

On Friday, May 13, 2016, John Omernik <j...@omernik.com> wrote:

> So with that Docker file, I got caravel working easily with test data (no
> drill yet) that will be weekend fun (and the pyodbc is already installed in
> the container, so now it's time to play!)
>
> So I started my docker image with:
>
> sudo docker run -it --rm --net=host
> -v=/mapr/brewpot/apps/prod/caravel/working:/app/working:rw
> -v=/mapr/brewpot/apps/prod/caravel/cache:/app/cache:rw zeta/caravel
> /bin/bash
>
>
> Now, I passed through a couple of volumes that I am not sure I will need, I
> want to play so that my "State" and initialization is saved in those
> directories in the running container (this is just early testing) I just
> run bash, and then run the commands below and it works. I was lazy here and
> just did net host, it would likely work with bridged mode, but I am in an
> airport and wanted to see if I could get it working... the fun part will be
> working with Drill over the weekend. Thanks again Neeraja for sharing this!
>
>
>
>
> Then I ran these commands(per the docs)  and could explore... pretty easy
> actually!
>
> # Create an admin userfabmanager create-admin --app caravel
> # Initialize the databasecaravel db upgrade
> # Create default roles and permissionscaravel init
> # Load some data to play withcaravel load_examples
> # Start the development web servercaravel runserver -d
>
>
>
>
> On Fri, May 13, 2016 at 11:27 AM, John Omernik <j...@omernik.com> wrote:
>
> > So, without running this, but having it build successfully, this seems
> > like a good place to start, it has caravel, and pyodbc all installed
> here.
> > I will be playing more this weekend
> >
> > FROM ubuntu
> >
> > RUN apt-get update && apt-get install -y build-essential libssl-dev
> > libffi-dev python-dev python-pip
> >
> > RUN apt-get install -y unixodbc-dev unixodbc-bin
> >
> > RUN pip install pyodbc
> >
> > RUN pip install caravel
> >
> > CMD ["python -v"]
> >
> > On Fri, May 13, 2016 at 10:44 AM, John Omernik <j...@omernik.com> wrote:
> >
> >> A little more googling and I found the pyodbc, that looks promising.
> >>
> >> On Fri, May 13, 2016 at 10:41 AM, John Omernik <j...@omernik.com>
> wrote:
> >>
> >>> "SQL Alchemy already understands Drill" I was just looking for that, is
> >>> there already some docs/blogs on that? I was going to start there as
> well
> >>> to determine how it worked and then look into the dialect writing and
> see
> >>> how big that project was.  I didn't find much on the Drill + Alchemy,
> but I
> >>> am in an airport and I blame wifi gremlins.
> >>>
> >>>
> >>>
> >>> On Fri, May 13, 2016 at 10:25 AM, Ted Dunning <ted.dunn...@gmail.com>
> >>> wrote:
> >>>
> >>>> SQLAlchemy generates SQL queries and passes them on to Drill. Since
> >>>> SQLAlchemy already understands Drill, most of what will be needed is
> >>>> slight
> >>>> tuning for SQL dialect and providing a mechanism for SQLAlchemy to get
> >>>> meta-data from views.  Tableau does the meta-data discovery using
> limit
> >>>> 0
> >>>> queries to get column names. We would hope that similar methods would
> >>>> work.
> >>>>
> >>>>
> >>>> On Fri, May 13, 2016 at 6:13 AM, Erik Antelman <eantel...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Isn't this a matter of Drill<->SQLAlchemy. Such a support could
> likely
> >>>> > enable other frameworks.
> >>>> >
> >>>> > Would one think that adaptation of SQLAlchemy to Drill is specific
> to
> >>>> > Caravel? What subset of features from a RDBMS ORM is meaningfull,
> >>>> feasable
> >>>> > and usefull to map to Drill. This sounds like a broad general
> >>>> question. I
> >>>> > am sure there are orms from other language camps that might want
> Drill
> >>>> > backends.
> >>>> > On May 13, 2016 7:33 AM, "John Omernik" <j...@omernik.com> wrote:
> >>>> >
&

Re: Drill & Caravel

2016-05-13 Thread Neeraja Rentachintala

Yes, the key thing is the SQL Alchemy layer.
I can see it more broadly being used than just Caravel.

On Fri, May 13, 2016 at 6:13 AM, Erik Antelman <eantel...@gmail.com> wrote:

> Isn't this a matter of Drill<->SQLAlchemy. Such a support could likely
> enable other frameworks.
>
> Would one think that adaptation of SQLAlchemy to Drill is specific to
> Caravel? What subset of features from a RDBMS ORM is meaningfull, feasable
> and usefull to map to Drill. This sounds like a broad general question. I
> am sure there are orms from other language camps that might want Drill
> backends.
> On May 13, 2016 7:33 AM, "John Omernik" <j...@omernik.com> wrote:
>
> > I will be looking into this as well, thanks for sharing!
> > On May 13, 2016 2:01 AM, "Nirav Shah" <nirav.s...@games24x7.com> wrote:
> >
> > > I Hi Neeraja,
> > >
> > > I am interested in contributing if integration is not available.
> > > Kindly let me know
> > >
> > > Regards,
> > > Nirav
> > >
> > > On Thu, May 12, 2016 at 9:19 PM, Neeraja Rentachintala <
> > > nrentachint...@maprtech.com> wrote:
> > >
> > > > Hi Folks
> > > >
> > > > Caravel is nice visualization tool recently open sourced by airbnb.
> Did
> > > > anyone try to integrate Drill and/or interested in contributing to
> > making
> > > > this work with Drill.
> > > >
> > > > https://github.com/airbnb/caravel
> > > >
> > > >
> > > > -Thanks
> > > > Neeraja
> > > >
> > >
> >
>

Drill & Caravel

2016-05-12 Thread Neeraja Rentachintala

Hi Folks

Caravel is nice visualization tool recently open sourced by airbnb. Did
anyone try to integrate Drill and/or interested in contributing to making
this work with Drill.

https://github.com/airbnb/caravel


-Thanks
Neeraja

Re: Question on nested JSON behavior

2016-03-10 Thread Neeraja Rentachintala

Actually I agree with Jiang. The result does seem unintuitive. If it is a
file with just a list, it does still make sense to return the ids in that
list as an array unless the user has configured Drill to automatically
flatten the first level.
Does anyone know how does the other systems behave for this use case? (for
ex: Mongo)



On Thu, Mar 10, 2016 at 4:21 PM, Nathan Griffith 
wrote:

> Hi Jiang,
>
> Think of it this way: If you had a file that was just the list:
>
> {"id":"1001","type":"Regular"}
> {"id":"1002","type":"Chocolate"}
> {"id":"1003","type":"Blueberry"}
> {"id":"1004","type":"Devil's Food"}
>
> What would you like it to return when you query:
>
> select id from dfs.`/path/to/sample_file.json`;
>
> ?
>
> When you enter the query that you're asking about, you're indicating
> exactly that structure of data. Does this explanation make sense?
>
> Best,
> Nathan
>
> On Thu, Mar 10, 2016 at 4:07 PM, Jiang Wu  wrote:
> > Drill version: 1.4.0.  Assuming 3 JSON objects with the following
> structure:
> >
> >   {  ...
> > "batters":
> >   {
> > "batter":
> >   [
> > { "id": "1001", "type": "Regular" },
> > { "id": "1002", "type": "Chocolate" },
> > { "id": "1003", "type": "Blueberry" },
> > { "id": "1004", "type": "Devil's Food" }
> >   ]
> >   },
> > ...
> >   }
> >
> > Now running a few sample queries against the above data:
> >
> >
> > A)  select "batters" returns expected results, which are the values
> of "batters" from each row.
> >
> > 0: jdbc:drill:zk=local> select batters from dfs.`c:\tmp\sample.json`;
> > +-+
> > | batters |
> > +-+
> > |
> {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> Food"}]} |
> > | {"batter":[{"id":"1001","type":"Regular"}]} |
> > |
> {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]}
> |
> > +-+
> > 3 rows selected (0.243 seconds)
> >
> >
> > B)  select "batters.batter" also returns the expected results, which
> are the array values for "batters.batter" from each row.
> >
> >
> > 0: jdbc:drill:zk=local> select t.batters.batter from
> dfs.`c:\tmp\sample.json` t;
> > ++
> > | EXPR$0 |
> > ++
> > |
> [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> Food"}] |
> > | [{"id":"1001","type":"Regular"}] |
> > | [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}] |
> > ++
> > 3 rows selected (0.198 seconds)
> >
> >
> > C)  select "batters.batter.id" returns something unexpected:
> >
> > 0: jdbc:drill:zk=local> select t.batters.batter.id from
> dfs.`c:\tmp\sample.json` t;
> > +-+
> > | EXPR$0  |
> > +-+
> > | 1001|
> > | 1002|
> > | 1003|
> > +-+
> >
> > The above result doesn't make sense.  The result looks like the 3 values
> from row 1. Should the result be the following instead?
> >
> > +-+
> > | EXPR$0  |
> > +-+
> > | [1001, 1002, 1003, 1004]|
> > | [1001]|
> > | [1001, 1002]|
> > +-+
> >
> > Any hints on what is happening here?  Thanks.
> >
> > -- Jiang
> >
>

Re: Apache Drill - Read Java Objects

2016-02-24 Thread Neeraja Rentachintala

Jorge
can you give an example of what you are looking to accomplish here.
Based on your description, it seems to me that you might be able to use the
functions listed here.
https://drill.apache.org/docs/supported-data-types/#data-types-for-convert_to-and-convert_from-functions



On Wed, Feb 24, 2016 at 12:14 PM, jorge_...@yahoo.com.INVALID <
jorge_...@yahoo.com.invalid> wrote:

> Can you please reply to my question below? We need to know if it is
> possible.. The company I work for is probably MapR's largest customer and I
> would appreciate your help.
>
> Thanks,
> Jorge
>
>
> Sent from my iPhone
>
> > On Feb 18, 2016, at 9:48 PM, jorge gonzalez  wrote:
> >
> > Hello,
> >
> > The company I currently work for stores it's data in the form of java
> objects in several MapR clusters. These java objects have a string field
> with tab delimited data. They are looking to start using Apache Drill to
> first load the java objects and then read the tab delimited data/string
> field.
> >
> > Is this too difficult to accomplish? What are the necessary steps?
> >
> > Thanks in advance for your help.
> >
> > Regards,
> > Jorge
>

Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-23 Thread Neeraja Rentachintala

Norris
Quick comment on your point below. The username/password passed currently
on the connection string is for authentication purposes and also used for
impersonation in case of direct connection from BI tool to Drillbit. That
continue to exist, but now the driver needs to be extended to pass an
*'additional'* user name as part of connection and this represents the end
user identity on behalf of which Drill will execute queries (there is an
intermediate hop via the BI server which we are trying to support).
Sudheesh doc has specifics on the proposal.

With regards to interfacing the impersonation feature, it looks like all
you need is the username, which is already being pass down from the
application to the client via the driver.

On Tue, Feb 23, 2016 at 11:52 AM, Norris Lee <norr...@simba.com> wrote:

> ODBC does not have any standard way to change the user for a connection,
> so like Sudheesh mentioned, I'm not sure how this would be exposed to the
> application. I believe some other databases like SQLServer let you change
> the user via SQL.
>
> With regards to interfacing the impersonation feature, it looks like all
> you need is the username, which is already being pass down from the
> application to the client via the driver.
>
> Norris
>
> -Original Message-
> From: Sudheesh Katkam [mailto:skat...@maprtech.com]
> Sent: Tuesday, February 23, 2016 8:49 AM
> To: user@drill.apache.org
> Cc: dev <d...@drill.apache.org>
> Subject: Re: [DISCUSS] New Feature: Drill Client Impersonation
>
> > Do you have an interface proposal? I didn't see that.
>
> Are you referring to the Drill client interface to used by applications?
>
> > Also, what do you think about my comment and Keys response about moving
> pooling to the Driver and then making "connection" lightweight.
>
> An API to change the user on a connection can be easily added later (for
> now, we use a connection property). Since Drill connections are already
> lightweight, this is not an immediate problem. Unlike OracleConnection <
> https://docs.oracle.com/cd/B28359_01/java.111/b31224/proxya.htm#BABEJEIA>,
> JDBC/ ODBC do not have a provision for proxy sessions in their
> specification, so I am not entirely clear how we would expose “change user
> on connection” to applications using these API.
>
> > Connection level identity setting is only viable if the scalability
> concerns I raised in the doc and Jacques indirectly raised are addressed.
> >
> > Historically DB connections have been so expensive that most
> applications created pools of connections and reused them across users.
> That model doesn't work if each connection is tied to a single user. That's
> why the typical implementation has provided for changing the identity on an
> existing connection.
> >
> > Now, if the Drill connection is a very lightweight object (possibly
> mapping to a single heavier weight hidden process level object), then tying
> identity to the connection is fine. I don't know enough about the Drill
> architecture to comment on that but I think a good rule of thumb would be
> "is it reasonable to keep 50+ Drill connections open where each has a
> different user identity?" If the answer is no, then the design needs to
> consider the scale. I'll also add that much further in the future if/when
> Drill takes on more operational types of access that 50 connections will
> rise to a much larger number.
>
>
> Thank you,
> Sudheesh
>
> > On Feb 22, 2016, at 2:27 PM, Jacques Nadeau <jacq...@dremio.com> wrote:
> >
> > Got it, makes sense.
> >
> > Do you have an interface proposal? I didn't see that.
> >
> > Also, what do you think about my comment and Keys response about
> > moving pooling to the Driver and then making "connection" lightweight.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Feb 22, 2016 at 9:59 AM, Sudheesh Katkam
> > <skat...@maprtech.com>
> > wrote:
> >
> >> “… when creating this connection, as part of the connection
> >> properties (JDBC, C++ Client), the application passes the end user’s
> identity (e.g.
> >> username) …”
> >>
> >> I had written the change user as a session option as part of the
> >> enhancement only, where you’ve pointed out a better way. I addressed
> >> your comments on the doc.
> >>
> >> Thank you,
> >> Sudheesh
> >>
> >>> On Feb 22, 2016, at 9:49 AM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >>>
> >>> Maybe I misunderstood the design document.
> >>>
> >>> I thought this was how the

Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-22 Thread Neeraja Rentachintala

It seems to me that for phase 1, we should only have this as connection
level property and have the list of proxy users as a static bootstrap
option. Drill doesn't have a very granular privilege model other than
admins vs non-admins, so till then exposing this via system options seems
like a risk to me from a security standpoint.

-Neeraja

On Mon, Feb 22, 2016 at 9:59 AM, Sudheesh Katkam <skat...@maprtech.com>
wrote:

> “… when creating this connection, as part of the connection properties
> (JDBC, C++ Client), the application passes the end user’s identity (e.g.
> username) …”
>
> I had written the change user as a session option as part of the
> enhancement only, where you’ve pointed out a better way. I addressed your
> comments on the doc.
>
> Thank you,
> Sudheesh
>
> > On Feb 22, 2016, at 9:49 AM, Jacques Nadeau <jacq...@dremio.com> wrote:
> >
> > Maybe I misunderstood the design document.
> >
> > I thought this was how the user would be changed: "Provide a way to
> change
> > the user after the connection is made (details) through a session option"
> >
> > Did I miss something?
> >
> >
> >
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Feb 22, 2016 at 9:06 AM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> >> Jacques,
> >> I think the current proposal by Sudheesh is an API level change to pass
> >> this additional end user id during the connection establishment.
> >> Can you elaborate what you mean by random query.
> >>
> >> -Neeraja
> >>
> >> On Sun, Feb 21, 2016 at 5:07 PM, Jacques Nadeau <jacq...@dremio.com>
> >> wrote:
> >>
> >>> Sudheesh, thanks for putting this together. Reviewing Oracle
> >> documentation,
> >>> they expose this at the API level rather than through a random query. I
> >>> think we should probably model after that rather than invent a new
> >>> mechanism. This also means we can avoid things like query parsing,
> >>> execution roundtrip, query profiles, etc to provide this functionality.
> >>>
> >>> See here:
> >>>
> >>>
> https://docs.oracle.com/cd/B28359_01/java.111/b31224/proxya.htm#BABEJEIA
> >>>
> >>> --
> >>> Jacques Nadeau
> >>> CTO and Co-Founder, Dremio
> >>>
> >>> On Fri, Feb 19, 2016 at 2:18 PM, Keys Botzum <kbot...@maprtech.com>
> >> wrote:
> >>>
> >>>> This is a great feature to add to Drill and I'm excited to see design
> >> on
> >>>> it starting.
> >>>>
> >>>> The ability for an intermediate server that is likely already
> >>>> authenticating end users, to send end user identity down to Drill adds
> >> a
> >>>> key element into an end to end secure design by enabling Drill and the
> >>> back
> >>>> end systems to see the real user and thus perform meaningful
> >>> authorization.
> >>>>
> >>>> Back when I was building many JEE applications I know the DBAs where
> >> very
> >>>> frustrated that the application servers blinded them to the identity
> of
> >>> the
> >>>> end user accessing important corporate data. When JEE application
> >> servers
> >>>> and databases finally added the ability to impersonate that addressed
> a
> >>> lot
> >>>> of security concerns. Of course this isn't a perfect solution and I'm
> >>> sure
> >>>> others will recognize that in some scenarios impersonation isn't the
> >> best
> >>>> approach, but having that as an option in Drill is very valuable.
> >>>>
> >>>> Keys
> >>>> ___
> >>>> Keys Botzum
> >>>> Senior Principal Technologist
> >>>> kbot...@maprtech.com <mailto:kbot...@maprtech.com>
> >>>> 443-718-0098
> >>>> MapR Technologies
> >>>> http://www.mapr.com <http://www.mapr.com/>
> >>>>> On Feb 19, 2016, at 4:49 PM, Sudheesh Katkam <skat...@maprtech.com>
> >>>> wrote:
> >>>>>
> >>>>> Hey y’all,
> >>>>>
> >>>>> I plan to work on DRILL-4281 <
> >>>> https://issues.apache.org/jira/browse/DRILL-4281>: support for
> >>>> inbound/client impersonation. Please review the design document <
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1g0KgugVdRbbIxxZrSCtO1PEHlvwczTLDb38k-npvwjA
> >>>> ,
> >>>> which is open for comments. There is also a link to proof-of-concept
> >>>> (slightly hacky).
> >>>>>
> >>>>> Thank you,
> >>>>> Sudheesh
> >>>>
> >>>>
> >>>
> >>
>
>

Re: REFRESH TABLE METADATA - Access Denied

2016-02-15 Thread Neeraja Rentachintala

John
What is the JIRA# where you are adding more info.

-thanks

On Mon, Feb 15, 2016 at 11:10 AM, John Omernik  wrote:

> Arg, this problem is crazy. (I'll put this in the JIRA too)  So after
> waiting a while, and loading more data. I tried to refresh table metadata
> on the table, using the dataadm user (basically the user who owns the
> data). Note all directories and files are owned by dataadm:dataadm and the
> permissions are 770.  This worked before, but this time, when I ran
>
> REFRESH TABLE METADATA mytable;
>
> I get
>
> "false| Error: 2126.29602.2546226
> /data/prod/mytable/2015-011-12/.drill.parquet_metadata (Permission
> denied)12:44
>
> This is the SAME shell where I ran it before, and I loaded more data (note
> the directory in question was already loaded, that was no touched).
>
> Then I use the find command to remove all the .drill.parquet_metadata
> files. and run the REFRESH TABLE METADATA command again:
>
> This time the command works. Great.
>
> If I run it again, right after: It runs successfully again.
>
> 12:35  Ran it a third time, and it worked.
> 12:37 Ran it a fourth time: and it worked. (Note all the parquet_metadata
> files are owned by my drillbituser: drillbitgroup (in this case, mapr:mapr)
> despite the meta operation being done by the data owner.
> 12:39 Another process *running as dataadm* loaded a new day of data
> (2016-02-12)  No other data was altered here.
> 12:40 Ran REFRESH TABLE METADATA a fifth time: Got the error. Maybe it has
> to do with adding data? Error on 2015-11-12 again
> 12:41 A new Process loaded more data.  (2016-02-11, and 2016-02-10 loaded)
> Process completes succesfully, disabled at this time. for troubleshooting
> (not more data being loaded)
> 12:42 Attempt REFRESH TABLE METADATA again, same error on 2015-11-12
> 12:43 Removed all .drill.parquet_metadata files using find command
> 12:44 Ran REFRESH TABLE METADATA - This time ran with success.  Will now
> run and check without data loading. May have to do with data loading...
> 12:52 Ran REFRESH: Success
> 12:58 Ran REFRESH: Success
> 1:00 Forced Reload of 2016-02-15.  Basically making it so the folder
> "2016-02-15" did not have a .drill.parquet_metadata file (while the other
> days did)
> 1:01 Ran REFRESH : Error: 2126.27460.2555888
> /data/prod/mytable/2015-11-12/.drill.parquet_metadata (Permission denied)
> (Same file, not sure why it picks on this file, nothing is changed there)
> (Even validated, no files modifed since 12:58 when the parquet_metadata
> file was modified, all parquet files still have the same modified times of
> when they were loaded, Feb 9th)
>
>
>
> So thoughts:
>
> 1. When running REFRESH TABLE METADATA, it checks to see if all the files
> in the subdirectories exist, if they don't it starts to "do things"
> 2. The date 2015-11-12 probably keeps coming out is because it's first in
> .drill.parquet_metadata located in /mytable (not in the individual
> directories)
> 3. After the REFRESH failed, I checked some files.
> 2015-11-12/.drill.parquet_metadata was a 0 size files. (Like it was
> attempted to be rewritten and failed) Looking in 2016-11-13, the
> .drill.parquet_metadata file has data in it.
> 4. To test #3, I rm .drill.parquet_metadata from 2015-11-12, and run the
> refresh command again. Interesting... when I do that, I get permissioned
> denied on the 2015-11-12 directory again, this time, intead of the file
> owned by the driillbit user (and having the drillbit user group, in this
> case mapr)  I have a file of 0 bytes, with "dataadm:datareaders"  as the
> owner. That's interesting... shouldn't it be mapr:mapr (the drillbit user?)
>
> So this seems to be the crux of the issue... what should happen here? all
> metadata operations be checked to see if the user issuing it has
> permissions, and then writes happening as the drillbit user?  Any other
> thoughts here?
>
>
>
>
>
>
>
>
>
> On Mon, Feb 15, 2016 at 10:20 AM, John Omernik  wrote:
>
> > So I am not sure what's happened here. The JIRA isn't filled out, but I
> > can't seem to reproduce the problem. Was this stealth fixed? Based on
> some
> > testing, even when the data directory is owned by a different user than
> the
> > drillbit, the .parquet_metadata files are created as mapr:mapr with 755
> > permissions.  And when it refreshes now, there are no errors.  So Maybe
> all
> > fixed?
> >
> > Thanks
> >
> > On Sun, Feb 14, 2016 at 2:20 PM, John Omernik  wrote:
> >
> >> I'd like to revive this thread. Specifically, what should the expect
> >> behavior of the refresh metadata be when running with impersonation?
> >>
> >> Drill Bit User: mapr
> >> Data User (owner): jdoe
> >> Authenticated User: jdoe
> >>
> >> So if a base folder, mytable, has subdirectories of dates, 2015-01-01,
> >> 2015-01-02 etc. And all the data is owned by jdoe:datareaders, and the
> >> permissions are 750 on all directories and files, how SHOULD the REFRESH
> >> METADATA command be

Re: Query Planning and Directory Pruning

2016-02-09 Thread Neeraja Rentachintala

Yes, Drill-3759 covers it.
This is a high priority enhancement that we are trying to get to in the
next couple of releases.

-Neeraja

On Tue, Feb 9, 2016 at 7:32 AM, John Omernik  wrote:

> This one seems to cover it:
>
> https://issues.apache.org/jira/browse/DRILL-3759
>
>
>
> On Tue, Feb 9, 2016 at 9:25 AM, Abdel Hakim Deneche  >
> wrote:
>
> > Hi John,
> >
> > Sorry I didn't get back to you (I thought I did).
> >
> > No, I don't need the plan, I just wanted to confirm what was taking most
> of
> > the time and you already confirmed it's the planning.
> >
> > Can you open a JIRA for this ? this may be a known issue, but I'm not
> sure.
> >
> > Thanks
> >
> > On Tue, Feb 9, 2016 at 6:08 AM, John Omernik  wrote:
> >
> > > Abdel, do you still need the plans, as I said, if your table has any
> > decent
> > > amount of directories and files, it looks like the planning is touching
> > all
> > > the directories even though you are pruning.  I can post plans,
> however,
> > I
> > > think in this case you'll find they are exactly the same, and the only
> > > difference is that the longer queries is planning much more because it
> > has
> > > more files to read.
> > >
> > >
> > > On Thu, Feb 4, 2016 at 10:46 AM, John Omernik 
> wrote:
> > >
> > > > I can package up both plans for you if you need them (let me know if
> > you
> > > > still want them) but I can tell you the plans were EXACTLY the same,
> > > > however the data-sum table took 0.932 seconds to plan the query, and
> > the
> > > > data table (the one with the all the extra data) took 11.379 seconds
> to
> > > > plan the query. Indicating to me the issue isn't in the plan that was
> > > > created, but the actual planning process. (Let me know if you
> disagree
> > or
> > > > still need to see the plan, like I said, the actual plans were
> exactly
> > > the
> > > > same)
> > > >
> > > >
> > > > John.
> > > >
> > > >
> > > > On Thu, Feb 4, 2016 at 10:31 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com> wrote:
> > > >
> > > >> Hey John, can you try an explain plan for both queries and see how
> > much
> > > >> times it takes ?
> > > >>
> > > >> for example, for the first query you would run:
> > > >>
> > > >> *explain plan for* select count(1) from `data/2016-02-03`;
> > > >>
> > > >> It can also be helpful if you could share the query profiles for
> both
> > > >> queries.
> > > >>
> > > >> Thanks
> > > >>
> > > >> On Thu, Feb 4, 2016 at 8:15 AM, John Omernik 
> > wrote:
> > > >>
> > > >> > Hey all, I think am I seeing an issue related to
> > > >> > https://issues.apache.org/jira/browse/DRILL-3759 but I want to
> > > >> describe it
> > > >> > out here, see if it's really the case, and then determine what the
> > > >> blockers
> > > >> > may be to resolution.
> > > >> >
> > > >> > I am using the MapR Developer Release 1.4, and I have a directory
> > with
> > > >> > subdirectories by data.
> > > >> >
> > > >> > data/2015-01-01
> > > >> > data/2015-01-02
> > > >> > data/2015-01-03
> > > >> >
> > > >> > These are stored as Parquet files.  At this point Each data
> averages
> > > >> about
> > > >> > 1 GB of data, and has roughly 75 parquet files in it.
> > > >> >
> > > >> > When I run
> > > >> >
> > > >> > select count(1) from `data/2016-02-03` it takes roughly 11
> seconds.
> > > >> >
> > > >> > If I copy the 2016-02-03 directory to a new base (date-sum) and
> run
> > > >> >
> > > >> > select count(1) from `data_sum/2016-02-03` it runs in 0.874
> seconds.
> > > >> >
> > > >> > Same data, same structure, only difference is the data_sum
> directory
> > > >> only
> > > >> > has a few directories, iand data has dates going back to Nov 2015.
> > It
> > > >> > seems like it is getting files name for all files in each
> directory
> > > >> prior
> > > >> > to pruning which seems to me to be adding a lot of latency to
> > queries
> > > >> that
> > > >> > doesn't need to be there.  (thus I think I am seeing 3759) but I
> > > wanted
> > > >> to
> > > >> > confirm, and then I wanted to see how we can address this in that
> > the
> > > >> > directory prune should be fast, and on large data sets its just
> > going
> > > to
> > > >> > get worse and worse.
> > > >> >
> > > >> >
> > > >> >
> > > >> > John
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Abdelhakim Deneche
> > > >>
> > > >> Software Engineer
> > > >>
> > > >>   
> > > >>
> > > >>
> > > >> Now Available - Free Hadoop On-Demand Training
> > > >> <
> > > >>
> > >
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > >
> >
>

Re: Analyse web server logs in Drill

2016-02-06 Thread Neeraja Rentachintala

can you share a sample of the file.
If it is JSON, you can query it directly.

On Sat, Feb 6, 2016 at 4:25 PM, David Asfaha  wrote:

> Hi,
>
> I would like to analyse web server logs using Drill. Is there anyway to
> analyse them directly as if they were json files?
>
> While researching  an answer to this question, I looked for a way to search
> the mailing list archives but couldn't find it.
>
> Thanks,
> David
>

Re: Bug or Feature?

2016-02-04 Thread Neeraja Rentachintala

John
What happens if you do the select query with no filter.

The scenario you explained does seem like an unexpected behavior.

-Neeraja

On Thu, Feb 4, 2016 at 8:21 AM, John Omernik  wrote:

> Prior to posting a JIRA, I thought I'd toss this here:
>
> If I have a directory: data with subdirectories with parquet files in it
>
>
> data/2016-01-01
> data/2016-01-02
>
> (Seem familiar? This came up in my other testing)
>
>
> If I have MORE then one subdirectory,
>
> then
>
> select count(1) from `data/` where dir0='2016-01-01'
>
>  Works fine.
>
> However, if I have EXACTLY one subdirectory, then
>
> select count(1) from `data/` where dir0 = '2016-01-01'
>
> Takes 15 seconds (instead of returning almost instantly) and reports 0
> records for count.
> Note, this directory DOES exists, so that is not the issue.
>
> If I add a second directory, then the exact query returns almost instantly,
> and reports the correct number of records.
>
> In addition, when there is only one directory, select count(1) from `data/`
> returns instant and the correct count.
>
> To me, it appears if there is ONE and only ONE subdirectory, then dir0=
>  doesn't work as I think people would expect it to. I can't think of a real
> reason to have this behave, and to me it violates the principle of "least
> surprise", but I am not up on the internals of Drill, so I thought I'd post
> here first.
>
> John
>

Re: directory create CTAS behaviour

2016-01-14 Thread Neeraja Rentachintala

What would you like to see instead of directory.

On Thursday, January 14, 2016, Leon Clayton  wrote:

> Hello All
>
> Is it possible to change this behaviour. By default, a directory is
> created, using the exact table name specified in the CTAS statement. I
> don’t want a directory.
>
> http://drill.apache.org/docs/create-table-as-ctas-command/
>
> Regards
>
> Leon Clayton
>
>
>

Re: query plan caching?

2016-01-14 Thread Neeraja Rentachintala

Query plan caching is typically very useful if you have reporting type of
queries  where the query patterns are fixed and vary mostly on filter
conditions. For adhoc queries or data exploration use cases where there may
not be fixed query patterns, this technique is not frequently useful.

Similar to what Andries mentioned, what is the issue you are seeing. The
team might have ideas to help specifically.

-Neeraja

On Thu, Jan 14, 2016 at 7:34 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> Most of the effort have been going into metadata caching and improvements
> in this area, specifically DFS parquet and Hive. And also improved
> directory pruning at planning time. This helps to reduce he planning time
> significantly. These seem to have been the big time consumers for query
> plans
>
> I have not seen any specific effort to cache query plans. However Drill is
> used for analytical purposes the query volumes are substantially lower than
> transactional RDBMS where query plan caching can be critical.
>
> Are you potentially seeing an issue with query volume or query planning
> time?
>
> --Andries
>
> > On Jan 14, 2016, at 7:21 AM, Jason Altekruse 
> wrote:
> >
> > Currently not to my knowledge. Are there queries you are seeing that are
> > taking an abnormally long time to plan?
> >
> > On Thu, Jan 14, 2016 at 6:25 AM, Vince Gonzalez <
> vince.gonza...@gmail.com>
> > wrote:
> >
> >> Does Drill do any caching of query plans?
> >>
>
>

Re: Your Webinar is Starting - It's Time to Log In!

2015-11-13 Thread Neeraja Rentachintala

yes, we will share the slides and the recording.
-thanks

On Fri, Nov 13, 2015 at 9:31 AM, Sun, Zhan  wrote:

> Hi,
>
>
>
> Thank you for providing this great seminar on Drill today. Could you
> please help share the slides?
>
>
>
> Thanks,
>
> Jonathan
>
> ___
>
> This email is intended only for the use of the individual(s) to whom it is
> addressed and may be privileged and confidential.
> Unauthorised use or disclosure is prohibited. If you receive This e-mail
> in error, please advise immediately and delete the original message.
> This message may have been altered without your or our knowledge and the
> sender does not accept any liability for any errors or omissions in the
> message.
>
> Ce courriel est confidentiel et protégé. L'expéditeur ne renonce pas aux
> droits et obligations qui s'y rapportent.
> Toute diffusion, utilisation ou copie de ce message ou des renseignements
> qu'il contient par une personne autre que le (les) destinataire(s)
> désigné(s) est interdite.
> Si vous recevez ce courriel par erreur, veuillez m'en aviser
> immédiatement, par retour de courriel ou par un autre moyen.
>

Re: REFRESH TABLE METADATA - Access Denied

2015-11-11 Thread Neeraja Rentachintala

>>> Now, one thing I did notice is my mapr user was not in the mapradm
> group,
> >>> therefore, didn't have write permissions anywhere... when I fixed that
> on
> >>> all nodes, and then I manually deleted the metadatafiles, things seem
> to be
> >>> working. I wonder if that was my issue?
> >>>
> >>> Basically, the user running the drillbits need to be able to write
> files
> >>> (the .drill.parquet_metadata)  or something bad will happen :) I will
> do
> >>> more testing. This may be a good candidate for some documentation work
> to
> >>> understand what permissions are required to be able to query these.
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> vince.gonza...@gmail.com
> >>>> wrote:
> >>>
> >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> didn't
> >>>> follow your reproduction faithfully.
> >>>>
> >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> >>>> apache drill 1.2.0
> >>>> "drill baby drill"
> >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> >>>> +---+--+
> >>>> |  ok   |   summary|
> >>>> +---+--+
> >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> >>>> +---+--+
> >>>> 1 row selected (32.27 seconds)
> >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> >>>> +---+---+
> >>>> | srcIP | dstIP |
> >>>> +---+---+
> >>>> | 172.16.2.152  | 172.16.1.58   |
> >>>> | 172.16.1.58   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> +---+---+
> >>>> 12 rows selected (5.654 seconds)
> >>>>
> >>>> And here's what my table structure looks like (as seen via MapR NFS):
> >>>>
> >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> >>>> /mapr/vgonzalez.drill/tmp/flows/
> >>>> └── 2015
> >>>>└── 11
> >>>>├── 10
> >>>>│   ├── 21
> >>>>│   │   ├── 39
> >>>>│   │   │   ├── 03
> >>>>│   │   │   │   ├── _common_metadata
> >>>>│   │   │   │   ├── _metadata
> >>>>│   │   │   │   ├──
> >>>> part-r-0-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> >>>>│   │   │   │   └── _SUCCESS
> >>>>│   │   │   └── 20
> >>>>│   │   │   ├── _common_metadata
> >>>>│   │   │   ├── _metadata
> >>>>│   │   │   ├──
> >>>> part-r-0-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> >>>>
> >>>> My parquet was created in Spark, not Drill. Not sure if that's
> relevant.
> >>>>
> >>>> I have authentication and impersonation turned on, and the files are
> >>>> owned
> >>>> by mapr:mapr. Here's my drill-override.conf:
> >>>>
> >>>> drill.exec: {
> >>>>  cluster-id: "vgonzalez_drill-drillbits",
> >>>> zk.connect:
> >>>>
> >>>>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> >>>> }
> >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> >>>> drill.exec { security.user.auth { enabled: true, packages +=
> >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam"

Re: REFRESH TABLE METADATA - Access Denied

2015-11-06 Thread Neeraja Rentachintala

This doesn't make sense and seems like a bug.
I think the right behavior is for the Drillbit to access the cache as
Drillbit user at the query time (there is no user level metadata cache in
Drill at this point).



On Fri, Nov 6, 2015 at 6:57 AM, John Omernik  wrote:

> I ran REFRESH TABLE METADATA on a table, it completed successfully.
>
> When I tried a subsequent query, I get a IOException: Permission Denied on
> .drill.parquet_metadata.
>
> I am running drill with authentication.  I ran the REFRESH TABLE METADATA
> as user X, it appears the .drill.parquet_metadata was created and owned by
> the user the drill bits are running as as is created with -rwxr-x-r-x
>
> My question is this: So, I can see why the file is owned by the drill bit
> user, and the file is created with all can read permissions, but why am I
> getting a permission denied when user X is trying to run a query?
>

Re: Drill ETL: Views are Less than impressive

2015-11-04 Thread Neeraja Rentachintala

John , you mentioned the following.
If you are trying to incrementally load one day worth of data to existing
Parquet table, wouldn't that be a new partition/directory for every day
rather than overwriting an existing partition.

INSERT OVERWRITE TABLE mytable (partition=2015-01-01)
select field1, field2,field3, field4, '11:40' as loadtime
 from mytable where partition = '2015-01-01'

On Wed, Nov 4, 2015 at 8:00 AM, John Omernik  wrote:

> I am trying to manage ETL of some data.  Many of my questions over the last
> few days revolve around this data set, so here I am trying to put a
> manifesto post together. (Manifesto may be too strong of a word, but I have
> a feeling it will be a large post :)
>
> So here some points:
>
> 1. I am pulling data from an Amazon SQS queue using python and writing
> files to a MapRFS file location. This is working well, including writing
> files to filenames prefaced by a . until full loaded so drill ignores them.
> (thanks Andries E for the tip there!)
>
> 2. I CAN directly query the JSON  if I issue Alter Session
> set`store.json.all_text_mode`
> = true.  This is because there are some array fields that use JSON nulls,
> and I've accepted that I have to read them as strings. (Note my questions
> on list about setting this in a view rather than for the system (all
> queries, all users) or for the session (all queries for that user)
>
> 3.  My initial idea was to take data for a day, and then load it into a
> parquet table with directory based partitions.  i.e. I'd take the json from
> a complete day of data
>
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE table `parquetfinal/2015-11-03` as
> select field1, field2, field3, field4 from `streaminglocation` where dir0 =
> '2015-11-03'
>
> This works great, I have all my data for the previous day going back in
> time in a efficient storage format that does not require my users to set
> the store.json.all_text_mode setting in order to query it.  Mission
> accomplished on day -1 and older data.
>
> 4. I then needed a way to "join" that data (yesterday and going backwards)
> with the current data.  My initial thought was to use a view to just union
> the parquet table to the json table.  The  json table has the same date
> format in directories (i.e. dir0 = '2015-11-04')   However, we ran into a
> snag because I really don't want to have my users run the alter session
> command.
>
> I know this is a small point, but if a user is running a query and gets an
> error, thats a situation where my user either has to know about alter
> session or they call the helpdesk, or they just give up and say "well this
> isn't ready" and don't use the data.  Not all my users will live in the
> Drill docs and do so happily.  Yet, I don't want to set that at a system
> level or even by default at a session level due to unforeseen consequences
> of having that setting set and querying a different data set and getting
> inconsistent results.   So just joining the two as is wouldn't work.
>
> 5. My next ideas was to have a parquet backed incremental load table.
> Basically it takes around 20 seconds to load the json to parquet. I'd
> create a temp location outside the final parquet location. Load a whole
> days worth of data every 10 minutes into a new uniquely named directory and
> update a view.  Then I would create another view that would take the
> "final" location backed parquet table and UNION ALL it with the
> "incremental_load" parquet table (that has only one day).  This worked, but
> I am getting no optimizations.  A select count(*) on either of two parts of
> the UNION ALL returns in 0.5 seconds, a select count(*) on
> the unionized view returns in 20 seconds. That is rough.
>
> I thought about having the temp data sit directly in the parquet backed
> final table, but I don't see a way to create my Parquet files and then
> replace the partition that exists in the table with the new data. (This is
> something I used to do in hive  with
> INSERT OVERWRITE TABLE mytable (partition=2015-01-01)
> select field1, field2,field3, field4, '11:40' as loadtime
>  from mytable where partition = '2015-01-01'
>
> (Basically updating the partition with a new loadtime on every record in
> this example)
>
> So I am now in a pickle, I want to be able to provide the streaming data to
> my users in as timely of a manner as possible.  I have the issue with ALTER
> SESSION statement and the law of unintended consequences from changing a
> setting that will alter more than the table I am querying, and I UNIONS are
> being optimized by drill causing slow queries.   I am interested in
> feedback on how to address this ETL concern in the best way with Drill.
>
> And on the plus side, people can use my long email as a sleep aid... I'll
> call that my contribution to the devs who had to much caffeine :)
>
> John
>

Re: Security with Storage Plugins

2015-10-26 Thread Neeraja Rentachintala

Hi John

Can you please elaborate on what you mean by authentication to the source
system is going to be wide open in Drill.

Drill is a query layer on data sources and will respect the underlying
storage permissions without having to manage centralized security
permissions at Drill layer through user impersonation. Users can use Drill
views if they need more granular access to the data.

I would be interested in learning more about your use case to secure the
storage plugin/connections.

thanks

On Mon, Oct 26, 2015 at 6:33 AM, John Omernik  wrote:

> Hey all -
>
> On file system based storage plugins, security is straight forward with
> filesystem permissions etc.  How do we secure storage plugins?  It would
> seem we would want a situation where people could not access certain
> storage plugins especially since authentication to the source system is
> going to be "wide open" (i.e. there is no pass through auth to a backend
> Mongo server, JDBC system, or Hbase setup)  thus how do we wrap security
> around these?
>
> Even basic security... i.e. only X user can use them, so we can use them to
> load parquet tables or something where we can apply security.
>
> Thoughts?
>
> John
>

Re: CTAS over empty file throws NPE

2015-10-22 Thread Neeraja Rentachintala

Hsuan
Is there is a JIRA for this?

On Thu, Oct 22, 2015 at 10:11 AM, Hsuan Yi Chu  wrote:

> Hi,
> This is known issue. It is because there is no schema with empty.tsv file.
>
> I think Daniel might be trying to address this issue.
>
> Thanks for bringing this up.
>
> On Wed, Oct 21, 2015 at 8:10 PM, chandan prakash <
> chandanbaran...@gmail.com>
> wrote:
>
> > Hi,
> > I  have to run CTAS on a tsv file which might in some cases be empty .
> > In those cases its giving NPE.
> >
> > java.sql.SQLException: SYSTEM ERROR: NullPointerException
> >
> > Fragment 0:0
> >
> > [Error Id: 4aa5a127-b2dd-41a0-ac49-fc2058e9564f on 192.168.0.104:31010]
> >
> > at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(
> > DrillCursor.java:214)
> >
> > at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(
> > DrillCursor.java:257)
> >
> > saw *similar bug post : https://issues.apache.org/jira/browse/DRILL-3539
> > *
> >
> > Can anyone help with the fix or some workaround ?
> > Any lead will be appreciated .
> >
> > Thanks,
> > Chandan
> >
> >
> > --
> > Chandan Prakash
> >
>

Re: MapR Drill 1.2 Package

2015-10-21 Thread Neeraja Rentachintala

, MapR released a comprehensive SQL test framework to the open
> source community.  With over 10,000 tests developed over the course of
> several months, this framework is available for developers in the community
> to continue to maintain the enterprise quality of the Apache Drill project
> and accelerate community-driven innovation.
>
> “Releasing the test frameworks demonstrates our continued commitment in
> building a strong community to drive the innovation and quality of the
> Apache Drill OSS project,”
> said Neeraja Rentachintala, director, product management, MapR
> Technologies. “Drill users are getting value from their relational
> structured data in Hadoop as well as enabling a broader set of users in an
> organization to leverage new types of semi-structured data sources such as
> JSON. As the only schema-free SQL engine for big data, Drill brings
> unprecedented flexibility and performance, rapid time to insights, granular
> security, scale in all dimensions and integration with existing tools.”
>
>
> DRILL RESOURCES
> •   Take advantage of free MapR On-Demand Hadoop training to get
> started on Drill
> •   To learn more about the Drill 1.2 product and its key features,
> visit here
> •   To experience Drill in action by downloading the software or to
> find more information, visit here.
>
>
> Tweet this:  Apache Drill 1.2 and Data Exploration Quick Start Solution
> now available from MapR. http://bit.ly/1GPfPg3
>
>
>
> About MapR Technologies
> MapR provides the industry's only big data platform that combines the
> processing power of the top-ranked Hadoop with web-scale enterprise storage
> and real-time database capabilities, enabling customers to harness the
> enormous power of their data. Organizations with the most demanding
> production needs, including sub-second response for fraud prevention,
> secure and highly available data-driven insights for better healthcare,
> petabyte analysis for threat detection, and integrated operational and
> analytic processing for improved customer experiences, run on MapR. A
> majority of customers achieve payback in fewer than 12 months and realize
> greater than 5X ROI. MapR ensures customer success through world-class
> professional services and with free on-demand training that 40,000
> developers, data analysts and administrators have used to close the big
> data skills gap. Amazon, Cisco, Google, HP, SAP, and Teradata are part of
> the worldwide MapR partner ecosystem. Investors include Google Capital,
> Lightspeed Venture Partners, Mayfield Fund, NEA, Qualcomm Ventures and
> Redpoint Ventures. Connect with MapR on Twitter, LinkedIn, and Facebook
>
> # # #
>
>
> Media Contacts:
> Beth Winkowski
> MapR Technologies, Inc.
> (978) 649-7189
> bwinkow...@maprtech.com
>
> Kim Pegnato
> MapR Technologies, Inc.
> (781) 620-0016
> kpegn...@maprtech.com
>
> -Original Message-
> From: Christopher Matta [mailto:cma...@mapr.com]
> Sent: Wednesday, October 21, 2015 10:45 AM
> To: user@drill.apache.org
> Subject: Re: MapR Drill 1.2 Package
>
> Good question, I'd like to know as well.
>
> I know that there were some JDBC fixes added to the official Drill 1.2
> release, are those going to get back ported into the MapR release?
>
> Chris Matta
> cma...@mapr.com
> 215-701-3146
>
> On Tue, Oct 20, 2015 at 2:36 PM, John Omernik <j...@omernik.com> wrote:
>
> > Hey quick questions for the @maprtech folks. I am using the MapR
> > package for drill dated Oct 7 (Drill 1.2.0.20151007103) in MapR terms,
> > with the release announcement on Oct 17, has anything changed official
> > 1.2 release compared to the MapR Release 10 days earlier?
> >
> > Thanks!
> >
> > John
> >
>

Re: [VOTE] Release Apache Drill 1.2.0 RC2

2015-10-09 Thread Neeraja Rentachintala

+1 (non-binding)

Tested embedded mode using yelp JSON tutorials from
https://drill.apache.org/docs/tutorials-introduction/

thanks
Neeraja

On Thu, Oct 8, 2015 at 2:08 PM, Abdel Hakim Deneche 
wrote:

> Hi all,
>
> I'm enjoying the release management so much that I decided to propose a
> third RC of Apache Drill 1.2.0
>
> The tarball artifacts are hosted at [1] and the maven artifacts are hosted
> at [2].
>
> The vote will be open for the next 72 hours ending at 2PM Pacific, October
> 11, 2015.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> thanks,
> Hakim
>
> [1] http://people.apache.org/~adeneche/apache-drill-1.2.0-rc2/
> [2] https://repository.apache.org/content/repositories/orgapachedrill-1008
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>

Re: JDBC driver for MySQL - storage plugin config and push down ?

2015-10-07 Thread Neeraja Rentachintala

Andrew
thanks for the update. Also is it possible for you to share a brief set of
instructions on how to configure/use this plugin with a database , lets say
mysql (or postgres or ..).
There have been few question on the threads around this and could benefit
from a quick summary.


-Neeraja

On Wed, Oct 7, 2015 at 10:42 AM, andrew  wrote:

> Hi Ulf,
>
> There are 2 issues with the latest build that make the JDBC storage plugin
> unusable. I will be publishing a fix for both later today.
>
> The first issue is that the assembly phase wasn’t including the
> drill-module.conf for the JDBC plugin, so you could never actually register
> a JDBC source.
>
> The second issue is that the bootstrap-storage-plugins.json provided is
> wrong.
>
> I’ll send a note to the list once I’ve published a PR.
>
> - Andrew
>
> > On Oct 6, 2015, at 9:09 AM, Ulf Andreasson @ MapR <
> uandreas...@maprtech.com> wrote:
> >
> > Jacques et al,
> >
> > CP includes /usr/share/java/mysql-connector-java.jar but no diff.
> >
> > Are now running on a single node and have been monitoring /opt/drill/log
> > but the sqlline.log doesn't let away anything when creating a new plugin.
> > The web-ui only says "error (invalid JSON mapping)". Have also tried
> using
> > only {"type": "jdbc"} but the same which surprises me ...
> >
> > I checked
> > /drill/contrib/storage-jdbc/target/classes/bootstrap-storage-plugins.json
> > as for guidance (below) and the only diff is username/password
> >
> >"jdbc" : {
> >  type:"jdbc",
> >  enabled: false,
> >  driver:"org.apache.derby.jdbc.ClientDriver",
> >  url:"jdbc:derby://localhost:2/memory:testDB;"
> >}
> >
> > I am curious for the exact syntax when connecting to mysql and does a
> > failed connection attempt result in a failure in creating/updating the
> > plugin ?
> >
> > reg//ulf
> >
> >
> > reg//ulf
> >
> > 
> > Ulf Andreasson | Ericsson Global Alliance Solution Engineer, MapR.com |
> +46
> > 72 700 2295
> >
> >
> > On Mon, Oct 5, 2015 at 5:10 PM, Jacques Nadeau 
> wrote:
> >
> >> Have you added the MySQL jdbc driver to the Drill classpath?
> >>
> >> For better debugging of the issue: drop down to one node and then
> provide
> >> the server side log when you're trying to save/update the jdbc plugin.
> >>
> >> --
> >> Jacques Nadeau
> >> CTO and Co-Founder, Dremio
> >>
> >> On Mon, Oct 5, 2015 at 5:14 AM, Ulf Andreasson @ MapR <
> >> uandreas...@maprtech.com> wrote:
> >>
> >>> Got some questions wrt the JDBC mySQL storage plugin
> >>>
> >>> 1) Given that we are getting close to 1.2 and full release of the
> plugin
> >>> what is the status of the push down function, which clauses have been
> >>> supported by push down?
> >>>
> >>> 2) Creation of my MySQLDB storage plugin gives me Error "(invalid JSON
> >>> mapping)"
> >>> When trying to connect I get the error whatever I define  even if I
> >>> insert {type:jdbc} or a longer version like below. From seeing the
> code I
> >>> seem to have the right properties. I understand the error will also be
> >>> given if I cant connect with the MySQL db at the creation of the
> storage
> >>> plugin, correct ? Any hints of what I shall use ?
> >>>
> >>> {
> >>>  "type":"jdbc",
> >>>  "driver":"com.mysql.jdbc.Driver",
> >>>  "url":"jdbc:mysql://node03:3306/test",
> >>>  "username":“mapr",
> >>>  "password":“mapr",
> >>>  "enabled”:true
> >>> }
> >>>
> >>> 3) In another system we could use the storage plugin although it showed
> >>> "null" in the drill web-ui ... the configuration could be seen in
> >>> zookeeper.
> >>>
> >>> 4)  When canceling a querying while its running against a big table
> (~8M
> >>> records) drill crashes and we need to restart it.
> >>>
> >>> 5) When running “SELECT COUNT(*) from mysql.test;” it is very slow / no
> >>> response for very long time (more than 45 seconds).
> >>>
> >>>
> >>> reg//ulf
> >>>
> >>
>
>

Drill custom aggregate functions

2015-09-23 Thread Neeraja Rentachintala

https://drill.apache.org/docs/developing-an-aggregate-function/
See the customer aggregate functions are marked as alpha and experimental
usage only.
What features or aspects are missing to make this a 'ready to deploy in
production' capability.
Appreciate response.

thanks
-Neeraja

Drill WITH clause syntax

2015-09-23 Thread Neeraja Rentachintala

Team
Is this a valid Drill syntax (i.e 2 tables in with clause).

WITH X1
AS
(SELECT city,
AVG(review_count) AS city_reviews_avg
FROM `business.json`
GROUP BY city),
X2
AS
(SELECT X1.city, X1.city_reviews_avg,
MAX(X1.city_reviews_avg)
OVER () AS city_reviews_avg_max
FROM X1)
SELECT X2.city, X2.city_reviews_avg
FROM X2
WHERE X2.city_reviews_avg_max = X2.city_reviews_avg;

I am hitting this error. *Error: SYSTEM ERROR: NoSuchFieldError: constants*

A single table specification in WITH clause works fine.

-Neeraja

Re: Parquet Partitions

2015-08-05 Thread Neeraja Rentachintala

John
Both would work i.e query partitioned directories directly using file
system storage plug in or via Hive table.

On Wed, Aug 5, 2015 at 8:58 AM, John Omernik j...@omernik.com wrote:

 After reading about Parquet Partition Pruning in Drill 1.1, I was wondering
 if there is still partitioning based on hive like partitions. I.e. I have
 a process that is making a hive table with Parquet files.  It's using
 Partitions (Directories).  Do I need Drill to read that data using the Hive
 Plugin so it's aware of the partitions and can prune, or can I just use the
 DFS plugin, point it at the root of the table in Hive, and let it go,
 inferring Schema and partitions based on the directories that exist?

 John

Re: Drill 1.1 on Hive 1.2? (No results on hive tables!)

2015-07-29 Thread Neeraja Rentachintala

Yes, Drill 1.1 supports hive 1.0.

On Wednesday, July 29, 2015, Adrià Vilà av...@datknosys.com wrote:

 Hi everyone,

  Have installed and configured correctly hive storage plugin and it 's
 possible to list hive databases and view hive tables schema (describe table
 returns the right result), but any (basic) SELECT query returns no rows:
 No rows selected.

  Cluster versions:
  - google cloud platform
  - multi node cluster
  - hive 1.2.1.2.3
  - zookeeper 3.4.6.2.3
  - hdfs 2.7.1.2.3
  - drill 1.1 (commit e3fc7e97bfe712dc09d43a8a055a5135c96b7344)

  Is it possible that hive 1.2 version is not supported yet by drill 1.1
 and thats why I can't get any results on hive tables? I get no error, just
 no results.. Have been looking all logs and files I could!
  Is there any commit that fixes this?

  Thank you in advance!

  Adrià



 

 ADRIÀ VILÀ
 Desarrollador de Software
 | Software Development
 av...@datknosys.com
 javascript:;
 Tel.: +34 650 538 419
 www.datknosys.com
 Gran Via Carles III, 84,
 planta 3 (Torres Trade) - 08028 Barcelona (España) - T +34 934 965 721
 Paseo de la Castellana
 141, 20ª (Edif Cuzco IV) - 28046 Madrid (España) - T +34 917 498 012
 Calle Aguere, 9 local 01
 (Torres de Cristal) - 38005 Santa Cruz de Tenerife (España) - T +34 922 02
 57 60
 4 N. 2nd Street, Suite
 #210 San Jose, CA 95113 (USA) - Phone: +1 (408) 628 1441 --

 

 La información contenida
 en este mensaje es confidencial y se dirige exclusivamente a su
 destinatario, estando prohibida su divulgación parcial o total. Por favor
 contacte inmediatamente al remitente si ha recibido este mensaje por error.
 // The information in this message is confidential and it is intended only
 to its recipient, being forbidden its total or partial disclosure. Please
 immediately contact the sender if you have received this message by mistake.

Re: Querying Apache Spark Generated Parquet

2015-07-22 Thread Neeraja Rentachintala

Hi
Do you still see this issue. Can you share a sample parquet file where you
see problem.

On Thursday, July 16, 2015, Jacques Nadeau jacq...@dremio.com wrote:

 Can you create a JIRA and post a small sample file that illustrates the
 problem?

 On Thu, Jul 16, 2015 at 2:12 AM, Usman Ali usman@platalytics.com
 javascript:;
 wrote:

  Hi,
   I am having trouble querying a parquet file generated using Apache
  Spark.
  *Select * from `file.parquet` *  works fine but when I try to select
 only
  some fields of parquet files it returns Null values. i.e.* Select field1
  from `file.parquet` * returns only Null values. No, field1 does not
  contain any Null values.
  Note that this is not an issue when I query a parquet file which is not
  generated by Apache Spark.
  Any insights?
 
  Regards,
  Usman |Ali

Re: Querying multiple TSV or CSV files at once?

2015-02-08 Thread Neeraja Rentachintala

What is the error that you are seeing?
Can you simply point it to the directory (without *.csv) to see if it helps.


On Sun, Feb 8, 2015 at 10:33 AM, Minnow Noir minnown...@gmail.com wrote:

 I'm trying to do ad-hoc exploration/analysis over multiple files without
 having to concatenate them.  New files show up on a regular basis, and
 creating large, redundant concatenated files seems inelegant for data
 exploration.  I've tried the obvious (... from dfs.`/dir/*.csv` but that
 only returns lines from the first file it finds, and then an error for the
 next file.

 Is there any current way to do this?


 Thanks

Re: Looking For Drill REST API Doc.

2014-12-10 Thread Neeraja Rentachintala

Refer to below
https://issues.apache.org/jira/browse/DRILL-77
The google doc in this JIRA has info on the REST APIs.
-Neeraja

On Wed, Dec 10, 2014 at 9:42 PM, mufy mufeed.us...@gmail.com wrote:

 Is there a formal Drill REST APIs reference document that can be shared
 with customers?

 ---
 Mufeed Usman
 My LinkedIn http://www.linkedin.com/pub/mufeed-usman/28/254/400 | My
 Social Cause http://www.vision2016.org.in/ | My Blogs : LiveJournal
 http://mufeed.livejournal.com

41 matches

Mail list logo