Re: [ANNOUNCE] New Committer: Padma Penumarthy

2018-06-15 Thread Neeraja Rentachintala
Congrats Padma!

On 6/15/18, 9:36 AM, "Aman Sinha"  wrote:

The Project Management Committee (PMC) for Apache Drill has invited Padma
Penumarthy to become a committer, and we are pleased to announce that she 
has
accepted.

Padma has been contributing to Drill for about 1 1/2 years.  She has made
improvements for work-unit assignment in the parallelizer, performance of
filter operator for pattern matching and (more recently) on the batch
sizing for several operators: Flatten, MergeJoin, HashJoin, UnionAll.

Welcome Padma, and thank you for your contributions.  Keep up the good work
!

-Aman
(on behalf of Drill PMC)




Re: Issue with OBIEE 12c

2017-03-16 Thread Neeraja Rentachintala
You might want to refer to this.
https://www.rittmanmead.com/blog/2016/08/using-apache-drill-with-obiee-12c/ 

On 3/16/17, 8:43 AM, "Di Camillo, Fabrizio"  
wrote:

Hi guru,
we are using Apache Drill (1.9) driver for OBIEE 12c (64 bit).

Some queries failed with this error:
[nQSError: 16011] ODBC error occurred while executing SQLExtendedFetch to 
retrieve the results of a SQL

This is an errore message from OBIEE but we cannot find any details from 
apache drill.

Please can you help on this?

Thanks

Fabrizio

Fabrizio Di Camillo / Capgemini Italia / Roma

Mob.: +39 3451769879 / www.capgemini.com
Via di Torre Spaccata, 140 - 00169

This message contains information that may be privileged or confidential 
and is the property of the Capgemini Group. It is intended only for the person 
to whom it is addressed. If you are not the intended recipient, you are not 
authorized to read, print, retain, copy, disseminate, distribute, or use this 
message or any part thereof. If you receive this message in error, please 
notify the sender immediately and delete all copies of this message.




Re: [DISCUSS] Apache Drill Version after 1.9.0, etc.

2016-11-29 Thread Neeraja Rentachintala
+1 on continuing to 1.10 version after 1.9 release.


On Mon, Nov 28, 2016 at 7:49 PM, Aman Sinha  wrote:

> (A) I am leaning to 1.10 for the reasons already mentioned in your email.
> (B) sounds good.
> (C) Does it matter if there are a few commits in master branch already ?
> What's the implication of just updating the pom files (not force-push).
>
> On Mon, Nov 28, 2016 at 3:25 PM, Sudheesh Katkam 
> wrote:
>
> > Hi all,
> >
> > -
> >
> > (A) I had asked the question about what the release version should be
> > after 1.9.0. Since this is part of the next release plan, a vote is
> > required based on the discussion. For approval, the vote requires a lazy
> > majority of active committers over 3 days.
> >
> > Here are some comments from that thread:
> >
> > Quoting Paul:
> >
> > > For release numbers, 1.10 (then 1.11, 1.12, …) seems like a good idea.
> > >
> > > At first it may seem odd to go to 1.10 from 1.9. Might people get
> > confused between 1.10 and 1.1.0? But, there is precedence. Tomcat’s
> latest
> > 7-series release is 7.0.72. Java is on 8u112. And so on.
> > >
> > > I like the idea of moving to 2.0 later when the team introduces a major
> > change, rather than by default just because the numbers roll around. For
> > example, Hadoop when to 2.x when YARN was introduced. Impala appears to
> > have moved to 2.0 when they added Spill to disk for some (all?)
> operators.
> >
> >
> > Quoting Parth:
> >
> > > Specifically what did you want to discuss about the release number
> after
> > 1.9?  Ordinarily you would just go to 2.0. The only reason for holding
> off
> > on 2.0, that I can think of, is if you want to make breaking changes in
> the
> > 2.0 release and those are not going to be ready for the next release
> cycle.
> > Are any dev's planning on such breaking changes? If so we should discuss
> > that (or any other reason we might have for deferring 2.0) in a separate
> > thread?
> > > I'm +0 on any version number we chose.
> >
> >
> > I am +1 on Paul’s suggestion for 1.10.0, unless, as Parth noted, we plan
> > to make breaking changes in the next release cycle.
> >
> > @Jacques, any comments? You had mentioned about this a while back [1].
> >
> > -
> >
> > (B) Until discussion on (A) is complete, which may take a while, I
> propose
> > we move the master to 1.10.0-SNAPSHOT to unblock committing to master
> > branch. If there are no objections, I will do this tomorrow, once 1.9.0
> > release artifacts are propagated.
> >
> > -
> >
> > (C) I noticed there are some changes committed to master branch before
> the
> > commit that moves to the next snapshot version. Did we face this issue in
> > the past? If so, how did we resolve the issue? Is 'force push' an option?
> >
> > -
> >
> > Thank you,
> > Sudheesh
> >
> > [1] http://mail-archives.apache.org/mod_mbox/drill-dev/201604.mbox/%
> > 3CCAJrw0OTiXLnmW25K0aQtsVmh3A4vxfwZzvHntxeYJjPdd-PnYQ%40mail.gmail.com
> %3E
> >  > 3ccajrw0otixlnmw25k0aqtsvmh3a4vxfwzzvhntxeyjjpdd-p...@mail.gmail.com%3E>
>


Re: Dynamic UDFs support

2016-07-21 Thread Neeraja Rentachintala
It seems like we are reaching a conclusion here in terms of starting with a
simpler implementation i.e being able to deploy UDFs dynamically without
Drillbit restarts based off a jars in DFS location.  Dropping functions
dynamically is out of scope for version 1 of this feature (we assume
development of UDFs is happening on user laptop or a dev cluster where its
ok to have restart).

-Neeraja

On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum <kbot...@maprtech.com> wrote:

> Recognize the difficulty. Not suggesting this be addressed in first
> version. Just suggesting some thought about how a real user will
> workaround. Maybe some doc and/or small changes can make this easier.
>
> Keys
> ___
> Keys Botzum
> Senior Principal Technologist
> kbot...@maprtech.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
> On Jul 21, 2016 1:45 PM, "Paul Rogers" <prog...@maprtech.com> wrote:
>
> > Hi All,
> >
> > Adding a dynamic DROP would, of course, be a great addition! The reason
> > for suggesting we skip that was to control project scope.
> >
> > Dynamic DROP requires a synchronization step. Here’s the scenario:
> >
> > * Foreman A starts a query using UDF U.
> > * Foreman B receives a request to drop UDF U, followed by a request to
> add
> > a new version of U, U’.
> >
> > How do we drop a function that may be in use? There are some tricky bits
> > to work out, which seemed too overwhelming to consider all in one go.
> >
> > Clearly just dropping U and adding a new version of U with the same name
> > leads to issues if not synchronized. If a Drillbit D is running a query
> > with U when it receives notice to drop U, should D complete the query or
> > fail it? If the query completes, then how does D deal with the request to
> > register U’, which has the same name?
> >
> > Do we globally synchronize function deletion? (The foreman B that
> receives
> > the drop request waits for all queries using U to finish.) But, how do we
> > know which queries use U?
> >
> > An eventually consistent approach is to track the age of the oldest
> > running query. Suppose B drops U at time T. Any query received after T
> that
> > uses U will fail in planning. A new U’ can’t be registered until all
> > queries that started before T complete.
> >
> > The primary challenge we face in both the CREATE and DROP cases is that
> > Drill is distributed with little central coordination. That’s great for
> > scale, but makes it hard to design features that require coordination.
> Some
> > other tools solve this problem with a data dictionary (or “metastore").
> > Alas, Drill does not have such a concept. So a seemingly simple feature
> > like dynamic UDF becomes a major design challenge to get right.
> >
> > Thanks,
> >
> > - Paul
> >
> > > On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> > >
> > > The whole point of this feature is to avoid Drill cluster restarts as
> the
> > > name indicates 'Dynamic' UDFs.
> > > So any design that requires restarts I would think would beat the
> > purpose.
> > >
> > > I also think this is an example of a feature we start with a simple
> > design
> > > to serve the purpose, take feedback on how it is being deployed/used in
> > > real user situations and improve it in subsequent releases.
> > >
> > > -thanks
> > > Neeraja
> > >
> > > On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <kbot...@maprtech.com>
> > wrote:
> > >
> > >> I think there are a lot of great ideas here. My one concern is the
> lack
> > of
> > >> unload and thus presumably replace functionality. I'm just thinking
> > about
> > >> typical actual usage.
> > >>
> > >> In a typical development cycle someone writes something, tries it,
> > learns,
> > >> changes it, and tries again. Assuming I understand the design that
> > change
> > >> step requires a full Drill cluster restart. That is going to be very
> > >> disruptive and will make UDF work nearly impossible without a
> dedicated
> > >> "private" cluster for Drill. I realize that people should have access
> to
> > >> the data they need and Drill in a development cluster but even then
> > >> restarts can be hard since development clusters are often shared - and
> > >> that's assuming such a cluster exists. I realize of course Drill can
&

Re: Dynamic UDFs support

2016-07-21 Thread Neeraja Rentachintala
already do
> something
> >>>>>>>>> similar.)
> >>>>>>>>>
> >>>>>>>>> 4. Registry check is “forced” when processing a query with a
> >>>> function
> >>>>>>>> that
> >>>>>>>>> is not currently registered. (Doing so resolves any possible race
> >>>>>>>>> conditions.)
> >>>>>>>>>
> >>>>>>>>> 5. Some process (perhaps time based) removes old, unregistered
> jar
> >>>>>> files.
> >>>>>>>>> (Or, we could get fancy and use reference counts. The reference
> >>>> count
> >>>>>>>> would
> >>>>>>>>> be required if the user wants to delete, then recreate, the same
> >>>>>> function
> >>>>>>>>> and jar to avoid conflict with in-flight queries.)
> >>>>>>>>>
> >>>>>>>>> We can build security on this as follows:
> >>>>>>>>>
> >>>>>>>>> 1. Define permissions for who can write to the DFS location. Or,
> >>>>>> indeed,
> >>>>>>>>> have subdirectories by user and grant each user permission only
> on
> >>>>>> their
> >>>>>>>>> own UDF directory.
> >>>>>>>>>
> >>>>>>>>> 2. Provide separate registries for per-user functions (private)
> and
> >>>>>>>> global
> >>>>>>>>> functions (public). Only the admin can add global functions. But,
> >>>> only
> >>>>>>>> the
> >>>>>>>>> user that uploads a private function can use it.
> >>>>>>>>>
> >>>>>>>>> 3. Leverage the Java class loader to isolate UDFs in their own
> name
> >>>>>> space
> >>>>>>>>> (see Eclipse & Tomcat for examples). That is, Drill can call
> into a
> >>>>>> UDF,
> >>>>>>>>> UDFs can call selected Drill code, but UDFs can’t shadow Drill
> >>>> classes
> >>>>>>>>> (accidentally or maliciously.) Plus, my function Foo won’t clash
> >>>> with
> >>>>>>>> your
> >>>>>>>>> function Foo if both are private.
> >>>>>>>>>
> >>>>>>>>> Sorry that this has wandered a bit far from the original simple
> >>>> design,
> >>>>>>>>> but the above may capture much of what folks expect in modern
> >>>>>> distributed
> >>>>>>>>> big data systems.
> >>>>>>>>>
> >>>>>>>>> I wonder if a good next step might be to review the notes in the
> >>>> design
> >>>>>>>>> doc, in the JIRA, and in this e-mail chain and to prepare a
> summary
> >>>> of
> >>>>>>>>> technical requirements, and a proposed design. Postpone, at least
> >>>> for
> >>>>>>>> now,
> >>>>>>>>> concerns about the amount of work; we can worry about that once
> >>>> folks
> >>>>>>>> agree
> >>>>>>>>> on your revised design.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> - Paul
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva <
> >>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> 4. Authorization model mentioned by Julia and John
> >>>>>>>>>> If user won't have rights to copy jars to UDF classpath, which
> can
> >>>> be
> >>>>>>>>>> restricted by file system, he won't be able to do much harm by
> >>>> running
> >>>>>>>>>> CREATE command. If UDFs from jar were already registered, CREATE
> >>>>>>>>> statement
> >>>>>>>>>>

Re: Drill plugin for Kafka

2016-07-13 Thread Neeraja Rentachintala
Great Anil. Thanks.

On Wed, Jul 13, 2016 at 9:02 PM, AnilKumar B  wrote:

> Thanks Parth. Created the https://issues.apache.org/jira/browse/DRILL-4779
>
> Like Mongo storage plugin contribution, Kamesh and me would like to start
> working on this feature.
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Wed, Jul 13, 2016 at 7:53 PM, Parth Chandra  wrote:
>
> > That would be great.
> >
> > I don't think anyone is working on one. So please go ahead and create a
> > JIRA and assign it to yourself.
> >
> > And thank you for contributing.
> >
> >
> >
> > On Wed, Jul 13, 2016 at 3:22 PM, AnilKumar B 
> > wrote:
> >
> > > Hi All,
> > >
> > > Is it useful to have Drill plugin for Kafka? And if so, is there
> anyone,
> > > working on it already? Else planning to start working on it.
> > >
> > >
> > > Thanks & Regards,
> > > B Anil Kumar.
> > >
> >
>


Re: Implement "DROP TABLE IIF EXISTS" statement

2016-07-01 Thread Neeraja Rentachintala
sounds good.
We need to clear document this - specifically the Hive UDF IF syntax issue.

-Neeraja

On Fri, Jul 1, 2016 at 11:07 AM, Vitalii Diravka 
wrote:

> Agree with the last decision to use "IF EXISTS" statement and `if` udf with
> backticks.
> It is acceptable option.
>
> Thank you for valuable advices.
>
> Kind regards
> Vitalii
>
> 2016-06-30 17:22 GMT+00:00 John Omernik :
>
> > I agree with Julian. If we can backtick quote Hive's if and have an
> option
> > for Hive users, it would be nice. But Hive made a mess, and there is
> > precedent for IF.  This makes from a cluster administration perspective,
> > and even being a Hive user, as long as I had an option (with backticks)
> to
> > allow me to move forward, I'd understand and accept the required changes.
> >
> > John
> >
> >
> > On Thu, Jun 30, 2016 at 12:00 PM, Julian Hyde  wrote:
> >
> > > Even though it’s not standard, several other databases have DROP TABLE
> …
> > > IF EXISTS (MySQL [1]; Postgres [2] and SQL Server 2016 [3] put the “IF
> > > EXISTS” before the table name). I know there are problems with the IF
> > > keyword clashing with the Hive “IF” function, but I think it would be
> > crazy
> > > to do “IIF EXISTS”.
> > >
> > > I’d block Hive’s “IF” function, frankly. They screwed up. No need to
> > > propagate their mess into Drill.
> > >
> > > Julian
> > >
> > > [1] http://dev.mysql.com/doc/refman/5.7/en/drop-table.html <
> > > http://dev.mysql.com/doc/refman/5.7/en/drop-table.html>
> > >
> > > [2] https://www.postgresql.org/docs/8.2/static/sql-droptable.html <
> > > https://www.postgresql.org/docs/8.2/static/sql-droptable.html>
> > >
> > > [3]
> > >
> >
> https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/
> > > <
> > >
> >
> https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/
> > > >
> > >
> > > > On Jun 30, 2016, at 5:06 AM, Khurram Faraaz 
> > > wrote:
> > > >
> > > > I looked at the SQL standard and I did not find that IF EXISTS is a
> > part
> > > of
> > > > DROP TABLE syntax, please see below.
> > > >
> > > > INTERNATIONAL STANDARD
> > > > ISO/IEC 9075-2
> > > > Fourth edition 2011-12-15
> > > >
> > > >
> > > > Format
> > > >  ::=
> > > >  DROP TABLE  
> > > >
> > > >   ::=
> > > >CASCADE
> > > >  | RESTRICT
> > > >
> > > > On Thu, Jun 30, 2016 at 3:44 PM, Arina Yelchiyeva <
> > > > arina.yelchiy...@gmail.com> wrote:
> > > >
> > > >> To sum up currently we are facing two options:
> > > >>
> > > >> 1. Add IF as keyword.
> > > >> Pros:
> > > >> DROP TABLE / VIEW IF EXISTS will work
> > > >> Cons:
> > > >> if function (loaded from Hive) will stop working. In this case users
> > > will
> > > >> have two options:
> > > >> a) surround if with backticks (ex: select `if`(condition,option1,
> > > option2)
> > > >> from table)
> > > >> b) replace if function with case statement
> > > >>
> > > >> 2. Use IIF instead of IF
> > > >> Pros:
> > > >> if function will work, no backward compatibility issues.
> > > >> Cons:
> > > >> uncommon syntax for IF EXISTS statement
> > > >>
> > > >> So far none of this options seems to be ideal.
> > > >>
> > > >> Kind regards
> > > >> Arina
> > > >>
> > > >>
> > > >> On Wed, Jun 29, 2016 at 8:56 PM Paul Rogers 
> > > wrote:
> > > >>
> > > >>> Hi Vitalii,
> > > >>>
> > > >>> This will be a nice improvement. Your question about “IIF” vs. “IF”
> > is
> > > in
> > > >>> the context of one small enhancement. But, it raises a larger
> > question
> > > >>> (which is beyond the scope of your project, but is worth discussing
> > > >> anyway.)
> > > >>>
> > > >>> That larger issue is that we really should modify the Drill SQL
> > parser
> > > to
> > > >>> better handle keywords vs. identifiers. That is, the following
> > > >>> “pathological” statement should be valid:
> > > >>>
> > > >>> SELECT select, from FROM from, where WHERE from.select =
> where.from;
> > > >>>
> > > >>> This seems very confusing to us humans. But, to the SQL grammar the
> > > above
> > > >>> is unambiguous. SQL syntax determines where a keyword is valid. All
> > > other
> > > >>> uses of that keyword can easily be interpreted as an identifier.
> > > Further,
> > > >>> the location of the identifier determines whether to interpreted it
> > as
> > > a
> > > >>> column, table, schema, function, etc. For example, a keyword will
> > never
> > > >>> appear in a select list, from list or where expression.
> Technically,
> > we
> > > >>> could introduce distinct name spaces for keywords, columns, tables,
> > > >>> functions and so on.
> > > >>>
> > > >>> Without this change we run two risks:
> > > >>>
> > > >>> 1. We can’t use proper SQL syntax when we need it (as in your
> > project.)
> > > >>> 2. We risk breaking queries when we add new keywords (as in the
> > dynamic
> > > >>> UDF project.)
> > > >>>
> > > >>> This is 

Re: -ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query

2016-06-29 Thread Neeraja Rentachintala
Drill doesn't support SELECT INTO syntax currently.
You need to use Create Table As.

On Tue, Jun 28, 2016 at 11:30 PM, Omkar Pathallapalli 
wrote:

> Hi team,
>
> When i issued query in apache drill iam getting this error how can i
> resolve this
>
> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> select distinct ani,calldate,count(distinct(did))
> Count_of_did,destcode,trffclass,sum(talktime)/60.mo_talktime,custcode into
> rpttmp_final  from dfs.tmp.r_fa_cdr_tmp where  destcode not like
> '%voicemail%'
> and destcode not like '%customerservice%' and destcode not
> like '%topup%'
> group by ani,calldatetrffclass,destcode,custcode
>
>
> [30027]Query execution error. Details:[
> PARSE ERROR: Encountered "into" at line 1, column 121.
> Was expecting one of:
> "FROM" ...
> "," ...
> "AS" ...
>  ...
>  ...
>  ...
>  ...
>  ...
> "." ...
> "[" ...
> "(" ...
> "NOT" ...
> "IN" ...
> "BETWEEN" ...
> "LIKE" ...
> "SIMILAR" ...
> "=" ...
> ">" ...
> "<" ...
> "<=" ...
> ">=" ...
> "<>" ...
> "+" ...
> "-" ...
> "*" ...
> "/" ...
> "||" ...
> ...
>
>
> Regards,
> omkar
>
>


Re: [VOTE] Release Apache Drill 1.7.0 rc0

2016-06-26 Thread Neeraja Rentachintala
+1
Downloaded the binary and tried few tutorials.

On Saturday, June 25, 2016, Jinfeng Ni  wrote:

> +1
>
> Did a full maven build from source code.
> Run yelp tutorial queries against sample json dataset
> Run ~10 tpcds queries and checked query profile through webui
> Test maven artifacts with a simple application.
>
> All looks good.
>
>
>
> On Sat, Jun 25, 2016 at 4:52 PM, Parth Chandra  > wrote:
> > +1
> > Built from source.
> > Confirmed checksums.
> > Ran a few complex queries.
> >
> > All looks good.
> >
> >
> >
> > On Fri, Jun 24, 2016 at 5:45 PM, Aman Sinha  > wrote:
> >
> >> 1.7 Release notes:
> >> [1]
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334767==12313820
> >>
> >> On Thu, Jun 23, 2016 at 5:13 PM, Zelaine Fong  > wrote:
> >>
> >> > I believe DRILL-4658 should be excluded from the 1.7 fix list as it's
> the
> >> > same fix as DRILL-3149, which got backed out.
> >> >
> >> > -- Zelaine
> >> >
> >> > On Thu, Jun 23, 2016 at 4:45 PM, Aman Sinha  >
> >> wrote:
> >> >
> >> > > It looks like link [2] is wrong.   Here's the correct link:
> >> > > [2]  http://home.apache.org/~amansinha/drill/1.7.0/rc0/
> >> > >
> >> > > On Thu, Jun 23, 2016 at 4:41 PM, Aman Sinha  >
> >> > wrote:
> >> > >
> >> > > >
> >> > > > Hello all,
> >> > > >
> >> > > > I'd like to propose the zeroth release candidate (rc0) of Apache
> >> Drill,
> >> > > > version 1.7.0.
> >> > > > It covers a total of 58 resolved JIRAs [1].
> >> > > > Thanks to everyone who contributed to this release.
> >> > > >
> >> > > > The tarball artifacts are hosted at [2] and the maven artifacts
> are
> >> > > hosted
> >> > > > at [3].
> >> > > >
> >> > > > This release candidate is based on commit
> >> > > >  552129fa67102b1013aafafcd98ff585c9288005  located at [4].
> >> > > >
> >> > > > The vote will be open for the next ~72 hours ending at 5 pm
> Pacific,
> >> > June
> >> > > > 26 , 2016.
> >> > > >
> >> > > > [ ] +1
> >> > > > [ ] +0
> >> > > > [ ] -1
> >> > > >
> >> > > > Here's my vote: +1 (binding)
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Aman
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/browse/DRILL-3763?jql=project%20%3D%20DRILL%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.7.0
> >> > > > (note: this is a search result; I will convert to a JIRA report
> later
> >> > > once
> >> > > > the release tags are updated for the open JIRAs).
> >> > > > [2] http://home.apache.org/~amansinha/drill/1.7.0/rc0/
> >> > > > 
> >> > > > [3]
> >> > > >
> >> >
> https://repository.apache.org/content/repositories/orgapachedrill-1031/
> >> > > > [4] https://github.com/amansinha100/incubator-drill/tree/1.7.0
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
>


Re: Dynamic UDFs support

2016-06-21 Thread Neeraja Rentachintala
 Register the udf into, say, ZK. When a new Drillbit
> starts,
> > it looks for new udf jars in ZK, copies the file to a temporary location,
> > and launches. An existing Drill is notified of the change and does the
> same
> > download process. Clean-up is needed at some point to remove ZK entries
> if
> > the udf jar becomes statically available on the next launch. That needs
> > more thought.
> >
> > We’d still need the phases mentioned earlier to ensure consistency.
> >
> > Suggestions anyone as to how to do this super simply & still get it to
> > work with DoY?
> >
> > Thanks,
> >
> > - Paul
> >
> > > On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> > >
> > > This will need to work with YARN (Once Drill is YARN enabled, I would
> > > expect a lot of users using it in conjunction with YARN).
> > > Paul, I am not clear why this wouldn't work with YARN. Can you
> elaborate.
> > >
> > > -Neeraja
> > >
> > > On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers <prog...@maprtech.com>
> > wrote:
> > >
> > >> Good enough, as long as we document the limitation that this feature
> > can’t
> > >> work with YARN deployment as users generally do not have access to the
> > >> temporary “localization” directories where the Drill code is placed by
> > YARN.
> > >>
> > >> Note that the jar distribution race condition issue occurs with the
> > >> proposed design: I believe I sketched out a scenario in one of the
> > earlier
> > >> comments. Drillbit A receives the CREATE FUNCTION command. It tells
> > >> Drillbit B. While informing the other Drillbits, Drillbit B plans and
> > >> launches a query that uses the function. Drillbit Z starts execution
> of
> > the
> > >> query before it learns from A about the new function. This will be
> rare
> > —
> > >> just rare enough to create very hard to reproduce bugs.
> > >>
> > >> The only reliable solution is to do the work in multiple passes:
> > >>
> > >> Pass 1: Ask each node to load the function, but not make it available
> to
> > >> the planner. (it would be available to the execution engine.)
> > >> Pass 2: Await confirmation from each node that this is done.
> > >> Pass 3: Alert every node that it is now free to plan queries with the
> > >> function.
> > >>
> > >> Finally, I wonder if we should design the SQL syntax based on a
> > long-term
> > >> design, even if the feature itself is a short-term work-around.
> Changing
> > >> the syntax later might break scripts that users might write.
> > >>
> > >> So, the question for the group is this: is the value of semi-complete
> > >> feature sufficient to justify the potential problems?
> > >>
> > >> - Paul
> > >>
> > >>> On Jun 20, 2016, at 6:15 PM, Parth Chandra <pchan...@maprtech.com>
> > >> wrote:
> > >>>
> > >>> Moving discussion to dev.
> > >>>
> > >>> I believe the aim is to do a simple implementation without the
> > complexity
> > >>> of distributing the UDF. I think the document should make this
> > limitation
> > >>> clear.
> > >>>
> > >>> Per Paul's point on there being a simpler solution of just having
> each
> > >>> drillbit detect the if a UDF is present, I think the problem is if a
> > UDF
> > >>> get's deployed to some but not all drillbits. A query can then start
> > >>> executing but not run successfully. The intent of the create commands
> > >> would
> > >>> be to ensure that all drillbits have the UDF or none would.
> > >>>
> > >>> I think Jacques' point about ownership conflicts is not addressed
> > >> clearly.
> > >>> Also, the unloading is not clear. The delete command should probably
> > >> remove
> > >>> the UDF and unload it.
> > >>>
> > >>>
> > >>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <prog...@maprtech.com>
> > >> wrote:
> > >>>
> > >>>> Reviewed the spec; many comments posted. Three primary comments for
> > the
> > >>>> community to consider.
> > >>>>
> > >>>> 1. The desig

Re: Dynamic UDFs support

2016-06-20 Thread Neeraja Rentachintala
This will need to work with YARN (Once Drill is YARN enabled, I would
expect a lot of users using it in conjunction with YARN).
Paul, I am not clear why this wouldn't work with YARN. Can you elaborate.

-Neeraja

On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers  wrote:

> Good enough, as long as we document the limitation that this feature can’t
> work with YARN deployment as users generally do not have access to the
> temporary “localization” directories where the Drill code is placed by YARN.
>
> Note that the jar distribution race condition issue occurs with the
> proposed design: I believe I sketched out a scenario in one of the earlier
> comments. Drillbit A receives the CREATE FUNCTION command. It tells
> Drillbit B. While informing the other Drillbits, Drillbit B plans and
> launches a query that uses the function. Drillbit Z starts execution of the
> query before it learns from A about the new function. This will be rare —
> just rare enough to create very hard to reproduce bugs.
>
> The only reliable solution is to do the work in multiple passes:
>
> Pass 1: Ask each node to load the function, but not make it available to
> the planner. (it would be available to the execution engine.)
> Pass 2: Await confirmation from each node that this is done.
> Pass 3: Alert every node that it is now free to plan queries with the
> function.
>
> Finally, I wonder if we should design the SQL syntax based on a long-term
> design, even if the feature itself is a short-term work-around. Changing
> the syntax later might break scripts that users might write.
>
> So, the question for the group is this: is the value of semi-complete
> feature sufficient to justify the potential problems?
>
> - Paul
>
> > On Jun 20, 2016, at 6:15 PM, Parth Chandra 
> wrote:
> >
> > Moving discussion to dev.
> >
> > I believe the aim is to do a simple implementation without the complexity
> > of distributing the UDF. I think the document should make this limitation
> > clear.
> >
> > Per Paul's point on there being a simpler solution of just having each
> > drillbit detect the if a UDF is present, I think the problem is if a UDF
> > get's deployed to some but not all drillbits. A query can then start
> > executing but not run successfully. The intent of the create commands
> would
> > be to ensure that all drillbits have the UDF or none would.
> >
> > I think Jacques' point about ownership conflicts is not addressed
> clearly.
> > Also, the unloading is not clear. The delete command should probably
> remove
> > the UDF and unload it.
> >
> >
> > On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers 
> wrote:
> >
> >> Reviewed the spec; many comments posted. Three primary comments for the
> >> community to consider.
> >>
> >> 1. The design conflicts with the Drill-on-YARN project. Is this a
> specific
> >> fix for one unique problem, or is it worth expanding the solution to
> work
> >> with Drill-on-YARN deployments? Might be hard to make the two work
> together
> >> later. See comments in docs for details.
> >>
> >> 2. Have we, by chance, looked at how other projects handle code
> >> distribution? Spark, Storm and others automatically deploy code across
> the
> >> cluster; no manual distribution to each node. The key difference between
> >> Drill and others is that, for Storm, say, code is associated with a job
> >> (“topology” in Storm terms.) But, in Drill, functions are global and
> have
> >> no obvious life cycle that suggests when the code can be unloaded.
> >>
> >> 3. Have considered the class loader, dependency and name space isolation
> >> issues addressed by such products as Tomcat (web apps) or Eclipse
> >> (plugins)? Putting user code in the same namespace as Drill code  is
> quick
> >> & dirty. It turns out, however, that doing so leads to problems that
> >> require long, frustrating debugging sessions to resolve.
> >>
> >> Addressing item 1 might expand scope a bit. Addressing items 2 and 3
> are a
> >> big increase in scope, so I won’t be surprised if we leave those issues
> for
> >> later. (Though, addressing item 2 might be the best way to address item
> 1.)
> >>
> >> If we want a very simple solution that requires minimal change, perhaps
> we
> >> can use an even simpler solution. In the proposed design, the user still
> >> must distribute code to all the nodes. The primary change is to tell
> Drill
> >> to load (or unload) that code. Can accomplish the same result easier
> simply
> >> by having Drill periodically scan certain directories looking for new
> (or
> >> removed) jars? Still won’t work with YARN, or solve the name space
> issues,
> >> but will work for existing non-YARN Drill users without new SQL syntax.
> >>
> >> Thanks,
> >>
> >> - Paul
> >>
> >>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau 
> wrote:
> >>>
> >>> Two quick thoughts:
> >>>
> >>> - (user) In the design document I didn't see any discussion of
> >>> ownership/conflicts 

Drill & Caravel

2016-05-12 Thread Neeraja Rentachintala
Hi Folks

Caravel is nice visualization tool recently open sourced by airbnb. Did
anyone try to integrate Drill and/or interested in contributing to making
this work with Drill.

https://github.com/airbnb/caravel


-Thanks
Neeraja


Re: Proposal: Create v2 branch to work on breaking changes

2016-04-12 Thread Neeraja Rentachintala
Makes sense to postpone the debate : )
Will Look forward for the proposal.

On Tuesday, April 12, 2016, Zelaine Fong <zf...@maprtech.com> wrote:

> As we discussed at this morning's hangout, Jacques took the action to put
> together a strawman compatibility points document.  Would it be better to
> wait for that document before we debate this further?
>
> -- Zelaine
>
> On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau <jacq...@dremio.com
> <javascript:;>> wrote:
>
> > I agree with Paul, too. Perfect compatibility would be great. I recognize
> > the issues that a version break could cause.  These are some of the
> issues
> > that I believe require a version break to address:
> > - Support nulls in lists.
> > - Distinguish null maps from empty maps.
> > - Distinguish null arrays from empty arrays.
> > - Support sparse maps (analogous to Parquet maps instead of our current
> > approach analogous to structs in Parquet lingo).
> > - Clean up decimal and enable it by default.
> > - Support full Avro <> Parquet roundtrip (and Parquet files generated by
> > other tools).
> > - Enable union type by default.
> > - Improve performance execution performance of nullable values.
> >
> > I think these things need to be addressed in the 2.x line (let's say that
> > is ~12 months). This is all about tradeoffs which is why I keep asking
> > people to provide concrete impact. If you think at least one of these
> > should be resolved, you're arguing for breaking wire compatibility
> between
> > 1.x and 2.x.
> >
> > So let's get concrete:
> >
> > - How many users are running multiple clusters and using a single client
> to
> > connect them?
> > - What BI tools are most users using? What is the primary driver they are
> > using?
> > - What BI tools are packaging a Drill driver? If any, what is the update
> > process and lead time?
> > - How many users are skipping multiple Drill versions (e.g. going from
> 1.2
> > to 1.6)? (Beyond the MapR tick-tock pattern)
> > - How many users are delaying driver upgrade substantially? Are there
> > customers using the 1.0 driver?
> > - What is the average number of deployed clients per Drillbit cluster?
> >
> > These are some of the things that need to be evaluated to determine
> whether
> > we choose to implement a compatibility layer or simply make a full break.
> > (And in reality, I'm not sure we have the resources to build and carry a
> > complex compatibility layer for these changes.)
> >
> > Whatever the policy we agree upon for future commitments to the user
> base,
> > we're in a situation where there are very important reasons to move the
> > codebase forward and change the wire protocol for 2.x.
> >
> > I think it is noble to strive towards backwards compatibility. We should
> > always do this. However, I also think that--especially early in a
> product's
> > life--it is better to resolve technical debt issues and break a few eggs
> > than defer and carry a bunch of extra code around.
> >
> > Yes, it can suck for users. Luckily, we should also be giving users a
> bunch
> > of positive reasons that it is worth upgrading and dealing with this
> > version break. These include better perf, better compatibility with other
> > tools, union type support, faster bi tool behaviors and a number of other
> > things.
> >
> > I for one vote for moving forward and making sure that the 2.x branch is
> > the highest quality and best version of Drill yet rather than focusing on
> > minimizing the upgrade cost. All upgrades are a cost/benefit analysis.
> > Drill is too young to focus on only minimizing the cost. We should be
> > working to make sure the other part of the equation (benefit) is where
> > we're spending the vast majority of our time.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Apr 12, 2016 at 3:38 PM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com <javascript:;>> wrote:
> >
> > > I agree with Paul. Great points.
> > > I would also add the partners aspect to it. Majority of Drill users use
> > it
> > > in conjunction with a BI tool.
> > >
> > >
> > > -Neeraja
> > >
> > > On Tue, Apr 12, 2016 at 3:34 PM, Paul Rogers <prog...@maprtech.com
> <javascript:;>>
> > wrote:
> > >
> > > > Hi Jacques,
> > > >
> > > > My two cents…
> > > >
> > > > The unfortunate reality is that

Re: Proposal: Create v2 branch to work on breaking changes

2016-04-12 Thread Neeraja Rentachintala
I agree with Paul. Great points.
I would also add the partners aspect to it. Majority of Drill users use it
in conjunction with a BI tool.


-Neeraja

On Tue, Apr 12, 2016 at 3:34 PM, Paul Rogers <prog...@maprtech.com> wrote:

> Hi Jacques,
>
> My two cents…
>
> The unfortunate reality is that enterprise customers move slowly. There is
> a delay in the time it takes for end users to upgrade to a new release.
> When a third-party tool must also upgrade, the delay becomes even longer.
>
> At a high level, we need to provide a window of time in which old/new
> clients work with old/new servers. I may have a 1.6 client. The cluster
> upgrades to 1.8. I need time to upgrade my client to 1.8 — especially if I
> have to wait for the vendor to provide a new package.
>
> If I connect to two clusters, I may upgrade my client to 1.8 for one, but
> I still need to connect to 1.6 for the other if they upgrade on different
> schedules.
>
> This is exactly why we need to figure out a policy: how do we give users a
> sufficient window of time to complete upgrades, even across the 1.x/2.x
> boundary?
>
> The cost of not providing such a window? Broken production systems,
> unpleasant escalations and unhappy customers.
>
> Thanks,
>
> - Paul
>
> > On Apr 12, 2016, at 3:14 PM, Jacques Nadeau <jacq...@dremio.com> wrote:
> >
> >>> What I am suggesting is that we need to maintain backward
> compatibility with
> > a defined set of 1.x version clients when Drill 2.0 version is out.
> >
> > I'm asking you to be concrete on why. There is definitely a cost to
> > maintaining this compatibility. What are the real costs if we don't?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Wed, Apr 6, 2016 at 9:21 AM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> >> Jacques
> >> can you elaborate on what you mean by 'internal' implementation changes
> but
> >> maintain external API.
> >> I thought that changes that are being discussed here are the Drill
> client
> >> library changes.
> >> What I am suggesting is that we need to maintain backward compatibility
> >> with a defined set of 1.x version clients when Drill 2.0 version is out.
> >>
> >> Neeraja
> >>
> >> On Tue, Apr 5, 2016 at 12:12 PM, Jacques Nadeau <jacq...@dremio.com>
> >> wrote:
> >>
> >>> Thanks for bringing this up. BI compatibility is super important.
> >>>
> >>> The discussions here are primarily about internal implementation
> changes
> >> as
> >>> opposed to external API changes. From a BI perspective, I think (hope)
> >>> everyone shares the goal of having zero (to minimal) changes in terms
> of
> >>> ODBC and JDBC behaviors in v2. The items outlined in DRILL-4417 are
> also
> >>> critical to strong BI adoption as numerous patterns right now are
> >>> suboptimal and we need to get them improved.
> >>>
> >>> In terms of your request of the community, it makes sense to have a
> >>> strategy around this. It sounds like you have a bunch of considerations
> >>> that should be weighed but your presentation doesn't actually share
> what
> >>> the concrete details. To date, there has been no formal consensus or
> >>> commitment to any particular compatibility behavior. We've had an
> >> informal
> >>> "don't change wire compatibility within a major version". If we are
> going
> >>> to have a rich dialog about pros and cons of different approaches, we
> >> need
> >>> to make sure that everybody has the same understanding of the dynamics.
> >> For
> >>> example:
> >>>
> >>> Are you saying that someone has packaged the Apache Drill drivers in
> >> their
> >>> BI solution? If so, what version? Is this the Apache release artifact
> or
> >> a
> >>> custom version? Has someone certified them? Did anyone commit a
> >> particular
> >>> compatibility pattern to a BI vendor on behalf of the community?
> >>>
> >>> To date, I'm not aware of any of these types of decisions being
> discussed
> >>> in the community so it is hard to evaluate how important they are
> versus
> >>> other things. Knowing that DRILL-4417 is outstanding and critical to
> the
> >>> best BI experience, I think we should be very cautious of requiring
> >>> long-term support of the existing (internal) 

Re: Proposal: Create v2 branch to work on breaking changes

2016-04-06 Thread Neeraja Rentachintala
Jacques
can you elaborate on what you mean by 'internal' implementation changes but
maintain external API.
I thought that changes that are being discussed here are the Drill client
library changes.
What I am suggesting is that we need to maintain backward compatibility
with a defined set of 1.x version clients when Drill 2.0 version is out.

Neeraja

On Tue, Apr 5, 2016 at 12:12 PM, Jacques Nadeau <jacq...@dremio.com> wrote:

> Thanks for bringing this up. BI compatibility is super important.
>
> The discussions here are primarily about internal implementation changes as
> opposed to external API changes. From a BI perspective, I think (hope)
> everyone shares the goal of having zero (to minimal) changes in terms of
> ODBC and JDBC behaviors in v2. The items outlined in DRILL-4417 are also
> critical to strong BI adoption as numerous patterns right now are
> suboptimal and we need to get them improved.
>
> In terms of your request of the community, it makes sense to have a
> strategy around this. It sounds like you have a bunch of considerations
> that should be weighed but your presentation doesn't actually share what
> the concrete details. To date, there has been no formal consensus or
> commitment to any particular compatibility behavior. We've had an informal
> "don't change wire compatibility within a major version". If we are going
> to have a rich dialog about pros and cons of different approaches, we need
> to make sure that everybody has the same understanding of the dynamics. For
> example:
>
> Are you saying that someone has packaged the Apache Drill drivers in their
> BI solution? If so, what version? Is this the Apache release artifact or a
> custom version? Has someone certified them? Did anyone commit a particular
> compatibility pattern to a BI vendor on behalf of the community?
>
> To date, I'm not aware of any of these types of decisions being discussed
> in the community so it is hard to evaluate how important they are versus
> other things. Knowing that DRILL-4417 is outstanding and critical to the
> best BI experience, I think we should be very cautious of requiring
> long-term support of the existing (internal) implementation. Guaranteeing
> ODBC and JDBC behaviors should be satisfactory for the vast majority of
> situations. Anything beyond this needs to have a very public cost/benefit
> tradeoff. In other words, please expose your thinking 100x more so that we
> can all understand the ramifications of different strategies.
>
> thanks!
> Jacques
>
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Apr 5, 2016 at 10:01 AM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
> > Sorry for coming back to this thread late.
> > I have some feedback on the compatibility aspects of 2.0.
> >
> > We are working with a variety of BI vendors to certify Drill and provide
> > native connectors for Drill. Having native access from BI tools helps
> with
> > seamless experience for the users with performance and functionality.
> This
> > work is in progress and they are (and will be) working with 1.x versions
> of
> > Drill as part of the development because thats what we have now. Some of
> > these connectors will be available before 2.0 and some of them can come
> in
> > post 2.0 as certification is a long process. We don't want to be in a
> > situation where the native connectors are just released by certain BI
> > vendor and the connector is immediately obsolete or doesn't work because
> we
> > have 2.0 release out now.
> > So the general requirement should be that we maintain backward
> > compatibility with certain number of prior releases. This is very
> important
> > for the success of the project and adoption by eco system. I am happy to
> > discuss further.
> >
> > -Neeraja
> >
> > On Tue, Apr 5, 2016 at 8:44 AM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >
> > > I'm going to take this as lazy consensus. I'll create the branch.
> > >
> > > Once created, all merges to the master (1.x branch) should also go to
> the
> > > v2 branch unless we have a discussion here that they aren't applicable.
> > > When committing, please make sure to commit to both locations.
> > >
> > > thanks,
> > > Jacques
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sat, Mar 26, 2016 at 7:26 PM, Jacques Nadeau <jacq...@dremio.com>
> > > wrote:
> > >
> > > > Re Compatibility:
> > > >
> > > > I actually don't even think 1.0 cli

Re: Proposal: Create v2 branch to work on breaking changes

2016-04-05 Thread Neeraja Rentachintala
Sorry for coming back to this thread late.
I have some feedback on the compatibility aspects of 2.0.

We are working with a variety of BI vendors to certify Drill and provide
native connectors for Drill. Having native access from BI tools helps with
seamless experience for the users with performance and functionality. This
work is in progress and they are (and will be) working with 1.x versions of
Drill as part of the development because thats what we have now. Some of
these connectors will be available before 2.0 and some of them can come in
post 2.0 as certification is a long process. We don't want to be in a
situation where the native connectors are just released by certain BI
vendor and the connector is immediately obsolete or doesn't work because we
have 2.0 release out now.
So the general requirement should be that we maintain backward
compatibility with certain number of prior releases. This is very important
for the success of the project and adoption by eco system. I am happy to
discuss further.

-Neeraja

On Tue, Apr 5, 2016 at 8:44 AM, Jacques Nadeau  wrote:

> I'm going to take this as lazy consensus. I'll create the branch.
>
> Once created, all merges to the master (1.x branch) should also go to the
> v2 branch unless we have a discussion here that they aren't applicable.
> When committing, please make sure to commit to both locations.
>
> thanks,
> Jacques
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Mar 26, 2016 at 7:26 PM, Jacques Nadeau 
> wrote:
>
> > Re Compatibility:
> >
> > I actually don't even think 1.0 clients work with 1.6 server, do they?
> >
> > I would probably decrease the cross-compatibility requirement burden. A
> > nice goal would be cross compatibility across an extended series of
> > releases. However, given all the things we've learned in the last year,
> we
> > shouldn't try to maintain more legacy than is necessary. As such, I
> propose
> > that we consider the requirement of 2.0 to be:
> >
> > 1.lastX works with 2.firstX. (For example, if 1.8 is the last minor
> > release of the 1.x series, 1.8 would work with 2.0.)
> >
> > This simplifies testing (we don't have to worry about things like does
> 1.1
> > work with 2.3, etc) and gives people an upgrade path as they desire. This
> > also allows us to decide what pieces of the compatibility shim go in the
> > 2.0 server versus the 1.lastX client. (I actually lean towards allowing a
> > full break between v1 and v2 server/client but understand that that level
> > or coordination is hard in many organizations since analysts are separate
> > from IT). Hopefully, what I'm proposing can be a good compromise between
> > progress and deployment ease.
> >
> > Thoughts?
> >
> > Re: Branches/Dangers
> >
> > Good points on this Julian.
> >
> > How about this:
> >
> > - small fixes and enhancements PRs should be made against v1
> > - new feature PRs should be made against v2
> > - v2 should continue to always pass all precommit tests during its life
> > - v2 becomes master in two months
> >
> > I definitely don't want to create instability in the v2 branch.
> >
> > The other option I see is we can only do bug fix releases and branch the
> > current master into a maintenance branch and treat master as v2.
> >
> > Other ideas?
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sat, Mar 26, 2016 at 6:07 PM, Julian Hyde  wrote:
> >
> >> Do you plan to be doing significant development on both the v1 and v2
> >> branches, and if so, for how long? I have been bitten badly by that
> pattern
> >> in the past. Developers put lots of unrelated, destabilizing changes
> into
> >> v2, it look longer than expected to stabilize v2, product management
> lost
> >> confidence in v2 and shifted resources back to v1, and v2 never caught
> up
> >> with v1.
> >>
> >> One important question: Which branch will you ask people to target for
> >> pull requests? v1, v2 or both? If they submit to v2, and v2 is broken,
> how
> >> will you know whether the patches are good?
> >>
> >> My recommendation is to choose one of the following: (1) put a strict
> >> time limit of say 2 months after which v2 would become the master branch
> >> (and v1 master would become a maintenance branch), or (2) make v2
> focused
> >> on a particular architectural feature; create multiple independent
> feature
> >> branches with breaking API changes if you need to.
> >>
> >> Julian
> >>
> >>
> >> > On Mar 26, 2016, at 1:41 PM, Paul Rogers 
> wrote:
> >> >
> >> > Hi All,
> >> >
> >> > 2.0 is a good opportunity to enhance our ZK information. See
> >> DRILL-4543: Advertise Drill-bit ports, status, capabilities in
> ZooKeeper.
> >> This change will simplify YARN integration.
> >> >
> >> > This enhancement will change the “public API” in ZK. To Parth’s point,
> >> we can do so in a way that old clients work - as long as a Drill-bit
> uses
> >> default ports.
> >> >

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Neeraja Rentachintala
How is this handled for MongoDB storage plugin, which I believe a case
sensitive DB as well?

On Mon, Mar 14, 2016 at 4:27 PM, Jacques Nadeau  wrote:

> I don't think it is that simple since there are some types of things that
> we can't pushdown that will cause inconsistent results.
>
> For example, assuming that all values of x are positive, the following two
> queries should return the same result
>
> select * from hbase where x = 5
> select * from hbase where abs(x) = 5
>
> However, if the field x is sometimes 'x' and sometimes 'X', we're going to
> different results between the first query and the second. That is why I
> think we need to guarantee that even when optimization rules fails, we have
> the same plan meaning. In essence, all plans should be valid. If you get to
> a place where a rule changes the data, then the original plan was
> effectively invalid.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Mar 14, 2016 at 3:46 PM, Jinfeng Ni  wrote:
>
> > Project pushdown should always happen. If you see project pushdown
> > does not happen for your HBase query, then it's a bug.
> >
> > However, if you submit two physical plans, one with project pushdown,
> > another one without project pushdown, but they return different
> > results for HBase query. I'll not call this a bug.
> >
> >
> >
> > On Mon, Mar 14, 2016 at 2:54 PM, Jacques Nadeau 
> > wrote:
> > > Agree with Zelaine, plan changes/optimizations shouldn't change
> results.
> > > This is a bug.
> > >
> > > Drill is focused on being case-insensitive, case-preserving. Each
> storage
> > > plugin implements its own case sensitivity policy when working with
> > > columns/fields and should be documented. It isn't practical to make
> HBase
> > > case-insensitive so it should behave case sensitivity. DFS formats (as
> > > opposed to HBase) are entirely under Drill's control and thus target
> > > case-insensitive, case-preserving operation.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni 
> > wrote:
> > >
> > >> Abhishek
> > >>
> > >> Great question. Here is what I understand regarding the case sensitive
> > >> policy.
> > >>
> > >> Drill's case sensitivity policy (case insensitive and case preserving)
> > >> applies to the execution engine in Drill; it does not enforce the case
> > >> sensitivity policy to all the storage plugin. A storage plugin could
> > >> decide and implement it's own policy.
> > >>
> > >> Why would the pushdown impact the case sensitivity when query HBase?
> > >> Without project pushdown, HBase storage plugin will return all the
> > >> data, and it's up to Drill's execution Project operator to apply the
> > >> case insensitive policy.  With the project pushdown, Drill will pass
> > >> the list of column names to HBase storage plugin, and HBase decides to
> > >> apply it's case sensitivity policy when scan the data.
> > >>
> > >> Adding an option to make case sensitive storage plugin honor case
> > >> insensitive policy seems to be a good idea. The question is whether
> > >> the underneath storage (like HBase) will support such mode.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong 
> > wrote:
> > >> > Abhishek,
> > >> >
> > >> > I guess you're arguing that Drill's current behavior of honoring the
> > case
> > >> > sensitive nature of the underlying data source (in this case, HBase
> > and
> > >> > MapR-DB) will be confusing for Drill users who are accustomed to
> > Drill's
> > >> > case insensitive behavior.
> > >> >
> > >> > I can see arguments both ways.
> > >> >
> > >> > But the part I think is confusing is that the behavior differs
> > depending
> > >> on
> > >> > whether or not projections and filters are pushed down to the data
> > >> source.
> > >> > If the push down is done, then the behavior is case sensitive
> > >> > (corresponding to the data source).  But if pushdown doesn't happen,
> > then
> > >> > the behavior is case insensitive.  That difference seems
> inconsistent
> > and
> > >> > undesirable -- unless you argue that there are instances where you
> > would
> > >> > want one behavior vs the other.  But it seems like that should be
> > >> > orthogonal and separate from whether pushdowns are applied.
> > >> >
> > >> > -- Zelaine
> > >> >
> > >> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish 
> > >> wrote:
> > >> >
> > >> >> Hello all,
> > >> >>
> > >> >> As I understand, Drill by design is case-insensitive, w.r.t column
> > names
> > >> >> within a table or file [1]. While this provides great flexibility
> and
> > >> works
> > >> >> well with many data-sources, there are issues when working with
> > >> >> case-sensitive data-sources such as HBase / MapR-DB.
> > >> >>
> > >> >> Consider the following JSON file:
> > >> >>
> > >> >> {"_id": "ID1",
> > 

Re: Parallelization & Threading

2016-02-29 Thread Neeraja Rentachintala
Jacques
can you provide more context on what user/customer problem these changes
that you & Hanifi discussed are trying to solve.
Is it part of the better resource utilization or concurrency/multi tenancy
handling or both.
It will help to understand that as a background for the discussion.

-Neeraja

On Mon, Feb 29, 2016 at 9:36 PM, Jacques Nadeau  wrote:

> Hanifi and I had a great conversation late last week about how Drill
> currently provides parallelization. Hanifi suggested we move to a model
> whereby there is a fixed threadpool for all Drill work and we treat all
> operator and/or fragment operations as tasks that can be scheduled within
> that pool. This would serve the following purposes:
>
> 1. reduce the number of threads that Drill creates
> 2. Decrease wasteful context switching (especially in high concurrency
> scenarios)
> 3. Provide more predictable slas for Drill infrastructure tasks such as
> heartbeats/rpc and cancellations/planning and queue management/etc (a key
> hot-button for Vicki :)
>
> For reference, this is already the threading model we use for the RPC
> threads and is a fairly standard asynchronous programming model. When
> Hanifi and I met, we brainstormed on what types of changes might need to be
> done and ultimately thought that in order to do this, we'd realistically
> want to move iterator trees from a pull model to a push model within a
> node.
>
> After spending more time thinking about this idea, I had the following
> thoughts:
>
> - We could probably accomplish the same behavior staying with a pull model
> and using the IteraOutcome.NOT_YET to return.
> - In order for this to work effectively, all code would need to be
> non-blocking (including reading from disk, writing to socket, waiting for
> zookeeper responses, etc)
> - Task length (or coarseness) would be need to be quantized appropriately.
> While operating at the RootExec.next() might be attractive, it is too
> coarse to get reasonable sharing and we'd need to figure out ways to have
> time-based exit within operators.
> - With this approach, one of the biggest challenges would be reworking all
> the operators to be able to unwind the stack to exit execution (to yield
> their thread).
>
> Given those challenges, I think there may be another, simpler solution that
> could cover items 2 & 3 above without dealing with all the issues that we
> would have to deal with in the proposal that Hanifi suggested. At its core,
> I see the biggest issue is dealing with the unwinding/rewinding that would
> be required to move between threads. This is very similar to how we needed
> to unwind in the case of memory allocation before we supported realloc and
> causes substantial extra code complexity. As such, I suggest we use a pause
> approach that uses something similar to a semaphore for the number of
> active threads we allow. This could be done using the existing
> shouldContinue() mechanism where we suspend or reacquire thread use as we
> pass through this method. We'd also create some alternative shoudlContinue
> methods such as shouldContinue(Lock toLock) and shouldContinue(Queue
> queueToTakeFrom), etc so that shouldContinue would naturally wrap blocking
> calls with the right logic. This would be a fairly simple set of changes
> and we could see how well it improves issues 2 & 3 above.
>
> On top of this, I think we still need to implement automatic
> parallelization scaling of the cluster. Even a rudimentary monitoring of
> cluster load and parallel reduction of max_width_per_node would
> substantially improve the behavior of the cluster under heavy concurrent
> loads. (And note, I think that this is required no matter what we implement
> above.)
>
> Thoughts?
> Jacques
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>


Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-23 Thread Neeraja Rentachintala
Norris
Quick comment on your point below. The username/password passed currently
on the connection string is for authentication purposes and also used for
impersonation in case of direct connection from BI tool to Drillbit. That
continue to exist, but now the driver needs to be extended to pass an
*'additional'* user name as part of connection and this represents the end
user identity on behalf of which Drill will execute queries (there is an
intermediate hop via the BI server which we are trying to support).
Sudheesh doc has specifics on the proposal.

With regards to interfacing the impersonation feature, it looks like all
you need is the username, which is already being pass down from the
application to the client via the driver.

On Tue, Feb 23, 2016 at 11:52 AM, Norris Lee <norr...@simba.com> wrote:

> ODBC does not have any standard way to change the user for a connection,
> so like Sudheesh mentioned, I'm not sure how this would be exposed to the
> application. I believe some other databases like SQLServer let you change
> the user via SQL.
>
> With regards to interfacing the impersonation feature, it looks like all
> you need is the username, which is already being pass down from the
> application to the client via the driver.
>
> Norris
>
> -Original Message-
> From: Sudheesh Katkam [mailto:skat...@maprtech.com]
> Sent: Tuesday, February 23, 2016 8:49 AM
> To: u...@drill.apache.org
> Cc: dev <dev@drill.apache.org>
> Subject: Re: [DISCUSS] New Feature: Drill Client Impersonation
>
> > Do you have an interface proposal? I didn't see that.
>
> Are you referring to the Drill client interface to used by applications?
>
> > Also, what do you think about my comment and Keys response about moving
> pooling to the Driver and then making "connection" lightweight.
>
> An API to change the user on a connection can be easily added later (for
> now, we use a connection property). Since Drill connections are already
> lightweight, this is not an immediate problem. Unlike OracleConnection <
> https://docs.oracle.com/cd/B28359_01/java.111/b31224/proxya.htm#BABEJEIA>,
> JDBC/ ODBC do not have a provision for proxy sessions in their
> specification, so I am not entirely clear how we would expose “change user
> on connection” to applications using these API.
>
> > Connection level identity setting is only viable if the scalability
> concerns I raised in the doc and Jacques indirectly raised are addressed.
> >
> > Historically DB connections have been so expensive that most
> applications created pools of connections and reused them across users.
> That model doesn't work if each connection is tied to a single user. That's
> why the typical implementation has provided for changing the identity on an
> existing connection.
> >
> > Now, if the Drill connection is a very lightweight object (possibly
> mapping to a single heavier weight hidden process level object), then tying
> identity to the connection is fine. I don't know enough about the Drill
> architecture to comment on that but I think a good rule of thumb would be
> "is it reasonable to keep 50+ Drill connections open where each has a
> different user identity?" If the answer is no, then the design needs to
> consider the scale. I'll also add that much further in the future if/when
> Drill takes on more operational types of access that 50 connections will
> rise to a much larger number.
>
>
> Thank you,
> Sudheesh
>
> > On Feb 22, 2016, at 2:27 PM, Jacques Nadeau <jacq...@dremio.com> wrote:
> >
> > Got it, makes sense.
> >
> > Do you have an interface proposal? I didn't see that.
> >
> > Also, what do you think about my comment and Keys response about
> > moving pooling to the Driver and then making "connection" lightweight.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Feb 22, 2016 at 9:59 AM, Sudheesh Katkam
> > <skat...@maprtech.com>
> > wrote:
> >
> >> “… when creating this connection, as part of the connection
> >> properties (JDBC, C++ Client), the application passes the end user’s
> identity (e.g.
> >> username) …”
> >>
> >> I had written the change user as a session option as part of the
> >> enhancement only, where you’ve pointed out a better way. I addressed
> >> your comments on the doc.
> >>
> >> Thank you,
> >> Sudheesh
> >>
> >>> On Feb 22, 2016, at 9:49 AM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >>>
> >>> Maybe I misunderstood the design document.
> >>>
> >>> I thought this was how the

Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-22 Thread Neeraja Rentachintala
It seems to me that for phase 1, we should only have this as connection
level property and have the list of proxy users as a static bootstrap
option. Drill doesn't have a very granular privilege model other than
admins vs non-admins, so till then exposing this via system options seems
like a risk to me from a security standpoint.

-Neeraja

On Mon, Feb 22, 2016 at 9:59 AM, Sudheesh Katkam <skat...@maprtech.com>
wrote:

> “… when creating this connection, as part of the connection properties
> (JDBC, C++ Client), the application passes the end user’s identity (e.g.
> username) …”
>
> I had written the change user as a session option as part of the
> enhancement only, where you’ve pointed out a better way. I addressed your
> comments on the doc.
>
> Thank you,
> Sudheesh
>
> > On Feb 22, 2016, at 9:49 AM, Jacques Nadeau <jacq...@dremio.com> wrote:
> >
> > Maybe I misunderstood the design document.
> >
> > I thought this was how the user would be changed: "Provide a way to
> change
> > the user after the connection is made (details) through a session option"
> >
> > Did I miss something?
> >
> >
> >
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Feb 22, 2016 at 9:06 AM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> >> Jacques,
> >> I think the current proposal by Sudheesh is an API level change to pass
> >> this additional end user id during the connection establishment.
> >> Can you elaborate what you mean by random query.
> >>
> >> -Neeraja
> >>
> >> On Sun, Feb 21, 2016 at 5:07 PM, Jacques Nadeau <jacq...@dremio.com>
> >> wrote:
> >>
> >>> Sudheesh, thanks for putting this together. Reviewing Oracle
> >> documentation,
> >>> they expose this at the API level rather than through a random query. I
> >>> think we should probably model after that rather than invent a new
> >>> mechanism. This also means we can avoid things like query parsing,
> >>> execution roundtrip, query profiles, etc to provide this functionality.
> >>>
> >>> See here:
> >>>
> >>>
> https://docs.oracle.com/cd/B28359_01/java.111/b31224/proxya.htm#BABEJEIA
> >>>
> >>> --
> >>> Jacques Nadeau
> >>> CTO and Co-Founder, Dremio
> >>>
> >>> On Fri, Feb 19, 2016 at 2:18 PM, Keys Botzum <kbot...@maprtech.com>
> >> wrote:
> >>>
> >>>> This is a great feature to add to Drill and I'm excited to see design
> >> on
> >>>> it starting.
> >>>>
> >>>> The ability for an intermediate server that is likely already
> >>>> authenticating end users, to send end user identity down to Drill adds
> >> a
> >>>> key element into an end to end secure design by enabling Drill and the
> >>> back
> >>>> end systems to see the real user and thus perform meaningful
> >>> authorization.
> >>>>
> >>>> Back when I was building many JEE applications I know the DBAs where
> >> very
> >>>> frustrated that the application servers blinded them to the identity
> of
> >>> the
> >>>> end user accessing important corporate data. When JEE application
> >> servers
> >>>> and databases finally added the ability to impersonate that addressed
> a
> >>> lot
> >>>> of security concerns. Of course this isn't a perfect solution and I'm
> >>> sure
> >>>> others will recognize that in some scenarios impersonation isn't the
> >> best
> >>>> approach, but having that as an option in Drill is very valuable.
> >>>>
> >>>> Keys
> >>>> ___
> >>>> Keys Botzum
> >>>> Senior Principal Technologist
> >>>> kbot...@maprtech.com <mailto:kbot...@maprtech.com>
> >>>> 443-718-0098
> >>>> MapR Technologies
> >>>> http://www.mapr.com <http://www.mapr.com/>
> >>>>> On Feb 19, 2016, at 4:49 PM, Sudheesh Katkam <skat...@maprtech.com>
> >>>> wrote:
> >>>>>
> >>>>> Hey y’all,
> >>>>>
> >>>>> I plan to work on DRILL-4281 <
> >>>> https://issues.apache.org/jira/browse/DRILL-4281>: support for
> >>>> inbound/client impersonation. Please review the design document <
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1g0KgugVdRbbIxxZrSCtO1PEHlvwczTLDb38k-npvwjA
> >>>> ,
> >>>> which is open for comments. There is also a link to proof-of-concept
> >>>> (slightly hacky).
> >>>>>
> >>>>> Thank you,
> >>>>> Sudheesh
> >>>>
> >>>>
> >>>
> >>
>
>


Re: [DISCUSS] New Feature: Drill Client Impersonation

2016-02-22 Thread Neeraja Rentachintala
Jacques,
I think the current proposal by Sudheesh is an API level change to pass
this additional end user id during the connection establishment.
Can you elaborate what you mean by random query.

-Neeraja

On Sun, Feb 21, 2016 at 5:07 PM, Jacques Nadeau  wrote:

> Sudheesh, thanks for putting this together. Reviewing Oracle documentation,
> they expose this at the API level rather than through a random query. I
> think we should probably model after that rather than invent a new
> mechanism. This also means we can avoid things like query parsing,
> execution roundtrip, query profiles, etc to provide this functionality.
>
> See here:
>
> https://docs.oracle.com/cd/B28359_01/java.111/b31224/proxya.htm#BABEJEIA
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Feb 19, 2016 at 2:18 PM, Keys Botzum  wrote:
>
> > This is a great feature to add to Drill and I'm excited to see design on
> > it starting.
> >
> > The ability for an intermediate server that is likely already
> > authenticating end users, to send end user identity down to Drill adds a
> > key element into an end to end secure design by enabling Drill and the
> back
> > end systems to see the real user and thus perform meaningful
> authorization.
> >
> > Back when I was building many JEE applications I know the DBAs where very
> > frustrated that the application servers blinded them to the identity of
> the
> > end user accessing important corporate data. When JEE application servers
> > and databases finally added the ability to impersonate that addressed a
> lot
> > of security concerns. Of course this isn't a perfect solution and I'm
> sure
> > others will recognize that in some scenarios impersonation isn't the best
> > approach, but having that as an option in Drill is very valuable.
> >
> > Keys
> > ___
> > Keys Botzum
> > Senior Principal Technologist
> > kbot...@maprtech.com 
> > 443-718-0098
> > MapR Technologies
> > http://www.mapr.com 
> > > On Feb 19, 2016, at 4:49 PM, Sudheesh Katkam 
> > wrote:
> > >
> > > Hey y’all,
> > >
> > > I plan to work on DRILL-4281 <
> > https://issues.apache.org/jira/browse/DRILL-4281>: support for
> > inbound/client impersonation. Please review the design document <
> >
> https://docs.google.com/document/d/1g0KgugVdRbbIxxZrSCtO1PEHlvwczTLDb38k-npvwjA
> >,
> > which is open for comments. There is also a link to proof-of-concept
> > (slightly hacky).
> > >
> > > Thank you,
> > > Sudheesh
> >
> >
>


Re: resource manager proposal -- initial write-ups

2016-01-20 Thread Neeraja Rentachintala
Hanifi
thanks for putting this together.
Seems like the doc has only view rights.
Can you share it such that we can add comments.

-thanks

On Wed, Jan 20, 2016 at 10:59 AM, Hanifi GUNES 
wrote:

> Folks,
>
> I have been working on designing the new resource manager. I have a
> *moving*
> design document living at [1]. Note that this is purely a work-in-progress,
> has a lot of incomplete pieces. In the meantime, however, you can get the
> broad idea of what we are targeting there as well as dropping your feedback
> on the side for further clarification, suggestions etc.
>
>
> Cheers.
> -Hanifi
>
> 1: https://goo.gl/rpcjVR
>


Re: [VOTE] Release Apache Drill 1.3.0 (rc2)

2015-11-12 Thread Neeraja Rentachintala
Same question as Steven. What is the impact. It seemed from the bug the
impact is both wrong results and partition pruning not working. Drill is a
generally available project now and there are users who created millions of
parquet files using previous version of Drill.
Hence I think is a release blocker unless there is a clean path and proper
testing on how the users migrate to new versions.

-thanks

On Thu, Nov 12, 2015 at 10:43 AM, Steven Phillips  wrote:

> Does DRILL-4070 cause incorrect results? Or just prevent partition pruning?
>
> On Thu, Nov 12, 2015 at 10:32 AM, Jason Altekruse <
> altekruseja...@gmail.com>
> wrote:
>
> > I just commented on the JIRA, we are behaving correctly for newly created
> > parquet files. I did confirm the failure to prune on auto-partitioned
> files
> > created by 1.2. I do not think this is a release blocker, because I do
> not
> > think we can solve this in Drill code without risking wrong results over
> > parquet files written by other tools. I do support the creation of a
> > migration utility for existing files written by Drill 1.2, but this can
> be
> > released independent of 1.3.
> >
> >
> > On Thu, Nov 12, 2015 at 10:26 AM, Jinfeng Ni 
> > wrote:
> >
> > > Agree with Aman that DRILL-4070 is a show stopper. Parquet is the
> > > major data source Drill uses. If this release candidate breaks the
> > > backward compatibility of partitioning pruning for the parquet files
> > > created with prior release of Drill, it could cause serious problem
> > > for the current Drill user.
> > >
> > > -1
> > >
> > >
> > >
> > > On Thu, Nov 12, 2015 at 10:10 AM, rahul challapalli
> > >  wrote:
> > > > -1 (non-binding)
> > > > The nature of the issue (DRILL-4070) demands adequate testing even
> > with a
> > > > workaround in place.
> > > >
> > > > On Thu, Nov 12, 2015 at 9:32 AM, Aman Sinha 
> > > wrote:
> > > >
> > > >> Given this issue, I would be a -1  unfortunately.
> > > >>
> > > >> On Thu, Nov 12, 2015 at 8:42 AM, Aman Sinha 
> > > wrote:
> > > >>
> > > >> > Can someone familiar with the parquet changes take a look at
> > > DRILL-4070 ?
> > > >> > It seems to break backward compatibility.
> > > >> >
> > > >> > On Tue, Nov 10, 2015 at 9:51 PM, Jacques Nadeau <
> jacq...@dremio.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> >> Hey Everybody,
> > > >> >>
> > > >> >> I'd like to propose a new release candidate of Apache Drill,
> > version
> > > >> >> 1.3.0.  This is the third release candidate (rc2).  This
> addresses
> > > some
> > > >> >> issues identified in the the second release candidate including
> > some
> > > >> test
> > > >> >> issues & rpc concurrency issues.
> > > >> >>
> > > >> >> The tarball artifacts are hosted at [2] and the maven artifacts
> are
> > > >> hosted
> > > >> >> at [3]. This release candidate is based on commit
> > > >> >> 13ab6b1f9897ebcf9179407ffaf84b79b0ee95a1 located at [4].
> > > >> >> The vote will be open for 72 hours ending at 10PM Pacific,
> November
> > > 13,
> > > >> >> 2015.
> > > >> >>
> > > >> >> [ ] +1
> > > >> >> [ ] +0
> > > >> >> [ ] -1
> > > >> >>
> > > >> >> thanks,
> > > >> >> Jacques
> > > >> >>
> > > >> >> [1]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332946
> > > >> >> [2]http://people.apache.org/~jacques/apache-drill-1.3.0.rc2/
> > > >> >> [3]
> > > >> >>
> > >
> https://repository.apache.org/content/repositories/orgapachedrill-1013/
> > > >> >> [4] https://github.com/jacques-n/drill/tree/drill-1.3.0
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Jacques Nadeau
> > > >> >> CTO and Co-Founder, Dremio
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > >
> >
>


Re: Request for more feedback on "Support the Ability to Identify And Skip Records" design

2015-10-27 Thread Neeraja Rentachintala
Jacques
Thanks for the details.
I am trying to understand whats the difference between 3 & 4.
Here is how I am thinking of the scenario. Its probably better to discuss
this in hang out.

I have some data coming from an external system and I expect them to be in
certain format. Checked first couple of rows and it seem to be sticking to
the format. I have written a Drill query or a view to interpret the data
(for ex: converting certain fields to a date or timestamp, casting to a
specific type etc). However certain records seem to be corrupted such as
prefixed with non-printable characters. I need to have special handling for
these records (for ex: I want to identify what these are so I can either
fix them or choose to skip them). I am still in data exploration phase at
this point.

There is an extension use case of ETL/Data import where I probably have
millions of text files coming in and I am using Drill to convert all of
this to Parquet using CTAS. Some of the records in these files could be
corrupted and I need special handling for these (potentially skip them or
move them to a separate file) without interrupting the whole data
conversion.

-Neeraja

On Tue, Oct 27, 2015 at 8:52 AM, Jacques Nadeau <jacq...@dremio.com> wrote:

> There seem to be multiple user requirements that are being considered in
> Hsuan & Juliens' proposals:
>
> 1. Drill doesn't have enough information to parse my data, I want to give
> Drill help. (Examples might me: the field delimiter is "|", the proto idl
> encoding for a protobuf file is "...", provide an external avro schema )
> 2. While Drill can parse my data, the structure output is incomplete. It
> may be missing field types and/or field names. I want to tell Drill how to
> interpret that data since the format itself doesn't provide an adequate way
> to express this (typically text files as opposed to json, parquet)
> 3. I've defined an expected structure to my data files. If some records
> don't match that, I want to have special handling to manage those records
> (e.g. drop, warn number of drops, create separate file with provenance of
> each failing record)
> 4. I have an arbitrary query and I want any data-specific execution
> failures to be squelched to allow the query to complete with whatever data
> remains.
>
> My recommendation is that we have three new features:
>
> A. table with options (what julien is working on)
> B. .drill files (https://issues.apache.org/jira/browse/DRILL-3572)
> C. alter table ascribe metadata (to create a .drill file through sql)
> D. Support using table with options (A) to override settings in .drill (B)
>
> I believe that A & B (and C since it is simply a derivative of B) should
> provide the capability to achieve requirements 1-3 above.
>
> When Neeraja talks of the exploration use case, feature A is probably the
> most common way that people will do this. In the case of use case 3 above,
> if someone wants to use a "recordPositionAndError" behavior (see
> DRILL-3572), they will most likely want to do that in the context of a
> query (as opposed to a view or .drill).  As such, you would probably create
> a .drill file that did warn or ignore. Then layer over the top (via feature
> D) a recordPositionAndError if you want that for a certain situation.
>
> My main thought on Hsuan's initial proposal is it seems to try to provide
> an incomplete resolution of #4 above. It isn't clear to me that use case #4
> is a critical use case for most users. If it is, can we get some concrete
> examples of it as opposed to use cases 1-3? If it is a critical use case, I
> think we should solve it in a more general way (for example I don't think
> we should try to maintain file-based record provenance in that context).
> Among other things, the current proposal has the weird problem of not being
> consistent in how the user experiences the behavior (depending on what plan
> Drill decides to execute.)
>
> Note, there were some questions about how 1-3 could be solved using B so
> I've provided an example in the Jira:
> https://issues.apache.org/jira/browse/DRILL-3572
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Oct 26, 2015 at 4:09 PM, Zelaine Fong <zf...@maprtech.com> wrote:
>
> > My understanding of Jacques' proposal is that he suggests we use .drill
> > instead of requiring the user to do an explicit cast in their select
> > query.  That way, the changes for enhancement would be restricted to the
> > scanner.
> >
> > Did I interpret the alternative approach correctly?
> >
> > -- Zelaine
> >
> > On Mon, Oct 26, 2015 at 4:05 PM, Hsuan Yi Chu <hyi...@maprtech.com>
> wrote:
> >
> > > Hi,
> > >
> > > Luckily, we

Re: Request for more feedback on "Support the Ability to Identify And Skip Records" design

2015-10-26 Thread Neeraja Rentachintala
Jacques
I have responded to one of your comments on the doc.
can you pls review and comment. I am not clear on the approach you are
suggesting using .drill and what would that mean to user experience. It
would be great if you can add an example.

Similar to other thread (initiated by Julien) we have around being able to
provide file parsing hints from the query itself for self service data
exploration purposes, we need this feature to be fairly light weight from a
user experience point of view. i.e me as a business user got hold of some
external data, want to take a look by running adhoc queries on Drill , I
should be able to do it without having to go through whole setup of .drill
etc which will come later as the data is 'operationalized'

thanks
-Neeraja

On Mon, Oct 26, 2015 at 2:49 PM, Jacques Nadeau  wrote:

> Hsuan was kind enough to put together a provocative discussion on the
> mailing list about skipping records. I've started a way too long thread in
> the comments discussion but would like to get other feedback from the
> community. The main point of contention I have is that the big goal of this
> design is to provide "data import" like capabilities for Drill. In that
> context, I suggested a scan based approach to schema enforcement (and bad
> record capture/storage). I think it is a simpler approach and solves the
> vast majority of user needs. Hsuan's initial proposal was a much broader
> reaching proposal that supports an arbitrary number of expression types
> within project and filter (assuming they are proximate to the scan).
>
> Would love to get others feedback and thoughts on the doc to what the MVP
> for this feature really is.
>
>
> https://docs.google.com/document/d/1jCeYW924_SFwf-nOqtXrO68eixmAitM-tLngezzXw3Y/edit
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>


Re: [VOTE] Release Apache Drill 1.2.0 RC3

2015-10-14 Thread Neeraja Rentachintala
+1 (non-binding)
Downloaded the tar and exercised the yelp tutorials.

On Wed, Oct 14, 2015 at 3:33 PM, Jinfeng Ni  wrote:

> +1
>
> 1. Download source and binary tar ball. Do a full maven build from source.
> 2. Run yelp queries listed at Apache Drill doc [1].
> 3. Checked profiles through Web UI.
> 4. Verify checksum for both source and binary tar ball files.
>
> All look good.
>
> Thanks,
>
> Jinfeng
>
> [1]. https://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/
>
>
> On Wed, Oct 14, 2015 at 1:08 PM, Venki Korukanti
>  wrote:
> > +1.
> >
> > Built from source
> > Installed on 3 node cluster
> > Ran few queries from sqlline and WebUI
> > Ran few checks and queries to verify HTTPS on Web UI and Hive native
> > parquet reader are working.
> >
> > Thanks
> > Venki
> >
> > On Wed, Oct 14, 2015 at 1:04 PM, Parth Chandra 
> wrote:
> >
> >> +1.
> >>
> >> Downloaded source. Verified checksums
> >> Built from source (MacOS)
> >> Built C++ client from source (MacOS)
> >> Tested multiple parallel queries (both sync and async APIs) via C++
> client
> >> query submitter. Tested cancel from the c++client.
> >>
> >> Looks good.
> >>
> >>
> >>
> >>
> >> On Mon, Oct 12, 2015 at 7:28 AM, Abdel Hakim Deneche <
> >> adene...@maprtech.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I propose a fourth release candidate of Apache Drill 1.2.0
> >> >
> >> > The tarball artifacts are hosted at [1] and the maven artifacts are
> >> hosted
> >> > at [2].
> >> >
> >> > The vote will be open for the next 72 hours ending at 8AM Pacific,
> >> October
> >> > 15, 2015.
> >> >
> >> > [ ] +1
> >> > [ ] +0
> >> > [ ] -1
> >> >
> >> > Here is my vote:
> >> >
> >> > +1
> >> >
> >> > thanks,
> >> > Hakim
> >> >
> >> > [1] http://people.apache.org/~adeneche/apache-drill-1.2.0-rc3/
> >> > [2]
> >> https://repository.apache.org/content/repositories/orgapachedrill-1009
> >> >
> >> > --
> >> >
> >> > Abdelhakim Deneche
> >> >
> >> > Software Engineer
> >> >
> >> >   
> >> >
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >> >
> >>
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >> > >
> >> >
> >>
>


Re: [VOTE] Release Apache Drill 1.2.0 RC2

2015-10-09 Thread Neeraja Rentachintala
+1 (non-binding)

Tested embedded mode using yelp JSON tutorials from
https://drill.apache.org/docs/tutorials-introduction/

thanks
Neeraja

On Thu, Oct 8, 2015 at 2:08 PM, Abdel Hakim Deneche 
wrote:

> Hi all,
>
> I'm enjoying the release management so much that I decided to propose a
> third RC of Apache Drill 1.2.0
>
> The tarball artifacts are hosted at [1] and the maven artifacts are hosted
> at [2].
>
> The vote will be open for the next 72 hours ending at 2PM Pacific, October
> 11, 2015.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> thanks,
> Hakim
>
> [1] http://people.apache.org/~adeneche/apache-drill-1.2.0-rc2/
> [2] https://repository.apache.org/content/repositories/orgapachedrill-1008
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>


Re: Refresh Table Metadata : Cache file owner

2015-09-21 Thread Neeraja Rentachintala
yes, having Drill process being the owner of the metadata cache makes sense
since the goal of the cache is to speed up planning time.
However Drill should not leak any information the user doesn't have access
to as a result of this.
Specifically we need to ensure the metadata queries/queries when
impersonation enabled return consistent results whether the cache is
enabled or not.

-Neeraja


On Mon, Sep 21, 2015 at 10:05 PM, Jacques Nadeau  wrote:

> I don't know if this is how it was specified but this makes sense to me.
> The metadata cache is Drill's information, not the user's.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Sep 21, 2015 at 9:52 PM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
> > This is with impersonation enabled.
> >
> > On Mon, Sep 21, 2015 at 7:39 PM, mehant baid 
> > wrote:
> >
> > > Is impersonation enabled when you perform the refresh?
> > >
> > > On Monday, September 21, 2015, rahul challapalli <
> > > challapallira...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > With the newly checked-in refresh metadata cache feature, I see that
> > the
> > > > cache file is always created as the user who started the drillbit
> > process
> > > > and has nothing to do with the user who has issued the "refresh table
> > > > metadata" command. Can someone from the dev verify this?
> > > >
> > > > - Rahul
> > > >
> > >
> >
>


Re: [DISCUSS] Drop table support

2015-08-05 Thread Neeraja Rentachintala
I think enabling drop only when security is enabled is too restrictive.

On Wed, Aug 5, 2015 at 12:46 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com
 wrote:

  To prevent any catastrophic drops checking for homogenous file formats
  makes sure that at least the directory being dropped is something that
 can
  be read by Drill.


 Or we could just disable drop unless permissions can be enforced.



Re: [DISCUSS] Drop table support

2015-08-05 Thread Neeraja Rentachintala
Another question/comment.

Does Drill need to manage concurrency for the Drop table i.e how do you
deal with users trying to read the data while somebody is dropping. Does it
need to implement some kind of locking.

I have some thoughts on that but would like to know others think - Drill is
not (yet) a transactional system but rather an interactive query layer on
variety of stores. The couple of most common use cases I can think of in
this context  are - a user doing analytics/exploration and as part of it he
would create some intermediate tables, insert data into them and drop the
tables or BI tools generating these intermediate tables for processing
queries. Both these do not have the concurrency issue..
Additionally given that the data is externally managed, there could always
be other processes adding and deleting files and Drill doesn't even have
control over them.
Overall, I think the first phase of DROP implementation might be ok not to
have these locking/concurrency checks.

Thoughts?

-Neeraja





On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote:

 What you are suggesting makes sense in the case when security is enabled.
 So when Drill is accessing the file system it will impersonate the user who
 issued the command and drop will happen if the user has sufficient
 permissions.

 However when security isn't enabled, Drill will be accessing the file
 system as the Drill user itself which is most likely to be a super user who
 has permissions to delete most files. To prevent any catastrophic drops
 checking for homogenous file formats makes sure that at least the directory
 being dropped is something that can be read by Drill. This will prevent any
 accidental drops (like dropping the home directory etc, because its likely
 to have file formats that cannot be read by Drill). This will not prevent
 against malicious behavior (for handling this security should be enabled).

 Thanks
 Mehant

 On 8/5/15 11:43 AM, Ted Dunning wrote:

 Is any check really necessary?

 Can't we just say that for data sources that are file-like that drop is a
 rough synonym for rm? If you have permission to remove files and
 directories, you can do it.  If you don't, it will fail, possibly half
 done. I have never seen a bug filed against rm to add more elaborate
 semantics, so why is it so necessary for Drill to have elaborate semantics
 here?



 On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:

 The homogenous check- Will it be just checking for types are homogenous or
 if they are actually types that can be read by drill?
 Also, is there a good way to determine if a file can be read by drill?
 And
 will there be a perf hit if there are large number of files?

 Regards
 Ramana


 On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
 wrote:

 I agree, it is definitely restrictive. We can lift the restriction for
 being able to drop a table (when security is off) only if the Drill user
 owns it. I think the check for homogenous files should give us enough
 confidence that we are not deleting a non Drill directory.

 Thanks
 Mehant


 On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

 Ted, thats fair point on the recovery part.

 Regarding the other point by Mehant (copied below) ,there is an
 implication
 that user can drop only Drill managed tables (i.e created as Drill
 user)
 when security is not enabled. I think this check is too restrictive

 (also

 unintuitive). Drill doesn't have the concept of external/managed tables
 and
 a user (impersonated user if security is enabled or Drillbit service

 user

 if no security is enabled) should be able to drop the table if they have
 permissions to do so. The above design proposes a check to verify if
 the
 files that need to be deleted are readable by Drill and I believe is a
 good
 validation to have.

 /The above check is in the case when security is not enabled. Meaning
 we
 are executing as the Drill user. If we are running as the Drill user
 (which
 might be root or a super user) its likely that this user has
 permissions
 to
 delete most files and checking for permissions might not suffice. So

 when

 security isn't enabled the proposal is to delete only those files that

 are

 owned (created) by the Drill user./


 On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 

 nrentachint...@maprtech.com wrote:

 Also will there any mechanism to recover once you accidentally drop?

 yes.  Snapshots 
 https://www.mapr.com/resources/videos/mapr-snapshots

 .

 Seriously, recovery of data due to user error is a platform thing.  How
 can
 we recover from turning off the cluster?  From removing a disk on an
 Oracle
 node?

 I don't think that this is Drill's business.






Re: [DISCUSS] Drop table support

2015-08-04 Thread Neeraja Rentachintala
Ted, thats fair point on the recovery part.

Regarding the other point by Mehant (copied below) ,there is an implication
that user can drop only Drill managed tables (i.e created as Drill user)
when security is not enabled. I think this check is too restrictive (also
unintuitive). Drill doesn't have the concept of external/managed tables and
a user (impersonated user if security is enabled or Drillbit service user
if no security is enabled) should be able to drop the table if they have
permissions to do so. The above design proposes a check to verify if the
files that need to be deleted are readable by Drill and I believe is a good
validation to have.

/The above check is in the case when security is not enabled. Meaning we
are executing as the Drill user. If we are running as the Drill user (which
might be root or a super user) its likely that this user has permissions to
delete most files and checking for permissions might not suffice. So when
security isn't enabled the proposal is to delete only those files that are
owned (created) by the Drill user./


On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 
 nrentachint...@maprtech.com wrote:

  Also will there any mechanism to recover once you accidentally drop?
 

 yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots.

 Seriously, recovery of data due to user error is a platform thing.  How can
 we recover from turning off the cluster?  From removing a disk on an Oracle
 node?

 I don't think that this is Drill's business.



Re: [DISCUSS] Drop table support

2015-07-30 Thread Neeraja Rentachintala
Few questions/comments inline.

On Thu, Jul 30, 2015 at 2:53 PM, mehant baid baid.meh...@gmail.com wrote:

  Based on the discussion in the hangout I wanted to start a thread around
 Drop table support.

 Couple of high level points about what is planned to be supported

 1. In the first iteration Drop table will only support dropping tables in
 the file system and not dropping tables in Hive/ Hbase or other storage
 plugins.
 2. Since Drop table is potentially risky we want to be pessimistic about
 dropping tables.

 There are two broad scenarios while dealing with Drop table - Security
 enabled and Security Disabled. In both cases we would like to follow the
 below workflow

 1. Check if the table being dropped can be consumed by Drill.

[Neeraja] I am assuming if security is enabled, this is done with the
impersonated user identity. is this accurate.

 * Meaning do all the files in the directories conform to a format that
 Drill can read (parquet, json, csv etc). Jacques pointed out that if there
 is a bug in this logic where if one of the files in the directory conforms
 to a format that Drill can read we create a DrillTable and error out if we
 encounter other files we cannot read.

[Neeraja] What does it mean to create DrillTable here?

 * The above point can in the worst case entail reading the entire file
 system, if a user issues a drop table command on the root of the file
 system. But its more likely that we will encounter a file that Drill cannot
 read soon and abort the Drop with an error.
 * Another minor clarification is we consider only those directories to
 be consumable by Drill if they contain file formats that are homogenous and
 can be read by Drill. For eg: we should fail if a user is trying to delete
 a directory that contains both JSON and Parquet files.


 2. Once we have confirmed that the table requested to be dropped contains
 homogenous files which can be read by Drill, we delve into the file
 permissions.
 * If security is enabled, we impersonate the user issuing the command
 and drop the directory (succeeds if FS allows and user has correct
 permissions).
 * If security is not enabled, we only drop the directory if all the
 files are owned by the user Drillbit is running as (being pessimistic about
 drop). We should collect this information when checking for homogenous
 files.

[Neeraja] Why do we need this check. How is this different from the
impersonated user scenario.


 Open Questions:

 Views: How do we handle views that were created on top of the dropped
 table. Following are a couple of scenarios we might want to explore
 * Views are treated as a different entity and its useful for the user
 to have a view definition still in place as the dropped table will be
 replaced with new set of files with the exact schema and existing view
 definition suffices. AFAIK, Oracle and SQL Server have this model and don't
 drop the views if the base table is dropped.
 * Once the table is dropped, the view definition is no longer needed
 and hence should be dropped automatically. We can probably punt on this
 till we have dotdrill files. With dotdrill files we can maintain some
 information to indicate the views on this table and can drop the views
 implicitly. But given that some of the popular databases don't do this, we
 might want to conform to the standard behavior.

[Neeraja] Agree with the recommendation here. It seems we can go with a
simpler approach here i.e treat views as different entity

Also will there any mechanism to recover once you accidentally drop?

 Thanks
 Mehant



Re: Unable to run drill...

2015-06-12 Thread Neeraja Rentachintala
What error are you getting

On Friday, June 12, 2015, Mehul Trivedi trivedimeh...@gmail.com wrote:

 + dev@drill.apache.org javascript:;

 Deva team - anybody can help?

 Mehul Trivedi
 trivedimeh...@gmail.com javascript:;
 +91.999.875.1555

 On Fri, Jun 12, 2015 at 8:20 PM, Mehul Trivedi trivedimeh...@gmail.com
 javascript:;
 wrote:

  Hello Jacques
 
  Sorry to bring you in this but I could not find any help on net for the
  issue I am facing.
 
  I am working on a research project where we want a schema free engine to
  talk to cloud big data behind some application logic interfaces.
 
  Drill was my first choice when I heard it got out. I downloaded and tried
  to run on WIN but its giving me an exception while starting itself.
 
  Can you guide me right direction to whom I can talk to or email or phone
  call or chat with to have this sorted out?
 
  Thanks
 
 
  Mehul Trivedi
  trivedimeh...@gmail.com javascript:;
  +91.999.875.1555
 



Re: question about correlated arrays and flatten

2015-05-29 Thread Neeraja Rentachintala
Ted
can you pls give an example with few data elements in a, b and the expected
output you are looking from the query.

-Neeraja

On Fri, May 29, 2015 at 6:43 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 I have two arrays.  Their elements are correlated times and values.  I
 would like to flatten them into rows, each with two elements.

 The query

select flatten(a), flatten(b) from ...

 doesn't work because I get the cartesian product (of course).  The query

select flatten(a, b) from ...

 also doesn't work because flatten doesn't have a multi-argument form.

 Going crazy, this query kind of sort of almost works, but not really:

  select r.x.`key`, flatten(r.x.`value`)  from (

  select flatten(kvgen(x)) as x from ...) r;

 What I really want to see is something like this:
select zip(flatten(a), flatten(b)) from ...

 Any pointers?  Is my next step to write a UDF?



Re: [VOTE][RESULT] Release Apache Drill 1.0.0 (rc1)

2015-05-18 Thread Neeraja Rentachintala
great achievement. Congrats team.

On Mon, May 18, 2015 at 9:42 PM, Aditya adityakish...@gmail.com wrote:

 Great milestone!!! One of many more to come!!!

 On Mon, May 18, 2015 at 9:27 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 
  Awesome work all round.
 
  Sent from my iPhone
 
   On May 18, 2015, at 20:49, Abhishek Girish agir...@mapr.com wrote:
  
   Congrats all!
  
   On Monday, May 18, 2015, Jim Bates jba...@maprtech.com wrote:
  
   Congrats!
  
   On Mon, May 18, 2015 at 10:36 PM, Julien Le Dem
  jul...@twitter.com.invalid
  
   wrote:
  
   congrats!
  
   On Mon, May 18, 2015 at 8:01 PM, Jacques Nadeau jacq...@apache.org
   javascript:;
   wrote:
  
   Congrats everybody, looks like our 1.0.0 (rc1) release passed.  I'll
   get
   artifacts propagated and send a release email out tomorrow.
  
   Final Tally:
  
   12 x +1 (binding)
   Aman, Hanifi, Bridget, Mehant, Aditya, Parth, Venki, Jinfeng, Tomer,
   Jason,
   Steven, Jacques
  
   7 x +1 (non-binding)
   Neeraja, Hakim, Sudheesh, Ramana, Abhishek, Hsuan, Chun
  
   Thanks Everyone!
  
   Jacques
  
  
   On Mon, May 18, 2015 at 5:46 PM, Chun Chang cch...@maprtech.com
   javascript:; wrote:
  
   +1 (non-binding)
  
   This release candidate is a lot more stable in term of resource
   handling. I
   bombarded a four node cluster with queries from multiple concurrent
   threads
   for 72 hours, DRILL is stable and performed well. No performance
   degradation noticed.
  
   On Mon, May 18, 2015 at 2:28 PM, Steven Phillips 
   sphill...@maprtech.com javascript:;
  
   wrote:
  
   +1 (binding)
  
   On Sun, May 17, 2015 at 11:31 PM, Hsuan Yi Chu 
   hyi...@maprtech.com javascript:;
   wrote:
  
   +1 (non-binding)
   1. Downloaded the source to run mvn clean install
   2. Deployed 2-node Drill and tried a some queries, simple and
   nested
   ones
  
   Things all worked fine. :)
  
  
   On Sun, May 17, 2015 at 8:50 PM, Abhishek Girish 
   agir...@mapr.com javascript:;
  
   wrote:
  
   +1 (non-binding)
  
   - Tried out various queries on sqlline. Looked good.
   - Ran TPC-DS SF 100 queries successfully using the test
   framework.
   Saw
   no
   regressions.
   - Drill UI is responsive. Profile page shows updated
   information.
   Saw
   an
   issue, but I'm guessing not a blocker.
  
   Regards,
   Abhishek
  
   On Sun, May 17, 2015 at 6:40 PM, Jason Altekruse 
   altekruseja...@gmail.com javascript:;
  
   wrote:
  
   +1 binding
  
   Downloaded the source and binary tarballs, built the source
   and
   all
   unit
   tests passed. Ran a few queries in embedded mode and checked
   a
   few
   pages
   on
   the web UI.
  
   Great work everyone! Even Cyanide and Happiness got into the
   drill
   spirit
   to celebrate the release!
  
   http://explosm.net/comics/3929
  
  
   On Sun, May 17, 2015 at 3:17 PM, Ramana Inukonda 
   rinuko...@maprtech.com javascript:;
  
   wrote:
  
   +1(non binding)
   Did the following things:
  
 1. Built from downloaded source tar, skipped tests as I
   see
   plenty
   of
 unit tests successes.
 2. Deployed built tar on 2 centos clusters- 8 node and 3
   node.
 3. Ran some queries against various sources- hive,
   hbase,
   parquet,
   csv
 and json. Verified results.
 4. Ran TPCH queries against a SF100 dataset and verified
   results
   there.
 5. Looked to cancel long running  queries from sqlline
   before
   and
   after
 returning results. sqlline still seems fine and able to
   execute
   queries.
 6. Changed system and session options from sqlline and
   verified
   if
   they
 take effect.
 7. Checked profiles page and storage plugin page from
   the
   web
   UI
   and
 updated storage plugins and verified as updated
   successfully.
  
   Congrats guys, looks like a good release!
  
   On Sun, May 17, 2015 at 9:45 AM, Tomer Shiran 
   tshi...@gmail.com javascript:;
  
   wrote:
  
   +1 (binding)
  
   Downloaded binary
   Ran Drill in embedded mode with the new alias
   (bin/drill-embedded)
   Ran some queries on the Yelp dataset on local files (Mac)
   and
   MongoDB
   Found that David is the most popular name on Yelp and
   that
   reviews
   are
   more
   often useful than funny or cool (based on Yelp votes)
  
   Congrats!
  
   On Fri, May 15, 2015 at 8:03 PM, Jacques Nadeau 
   jacq...@apache.org javascript:;
   wrote:
  
   Hey Everybody,
  
   I'm happy to propose a new release of Apache Drill,
   version
   1.0.0.
   This
   is
   the second release candidate (rc1).  It includes a few
   issues
   found
   earlier
   today and covers a total of 228 JIRAs*.
  
   The vote will be open for 72 hours ending at 8pm
   Pacific,
   May
   18,
   2015.
  
   [ ] +1
   [ ] +0
   [ ] -1
  
   thanks,
   Jacques
  
   [1]
  
  
  
  
  
  
  
  
  
  
  
  
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820version=12325568
   [2]
   http://people.apache.org/~jacques/apache-drill-1.0.0.rc1/
  
  
   *Note, 

Re: [VOTE] Release Apache Drill 1.0.0

2015-05-15 Thread Neeraja Rentachintala
+1 (non-binding)
Downloaded the tar and tried few tutorials from
drill.apache.org/docs/tutorials.
Worked like a charm. Congratulations.

-Neeraja

On Fri, May 15, 2015 at 1:33 AM, Ramana Inukonda rinuko...@maprtech.com
wrote:

 +1(Non-Binding)

1. Downloaded source, built using mvn clean install -U -DskipTests
2. Deployed on a small 3 node cluster, verified drillbits came up fine
and ran against various sources and various types of queries.
3. Deployed on a 8 node cluster and ran TPCH queries against SF100 data.
4. Checked UI pages such as profiles, storage etc.

 Regards
 Ramana


 On Fri, May 15, 2015 at 12:17 AM, Jacques Nadeau jacq...@apache.org
 wrote:

  It is my absolute pleasure to propose the release of Apache Drill,
 version
  1.0.0.  This is a massive release with a large number of performance and
  stability fixes totaling 197 resolved JIRAs [1].  Please download this
  release at [2] and cast your vote.
 
  The vote will be open for 72 hours, ending at 23:59 Pacific, May 17,
 2015.
 
  [ ] +1
  [ ] +0
  [ ] -1
 
  thanks,
  Jacques
 
  [1]
 
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820version=12325568
  [2] http://people.apache.org/~jacques/apache-drill-1.0.0.rc0/
 



Re: [VOTE] Release Apache Drill 1.0.0 (rc1)

2015-05-15 Thread Neeraja Rentachintala
+1 (non-binding)
- Downloaded the tar
- Tried tutorials from drill.apache.org/docs/tutorials

-thanks

On Fri, May 15, 2015 at 8:03 PM, Jacques Nadeau jacq...@apache.org wrote:

 Hey Everybody,

 I'm happy to propose a new release of Apache Drill, version 1.0.0.  This is
 the second release candidate (rc1).  It includes a few issues found earlier
 today and covers a total of 228 JIRAs*.

 The vote will be open for 72 hours ending at 8pm Pacific, May 18, 2015.

 [ ] +1
 [ ] +0
 [ ] -1

 thanks,
 Jacques

 [1]

 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820version=12325568
 [2] http://people.apache.org/~jacques/apache-drill-1.0.0.rc1/


 *Note, the previous rc0 vote email undercounting the number of closed
 JIRAs.



Re: Things are looking pretty good, let's do a 1.0 vote soon

2015-05-13 Thread Neeraja Rentachintala
awesome. Congratulations to the team.

On Tue, May 12, 2015 at 5:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 This is huge!  (Premature) congratulations to the team.

 Real congrats on account when it gets out the door.

 Pom poms all around.



 On Tue, May 12, 2015 at 9:56 PM, Jacques Nadeau jacq...@apache.org
 wrote:

  It looks like we've gotten a number of stability and performance
  improvements into the codebase since 0.9.  I'd like to suggest we start a
  1.0 vote soon.  I'm happy to be release manager again.  If everybody
 thinks
  this sounds good, I'll try to cut a release candidate in the next day or
  two.
 
  thanks,
  Jacques
 



Re: Should we make dir* columns only exist when requested?

2015-04-23 Thread Neeraja Rentachintala
what do you mean by alter data returned based how you select directory.
can you give an example.


On Thu, Apr 23, 2015 at 2:34 PM, Jacques Nadeau jacq...@apache.org wrote:

 Hey guys,

 I've been thinking that always showing dir# columns seems to alter data
 returned from Drill depending on how you select the directory.  I'd propose
 that we make it so that we only return dir# columns when they are
 explicitly requested.

 Thoughts?



Re: Should we make dir* columns only exist when requested?

2015-04-23 Thread Neeraja Rentachintala
Exposing directories in select * queries enable data discovery rather than
assuming knowledge on user front.
Making dir as an array could be a good option to avoid the multi column
issue.

-Neeraja

On Thu, Apr 23, 2015 at 3:57 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 I would propose that dir be an array that contains all of the directories
 rather than having multiple values.

 The multiple names are particularly inconvenient if files are are different
 depths.



 On Thu, Apr 23, 2015 at 5:56 PM, Jacques Nadeau jacq...@apache.org
 wrote:

  I'm specifically arguing that SELECT * doesn't return the columns.
 
  Here is current behavior:
 
  /mytdir/mysdir/myfile.json
  {a:1,b:2,c:3}
  {a:4,b:5,c:6}
 
  select * from `myfile.json`
 
  a, b, c
  1, 2, 3
  4, 5, 6
 
  select * from `/mysdir/myfile.json`
 
  dir0 a, b, c
  mysdir, 1, 2, 3
  mysdir, 4, 5, 6
 
  select * from `/mytdir/mysdir/myfile.json`
 
  dir0, dir1 a, b, c
  mytdir, mysdir, 1, 2, 3
  mytdir, mysdir, 4, 5, 6
 
 
  
  My proposal:
 
  select * from `myfile.json`
  select * from `/mysdir/myfile.json`
  select * from `/mytdir/mysdir/myfile.json`
  ::all produce::
  a, b, c
  1, 2, 3
  4, 5, 6
 
  select dir0, a, b, c from `/mysdir/myfile.json`
 
  dir0 a, b, c
  mysdir, 1, 2, 3
  mysdir, 4, 5, 6
 
  select dir0, a, b, c from `/mytdir/mysdir/myfile.json`
 
  dir0 a, b, c
  mytdir, 1, 2, 3
  mytdir, 4, 5, 6
 
 
 
 
  On Thu, Apr 23, 2015 at 5:42 PM, Aman Sinha asi...@maprtech.com wrote:
 
   Seems reasonable, as long as SELECT * also returns the dir# columns.
  
   On Thu, Apr 23, 2015 at 2:34 PM, Jacques Nadeau jacq...@apache.org
   wrote:
  
Hey guys,
   
I've been thinking that always showing dir# columns seems to alter
 data
returned from Drill depending on how you select the directory.  I'd
   propose
that we make it so that we only return dir# columns when they are
explicitly requested.
   
Thoughts?
   
  
 



Re: [VOTE] Release Apache Drill 0.7.0 (rc1)

2014-12-18 Thread Neeraja Rentachintala
+1 (non-binding)
Tried the tar with few simple queries on JSON.

-Neeraja

On Thu, Dec 18, 2014 at 7:24 PM, Julian Hyde julianh...@gmail.com wrote:

 Downloaded source, built on JDK 1.7 and Mac OS, started command line and
 ran some queries. Logged a minor bug [
 https://issues.apache.org/jira/browse/DRILL-1898 ], not a show-stopper.

 +1

 On Dec 18, 2014, at 4:27 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  Did you test it?  If so, it is good to say what you did so that others
 can
  avoid duplication of effort.
 
  If not, it is best to not vote.
 
 
 
  On Thu, Dec 18, 2014 at 3:38 PM, Tomer Shiran tshi...@maprtech.com
 wrote:
 
  +1
 
  On Dec 18, 2014, at 12:06 PM, Jacques Nadeau jacq...@apache.org
 wrote:
 
  Good morning,
 
  I would like to propose the release of Apache Drill, version 0.7.0.
 This
  is the second release candidate (zero-index rc1) and includes fixes
 for a
  few issues identified as part of the first candidate.
 
  This release includes 228 resolved JIRAs [1].
 
  The artifacts are hosted at [2].
 
  The vote will be open for 72 hours, ending Noon Pacific, December 21,
  2014.
 
  [ ] +1
  [ ] +0
  [ ] -1
 
 
  Thank you,
  Jacques
 
  [1]
 
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820version=12327473
  [2] http://people.apache.org/~jacques/apache-drill-0.7.0.rc1/
 




Re: [VOTE] Release Apache Drill 0.7.0

2014-12-16 Thread Neeraja Rentachintala
+1(non-binding)

On Tue, Dec 16, 2014 at 12:15 AM, Mehdi Chitforoosh 
mehdichitforo...@gmail.com wrote:

 +1

 On Tue, Dec 16, 2014 at 10:14 AM, Yash Sharma yash...@gmail.com wrote:
 
  +1
 
  - Source and binaries present
  - Source and Binaries contain README, NOTICE, LICENSE Files.
  - Source contains INSTALL.md
  - Verified Checksums for src and binaries
  - Able to launch drill from binary distribution (Embeded mode).
  - Able to build from source
  - Able to fire sample queries on sqlline
 
  On Mon, Dec 15, 2014 at 11:56 PM, Tomer Shiran tshi...@gmail.com
 wrote:
  
   +1
  
   On Mon, Dec 15, 2014 at 9:39 AM, Jacques Nadeau jacq...@apache.org
   wrote:
   
Good morning,
   
I would like to propose the release of Apache Drill, version 0.7.0.
   
This release includes 223 resolved JIRAs [1].
   
The artifacts are hosted at [2].
   
The vote will be open for 72 hours, ending 10 AM Pacific, December
 18,
2014.
   
[ ] +1
[ ] +0
[ ] -1
   
   
Thank you,
Jacques
   
[1]
   
   
  
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820version=12327473
[2] http://people.apache.org/~jacques/apache-drill-0.7.0.rc0/
   
  
 


 --
 Mehdi Chitforoosh