date:20160314

Re: Calcite: Trait propagation using relset iteration versus remove extraneous trait creation

2016-03-14 Thread Aman Sinha

I think the goal should be to achieve the trait propagation without relying
on the add followed by remove strategy.  Consider a simple query with  2
table join followed by a group-by.  If I want to use merge join and
streaming aggregate,  there will be 1 pair of {hash-distribute, sort} added
on both sides of the merge join and another such pair added for the 2-phase
streaming aggregate.

So, altogether 6 enforcer nodes (3 for sort, 3 for distribution) could be
potentially added for a simple query and then removed later if the required
traits are available from the input.  This would add overhead.  Ideally,
this should be done through a requirements-driven mechanism where the
parent operator asks the child to satisfy the collation or distribution
requirement and the child recursively asks its descendants to satisfy it.
It should be the responsibility of the child to decide whether to add the
enforcer (if it cannot provide the trait natively), not that of the parent.

Aman

On Mon, Mar 14, 2016 at 3:09 PM, Jacques Nadeau  wrote:

> Hey All,
>
> I've been thinking about the SubsetTransformer pattern [1] that we use in
> Drill to ensure trait propagation. It was discussed here in Calcite [2]
>
> Julian's felt that the correct solution (and the patch he ultimately
> applied) was to use a create and then remove behavior. Take a look at his
> revision to my test here [3] where he adds the SortRemoveRule in order to
> remove an extraneous Sort operation.
>
> It seems like we need to either introduce a new mechanism in Calcite to
> accomplish this or we need to adopt the removal behavior. (I also believe
> there are a small set of situations where we insert distribution for
> parallelization purposes as opposed to a requirement for a particular
> operation... we'll need to determine how those work and figure out how to
> express correctly in this removal pattern.)
>
> Thoughts?
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> [2] https://issues.apache.org/jira/browse/CALCITE-606
> [3]
>
> https://github.com/julianhyde/calcite/commit/fb203dc4b9aea89bfed839c22ae3e285044df400#diff-9494b27dde1061ef95e3853cb6222b5bR103
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Aditya

>
> However, if the field x is sometimes 'x' and sometimes 'X', we're going to
> different results between the first query and the second. That is why I
> think we need to guarantee that even when optimization rules fails, we have
> the same plan meaning. In essence, all plans should be valid. If you get to
> a place where a rule changes the data, then the original plan was
> effectively invalid.
>

This would require that the Vector field names can switch between
case-sensitive
and case-insensitive mode and this should be controllable by storage
plugins.

On Mon, Mar 14, 2016 at 4:27 PM, Jacques Nadeau  wrote:

> I don't think it is that simple since there are some types of things that
> we can't pushdown that will cause inconsistent results.
>
> For example, assuming that all values of x are positive, the following two
> queries should return the same result
>
> select * from hbase where x = 5
> select * from hbase where abs(x) = 5
>
> However, if the field x is sometimes 'x' and sometimes 'X', we're going to
> different results between the first query and the second. That is why I
> think we need to guarantee that even when optimization rules fails, we have
> the same plan meaning. In essence, all plans should be valid. If you get to
> a place where a rule changes the data, then the original plan was
> effectively invalid.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Mar 14, 2016 at 3:46 PM, Jinfeng Ni  wrote:
>
> > Project pushdown should always happen. If you see project pushdown
> > does not happen for your HBase query, then it's a bug.
> >
> > However, if you submit two physical plans, one with project pushdown,
> > another one without project pushdown, but they return different
> > results for HBase query. I'll not call this a bug.
> >
> >
> >
> > On Mon, Mar 14, 2016 at 2:54 PM, Jacques Nadeau 
> > wrote:
> > > Agree with Zelaine, plan changes/optimizations shouldn't change
> results.
> > > This is a bug.
> > >
> > > Drill is focused on being case-insensitive, case-preserving. Each
> storage
> > > plugin implements its own case sensitivity policy when working with
> > > columns/fields and should be documented. It isn't practical to make
> HBase
> > > case-insensitive so it should behave case sensitivity. DFS formats (as
> > > opposed to HBase) are entirely under Drill's control and thus target
> > > case-insensitive, case-preserving operation.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni 
> > wrote:
> > >
> > >> Abhishek
> > >>
> > >> Great question. Here is what I understand regarding the case sensitive
> > >> policy.
> > >>
> > >> Drill's case sensitivity policy (case insensitive and case preserving)
> > >> applies to the execution engine in Drill; it does not enforce the case
> > >> sensitivity policy to all the storage plugin. A storage plugin could
> > >> decide and implement it's own policy.
> > >>
> > >> Why would the pushdown impact the case sensitivity when query HBase?
> > >> Without project pushdown, HBase storage plugin will return all the
> > >> data, and it's up to Drill's execution Project operator to apply the
> > >> case insensitive policy.  With the project pushdown, Drill will pass
> > >> the list of column names to HBase storage plugin, and HBase decides to
> > >> apply it's case sensitivity policy when scan the data.
> > >>
> > >> Adding an option to make case sensitive storage plugin honor case
> > >> insensitive policy seems to be a good idea. The question is whether
> > >> the underneath storage (like HBase) will support such mode.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong 
> > wrote:
> > >> > Abhishek,
> > >> >
> > >> > I guess you're arguing that Drill's current behavior of honoring the
> > case
> > >> > sensitive nature of the underlying data source (in this case, HBase
> > and
> > >> > MapR-DB) will be confusing for Drill users who are accustomed to
> > Drill's
> > >> > case insensitive behavior.
> > >> >
> > >> > I can see arguments both ways.
> > >> >
> > >> > But the part I think is confusing is that the behavior differs
> > depending
> > >> on
> > >> > whether or not projections and filters are pushed down to the data
> > >> source.
> > >> > If the push down is done, then the behavior is case sensitive
> > >> > (corresponding to the data source).  But if pushdown doesn't happen,
> > then
> > >> > the behavior is case insensitive.  That difference seems
> inconsistent
> > and
> > >> > undesirable -- unless you argue that there are instances where you
> > would
> > >> > want one behavior vs the other.  But it seems like that should be
> > >> > orthogonal and separate from whether pushdowns are applied.
> > >> >
> > >> > -- Zelaine
> > >> >
> > >> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish 
> > >> wrote:
> > >> >
> > >> >> Hello all,
> > >> >>
> > >> >> As I understand, Drill by design is case-insensitive, w.r

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Jacques Nadeau

I believe it also suffers from the same issues.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Mar 14, 2016 at 4:29 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> How is this handled for MongoDB storage plugin, which I believe a case
> sensitive DB as well?
>
> On Mon, Mar 14, 2016 at 4:27 PM, Jacques Nadeau 
> wrote:
>
> > I don't think it is that simple since there are some types of things that
> > we can't pushdown that will cause inconsistent results.
> >
> > For example, assuming that all values of x are positive, the following
> two
> > queries should return the same result
> >
> > select * from hbase where x = 5
> > select * from hbase where abs(x) = 5
> >
> > However, if the field x is sometimes 'x' and sometimes 'X', we're going
> to
> > different results between the first query and the second. That is why I
> > think we need to guarantee that even when optimization rules fails, we
> have
> > the same plan meaning. In essence, all plans should be valid. If you get
> to
> > a place where a rule changes the data, then the original plan was
> > effectively invalid.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Mar 14, 2016 at 3:46 PM, Jinfeng Ni 
> wrote:
> >
> > > Project pushdown should always happen. If you see project pushdown
> > > does not happen for your HBase query, then it's a bug.
> > >
> > > However, if you submit two physical plans, one with project pushdown,
> > > another one without project pushdown, but they return different
> > > results for HBase query. I'll not call this a bug.
> > >
> > >
> > >
> > > On Mon, Mar 14, 2016 at 2:54 PM, Jacques Nadeau 
> > > wrote:
> > > > Agree with Zelaine, plan changes/optimizations shouldn't change
> > results.
> > > > This is a bug.
> > > >
> > > > Drill is focused on being case-insensitive, case-preserving. Each
> > storage
> > > > plugin implements its own case sensitivity policy when working with
> > > > columns/fields and should be documented. It isn't practical to make
> > HBase
> > > > case-insensitive so it should behave case sensitivity. DFS formats
> (as
> > > > opposed to HBase) are entirely under Drill's control and thus target
> > > > case-insensitive, case-preserving operation.
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni 
> > > wrote:
> > > >
> > > >> Abhishek
> > > >>
> > > >> Great question. Here is what I understand regarding the case
> sensitive
> > > >> policy.
> > > >>
> > > >> Drill's case sensitivity policy (case insensitive and case
> preserving)
> > > >> applies to the execution engine in Drill; it does not enforce the
> case
> > > >> sensitivity policy to all the storage plugin. A storage plugin could
> > > >> decide and implement it's own policy.
> > > >>
> > > >> Why would the pushdown impact the case sensitivity when query HBase?
> > > >> Without project pushdown, HBase storage plugin will return all the
> > > >> data, and it's up to Drill's execution Project operator to apply the
> > > >> case insensitive policy.  With the project pushdown, Drill will pass
> > > >> the list of column names to HBase storage plugin, and HBase decides
> to
> > > >> apply it's case sensitivity policy when scan the data.
> > > >>
> > > >> Adding an option to make case sensitive storage plugin honor case
> > > >> insensitive policy seems to be a good idea. The question is whether
> > > >> the underneath storage (like HBase) will support such mode.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong 
> > > wrote:
> > > >> > Abhishek,
> > > >> >
> > > >> > I guess you're arguing that Drill's current behavior of honoring
> the
> > > case
> > > >> > sensitive nature of the underlying data source (in this case,
> HBase
> > > and
> > > >> > MapR-DB) will be confusing for Drill users who are accustomed to
> > > Drill's
> > > >> > case insensitive behavior.
> > > >> >
> > > >> > I can see arguments both ways.
> > > >> >
> > > >> > But the part I think is confusing is that the behavior differs
> > > depending
> > > >> on
> > > >> > whether or not projections and filters are pushed down to the data
> > > >> source.
> > > >> > If the push down is done, then the behavior is case sensitive
> > > >> > (corresponding to the data source).  But if pushdown doesn't
> happen,
> > > then
> > > >> > the behavior is case insensitive.  That difference seems
> > inconsistent
> > > and
> > > >> > undesirable -- unless you argue that there are instances where you
> > > would
> > > >> > want one behavior vs the other.  But it seems like that should be
> > > >> > orthogonal and separate from whether pushdowns are applied.
> > > >> >
> > > >> > -- Zelaine
> > > >> >
> > > >> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish <
> agir...@mapr.com>
> > > >> wrote:
> > > >> >
> > > >> >> Hello all,
> > > >> >>
> > > >> >> As I understand, Drill by design is case-insens

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Neeraja Rentachintala

How is this handled for MongoDB storage plugin, which I believe a case
sensitive DB as well?

On Mon, Mar 14, 2016 at 4:27 PM, Jacques Nadeau  wrote:

> I don't think it is that simple since there are some types of things that
> we can't pushdown that will cause inconsistent results.
>
> For example, assuming that all values of x are positive, the following two
> queries should return the same result
>
> select * from hbase where x = 5
> select * from hbase where abs(x) = 5
>
> However, if the field x is sometimes 'x' and sometimes 'X', we're going to
> different results between the first query and the second. That is why I
> think we need to guarantee that even when optimization rules fails, we have
> the same plan meaning. In essence, all plans should be valid. If you get to
> a place where a rule changes the data, then the original plan was
> effectively invalid.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Mar 14, 2016 at 3:46 PM, Jinfeng Ni  wrote:
>
> > Project pushdown should always happen. If you see project pushdown
> > does not happen for your HBase query, then it's a bug.
> >
> > However, if you submit two physical plans, one with project pushdown,
> > another one without project pushdown, but they return different
> > results for HBase query. I'll not call this a bug.
> >
> >
> >
> > On Mon, Mar 14, 2016 at 2:54 PM, Jacques Nadeau 
> > wrote:
> > > Agree with Zelaine, plan changes/optimizations shouldn't change
> results.
> > > This is a bug.
> > >
> > > Drill is focused on being case-insensitive, case-preserving. Each
> storage
> > > plugin implements its own case sensitivity policy when working with
> > > columns/fields and should be documented. It isn't practical to make
> HBase
> > > case-insensitive so it should behave case sensitivity. DFS formats (as
> > > opposed to HBase) are entirely under Drill's control and thus target
> > > case-insensitive, case-preserving operation.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni 
> > wrote:
> > >
> > >> Abhishek
> > >>
> > >> Great question. Here is what I understand regarding the case sensitive
> > >> policy.
> > >>
> > >> Drill's case sensitivity policy (case insensitive and case preserving)
> > >> applies to the execution engine in Drill; it does not enforce the case
> > >> sensitivity policy to all the storage plugin. A storage plugin could
> > >> decide and implement it's own policy.
> > >>
> > >> Why would the pushdown impact the case sensitivity when query HBase?
> > >> Without project pushdown, HBase storage plugin will return all the
> > >> data, and it's up to Drill's execution Project operator to apply the
> > >> case insensitive policy.  With the project pushdown, Drill will pass
> > >> the list of column names to HBase storage plugin, and HBase decides to
> > >> apply it's case sensitivity policy when scan the data.
> > >>
> > >> Adding an option to make case sensitive storage plugin honor case
> > >> insensitive policy seems to be a good idea. The question is whether
> > >> the underneath storage (like HBase) will support such mode.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong 
> > wrote:
> > >> > Abhishek,
> > >> >
> > >> > I guess you're arguing that Drill's current behavior of honoring the
> > case
> > >> > sensitive nature of the underlying data source (in this case, HBase
> > and
> > >> > MapR-DB) will be confusing for Drill users who are accustomed to
> > Drill's
> > >> > case insensitive behavior.
> > >> >
> > >> > I can see arguments both ways.
> > >> >
> > >> > But the part I think is confusing is that the behavior differs
> > depending
> > >> on
> > >> > whether or not projections and filters are pushed down to the data
> > >> source.
> > >> > If the push down is done, then the behavior is case sensitive
> > >> > (corresponding to the data source).  But if pushdown doesn't happen,
> > then
> > >> > the behavior is case insensitive.  That difference seems
> inconsistent
> > and
> > >> > undesirable -- unless you argue that there are instances where you
> > would
> > >> > want one behavior vs the other.  But it seems like that should be
> > >> > orthogonal and separate from whether pushdowns are applied.
> > >> >
> > >> > -- Zelaine
> > >> >
> > >> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish 
> > >> wrote:
> > >> >
> > >> >> Hello all,
> > >> >>
> > >> >> As I understand, Drill by design is case-insensitive, w.r.t column
> > names
> > >> >> within a table or file [1]. While this provides great flexibility
> and
> > >> works
> > >> >> well with many data-sources, there are issues when working with
> > >> >> case-sensitive data-sources such as HBase / MapR-DB.
> > >> >>
> > >> >> Consider the following JSON file:
> > >> >>
> > >> >> {"_id": "ID1",
> > >> >>  *"Name"* : "ABC",
> > >> >>  "Age" : "25",
> > >> >>  "Phone" : null
> > >> >> }
> > >> >> {"_id": "ID2",
> > >> >>  *

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Jacques Nadeau

I don't think it is that simple since there are some types of things that
we can't pushdown that will cause inconsistent results.

For example, assuming that all values of x are positive, the following two
queries should return the same result

select * from hbase where x = 5
select * from hbase where abs(x) = 5

However, if the field x is sometimes 'x' and sometimes 'X', we're going to
different results between the first query and the second. That is why I
think we need to guarantee that even when optimization rules fails, we have
the same plan meaning. In essence, all plans should be valid. If you get to
a place where a rule changes the data, then the original plan was
effectively invalid.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Mar 14, 2016 at 3:46 PM, Jinfeng Ni  wrote:

> Project pushdown should always happen. If you see project pushdown
> does not happen for your HBase query, then it's a bug.
>
> However, if you submit two physical plans, one with project pushdown,
> another one without project pushdown, but they return different
> results for HBase query. I'll not call this a bug.
>
>
>
> On Mon, Mar 14, 2016 at 2:54 PM, Jacques Nadeau 
> wrote:
> > Agree with Zelaine, plan changes/optimizations shouldn't change results.
> > This is a bug.
> >
> > Drill is focused on being case-insensitive, case-preserving. Each storage
> > plugin implements its own case sensitivity policy when working with
> > columns/fields and should be documented. It isn't practical to make HBase
> > case-insensitive so it should behave case sensitivity. DFS formats (as
> > opposed to HBase) are entirely under Drill's control and thus target
> > case-insensitive, case-preserving operation.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni 
> wrote:
> >
> >> Abhishek
> >>
> >> Great question. Here is what I understand regarding the case sensitive
> >> policy.
> >>
> >> Drill's case sensitivity policy (case insensitive and case preserving)
> >> applies to the execution engine in Drill; it does not enforce the case
> >> sensitivity policy to all the storage plugin. A storage plugin could
> >> decide and implement it's own policy.
> >>
> >> Why would the pushdown impact the case sensitivity when query HBase?
> >> Without project pushdown, HBase storage plugin will return all the
> >> data, and it's up to Drill's execution Project operator to apply the
> >> case insensitive policy.  With the project pushdown, Drill will pass
> >> the list of column names to HBase storage plugin, and HBase decides to
> >> apply it's case sensitivity policy when scan the data.
> >>
> >> Adding an option to make case sensitive storage plugin honor case
> >> insensitive policy seems to be a good idea. The question is whether
> >> the underneath storage (like HBase) will support such mode.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong 
> wrote:
> >> > Abhishek,
> >> >
> >> > I guess you're arguing that Drill's current behavior of honoring the
> case
> >> > sensitive nature of the underlying data source (in this case, HBase
> and
> >> > MapR-DB) will be confusing for Drill users who are accustomed to
> Drill's
> >> > case insensitive behavior.
> >> >
> >> > I can see arguments both ways.
> >> >
> >> > But the part I think is confusing is that the behavior differs
> depending
> >> on
> >> > whether or not projections and filters are pushed down to the data
> >> source.
> >> > If the push down is done, then the behavior is case sensitive
> >> > (corresponding to the data source).  But if pushdown doesn't happen,
> then
> >> > the behavior is case insensitive.  That difference seems inconsistent
> and
> >> > undesirable -- unless you argue that there are instances where you
> would
> >> > want one behavior vs the other.  But it seems like that should be
> >> > orthogonal and separate from whether pushdowns are applied.
> >> >
> >> > -- Zelaine
> >> >
> >> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish 
> >> wrote:
> >> >
> >> >> Hello all,
> >> >>
> >> >> As I understand, Drill by design is case-insensitive, w.r.t column
> names
> >> >> within a table or file [1]. While this provides great flexibility and
> >> works
> >> >> well with many data-sources, there are issues when working with
> >> >> case-sensitive data-sources such as HBase / MapR-DB.
> >> >>
> >> >> Consider the following JSON file:
> >> >>
> >> >> {"_id": "ID1",
> >> >>  *"Name"* : "ABC",
> >> >>  "Age" : "25",
> >> >>  "Phone" : null
> >> >> }
> >> >> {"_id": "ID2",
> >> >>  *"name"* : "PQR",
> >> >>  "Age" : "30",
> >> >>  "Phone" : "408-123-456"
> >> >> }
> >> >> {"_id": "ID3",
> >> >>  *"NAME"* : "XYZ",
> >> >>  "Phone" : ""
> >> >> }
> >> >>
> >> >> Note that the case of the name field within the JSON file is of
> >> mixed-case.
> >> >>
> >> >> From Drill, while querying the JSON file directly (or corresponding
> >> content
> >> >> in Parquet or Text formats), we get results

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Jinfeng Ni

+1 (binding)

- Download src tgz and do a full maven build on CentOS
- Run yelp tutorial queries.
- Verify query profiles on Web-UI
- Run couple of partition pruning related queries.

All look good.

Jinfeng


On Mon, Mar 14, 2016 at 2:48 PM, Jacques Nadeau  wrote:
> +1 (binding)
>
> - Download src tgz and build and test
> - Download binary tgz, test execution of a number of queries and verify
> profiles
> - Enable socket level logging and confirm new planning phase + time logging
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Mar 14, 2016 at 1:45 PM, Chun Chang  wrote:
>
>> +1 (non-binding)
>>
>> -ran functional and advanced automation
>>
>> On Mon, Mar 14, 2016 at 1:09 PM, Sudheesh Katkam 
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> > * downloaded and built from source tar-ball; ran unit tests successfully
>> > on Ubuntu
>> > * ran simple queries (including cancellations) in embedded mode on Mac;
>> > verified states in web UI
>> > * ran simple queries (including cancellations) on a 3 node cluster;
>> > verified states in web UI
>> >
>> > * tested maven artifacts (drill-jdbc) using a sample application <
>> > https://github.com/sudheeshkatkam/drill-example>.
>> > This application is based on DrillClient, and not JDBC API. I had to make
>> > two changes for this application to work (i.e. not backward compatible).
>> > However, these changes are not related to this release (commits
>> > responsible: 1fde9bb <
>> >
>> https://github.com/apache/drill/commit/1fde9bb1505f04e0b0a1afb542a1aa5dfd20ed1b
>> >
>> > and de00881 <
>> >
>> https://github.com/apache/drill/commit/de008810c815e46e6f6e5d13ad0b9a23e705b13a
>> >).
>> > We should have a conversation about what constitutes public API and
>> changes
>> > to this API on a separate thread.
>> >
>> > Thank you,
>> > Sudheesh
>> >
>> > > On Mar 14, 2016, at 12:04 PM, Abhishek Girish <
>> abhishek.gir...@gmail.com>
>> > wrote:
>> > >
>> > > +1 (non-binding)
>> > >
>> > > - Tested Drill in distributed mode (built with MapR profile).
>> > > - Ran functional tests from Drill-Test-Framework [1]
>> > > - Tested Web UI (basic sanity)
>> > > - Tested Sqlline
>> > >
>> > > Looks good.
>> > >
>> > >
>> > > [1] https://github.com/mapr/drill-test-framework
>> > >
>> > > On Mon, Mar 14, 2016 at 11:23 AM, Venki Korukanti <
>> > venki.koruka...@gmail.com
>> > >> wrote:
>> > >
>> > >> +1
>> > >>
>> > >> Installed tar.gz on a 3 node cluster.
>> > >> Ran queries on data located in HDFS
>> > >> Enabled auth in WebUI, ran few queries and, verified auth and querying
>> > >> works fine
>> > >> Logged bugs for 2 minor issues/improvements (DRILL-4508
>> > >>  & DRILL-4509
>> > >> )
>> > >>
>> > >> Thanks
>> > >> Venki
>> > >>
>> > >> On Mon, Mar 14, 2016 at 10:56 AM, Norris Lee 
>> wrote:
>> > >>
>> > >>> +1 (Non-binding)
>> > >>>
>> > >>> Build from source on CentOS. Tested the ODBC driver with queries
>> > against
>> > >>> hive and DFS (json, parquet, tsv, csv, directories).
>> > >>>
>> > >>> Norris
>> > >>>
>> > >>> -Original Message-
>> > >>> From: Hsuan Yi Chu [mailto:hyi...@maprtech.com]
>> > >>> Sent: Monday, March 14, 2016 10:42 AM
>> > >>> To: dev@drill.apache.org; adityakish...@gmail.com
>> > >>> Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0
>> > >>>
>> > >>> +1
>> > >>> mvn clean install on linux vm; Tried some queries; Looks good.
>> > >>>
>> > >>> On Mon, Mar 14, 2016 at 9:58 AM, Aditya 
>> > wrote:
>> > >>>
>> >  While I did verify the signature and structure of the maven
>> artifacts,
>> >  I think Jacques was referring to verify the functionality, which I
>> > have
>> > >>> not.
>> > 
>> >  On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra 
>> > >>> wrote:
>> > 
>> > > Aditya has verified the maven artifacts. Would it make sense to
>> > > extend
>> >  the
>> > > vote by another day to let more people verify the release?
>> > >
>> > >
>> > >
>> > > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau <
>> jacq...@dremio.com>
>> > > wrote:
>> > >
>> > >> I haven't had a chance to validate yet.  Has anyone checked the
>> > >> maven artifacts yet?
>> > >> On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
>> > >>
>> > >>> +1 (binding).
>> > >>>
>> > >>> * Verified checksum and signature of all release artifacts in[1]
>> > >>> and
>> > >> maven
>> > >>> artifacts in [2] and the artifacts are signed using Parth's
>> > >>> public
>> >  key
>> > >> (ID
>> > >>> 9BAA73B0).
>> > >>> * Verified that build and tests pass using the source artifact.
>> > >>> * Verified that Drill can be launched in embedded mode using the
>> > >>> convenience binary release.
>> > >>> * Ran sample queries using classpath storage plugin.
>> > >>>
>> > >>> p.s. Have enhanced the release verification script [3] to allow
>> > > automatic
>> > >

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Jinfeng Ni

Project pushdown should always happen. If you see project pushdown
does not happen for your HBase query, then it's a bug.

However, if you submit two physical plans, one with project pushdown,
another one without project pushdown, but they return different
results for HBase query. I'll not call this a bug.



On Mon, Mar 14, 2016 at 2:54 PM, Jacques Nadeau  wrote:
> Agree with Zelaine, plan changes/optimizations shouldn't change results.
> This is a bug.
>
> Drill is focused on being case-insensitive, case-preserving. Each storage
> plugin implements its own case sensitivity policy when working with
> columns/fields and should be documented. It isn't practical to make HBase
> case-insensitive so it should behave case sensitivity. DFS formats (as
> opposed to HBase) are entirely under Drill's control and thus target
> case-insensitive, case-preserving operation.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni  wrote:
>
>> Abhishek
>>
>> Great question. Here is what I understand regarding the case sensitive
>> policy.
>>
>> Drill's case sensitivity policy (case insensitive and case preserving)
>> applies to the execution engine in Drill; it does not enforce the case
>> sensitivity policy to all the storage plugin. A storage plugin could
>> decide and implement it's own policy.
>>
>> Why would the pushdown impact the case sensitivity when query HBase?
>> Without project pushdown, HBase storage plugin will return all the
>> data, and it's up to Drill's execution Project operator to apply the
>> case insensitive policy.  With the project pushdown, Drill will pass
>> the list of column names to HBase storage plugin, and HBase decides to
>> apply it's case sensitivity policy when scan the data.
>>
>> Adding an option to make case sensitive storage plugin honor case
>> insensitive policy seems to be a good idea. The question is whether
>> the underneath storage (like HBase) will support such mode.
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong  wrote:
>> > Abhishek,
>> >
>> > I guess you're arguing that Drill's current behavior of honoring the case
>> > sensitive nature of the underlying data source (in this case, HBase and
>> > MapR-DB) will be confusing for Drill users who are accustomed to  Drill's
>> > case insensitive behavior.
>> >
>> > I can see arguments both ways.
>> >
>> > But the part I think is confusing is that the behavior differs depending
>> on
>> > whether or not projections and filters are pushed down to the data
>> source.
>> > If the push down is done, then the behavior is case sensitive
>> > (corresponding to the data source).  But if pushdown doesn't happen, then
>> > the behavior is case insensitive.  That difference seems inconsistent and
>> > undesirable -- unless you argue that there are instances where you would
>> > want one behavior vs the other.  But it seems like that should be
>> > orthogonal and separate from whether pushdowns are applied.
>> >
>> > -- Zelaine
>> >
>> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish 
>> wrote:
>> >
>> >> Hello all,
>> >>
>> >> As I understand, Drill by design is case-insensitive, w.r.t column names
>> >> within a table or file [1]. While this provides great flexibility and
>> works
>> >> well with many data-sources, there are issues when working with
>> >> case-sensitive data-sources such as HBase / MapR-DB.
>> >>
>> >> Consider the following JSON file:
>> >>
>> >> {"_id": "ID1",
>> >>  *"Name"* : "ABC",
>> >>  "Age" : "25",
>> >>  "Phone" : null
>> >> }
>> >> {"_id": "ID2",
>> >>  *"name"* : "PQR",
>> >>  "Age" : "30",
>> >>  "Phone" : "408-123-456"
>> >> }
>> >> {"_id": "ID3",
>> >>  *"NAME"* : "XYZ",
>> >>  "Phone" : ""
>> >> }
>> >>
>> >> Note that the case of the name field within the JSON file is of
>> mixed-case.
>> >>
>> >> From Drill, while querying the JSON file directly (or corresponding
>> content
>> >> in Parquet or Text formats), we get results which we as Drill users have
>> >> come to expect:
>> >>
>> >> > select NAME from mfs.`/tmp/json/a.json`;
>> >> +---+
>> >> | NAME  |
>> >> +---+
>> >> | ABC   |
>> >> | PQR   |
>> >> | XYZ   |
>> >> +---+
>> >>
>> >>
>> >> However, while querying a case-sensitive datasource (*with pushdown
>> >> enabled*)
>> >> the following results are returned. The case provided in the query text
>> is
>> >> honored and would determine the results. This could come as a *slight
>> >> surprise to certain Drill users* exploring/migrating to new Databases
>> >> (using new Storage / Format plugins within Drill)
>> >>
>> >> > select *Name* from mfs.`/tmp/json/a`;
>> >> +---+
>> >> | Name  |
>> >> +---+
>> >> | ABC   |
>> >> +---+
>> >>
>> >> > select *name* from mfs.`/tmp/json/a`;
>> >> +---+
>> >> | name  |
>> >> +---+
>> >> | PQR   |
>> >> +---+
>> >>
>> >> > select *NAME* from mfs.`/tmp/json/a`;
>> >> +---+
>> >> | NAME  |
>> >> +---+
>> >> | XYZ   |
>> >> +---+
>> >>
>> >>
>> >> > select *

[jira] [Resolved] (DRILL-4050) Add zip archives to the list of artifacts in verify_release.sh

2016-03-14 Thread Aditya Kishore (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore resolved DRILL-4050.
---
   Resolution: Fixed
Fix Version/s: 1.7.0

This has been merged into master.

> Add zip archives to the list of artifacts in verify_release.sh
> --
>
> Key: DRILL-4050
> URL: https://issues.apache.org/jira/browse/DRILL-4050
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] drill pull request: DRILL-4050: Add zip archives to the list of ar...

2016-03-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Calcite: Trait propagation using relset iteration versus remove extraneous trait creation

2016-03-14 Thread Jacques Nadeau

Hey All,

I've been thinking about the SubsetTransformer pattern [1] that we use in
Drill to ensure trait propagation. It was discussed here in Calcite [2]

Julian's felt that the correct solution (and the patch he ultimately
applied) was to use a create and then remove behavior. Take a look at his
revision to my test here [3] where he adds the SortRemoveRule in order to
remove an extraneous Sort operation.

It seems like we need to either introduce a new mechanism in Calcite to
accomplish this or we need to adopt the removal behavior. (I also believe
there are a small set of situations where we insert distribution for
parallelization purposes as opposed to a requirement for a particular
operation... we'll need to determine how those work and figure out how to
express correctly in this removal pattern.)

Thoughts?

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
[2] https://issues.apache.org/jira/browse/CALCITE-606
[3]
https://github.com/julianhyde/calcite/commit/fb203dc4b9aea89bfed839c22ae3e285044df400#diff-9494b27dde1061ef95e3853cb6222b5bR103
--
Jacques Nadeau
CTO and Co-Founder, Dremio

[GitHub] drill pull request: DRILL-4050: Add zip archives to the list of ar...

2016-03-14 Thread adityakishore

Github user adityakishore commented on the pull request:

https://github.com/apache/drill/pull/249#issuecomment-196542189
  
This enhanced version of the script allows integrated download and 
verification of a Drill release. It can be used to verify both the main release 
artifacts and maven repository artifacts.

For example, to verify the 1.6 rc0 release artifacts, I ran
```
./verify_release.sh 
https://repository.apache.org/content/repositories/orgapachedrill-1030/ 
/tmp/drill-1.6/maven/
./verify_release.sh 
http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/ /tmp/drill-1.6/main/
```
If I had pre-downloaded the files in the respective folders, I'd run
```
./verify_release.sh /tmp/drill-1.6/maven/
./verify_release.sh /tmp/drill-1.6/main/
```
Finally, run with `-nv` option to reduce the verbosity of the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Jacques Nadeau

Agree with Zelaine, plan changes/optimizations shouldn't change results.
This is a bug.

Drill is focused on being case-insensitive, case-preserving. Each storage
plugin implements its own case sensitivity policy when working with
columns/fields and should be documented. It isn't practical to make HBase
case-insensitive so it should behave case sensitivity. DFS formats (as
opposed to HBase) are entirely under Drill's control and thus target
case-insensitive, case-preserving operation.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Mar 14, 2016 at 2:43 PM, Jinfeng Ni  wrote:

> Abhishek
>
> Great question. Here is what I understand regarding the case sensitive
> policy.
>
> Drill's case sensitivity policy (case insensitive and case preserving)
> applies to the execution engine in Drill; it does not enforce the case
> sensitivity policy to all the storage plugin. A storage plugin could
> decide and implement it's own policy.
>
> Why would the pushdown impact the case sensitivity when query HBase?
> Without project pushdown, HBase storage plugin will return all the
> data, and it's up to Drill's execution Project operator to apply the
> case insensitive policy.  With the project pushdown, Drill will pass
> the list of column names to HBase storage plugin, and HBase decides to
> apply it's case sensitivity policy when scan the data.
>
> Adding an option to make case sensitive storage plugin honor case
> insensitive policy seems to be a good idea. The question is whether
> the underneath storage (like HBase) will support such mode.
>
>
>
>
>
>
> On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong  wrote:
> > Abhishek,
> >
> > I guess you're arguing that Drill's current behavior of honoring the case
> > sensitive nature of the underlying data source (in this case, HBase and
> > MapR-DB) will be confusing for Drill users who are accustomed to  Drill's
> > case insensitive behavior.
> >
> > I can see arguments both ways.
> >
> > But the part I think is confusing is that the behavior differs depending
> on
> > whether or not projections and filters are pushed down to the data
> source.
> > If the push down is done, then the behavior is case sensitive
> > (corresponding to the data source).  But if pushdown doesn't happen, then
> > the behavior is case insensitive.  That difference seems inconsistent and
> > undesirable -- unless you argue that there are instances where you would
> > want one behavior vs the other.  But it seems like that should be
> > orthogonal and separate from whether pushdowns are applied.
> >
> > -- Zelaine
> >
> > On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish 
> wrote:
> >
> >> Hello all,
> >>
> >> As I understand, Drill by design is case-insensitive, w.r.t column names
> >> within a table or file [1]. While this provides great flexibility and
> works
> >> well with many data-sources, there are issues when working with
> >> case-sensitive data-sources such as HBase / MapR-DB.
> >>
> >> Consider the following JSON file:
> >>
> >> {"_id": "ID1",
> >>  *"Name"* : "ABC",
> >>  "Age" : "25",
> >>  "Phone" : null
> >> }
> >> {"_id": "ID2",
> >>  *"name"* : "PQR",
> >>  "Age" : "30",
> >>  "Phone" : "408-123-456"
> >> }
> >> {"_id": "ID3",
> >>  *"NAME"* : "XYZ",
> >>  "Phone" : ""
> >> }
> >>
> >> Note that the case of the name field within the JSON file is of
> mixed-case.
> >>
> >> From Drill, while querying the JSON file directly (or corresponding
> content
> >> in Parquet or Text formats), we get results which we as Drill users have
> >> come to expect:
> >>
> >> > select NAME from mfs.`/tmp/json/a.json`;
> >> +---+
> >> | NAME  |
> >> +---+
> >> | ABC   |
> >> | PQR   |
> >> | XYZ   |
> >> +---+
> >>
> >>
> >> However, while querying a case-sensitive datasource (*with pushdown
> >> enabled*)
> >> the following results are returned. The case provided in the query text
> is
> >> honored and would determine the results. This could come as a *slight
> >> surprise to certain Drill users* exploring/migrating to new Databases
> >> (using new Storage / Format plugins within Drill)
> >>
> >> > select *Name* from mfs.`/tmp/json/a`;
> >> +---+
> >> | Name  |
> >> +---+
> >> | ABC   |
> >> +---+
> >>
> >> > select *name* from mfs.`/tmp/json/a`;
> >> +---+
> >> | name  |
> >> +---+
> >> | PQR   |
> >> +---+
> >>
> >> > select *NAME* from mfs.`/tmp/json/a`;
> >> +---+
> >> | NAME  |
> >> +---+
> >> | XYZ   |
> >> +---+
> >>
> >>
> >> > select *nAME* from mfs.`/tmp/json/a`;
> >> +---+
> >> | nAME  |
> >> +---+
> >> +---+
> >> No rows selected
> >>
> >> There is no easy way to get all matching rows (irrespective of the case
> of
> >> the column name). In the above example, the first row matching the
> provided
> >> case is returned.
> >>
> >>
> >> > select *Name, name, NAME* from mfs.`/tmp/json/a`;
> >> +---+++
> >> | Name  | name0  | NAME1  |
> >> +---+++
> >> | ABC   | ABC| ABC|
> >> +---+---

[GitHub] drill pull request: DRILL-4504: Create an event loop for each of [...

2016-03-14 Thread jacques-n

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56082440
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
--- End diff --

The query interface is expected to be threadsafe. Connect and close are not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Jacques Nadeau

+1 (binding)

- Download src tgz and build and test
- Download binary tgz, test execution of a number of queries and verify
profiles
- Enable socket level logging and confirm new planning phase + time logging




--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Mar 14, 2016 at 1:45 PM, Chun Chang  wrote:

> +1 (non-binding)
>
> -ran functional and advanced automation
>
> On Mon, Mar 14, 2016 at 1:09 PM, Sudheesh Katkam 
> wrote:
>
> > +1 (non-binding)
> >
> > * downloaded and built from source tar-ball; ran unit tests successfully
> > on Ubuntu
> > * ran simple queries (including cancellations) in embedded mode on Mac;
> > verified states in web UI
> > * ran simple queries (including cancellations) on a 3 node cluster;
> > verified states in web UI
> >
> > * tested maven artifacts (drill-jdbc) using a sample application <
> > https://github.com/sudheeshkatkam/drill-example>.
> > This application is based on DrillClient, and not JDBC API. I had to make
> > two changes for this application to work (i.e. not backward compatible).
> > However, these changes are not related to this release (commits
> > responsible: 1fde9bb <
> >
> https://github.com/apache/drill/commit/1fde9bb1505f04e0b0a1afb542a1aa5dfd20ed1b
> >
> > and de00881 <
> >
> https://github.com/apache/drill/commit/de008810c815e46e6f6e5d13ad0b9a23e705b13a
> >).
> > We should have a conversation about what constitutes public API and
> changes
> > to this API on a separate thread.
> >
> > Thank you,
> > Sudheesh
> >
> > > On Mar 14, 2016, at 12:04 PM, Abhishek Girish <
> abhishek.gir...@gmail.com>
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > - Tested Drill in distributed mode (built with MapR profile).
> > > - Ran functional tests from Drill-Test-Framework [1]
> > > - Tested Web UI (basic sanity)
> > > - Tested Sqlline
> > >
> > > Looks good.
> > >
> > >
> > > [1] https://github.com/mapr/drill-test-framework
> > >
> > > On Mon, Mar 14, 2016 at 11:23 AM, Venki Korukanti <
> > venki.koruka...@gmail.com
> > >> wrote:
> > >
> > >> +1
> > >>
> > >> Installed tar.gz on a 3 node cluster.
> > >> Ran queries on data located in HDFS
> > >> Enabled auth in WebUI, ran few queries and, verified auth and querying
> > >> works fine
> > >> Logged bugs for 2 minor issues/improvements (DRILL-4508
> > >>  & DRILL-4509
> > >> )
> > >>
> > >> Thanks
> > >> Venki
> > >>
> > >> On Mon, Mar 14, 2016 at 10:56 AM, Norris Lee 
> wrote:
> > >>
> > >>> +1 (Non-binding)
> > >>>
> > >>> Build from source on CentOS. Tested the ODBC driver with queries
> > against
> > >>> hive and DFS (json, parquet, tsv, csv, directories).
> > >>>
> > >>> Norris
> > >>>
> > >>> -Original Message-
> > >>> From: Hsuan Yi Chu [mailto:hyi...@maprtech.com]
> > >>> Sent: Monday, March 14, 2016 10:42 AM
> > >>> To: dev@drill.apache.org; adityakish...@gmail.com
> > >>> Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0
> > >>>
> > >>> +1
> > >>> mvn clean install on linux vm; Tried some queries; Looks good.
> > >>>
> > >>> On Mon, Mar 14, 2016 at 9:58 AM, Aditya 
> > wrote:
> > >>>
> >  While I did verify the signature and structure of the maven
> artifacts,
> >  I think Jacques was referring to verify the functionality, which I
> > have
> > >>> not.
> > 
> >  On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra 
> > >>> wrote:
> > 
> > > Aditya has verified the maven artifacts. Would it make sense to
> > > extend
> >  the
> > > vote by another day to let more people verify the release?
> > >
> > >
> > >
> > > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau <
> jacq...@dremio.com>
> > > wrote:
> > >
> > >> I haven't had a chance to validate yet.  Has anyone checked the
> > >> maven artifacts yet?
> > >> On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> > >>
> > >>> +1 (binding).
> > >>>
> > >>> * Verified checksum and signature of all release artifacts in[1]
> > >>> and
> > >> maven
> > >>> artifacts in [2] and the artifacts are signed using Parth's
> > >>> public
> >  key
> > >> (ID
> > >>> 9BAA73B0).
> > >>> * Verified that build and tests pass using the source artifact.
> > >>> * Verified that Drill can be launched in embedded mode using the
> > >>> convenience binary release.
> > >>> * Ran sample queries using classpath storage plugin.
> > >>>
> > >>> p.s. Have enhanced the release verification script [3] to allow
> > > automatic
> > >>> download and verification of release artifacts through the pull
> >  request
> > >>> 249[4]. Will merge if someone can review it.
> > >>>
> > >>> [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > >>> [2]
> > >>
> https://repository.apache.org/content/repositories/orgapachedrill-
> > >> 1030
> > >>> [3]
> > >
> https://github.com/apache/drill/blob/master/tools/verify_re

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Jinfeng Ni

Abhishek

Great question. Here is what I understand regarding the case sensitive policy.

Drill's case sensitivity policy (case insensitive and case preserving)
applies to the execution engine in Drill; it does not enforce the case
sensitivity policy to all the storage plugin. A storage plugin could
decide and implement it's own policy.

Why would the pushdown impact the case sensitivity when query HBase?
Without project pushdown, HBase storage plugin will return all the
data, and it's up to Drill's execution Project operator to apply the
case insensitive policy.  With the project pushdown, Drill will pass
the list of column names to HBase storage plugin, and HBase decides to
apply it's case sensitivity policy when scan the data.

Adding an option to make case sensitive storage plugin honor case
insensitive policy seems to be a good idea. The question is whether
the underneath storage (like HBase) will support such mode.






On Mon, Mar 14, 2016 at 2:09 PM, Zelaine Fong  wrote:
> Abhishek,
>
> I guess you're arguing that Drill's current behavior of honoring the case
> sensitive nature of the underlying data source (in this case, HBase and
> MapR-DB) will be confusing for Drill users who are accustomed to  Drill's
> case insensitive behavior.
>
> I can see arguments both ways.
>
> But the part I think is confusing is that the behavior differs depending on
> whether or not projections and filters are pushed down to the data source.
> If the push down is done, then the behavior is case sensitive
> (corresponding to the data source).  But if pushdown doesn't happen, then
> the behavior is case insensitive.  That difference seems inconsistent and
> undesirable -- unless you argue that there are instances where you would
> want one behavior vs the other.  But it seems like that should be
> orthogonal and separate from whether pushdowns are applied.
>
> -- Zelaine
>
> On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish  wrote:
>
>> Hello all,
>>
>> As I understand, Drill by design is case-insensitive, w.r.t column names
>> within a table or file [1]. While this provides great flexibility and works
>> well with many data-sources, there are issues when working with
>> case-sensitive data-sources such as HBase / MapR-DB.
>>
>> Consider the following JSON file:
>>
>> {"_id": "ID1",
>>  *"Name"* : "ABC",
>>  "Age" : "25",
>>  "Phone" : null
>> }
>> {"_id": "ID2",
>>  *"name"* : "PQR",
>>  "Age" : "30",
>>  "Phone" : "408-123-456"
>> }
>> {"_id": "ID3",
>>  *"NAME"* : "XYZ",
>>  "Phone" : ""
>> }
>>
>> Note that the case of the name field within the JSON file is of mixed-case.
>>
>> From Drill, while querying the JSON file directly (or corresponding content
>> in Parquet or Text formats), we get results which we as Drill users have
>> come to expect:
>>
>> > select NAME from mfs.`/tmp/json/a.json`;
>> +---+
>> | NAME  |
>> +---+
>> | ABC   |
>> | PQR   |
>> | XYZ   |
>> +---+
>>
>>
>> However, while querying a case-sensitive datasource (*with pushdown
>> enabled*)
>> the following results are returned. The case provided in the query text is
>> honored and would determine the results. This could come as a *slight
>> surprise to certain Drill users* exploring/migrating to new Databases
>> (using new Storage / Format plugins within Drill)
>>
>> > select *Name* from mfs.`/tmp/json/a`;
>> +---+
>> | Name  |
>> +---+
>> | ABC   |
>> +---+
>>
>> > select *name* from mfs.`/tmp/json/a`;
>> +---+
>> | name  |
>> +---+
>> | PQR   |
>> +---+
>>
>> > select *NAME* from mfs.`/tmp/json/a`;
>> +---+
>> | NAME  |
>> +---+
>> | XYZ   |
>> +---+
>>
>>
>> > select *nAME* from mfs.`/tmp/json/a`;
>> +---+
>> | nAME  |
>> +---+
>> +---+
>> No rows selected
>>
>> There is no easy way to get all matching rows (irrespective of the case of
>> the column name). In the above example, the first row matching the provided
>> case is returned.
>>
>>
>> > select *Name, name, NAME* from mfs.`/tmp/json/a`;
>> +---+++
>> | Name  | name0  | NAME1  |
>> +---+++
>> | ABC   | ABC| ABC|
>> +---+++
>>
>> > select *NAME, Name, name* from mfs.`/tmp/json/a`;
>> +---+++
>> | NAME  | Name0  | name1  |
>> +---+++
>> | XYZ   | XYZ| XYZ|
>> +---+++
>>
>>
>> If Pushdown features are disabled, the behavior seen above would indeed
>> match JSON files. However, this could come at a cost of not fully utilizing
>> the power of the underlying data-source, and could lead to performance
>> issues.
>>
>> *In-consistent Results can happen when:*
>>
>> (1) Dataset has mixed-cases for fields. Example seen above. While this
>> might not be very common, the concerns are still valid*, *since substantial
>> Drill users are exploring Drill for ETL cases where Data is not completely
>> sanitized.
>>
>> (2) Data is consistent w.r.t case, but the query text has non-matching
>> case. While some coul

[GitHub] drill pull request: DRILL-4504: Create an event loop for each of [...

2016-03-14 Thread sudheeshkatkam

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56080621
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
--- End diff --

I'll remove modifier, and document that the class is not thread safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Working with Case-Sensitive Data-sources

2016-03-14 Thread Zelaine Fong

Abhishek,

I guess you're arguing that Drill's current behavior of honoring the case
sensitive nature of the underlying data source (in this case, HBase and
MapR-DB) will be confusing for Drill users who are accustomed to  Drill's
case insensitive behavior.

I can see arguments both ways.

But the part I think is confusing is that the behavior differs depending on
whether or not projections and filters are pushed down to the data source.
If the push down is done, then the behavior is case sensitive
(corresponding to the data source).  But if pushdown doesn't happen, then
the behavior is case insensitive.  That difference seems inconsistent and
undesirable -- unless you argue that there are instances where you would
want one behavior vs the other.  But it seems like that should be
orthogonal and separate from whether pushdowns are applied.

-- Zelaine

On Mon, Mar 14, 2016 at 1:40 AM, Abhishek Girish  wrote:

> Hello all,
>
> As I understand, Drill by design is case-insensitive, w.r.t column names
> within a table or file [1]. While this provides great flexibility and works
> well with many data-sources, there are issues when working with
> case-sensitive data-sources such as HBase / MapR-DB.
>
> Consider the following JSON file:
>
> {"_id": "ID1",
>  *"Name"* : "ABC",
>  "Age" : "25",
>  "Phone" : null
> }
> {"_id": "ID2",
>  *"name"* : "PQR",
>  "Age" : "30",
>  "Phone" : "408-123-456"
> }
> {"_id": "ID3",
>  *"NAME"* : "XYZ",
>  "Phone" : ""
> }
>
> Note that the case of the name field within the JSON file is of mixed-case.
>
> From Drill, while querying the JSON file directly (or corresponding content
> in Parquet or Text formats), we get results which we as Drill users have
> come to expect:
>
> > select NAME from mfs.`/tmp/json/a.json`;
> +---+
> | NAME  |
> +---+
> | ABC   |
> | PQR   |
> | XYZ   |
> +---+
>
>
> However, while querying a case-sensitive datasource (*with pushdown
> enabled*)
> the following results are returned. The case provided in the query text is
> honored and would determine the results. This could come as a *slight
> surprise to certain Drill users* exploring/migrating to new Databases
> (using new Storage / Format plugins within Drill)
>
> > select *Name* from mfs.`/tmp/json/a`;
> +---+
> | Name  |
> +---+
> | ABC   |
> +---+
>
> > select *name* from mfs.`/tmp/json/a`;
> +---+
> | name  |
> +---+
> | PQR   |
> +---+
>
> > select *NAME* from mfs.`/tmp/json/a`;
> +---+
> | NAME  |
> +---+
> | XYZ   |
> +---+
>
>
> > select *nAME* from mfs.`/tmp/json/a`;
> +---+
> | nAME  |
> +---+
> +---+
> No rows selected
>
> There is no easy way to get all matching rows (irrespective of the case of
> the column name). In the above example, the first row matching the provided
> case is returned.
>
>
> > select *Name, name, NAME* from mfs.`/tmp/json/a`;
> +---+++
> | Name  | name0  | NAME1  |
> +---+++
> | ABC   | ABC| ABC|
> +---+++
>
> > select *NAME, Name, name* from mfs.`/tmp/json/a`;
> +---+++
> | NAME  | Name0  | name1  |
> +---+++
> | XYZ   | XYZ| XYZ|
> +---+++
>
>
> If Pushdown features are disabled, the behavior seen above would indeed
> match JSON files. However, this could come at a cost of not fully utilizing
> the power of the underlying data-source, and could lead to performance
> issues.
>
> *In-consistent Results can happen when:*
>
> (1) Dataset has mixed-cases for fields. Example seen above. While this
> might not be very common, the concerns are still valid*, *since substantial
> Drill users are exploring Drill for ETL cases where Data is not completely
> sanitized.
>
> (2) Data is consistent w.r.t case, but the query text has non-matching
> case. While some could term this as user error, it could still cause issues
> when users, applications or the underlying datasources change.
>
> In both the above cases, Drill would silently perform the query and return
> results which could be either *none, partial, complete/correct or entirely*
> *wrong*.
>
> Some specific questions:
>
> (1) *Supporting Case-In-sensitive Behavior for Case-Sensitive Data-sources.
> *For users who prefer the flexibility, how can Drill ensure that the
> underlying data-source can return case-insensitive results.
>
> (2) *Supporting Case-Sensitive Behavior. *How can Drill OPTIONALLY support
> case-sensitive behavior for data-sources. Users coming from case-sensitive
> databases might want results matching the provided case. Example using the
> above data:
>
> > select _id, *Name, name, NAME* from mfs.`/tmp/json/a`;
> +--+---+++
>
> | _id  | Name  | name   | NAME   |
> +--+---+++
> | ID1  | ABC   | null   | null   |
> +--+---+++
> | ID2  | null  | PQR| null   |
> +--+---+++
> | ID3  | null  | null   | XYZ|
> +

[GitHub] drill pull request: DRILL-4479: Use varchar for default column whe...

2016-03-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/420


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Chun Chang

+1 (non-binding)

-ran functional and advanced automation

On Mon, Mar 14, 2016 at 1:09 PM, Sudheesh Katkam 
wrote:

> +1 (non-binding)
>
> * downloaded and built from source tar-ball; ran unit tests successfully
> on Ubuntu
> * ran simple queries (including cancellations) in embedded mode on Mac;
> verified states in web UI
> * ran simple queries (including cancellations) on a 3 node cluster;
> verified states in web UI
>
> * tested maven artifacts (drill-jdbc) using a sample application <
> https://github.com/sudheeshkatkam/drill-example>.
> This application is based on DrillClient, and not JDBC API. I had to make
> two changes for this application to work (i.e. not backward compatible).
> However, these changes are not related to this release (commits
> responsible: 1fde9bb <
> https://github.com/apache/drill/commit/1fde9bb1505f04e0b0a1afb542a1aa5dfd20ed1b>
> and de00881 <
> https://github.com/apache/drill/commit/de008810c815e46e6f6e5d13ad0b9a23e705b13a>).
> We should have a conversation about what constitutes public API and changes
> to this API on a separate thread.
>
> Thank you,
> Sudheesh
>
> > On Mar 14, 2016, at 12:04 PM, Abhishek Girish 
> wrote:
> >
> > +1 (non-binding)
> >
> > - Tested Drill in distributed mode (built with MapR profile).
> > - Ran functional tests from Drill-Test-Framework [1]
> > - Tested Web UI (basic sanity)
> > - Tested Sqlline
> >
> > Looks good.
> >
> >
> > [1] https://github.com/mapr/drill-test-framework
> >
> > On Mon, Mar 14, 2016 at 11:23 AM, Venki Korukanti <
> venki.koruka...@gmail.com
> >> wrote:
> >
> >> +1
> >>
> >> Installed tar.gz on a 3 node cluster.
> >> Ran queries on data located in HDFS
> >> Enabled auth in WebUI, ran few queries and, verified auth and querying
> >> works fine
> >> Logged bugs for 2 minor issues/improvements (DRILL-4508
> >>  & DRILL-4509
> >> )
> >>
> >> Thanks
> >> Venki
> >>
> >> On Mon, Mar 14, 2016 at 10:56 AM, Norris Lee  wrote:
> >>
> >>> +1 (Non-binding)
> >>>
> >>> Build from source on CentOS. Tested the ODBC driver with queries
> against
> >>> hive and DFS (json, parquet, tsv, csv, directories).
> >>>
> >>> Norris
> >>>
> >>> -Original Message-
> >>> From: Hsuan Yi Chu [mailto:hyi...@maprtech.com]
> >>> Sent: Monday, March 14, 2016 10:42 AM
> >>> To: dev@drill.apache.org; adityakish...@gmail.com
> >>> Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0
> >>>
> >>> +1
> >>> mvn clean install on linux vm; Tried some queries; Looks good.
> >>>
> >>> On Mon, Mar 14, 2016 at 9:58 AM, Aditya 
> wrote:
> >>>
>  While I did verify the signature and structure of the maven artifacts,
>  I think Jacques was referring to verify the functionality, which I
> have
> >>> not.
> 
>  On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra 
> >>> wrote:
> 
> > Aditya has verified the maven artifacts. Would it make sense to
> > extend
>  the
> > vote by another day to let more people verify the release?
> >
> >
> >
> > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> > wrote:
> >
> >> I haven't had a chance to validate yet.  Has anyone checked the
> >> maven artifacts yet?
> >> On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> >>
> >>> +1 (binding).
> >>>
> >>> * Verified checksum and signature of all release artifacts in[1]
> >>> and
> >> maven
> >>> artifacts in [2] and the artifacts are signed using Parth's
> >>> public
>  key
> >> (ID
> >>> 9BAA73B0).
> >>> * Verified that build and tests pass using the source artifact.
> >>> * Verified that Drill can be launched in embedded mode using the
> >>> convenience binary release.
> >>> * Ran sample queries using classpath storage plugin.
> >>>
> >>> p.s. Have enhanced the release verification script [3] to allow
> > automatic
> >>> download and verification of release artifacts through the pull
>  request
> >>> 249[4]. Will merge if someone can review it.
> >>>
> >>> [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> >>> [2]
> >> https://repository.apache.org/content/repositories/orgapachedrill-
> >> 1030
> >>> [3]
> > https://github.com/apache/drill/blob/master/tools/verify_release.sh
> >>> [4] https://github.com/apache/drill/pull/249
> >>>
> >>> On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> >>> adene...@maprtech.com
>  wrote:
> >>>
>  +1
> 
>  built from source with mapr profile and deployed on 2 nodes,
>  then
>  run
>  window functions from Drill's test framework. Also took a
>  quick
>  look
> > at
> >>> the
>  WebUI. Everything looks fine
> 
>  On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra
>  
> >>> wrote:
> 
> > Added GPG key
> >
> > On S

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Sudheesh Katkam

+1 (non-binding)

* downloaded and built from source tar-ball; ran unit tests successfully on 
Ubuntu
* ran simple queries (including cancellations) in embedded mode on Mac; 
verified states in web UI
* ran simple queries (including cancellations) on a 3 node cluster; verified 
states in web UI

* tested maven artifacts (drill-jdbc) using a sample application 
.
This application is based on DrillClient, and not JDBC API. I had to make two 
changes for this application to work (i.e. not backward compatible). However, 
these changes are not related to this release (commits responsible: 1fde9bb 

 and de00881 
).
 We should have a conversation about what constitutes public API and changes to 
this API on a separate thread.

Thank you,
Sudheesh

> On Mar 14, 2016, at 12:04 PM, Abhishek Girish  
> wrote:
> 
> +1 (non-binding)
> 
> - Tested Drill in distributed mode (built with MapR profile).
> - Ran functional tests from Drill-Test-Framework [1]
> - Tested Web UI (basic sanity)
> - Tested Sqlline
> 
> Looks good.
> 
> 
> [1] https://github.com/mapr/drill-test-framework
> 
> On Mon, Mar 14, 2016 at 11:23 AM, Venki Korukanti > wrote:
> 
>> +1
>> 
>> Installed tar.gz on a 3 node cluster.
>> Ran queries on data located in HDFS
>> Enabled auth in WebUI, ran few queries and, verified auth and querying
>> works fine
>> Logged bugs for 2 minor issues/improvements (DRILL-4508
>>  & DRILL-4509
>> )
>> 
>> Thanks
>> Venki
>> 
>> On Mon, Mar 14, 2016 at 10:56 AM, Norris Lee  wrote:
>> 
>>> +1 (Non-binding)
>>> 
>>> Build from source on CentOS. Tested the ODBC driver with queries against
>>> hive and DFS (json, parquet, tsv, csv, directories).
>>> 
>>> Norris
>>> 
>>> -Original Message-
>>> From: Hsuan Yi Chu [mailto:hyi...@maprtech.com]
>>> Sent: Monday, March 14, 2016 10:42 AM
>>> To: dev@drill.apache.org; adityakish...@gmail.com
>>> Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0
>>> 
>>> +1
>>> mvn clean install on linux vm; Tried some queries; Looks good.
>>> 
>>> On Mon, Mar 14, 2016 at 9:58 AM, Aditya  wrote:
>>> 
 While I did verify the signature and structure of the maven artifacts,
 I think Jacques was referring to verify the functionality, which I have
>>> not.
 
 On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra 
>>> wrote:
 
> Aditya has verified the maven artifacts. Would it make sense to
> extend
 the
> vote by another day to let more people verify the release?
> 
> 
> 
> On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> wrote:
> 
>> I haven't had a chance to validate yet.  Has anyone checked the
>> maven artifacts yet?
>> On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
>> 
>>> +1 (binding).
>>> 
>>> * Verified checksum and signature of all release artifacts in[1]
>>> and
>> maven
>>> artifacts in [2] and the artifacts are signed using Parth's
>>> public
 key
>> (ID
>>> 9BAA73B0).
>>> * Verified that build and tests pass using the source artifact.
>>> * Verified that Drill can be launched in embedded mode using the
>>> convenience binary release.
>>> * Ran sample queries using classpath storage plugin.
>>> 
>>> p.s. Have enhanced the release verification script [3] to allow
> automatic
>>> download and verification of release artifacts through the pull
 request
>>> 249[4]. Will merge if someone can review it.
>>> 
>>> [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
>>> [2]
>> https://repository.apache.org/content/repositories/orgapachedrill-
>> 1030
>>> [3]
> https://github.com/apache/drill/blob/master/tools/verify_release.sh
>>> [4] https://github.com/apache/drill/pull/249
>>> 
>>> On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
>>> adene...@maprtech.com
 wrote:
>>> 
 +1
 
 built from source with mapr profile and deployed on 2 nodes,
 then
 run
 window functions from Drill's test framework. Also took a
 quick
 look
> at
>>> the
 WebUI. Everything looks fine
 
 On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra
 
>>> wrote:
 
> Added GPG key
> 
> On Sat, Mar 12, 2016 at 6:48 PM, Aditya
> 
>>> wrote:
> 
>> I couldn't find your signing keys[1].
>> 
>> [1] https://github.com/apache/drill/blob/master/KEYS
>> 
>> On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra <
 par...@apache.org
>> 
> wrote:
>> 
>>> Hello all,
>>> 
>>> I'd like to propose the

[GitHub] drill pull request: DRILL-4504: Create an event loop for each of [...

2016-03-14 Thread hnfgns

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56060382
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
+  private volatile boolean connected; // = false
 
-  public DrillClient() throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient() {
 this(DrillConfig.create(), false);
   }
 
-  public DrillClient(boolean isDirect) throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(boolean isDirect) {
 this(DrillConfig.create(), isDirect);
   }
 
-  public DrillClient(String fileName) throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(String fileName) {
 this(DrillConfig.create(fileName), false);
   }
 
-  public DrillClient(DrillConfig config) throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config) {
 this(config, null, false);
   }
 
-  public DrillClient(DrillConfig config, boolean isDirect)
-  throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, boolean isDirect) {
 this(config, null, isDirect);
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator)
-throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, ClusterCoordinator coordinator) {
 this(config, coordinator, null, false);
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator, 
boolean isDirect)
-throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, ClusterCoordinator coordinator, 
boolean isDirect) {
 this(config, coordinator, null, isDirect);
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator, 
BufferAllocator allocator)
-  throws OutOfMemoryException {
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */

[GitHub] drill pull request: DRILL-4504: Create an event loop for each of [...

2016-03-14 Thread hnfgns

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/429#discussion_r56058208
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -74,73 +74,148 @@
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * Use the builder class ({@link DrillClient.Builder}) to build objects of 
this class.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // if the following variables are set during construction, they are not 
overridden during or after #connect call
+  // otherwise, they are set to defaults during #connect call
   private EventLoopGroup eventLoopGroup;
   private ExecutorService executor;
+  private boolean supportComplexTypes;
+
+  // the following variables are set during connection, and must not be 
overridden later
+  private UserClient client;
+  private UserProperties props;
+  private volatile ClusterCoordinator clusterCoordinator;
--- End diff --

-0. 

Why volatile here? Is DrillClient meant to be thread safe? If so, we seem 
to have more work to do: #close for instance. Otherwise volatile seems totally 
irrelevant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Abhishek Girish

+1 (non-binding)

- Tested Drill in distributed mode (built with MapR profile).
- Ran functional tests from Drill-Test-Framework [1]
- Tested Web UI (basic sanity)
- Tested Sqlline

Looks good.


[1] https://github.com/mapr/drill-test-framework

On Mon, Mar 14, 2016 at 11:23 AM, Venki Korukanti  wrote:

> +1
>
> Installed tar.gz on a 3 node cluster.
> Ran queries on data located in HDFS
> Enabled auth in WebUI, ran few queries and, verified auth and querying
> works fine
> Logged bugs for 2 minor issues/improvements (DRILL-4508
>  & DRILL-4509
> )
>
> Thanks
> Venki
>
> On Mon, Mar 14, 2016 at 10:56 AM, Norris Lee  wrote:
>
> > +1 (Non-binding)
> >
> > Build from source on CentOS. Tested the ODBC driver with queries against
> > hive and DFS (json, parquet, tsv, csv, directories).
> >
> > Norris
> >
> > -Original Message-
> > From: Hsuan Yi Chu [mailto:hyi...@maprtech.com]
> > Sent: Monday, March 14, 2016 10:42 AM
> > To: dev@drill.apache.org; adityakish...@gmail.com
> > Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0
> >
> > +1
> > mvn clean install on linux vm; Tried some queries; Looks good.
> >
> > On Mon, Mar 14, 2016 at 9:58 AM, Aditya  wrote:
> >
> > > While I did verify the signature and structure of the maven artifacts,
> > > I think Jacques was referring to verify the functionality, which I have
> > not.
> > >
> > > On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra 
> > wrote:
> > >
> > > > Aditya has verified the maven artifacts. Would it make sense to
> > > > extend
> > > the
> > > > vote by another day to let more people verify the release?
> > > >
> > > >
> > > >
> > > > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> > > > wrote:
> > > >
> > > > > I haven't had a chance to validate yet.  Has anyone checked the
> > > > > maven artifacts yet?
> > > > > On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> > > > >
> > > > > > +1 (binding).
> > > > > >
> > > > > > * Verified checksum and signature of all release artifacts in[1]
> > > > > > and
> > > > > maven
> > > > > > artifacts in [2] and the artifacts are signed using Parth's
> > > > > > public
> > > key
> > > > > (ID
> > > > > > 9BAA73B0).
> > > > > > * Verified that build and tests pass using the source artifact.
> > > > > > * Verified that Drill can be launched in embedded mode using the
> > > > > > convenience binary release.
> > > > > > * Ran sample queries using classpath storage plugin.
> > > > > >
> > > > > > p.s. Have enhanced the release verification script [3] to allow
> > > > automatic
> > > > > > download and verification of release artifacts through the pull
> > > request
> > > > > > 249[4]. Will merge if someone can review it.
> > > > > >
> > > > > > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > > > > [2]
> > > > > https://repository.apache.org/content/repositories/orgapachedrill-
> > > > > 1030
> > > > > > [3]
> > > > https://github.com/apache/drill/blob/master/tools/verify_release.sh
> > > > > > [4] https://github.com/apache/drill/pull/249
> > > > > >
> > > > > > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> > > > > > adene...@maprtech.com
> > > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > built from source with mapr profile and deployed on 2 nodes,
> > > > > > > then
> > > run
> > > > > > > window functions from Drill's test framework. Also took a
> > > > > > > quick
> > > look
> > > > at
> > > > > > the
> > > > > > > WebUI. Everything looks fine
> > > > > > >
> > > > > > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra
> > > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > >> Added GPG key
> > > > > > >>
> > > > > > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya
> > > > > > >> 
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > I couldn't find your signing keys[1].
> > > > > > >> >
> > > > > > >> > [1] https://github.com/apache/drill/blob/master/KEYS
> > > > > > >> >
> > > > > > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra <
> > > par...@apache.org
> > > > >
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Hello all,
> > > > > > >> > >
> > > > > > >> > > I'd like to propose the zeroth release candidate (rc0) of
> > > Apache
> > > > > > >> Drill,
> > > > > > >> > > version 1.6.0.
> > > > > > >> > > It covers a total of 44 resolved JIRAs [1].
> > > > > > >> > > Thanks to everyone who contributed to this release.
> > > > > > >> > >
> > > > > > >> > > The tarball artifacts are hosted at [2] and the maven
> > > artifacts
> > > > > are
> > > > > > >> > hosted
> > > > > > >> > > at [3].
> > > > > > >> > >
> > > > > > >> > > This release candidate is based on commit
> > > > > > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > > > > > >> > >
> > > > > > >> > > The vote will be open for the next ~72 hours ending at
> > > > > > >> > > 7:10 AM
> > > > > > >> Pacific,
> > > > > > >> > > March
> > > > > > >> > > 14, 2016.
> > > > > > >> > >
>

[jira] [Created] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dril

2016-03-14 Thread Chun Chang (JIRA)

Chun Chang created DRILL-4510:
-

 Summary: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarCharVector
 Key: DRILL-4510
 URL: https://issues.apache.org/jira/browse/DRILL-4510
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Chun Chang
Priority: Critical


Hit the following regression running advanced automation. Regression happened 
between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and 
1f23b89623c72808f2ee866cec9b4b8a48929d68

{noformat}
Execution Failures:
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
Query: 
-- start query 66 in stream 0 using template query66.tpl 
SELECT w_warehouse_name, 
   w_warehouse_sq_ft, 
   w_city, 
   w_county, 
   w_state, 
   w_country, 
   ship_carriers, 
   year1,
   Sum(jan_sales) AS jan_sales, 
   Sum(feb_sales) AS feb_sales, 
   Sum(mar_sales) AS mar_sales, 
   Sum(apr_sales) AS apr_sales, 
   Sum(may_sales) AS may_sales, 
   Sum(jun_sales) AS jun_sales, 
   Sum(jul_sales) AS jul_sales, 
   Sum(aug_sales) AS aug_sales, 
   Sum(sep_sales) AS sep_sales, 
   Sum(oct_sales) AS oct_sales, 
   Sum(nov_sales) AS nov_sales, 
   Sum(dec_sales) AS dec_sales, 
   Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
   Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
   Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
   Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
   Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
   Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
   Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
   Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
   Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
   Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
   Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
   Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
   Sum(jan_net)   AS jan_net, 
   Sum(feb_net)   AS feb_net, 
   Sum(mar_net)   AS mar_net, 
   Sum(apr_net)   AS apr_net, 
   Sum(may_net)   AS may_net, 
   Sum(jun_net)   AS jun_net, 
   Sum(jul_net)   AS jul_net, 
   Sum(aug_net)   AS aug_net, 
   Sum(sep_net)   AS sep_net, 
   Sum(oct_net)   AS oct_net, 
   Sum(nov_net)   AS nov_net, 
   Sum(dec_net)   AS dec_net 
FROM   (SELECT w_warehouse_name, 
   w_warehouse_sq_ft, 
   w_city, 
   w_county, 
   w_state, 
   w_country, 
   'ZOUROS' 
   || ',' 
   || 'ZHOU' AS ship_carriers, 
   d_yearAS year1, 
   Sum(CASE 
 WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS jan_sales, 
   Sum(CASE 
 WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS feb_sales, 
   Sum(CASE 
 WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS mar_sales, 
   Sum(CASE 
 WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS apr_sales, 
   Sum(CASE 
 WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS may_sales, 
   Sum(CASE 
 WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity 
 ELSE 0 
   END)  AS jun_sales, 
   Sum(CASE 
 WHEN d_

[GitHub] drill pull request: DRILL-4479: Use varchar for default column whe...

2016-03-14 Thread hnfgns

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/420#issuecomment-196464641
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-14 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4490.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in commit: 46e3de790da8f9c6d2d18e7e40fd37c01b3b1681


> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.6.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Venki Korukanti

+1

Installed tar.gz on a 3 node cluster.
Ran queries on data located in HDFS
Enabled auth in WebUI, ran few queries and, verified auth and querying
works fine
Logged bugs for 2 minor issues/improvements (DRILL-4508
 & DRILL-4509
)

Thanks
Venki

On Mon, Mar 14, 2016 at 10:56 AM, Norris Lee  wrote:

> +1 (Non-binding)
>
> Build from source on CentOS. Tested the ODBC driver with queries against
> hive and DFS (json, parquet, tsv, csv, directories).
>
> Norris
>
> -Original Message-
> From: Hsuan Yi Chu [mailto:hyi...@maprtech.com]
> Sent: Monday, March 14, 2016 10:42 AM
> To: dev@drill.apache.org; adityakish...@gmail.com
> Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0
>
> +1
> mvn clean install on linux vm; Tried some queries; Looks good.
>
> On Mon, Mar 14, 2016 at 9:58 AM, Aditya  wrote:
>
> > While I did verify the signature and structure of the maven artifacts,
> > I think Jacques was referring to verify the functionality, which I have
> not.
> >
> > On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra 
> wrote:
> >
> > > Aditya has verified the maven artifacts. Would it make sense to
> > > extend
> > the
> > > vote by another day to let more people verify the release?
> > >
> > >
> > >
> > > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> > > wrote:
> > >
> > > > I haven't had a chance to validate yet.  Has anyone checked the
> > > > maven artifacts yet?
> > > > On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> > > >
> > > > > +1 (binding).
> > > > >
> > > > > * Verified checksum and signature of all release artifacts in[1]
> > > > > and
> > > > maven
> > > > > artifacts in [2] and the artifacts are signed using Parth's
> > > > > public
> > key
> > > > (ID
> > > > > 9BAA73B0).
> > > > > * Verified that build and tests pass using the source artifact.
> > > > > * Verified that Drill can be launched in embedded mode using the
> > > > > convenience binary release.
> > > > > * Ran sample queries using classpath storage plugin.
> > > > >
> > > > > p.s. Have enhanced the release verification script [3] to allow
> > > automatic
> > > > > download and verification of release artifacts through the pull
> > request
> > > > > 249[4]. Will merge if someone can review it.
> > > > >
> > > > > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > > > [2]
> > > > https://repository.apache.org/content/repositories/orgapachedrill-
> > > > 1030
> > > > > [3]
> > > https://github.com/apache/drill/blob/master/tools/verify_release.sh
> > > > > [4] https://github.com/apache/drill/pull/249
> > > > >
> > > > > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> > > > > adene...@maprtech.com
> > > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > built from source with mapr profile and deployed on 2 nodes,
> > > > > > then
> > run
> > > > > > window functions from Drill's test framework. Also took a
> > > > > > quick
> > look
> > > at
> > > > > the
> > > > > > WebUI. Everything looks fine
> > > > > >
> > > > > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra
> > > > > > 
> > > > > wrote:
> > > > > >
> > > > > >> Added GPG key
> > > > > >>
> > > > > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya
> > > > > >> 
> > > > > wrote:
> > > > > >>
> > > > > >> > I couldn't find your signing keys[1].
> > > > > >> >
> > > > > >> > [1] https://github.com/apache/drill/blob/master/KEYS
> > > > > >> >
> > > > > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra <
> > par...@apache.org
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hello all,
> > > > > >> > >
> > > > > >> > > I'd like to propose the zeroth release candidate (rc0) of
> > Apache
> > > > > >> Drill,
> > > > > >> > > version 1.6.0.
> > > > > >> > > It covers a total of 44 resolved JIRAs [1].
> > > > > >> > > Thanks to everyone who contributed to this release.
> > > > > >> > >
> > > > > >> > > The tarball artifacts are hosted at [2] and the maven
> > artifacts
> > > > are
> > > > > >> > hosted
> > > > > >> > > at [3].
> > > > > >> > >
> > > > > >> > > This release candidate is based on commit
> > > > > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > > > > >> > >
> > > > > >> > > The vote will be open for the next ~72 hours ending at
> > > > > >> > > 7:10 AM
> > > > > >> Pacific,
> > > > > >> > > March
> > > > > >> > > 14, 2016.
> > > > > >> > >
> > > > > >> > > [ ] +1
> > > > > >> > > [ ] +0
> > > > > >> > > [ ] -1
> > > > > >> > >
> > > > > >> > > Here's my vote: +1
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Parth
> > > > > >> > >
> > > > > >> > > [1]
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill
> > %22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D
> > 1.6.0
> > > > > >> > > [2]
> > > > > >> > > http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/

[jira] [Created] (DRILL-4509) Ignore unknown storage plugin configs while starting Drillbit

2016-03-14 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4509:
--

 Summary: Ignore unknown storage plugin configs while starting 
Drillbit
 Key: DRILL-4509
 URL: https://issues.apache.org/jira/browse/DRILL-4509
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Priority: Minor
 Fix For: 1.7.0


If zookeeper contains a storage plugin configuration whose implementation is 
not found while starting the Drillbit, Drillbit throws an error and fails to 
restart:

{code}
Could not resolve type id 'newPlugin' into a subtype of [simple type, class 
org.apache.drill.common.logical.StoragePluginConfig]: known type ids = 
[InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file, hbase, 
hive, jdbc, kudu, mock, mongo, named]
{code}

Should we ignore such plugins with a warning in logs and continue starting 
Drillbit?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

RE: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Norris Lee

+1 (Non-binding)

Build from source on CentOS. Tested the ODBC driver with queries against hive 
and DFS (json, parquet, tsv, csv, directories).

Norris

-Original Message-
From: Hsuan Yi Chu [mailto:hyi...@maprtech.com] 
Sent: Monday, March 14, 2016 10:42 AM
To: dev@drill.apache.org; adityakish...@gmail.com
Subject: Re: [VOTE] Release Apache Drill 1.6.0 - rc0

+1
mvn clean install on linux vm; Tried some queries; Looks good.

On Mon, Mar 14, 2016 at 9:58 AM, Aditya  wrote:

> While I did verify the signature and structure of the maven artifacts, 
> I think Jacques was referring to verify the functionality, which I have not.
>
> On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra  wrote:
>
> > Aditya has verified the maven artifacts. Would it make sense to 
> > extend
> the
> > vote by another day to let more people verify the release?
> >
> >
> >
> > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> > wrote:
> >
> > > I haven't had a chance to validate yet.  Has anyone checked the 
> > > maven artifacts yet?
> > > On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> > >
> > > > +1 (binding).
> > > >
> > > > * Verified checksum and signature of all release artifacts in[1] 
> > > > and
> > > maven
> > > > artifacts in [2] and the artifacts are signed using Parth's 
> > > > public
> key
> > > (ID
> > > > 9BAA73B0).
> > > > * Verified that build and tests pass using the source artifact.
> > > > * Verified that Drill can be launched in embedded mode using the 
> > > > convenience binary release.
> > > > * Ran sample queries using classpath storage plugin.
> > > >
> > > > p.s. Have enhanced the release verification script [3] to allow
> > automatic
> > > > download and verification of release artifacts through the pull
> request
> > > > 249[4]. Will merge if someone can review it.
> > > >
> > > > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > > [2]
> > > https://repository.apache.org/content/repositories/orgapachedrill-
> > > 1030
> > > > [3]
> > https://github.com/apache/drill/blob/master/tools/verify_release.sh
> > > > [4] https://github.com/apache/drill/pull/249
> > > >
> > > > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche < 
> > > > adene...@maprtech.com
> > > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > built from source with mapr profile and deployed on 2 nodes, 
> > > > > then
> run
> > > > > window functions from Drill's test framework. Also took a 
> > > > > quick
> look
> > at
> > > > the
> > > > > WebUI. Everything looks fine
> > > > >
> > > > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra 
> > > > > 
> > > > wrote:
> > > > >
> > > > >> Added GPG key
> > > > >>
> > > > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya 
> > > > >> 
> > > > wrote:
> > > > >>
> > > > >> > I couldn't find your signing keys[1].
> > > > >> >
> > > > >> > [1] https://github.com/apache/drill/blob/master/KEYS
> > > > >> >
> > > > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra <
> par...@apache.org
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hello all,
> > > > >> > >
> > > > >> > > I'd like to propose the zeroth release candidate (rc0) of
> Apache
> > > > >> Drill,
> > > > >> > > version 1.6.0.
> > > > >> > > It covers a total of 44 resolved JIRAs [1].
> > > > >> > > Thanks to everyone who contributed to this release.
> > > > >> > >
> > > > >> > > The tarball artifacts are hosted at [2] and the maven
> artifacts
> > > are
> > > > >> > hosted
> > > > >> > > at [3].
> > > > >> > >
> > > > >> > > This release candidate is based on commit 
> > > > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > > > >> > >
> > > > >> > > The vote will be open for the next ~72 hours ending at 
> > > > >> > > 7:10 AM
> > > > >> Pacific,
> > > > >> > > March
> > > > >> > > 14, 2016.
> > > > >> > >
> > > > >> > > [ ] +1
> > > > >> > > [ ] +0
> > > > >> > > [ ] -1
> > > > >> > >
> > > > >> > > Here's my vote: +1
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Parth
> > > > >> > >
> > > > >> > > [1]
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill
> %22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D
> 1.6.0
> > > > >> > > [2] 
> > > > >> > > http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > > >> > > [3]
> > > > >> >
> > > >
> https://repository.apache.org/content/repositories/orgapachedrill-1030
> > > > >> > > [4]
> > > > https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm
> _campaign=Free%20available
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Hsuan Yi Chu

+1
mvn clean install on linux vm; Tried some queries; Looks good.

On Mon, Mar 14, 2016 at 9:58 AM, Aditya  wrote:

> While I did verify the signature and structure of the maven artifacts, I
> think Jacques was referring to verify the functionality, which I have not.
>
> On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra  wrote:
>
> > Aditya has verified the maven artifacts. Would it make sense to extend
> the
> > vote by another day to let more people verify the release?
> >
> >
> >
> > On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> > wrote:
> >
> > > I haven't had a chance to validate yet.  Has anyone checked the maven
> > > artifacts yet?
> > > On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> > >
> > > > +1 (binding).
> > > >
> > > > * Verified checksum and signature of all release artifacts in[1] and
> > > maven
> > > > artifacts in [2] and the artifacts are signed using Parth's public
> key
> > > (ID
> > > > 9BAA73B0).
> > > > * Verified that build and tests pass using the source artifact.
> > > > * Verified that Drill can be launched in embedded mode using the
> > > > convenience binary release.
> > > > * Ran sample queries using classpath storage plugin.
> > > >
> > > > p.s. Have enhanced the release verification script [3] to allow
> > automatic
> > > > download and verification of release artifacts through the pull
> request
> > > > 249[4]. Will merge if someone can review it.
> > > >
> > > > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > > [2]
> > > https://repository.apache.org/content/repositories/orgapachedrill-1030
> > > > [3]
> > https://github.com/apache/drill/blob/master/tools/verify_release.sh
> > > > [4] https://github.com/apache/drill/pull/249
> > > >
> > > > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com
> > > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > built from source with mapr profile and deployed on 2 nodes, then
> run
> > > > > window functions from Drill's test framework. Also took a quick
> look
> > at
> > > > the
> > > > > WebUI. Everything looks fine
> > > > >
> > > > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra 
> > > > wrote:
> > > > >
> > > > >> Added GPG key
> > > > >>
> > > > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya 
> > > > wrote:
> > > > >>
> > > > >> > I couldn't find your signing keys[1].
> > > > >> >
> > > > >> > [1] https://github.com/apache/drill/blob/master/KEYS
> > > > >> >
> > > > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra <
> par...@apache.org
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hello all,
> > > > >> > >
> > > > >> > > I'd like to propose the zeroth release candidate (rc0) of
> Apache
> > > > >> Drill,
> > > > >> > > version 1.6.0.
> > > > >> > > It covers a total of 44 resolved JIRAs [1].
> > > > >> > > Thanks to everyone who contributed to this release.
> > > > >> > >
> > > > >> > > The tarball artifacts are hosted at [2] and the maven
> artifacts
> > > are
> > > > >> > hosted
> > > > >> > > at [3].
> > > > >> > >
> > > > >> > > This release candidate is based on commit
> > > > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > > > >> > >
> > > > >> > > The vote will be open for the next ~72 hours ending at 7:10 AM
> > > > >> Pacific,
> > > > >> > > March
> > > > >> > > 14, 2016.
> > > > >> > >
> > > > >> > > [ ] +1
> > > > >> > > [ ] +0
> > > > >> > > [ ] -1
> > > > >> > >
> > > > >> > > Here's my vote: +1
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Parth
> > > > >> > >
> > > > >> > > [1]
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
> > > > >> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > > >> > > [3]
> > > > >> >
> > > >
> https://repository.apache.org/content/repositories/orgapachedrill-1030
> > > > >> > > [4]
> > > > https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (DRILL-4508) Null proof all AutoCloseable.close() methods

2016-03-14 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4508:
--

 Summary: Null proof all AutoCloseable.close() methods
 Key: DRILL-4508
 URL: https://issues.apache.org/jira/browse/DRILL-4508
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Priority: Minor
 Fix For: 1.7.0


If Drillbit fails to start (due to incorrect configuration or storage plugin 
information not found etc.), we end up calling close on various components such 
as WebServer, Drillbit etc. Some of these components may not have initialized 
and may have null values. Close() method is not checking for null values before 
reading them. One example is here:

{code}
java.lang.NullPointerException: null
at 
org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280)
 ~[drill-java-exec-1.6.0.jar:1.6.0]
at 
org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) 
~[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) 
~[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) 
[drill-java-exec-1.6.0.jar:1.6.0]
{code}

This masks the actual error (incorrect configuration) and it is hard to know 
what went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-14 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4474.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in commit: 49ae6d363efe78df4e89f7913d1d560e9627b325

> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . . >

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Aditya

While I did verify the signature and structure of the maven artifacts, I
think Jacques was referring to verify the functionality, which I have not.

On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra  wrote:

> Aditya has verified the maven artifacts. Would it make sense to extend the
> vote by another day to let more people verify the release?
>
>
>
> On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> wrote:
>
> > I haven't had a chance to validate yet.  Has anyone checked the maven
> > artifacts yet?
> > On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
> >
> > > +1 (binding).
> > >
> > > * Verified checksum and signature of all release artifacts in[1] and
> > maven
> > > artifacts in [2] and the artifacts are signed using Parth's public key
> > (ID
> > > 9BAA73B0).
> > > * Verified that build and tests pass using the source artifact.
> > > * Verified that Drill can be launched in embedded mode using the
> > > convenience binary release.
> > > * Ran sample queries using classpath storage plugin.
> > >
> > > p.s. Have enhanced the release verification script [3] to allow
> automatic
> > > download and verification of release artifacts through the pull request
> > > 249[4]. Will merge if someone can review it.
> > >
> > > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > [2]
> > https://repository.apache.org/content/repositories/orgapachedrill-1030
> > > [3]
> https://github.com/apache/drill/blob/master/tools/verify_release.sh
> > > [4] https://github.com/apache/drill/pull/249
> > >
> > > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> > > adene...@maprtech.com
> > > > wrote:
> > >
> > > > +1
> > > >
> > > > built from source with mapr profile and deployed on 2 nodes, then run
> > > > window functions from Drill's test framework. Also took a quick look
> at
> > > the
> > > > WebUI. Everything looks fine
> > > >
> > > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra 
> > > wrote:
> > > >
> > > >> Added GPG key
> > > >>
> > > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya 
> > > wrote:
> > > >>
> > > >> > I couldn't find your signing keys[1].
> > > >> >
> > > >> > [1] https://github.com/apache/drill/blob/master/KEYS
> > > >> >
> > > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra  >
> > > >> wrote:
> > > >> >
> > > >> > > Hello all,
> > > >> > >
> > > >> > > I'd like to propose the zeroth release candidate (rc0) of Apache
> > > >> Drill,
> > > >> > > version 1.6.0.
> > > >> > > It covers a total of 44 resolved JIRAs [1].
> > > >> > > Thanks to everyone who contributed to this release.
> > > >> > >
> > > >> > > The tarball artifacts are hosted at [2] and the maven artifacts
> > are
> > > >> > hosted
> > > >> > > at [3].
> > > >> > >
> > > >> > > This release candidate is based on commit
> > > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > > >> > >
> > > >> > > The vote will be open for the next ~72 hours ending at 7:10 AM
> > > >> Pacific,
> > > >> > > March
> > > >> > > 14, 2016.
> > > >> > >
> > > >> > > [ ] +1
> > > >> > > [ ] +0
> > > >> > > [ ] -1
> > > >> > >
> > > >> > > Here's my vote: +1
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Parth
> > > >> > >
> > > >> > > [1]
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
> > > >> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > >> > > [3]
> > > >> >
> > > https://repository.apache.org/content/repositories/orgapachedrill-1030
> > > >> > > [4]
> > > https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > > >
> > >
> >
>

Re: [VOTE] Extended - Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Parth Chandra

Hello everyone,

  I'm extending the deadline to verify the release candidate. While there
are enough votes to pass rc0, I'm inclined to allow additional time to let
the broader community get a chance.

Voting will now end at *8:10 am PDT, March 15th, 2016*.

Thanks

Parth

On Mon, Mar 14, 2016 at 8:12 AM, Parth Chandra  wrote:

> Aditya has verified the maven artifacts. Would it make sense to extend the
> vote by another day to let more people verify the release?
>
>
>
> On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau 
> wrote:
>
>> I haven't had a chance to validate yet.  Has anyone checked the maven
>> artifacts yet?
>> On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
>>
>> > +1 (binding).
>> >
>> > * Verified checksum and signature of all release artifacts in[1] and
>> maven
>> > artifacts in [2] and the artifacts are signed using Parth's public key
>> (ID
>> > 9BAA73B0).
>> > * Verified that build and tests pass using the source artifact.
>> > * Verified that Drill can be launched in embedded mode using the
>> > convenience binary release.
>> > * Ran sample queries using classpath storage plugin.
>> >
>> > p.s. Have enhanced the release verification script [3] to allow
>> automatic
>> > download and verification of release artifacts through the pull request
>> > 249[4]. Will merge if someone can review it.
>> >
>> > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
>> > [2]
>> https://repository.apache.org/content/repositories/orgapachedrill-1030
>> > [3] https://github.com/apache/drill/blob/master/tools/verify_release.sh
>> > [4] https://github.com/apache/drill/pull/249
>> >
>> > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
>> > adene...@maprtech.com
>> > > wrote:
>> >
>> > > +1
>> > >
>> > > built from source with mapr profile and deployed on 2 nodes, then run
>> > > window functions from Drill's test framework. Also took a quick look
>> at
>> > the
>> > > WebUI. Everything looks fine
>> > >
>> > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra 
>> > wrote:
>> > >
>> > >> Added GPG key
>> > >>
>> > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya 
>> > wrote:
>> > >>
>> > >> > I couldn't find your signing keys[1].
>> > >> >
>> > >> > [1] https://github.com/apache/drill/blob/master/KEYS
>> > >> >
>> > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra 
>> > >> wrote:
>> > >> >
>> > >> > > Hello all,
>> > >> > >
>> > >> > > I'd like to propose the zeroth release candidate (rc0) of Apache
>> > >> Drill,
>> > >> > > version 1.6.0.
>> > >> > > It covers a total of 44 resolved JIRAs [1].
>> > >> > > Thanks to everyone who contributed to this release.
>> > >> > >
>> > >> > > The tarball artifacts are hosted at [2] and the maven artifacts
>> are
>> > >> > hosted
>> > >> > > at [3].
>> > >> > >
>> > >> > > This release candidate is based on commit
>> > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
>> > >> > >
>> > >> > > The vote will be open for the next ~72 hours ending at 7:10 AM
>> > >> Pacific,
>> > >> > > March
>> > >> > > 14, 2016.
>> > >> > >
>> > >> > > [ ] +1
>> > >> > > [ ] +0
>> > >> > > [ ] -1
>> > >> > >
>> > >> > > Here's my vote: +1
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Parth
>> > >> > >
>> > >> > > [1]
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
>> > >> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
>> > >> > > [3]
>> > >> >
>> > https://repository.apache.org/content/repositories/orgapachedrill-1030
>> > >> > > [4]
>> > https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Abdelhakim Deneche
>> > >
>> > > Software Engineer
>> > >
>> > >   
>> > >
>> > >
>> > > Now Available - Free Hadoop On-Demand Training
>> > > <
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > >
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Parth Chandra

Aditya has verified the maven artifacts. Would it make sense to extend the
vote by another day to let more people verify the release?



On Mon, Mar 14, 2016 at 7:08 AM, Jacques Nadeau  wrote:

> I haven't had a chance to validate yet.  Has anyone checked the maven
> artifacts yet?
> On Mar 14, 2016 6:37 AM, "Aditya"  wrote:
>
> > +1 (binding).
> >
> > * Verified checksum and signature of all release artifacts in[1] and
> maven
> > artifacts in [2] and the artifacts are signed using Parth's public key
> (ID
> > 9BAA73B0).
> > * Verified that build and tests pass using the source artifact.
> > * Verified that Drill can be launched in embedded mode using the
> > convenience binary release.
> > * Ran sample queries using classpath storage plugin.
> >
> > p.s. Have enhanced the release verification script [3] to allow automatic
> > download and verification of release artifacts through the pull request
> > 249[4]. Will merge if someone can review it.
> >
> > [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > [2]
> https://repository.apache.org/content/repositories/orgapachedrill-1030
> > [3] https://github.com/apache/drill/blob/master/tools/verify_release.sh
> > [4] https://github.com/apache/drill/pull/249
> >
> > On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > wrote:
> >
> > > +1
> > >
> > > built from source with mapr profile and deployed on 2 nodes, then run
> > > window functions from Drill's test framework. Also took a quick look at
> > the
> > > WebUI. Everything looks fine
> > >
> > > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra 
> > wrote:
> > >
> > >> Added GPG key
> > >>
> > >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya 
> > wrote:
> > >>
> > >> > I couldn't find your signing keys[1].
> > >> >
> > >> > [1] https://github.com/apache/drill/blob/master/KEYS
> > >> >
> > >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra 
> > >> wrote:
> > >> >
> > >> > > Hello all,
> > >> > >
> > >> > > I'd like to propose the zeroth release candidate (rc0) of Apache
> > >> Drill,
> > >> > > version 1.6.0.
> > >> > > It covers a total of 44 resolved JIRAs [1].
> > >> > > Thanks to everyone who contributed to this release.
> > >> > >
> > >> > > The tarball artifacts are hosted at [2] and the maven artifacts
> are
> > >> > hosted
> > >> > > at [3].
> > >> > >
> > >> > > This release candidate is based on commit
> > >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > >> > >
> > >> > > The vote will be open for the next ~72 hours ending at 7:10 AM
> > >> Pacific,
> > >> > > March
> > >> > > 14, 2016.
> > >> > >
> > >> > > [ ] +1
> > >> > > [ ] +0
> > >> > > [ ] -1
> > >> > >
> > >> > > Here's my vote: +1
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Parth
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
> > >> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > >> > > [3]
> > >> >
> > https://repository.apache.org/content/repositories/orgapachedrill-1030
> > >> > > [4]
> > https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> > >
> >
>

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Jacques Nadeau

I haven't had a chance to validate yet.  Has anyone checked the maven
artifacts yet?
On Mar 14, 2016 6:37 AM, "Aditya"  wrote:

> +1 (binding).
>
> * Verified checksum and signature of all release artifacts in[1] and maven
> artifacts in [2] and the artifacts are signed using Parth's public key (ID
> 9BAA73B0).
> * Verified that build and tests pass using the source artifact.
> * Verified that Drill can be launched in embedded mode using the
> convenience binary release.
> * Ran sample queries using classpath storage plugin.
>
> p.s. Have enhanced the release verification script [3] to allow automatic
> download and verification of release artifacts through the pull request
> 249[4]. Will merge if someone can review it.
>
> [1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> [2] https://repository.apache.org/content/repositories/orgapachedrill-1030
> [3] https://github.com/apache/drill/blob/master/tools/verify_release.sh
> [4] https://github.com/apache/drill/pull/249
>
> On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > wrote:
>
> > +1
> >
> > built from source with mapr profile and deployed on 2 nodes, then run
> > window functions from Drill's test framework. Also took a quick look at
> the
> > WebUI. Everything looks fine
> >
> > On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra 
> wrote:
> >
> >> Added GPG key
> >>
> >> On Sat, Mar 12, 2016 at 6:48 PM, Aditya 
> wrote:
> >>
> >> > I couldn't find your signing keys[1].
> >> >
> >> > [1] https://github.com/apache/drill/blob/master/KEYS
> >> >
> >> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra 
> >> wrote:
> >> >
> >> > > Hello all,
> >> > >
> >> > > I'd like to propose the zeroth release candidate (rc0) of Apache
> >> Drill,
> >> > > version 1.6.0.
> >> > > It covers a total of 44 resolved JIRAs [1].
> >> > > Thanks to everyone who contributed to this release.
> >> > >
> >> > > The tarball artifacts are hosted at [2] and the maven artifacts are
> >> > hosted
> >> > > at [3].
> >> > >
> >> > > This release candidate is based on commit
> >> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> >> > >
> >> > > The vote will be open for the next ~72 hours ending at 7:10 AM
> >> Pacific,
> >> > > March
> >> > > 14, 2016.
> >> > >
> >> > > [ ] +1
> >> > > [ ] +0
> >> > > [ ] -1
> >> > >
> >> > > Here's my vote: +1
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Parth
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
> >> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> >> > > [3]
> >> >
> https://repository.apache.org/content/repositories/orgapachedrill-1030
> >> > > [4]
> https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Aditya

+1 (binding).

* Verified checksum and signature of all release artifacts in[1] and maven
artifacts in [2] and the artifacts are signed using Parth's public key (ID
9BAA73B0).
* Verified that build and tests pass using the source artifact.
* Verified that Drill can be launched in embedded mode using the
convenience binary release.
* Ran sample queries using classpath storage plugin.

p.s. Have enhanced the release verification script [3] to allow automatic
download and verification of release artifacts through the pull request
249[4]. Will merge if someone can review it.

[1] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
[2] https://repository.apache.org/content/repositories/orgapachedrill-1030
[3] https://github.com/apache/drill/blob/master/tools/verify_release.sh
[4] https://github.com/apache/drill/pull/249

On Mon, Mar 14, 2016 at 12:51 AM, Abdel Hakim Deneche  wrote:

> +1
>
> built from source with mapr profile and deployed on 2 nodes, then run
> window functions from Drill's test framework. Also took a quick look at the
> WebUI. Everything looks fine
>
> On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra  wrote:
>
>> Added GPG key
>>
>> On Sat, Mar 12, 2016 at 6:48 PM, Aditya  wrote:
>>
>> > I couldn't find your signing keys[1].
>> >
>> > [1] https://github.com/apache/drill/blob/master/KEYS
>> >
>> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra 
>> wrote:
>> >
>> > > Hello all,
>> > >
>> > > I'd like to propose the zeroth release candidate (rc0) of Apache
>> Drill,
>> > > version 1.6.0.
>> > > It covers a total of 44 resolved JIRAs [1].
>> > > Thanks to everyone who contributed to this release.
>> > >
>> > > The tarball artifacts are hosted at [2] and the maven artifacts are
>> > hosted
>> > > at [3].
>> > >
>> > > This release candidate is based on commit
>> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
>> > >
>> > > The vote will be open for the next ~72 hours ending at 7:10 AM
>> Pacific,
>> > > March
>> > > 14, 2016.
>> > >
>> > > [ ] +1
>> > > [ ] +0
>> > > [ ] -1
>> > >
>> > > Here's my vote: +1
>> > >
>> > > Thanks,
>> > >
>> > > Parth
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
>> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
>> > > [3]
>> > https://repository.apache.org/content/repositories/orgapachedrill-1030
>> > > [4] https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
>> > >
>> >
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> 
>

[jira] [Created] (DRILL-4507) TO_TIMESTAMP does not generate TIMESTAMP data type in metadata

2016-03-14 Thread Ian Hellstrom (JIRA)

Ian Hellstrom created DRILL-4507:


 Summary: TO_TIMESTAMP does not generate TIMESTAMP data type in 
metadata
 Key: DRILL-4507
 URL: https://issues.apache.org/jira/browse/DRILL-4507
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.5.0
Reporter: Ian Hellstrom


When creating a view that contains the `TO_TIMESTAMP()` casting function, the 
resulting column does not show up as a `TIMESTAMP` but rather as `ANY`:

{code}
 CREATE VIEW timestamp_test AS SELECT TO_TIMESTAMP('2008-2-23 12:00:00', 
'-MM-dd HH:mm:ss') FROM (VALUES(1));
DESCRIBE timestamp_test;
{code}

yields:

{code}
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| EXPR$0   | ANY| YES  |
+--++--+
{code}

The same is true when using `SUBSTR`, which ought to return strings, but in 
reality shows up as `ANY` in the description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Working with Case-Sensitive Data-sources

2016-03-14 Thread Abhishek Girish

Hello all,

As I understand, Drill by design is case-insensitive, w.r.t column names
within a table or file [1]. While this provides great flexibility and works
well with many data-sources, there are issues when working with
case-sensitive data-sources such as HBase / MapR-DB.

Consider the following JSON file:

{"_id": "ID1",
 *"Name"* : "ABC",
 "Age" : "25",
 "Phone" : null
}
{"_id": "ID2",
 *"name"* : "PQR",
 "Age" : "30",
 "Phone" : "408-123-456"
}
{"_id": "ID3",
 *"NAME"* : "XYZ",
 "Phone" : ""
}

Note that the case of the name field within the JSON file is of mixed-case.

>From Drill, while querying the JSON file directly (or corresponding content
in Parquet or Text formats), we get results which we as Drill users have
come to expect:

> select NAME from mfs.`/tmp/json/a.json`;
+---+
| NAME  |
+---+
| ABC   |
| PQR   |
| XYZ   |
+---+


However, while querying a case-sensitive datasource (*with pushdown enabled*)
the following results are returned. The case provided in the query text is
honored and would determine the results. This could come as a *slight
surprise to certain Drill users* exploring/migrating to new Databases
(using new Storage / Format plugins within Drill)

> select *Name* from mfs.`/tmp/json/a`;
+---+
| Name  |
+---+
| ABC   |
+---+

> select *name* from mfs.`/tmp/json/a`;
+---+
| name  |
+---+
| PQR   |
+---+

> select *NAME* from mfs.`/tmp/json/a`;
+---+
| NAME  |
+---+
| XYZ   |
+---+


> select *nAME* from mfs.`/tmp/json/a`;
+---+
| nAME  |
+---+
+---+
No rows selected

There is no easy way to get all matching rows (irrespective of the case of
the column name). In the above example, the first row matching the provided
case is returned.


> select *Name, name, NAME* from mfs.`/tmp/json/a`;
+---+++
| Name  | name0  | NAME1  |
+---+++
| ABC   | ABC| ABC|
+---+++

> select *NAME, Name, name* from mfs.`/tmp/json/a`;
+---+++
| NAME  | Name0  | name1  |
+---+++
| XYZ   | XYZ| XYZ|
+---+++


If Pushdown features are disabled, the behavior seen above would indeed
match JSON files. However, this could come at a cost of not fully utilizing
the power of the underlying data-source, and could lead to performance
issues.

*In-consistent Results can happen when:*

(1) Dataset has mixed-cases for fields. Example seen above. While this
might not be very common, the concerns are still valid*, *since substantial
Drill users are exploring Drill for ETL cases where Data is not completely
sanitized.

(2) Data is consistent w.r.t case, but the query text has non-matching
case. While some could term this as user error, it could still cause issues
when users, applications or the underlying datasources change.

In both the above cases, Drill would silently perform the query and return
results which could be either *none, partial, complete/correct or entirely*
*wrong*.

Some specific questions:

(1) *Supporting Case-In-sensitive Behavior for Case-Sensitive Data-sources.
*For users who prefer the flexibility, how can Drill ensure that the
underlying data-source can return case-insensitive results.

(2) *Supporting Case-Sensitive Behavior. *How can Drill OPTIONALLY support
case-sensitive behavior for data-sources. Users coming from case-sensitive
databases might want results matching the provided case. Example using the
above data:

> select _id, *Name, name, NAME* from mfs.`/tmp/json/a`;
+--+---+++

| _id  | Name  | name   | NAME   |
+--+---+++
| ID1  | ABC   | null   | null   |
+--+---+++
| ID2  | null  | PQR| null   |
+--+---+++
| ID3  | null  | null   | XYZ|
+--+---+++


(3) How does Drill currently work with *MongoDB*, which i guess is a
case-sensitive database? Have these issues ever been discussed previously?


Thanks in advance. I'd appreciate any helpful response.

Regards,
Abhishek


[1] https://drill.apache.org/docs/lexical-structure/#case-sensitivity

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-14 Thread Abdel Hakim Deneche

+1

built from source with mapr profile and deployed on 2 nodes, then run
window functions from Drill's test framework. Also took a quick look at the
WebUI. Everything looks fine

On Sun, Mar 13, 2016 at 5:53 PM, Parth Chandra  wrote:

> Added GPG key
>
> On Sat, Mar 12, 2016 at 6:48 PM, Aditya  wrote:
>
> > I couldn't find your signing keys[1].
> >
> > [1] https://github.com/apache/drill/blob/master/KEYS
> >
> > On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra 
> wrote:
> >
> > > Hello all,
> > >
> > > I'd like to propose the zeroth release candidate (rc0) of Apache Drill,
> > > version 1.6.0.
> > > It covers a total of 44 resolved JIRAs [1].
> > > Thanks to everyone who contributed to this release.
> > >
> > > The tarball artifacts are hosted at [2] and the maven artifacts are
> > hosted
> > > at [3].
> > >
> > > This release candidate is based on commit
> > > d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb located at [4].
> > >
> > > The vote will be open for the next ~72 hours ending at 7:10 AM Pacific,
> > > March
> > > 14, 2016.
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1
> > >
> > > Here's my vote: +1
> > >
> > > Thanks,
> > >
> > > Parth
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%3D%22Apache%20Drill%22%20and%20status%20in%20(resolved%2C%20closed)%20and%20fixVersion%3D1.6.0
> > > [2] http://home.apache.org/~parthc/drill/releases/1.6.0/rc0/
> > > [3]
> > https://repository.apache.org/content/repositories/orgapachedrill-1030
> > > [4] https://github.com/parthchandra/incubator-drill/tree/drill-1.6.0
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training

40 matches

Mail list logo