Re: [DISCUSS] 1.14.0 release

2018-06-13 Thread Padma Penumarthy
I am planning to open couple of batch sizing PRs this week that I would like to 
get in.

Thanks
Padma


> On Jun 13, 2018, at 11:59 AM, Vlad Rozov  wrote:
> 
> DRILL-6422: Update guava to 23.0 and shade it (PR in review)
> DRILL-6353: Upgrade Parquet MR dependencies (ready-to-commit)
> 
> Thank you,
> 
> Vlad
> 
> On 6/12/18 17:16, Boaz Ben-Zvi wrote:
>> Hello Drillers,
>>   Nearly three months have passed since the 1.13.0 release, and it is time 
>> to start planning for the 1.14.0 ; I volunteer to manage the new release.
>>   If there is any ongoing work not yet committed into the Apache Drill 
>> master, that you strongly feel MUST be included in the 1.14 release, please 
>> reply to this thread. There are quite a few pending PRs, and we should 
>> prioritize and close the needed ones soon enough.
>> Thanks,
>>Boaz
>> 
> 



[jira] [Created] (DRILL-6495) Fragment error message profile dumped into log file. Why?

2018-06-13 Thread Chun Chang (JIRA)
Chun Chang created DRILL-6495:
-

 Summary: Fragment error message profile dumped into log file. Why?
 Key: DRILL-6495
 URL: https://issues.apache.org/jira/browse/DRILL-6495
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.13.0
Reporter: Chun Chang


When a query fails for some reason, we dump the following gigantic json profile 
into drillbit.log. Why do we do this? Has anyone found this info useful? It 
completely clutters the log file. If the profile contains crucial information, 
I recommend finding an alternative.

{noformat}
2018-06-13 14:47:31,094 [24de6f10-0e19-1769-a136-76b15ce5b832:frag:2:7] INFO 
o.a.d.e.w.fragment.FragmentExecutor - 24de6f10-0e19-1769-a136-76b15ce5b832:2:7: 
State change requested CANCELLATION_REQUESTED --> FINISHED
2018-06-13 14:47:31,094 [24de6f10-0e19-1769-a136-76b15ce5b832:frag:2:7] INFO 
o.a.d.e.w.f.FragmentStatusReporter - 24de6f10-0e19-1769-a136-76b15ce5b832:2:7: 
State to report: CANCELLED
2018-06-13 14:47:31,095 [24de6f10-0e19-1769-a136-76b15ce5b832:frag:2:7] WARN 
o.a.d.exec.rpc.control.WorkEventBus - A fragment message arrived but there was 
no registered listener for that message: profile {
state: CANCELLED
minor_fragment_id: 7
operator_profile {
input_profile {
records: 4096
batches: 1
schemas: 1
}
operator_id: 10
operator_type: 29
setup_nanos: 0
process_nanos: 2786560384
peak_local_memory_allocated: 54050816
wait_nanos: 47930232
}
operator_profile {
input_profile {
records: 4096
batches: 1
schemas: 1
}
operator_id: 8
operator_type: 10
setup_nanos: 1705434
process_nanos: 379744084
peak_local_memory_allocated: 105226240
wait_nanos: 0
}
operator_profile {
input_profile {
records: 0
batches: 0
schemas: 0
}
operator_id: 12
operator_type: 42
...
...

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6494) Drill Plugins Handler

2018-06-13 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6494:
--

 Summary: Drill Plugins Handler
 Key: DRILL-6494
 URL: https://issues.apache.org/jira/browse/DRILL-6494
 Project: Apache Drill
  Issue Type: New Feature
  Components: Tools, Build & Test
Affects Versions: 1.13.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: Future, 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] 1.14.0 release

2018-06-13 Thread Vlad Rozov

DRILL-6422: Update guava to 23.0 and shade it (PR in review)
DRILL-6353: Upgrade Parquet MR dependencies (ready-to-commit)

Thank you,

Vlad

On 6/12/18 17:16, Boaz Ben-Zvi wrote:

Hello Drillers,
   Nearly three months have passed since the 1.13.0 release, and it is time to 
start planning for the 1.14.0 ; I volunteer to manage the new release.
   If there is any ongoing work not yet committed into the Apache Drill master, 
that you strongly feel MUST be included in the 1.14 release, please reply to 
this thread. There are quite a few pending PRs, and we should prioritize and 
close the needed ones soon enough.
     Thanks,
        Boaz





Re: [DISCUSS] case insensitive storage plugin and workspaces names

2018-06-13 Thread Abhishek Girish
The issue is that for those customers who do have such storage plugin
names, it's too late to rename after an offline upgrade - as there is no
easy way to access the storage plugin configurations if Drillbits are down
(due to Drillbit start-up failing). Might be okay, if admins perform a
rolling upgrade (newer Drillbits would fail, but older Drillbits can be
used to update storage plugin config), but that's not fully supported.
Ideally, we'll need to find a way to not fail startup, instead disable the
plugins which have issues, but if that's a complex and separate task, for
now we should perhaps clearly document that this would be a breaking change
after upgrade, so users should fix the plugins before they proceed.

On Wed, Jun 13, 2018 at 3:42 AM Arina Yelchiyeva 
wrote:

> From the Drill code workspaces are already case insensitive (though the
> documentation states the opposite). Since there were no complaints from the
> users so far, I believe there are not many (if any) who uses the same names
> in different case.
> Regarding those users that already have duplicating storage plugins names,
> after the change Drill start up will fail with appropriate error message
> and they would have to rename those storage plugins.
>
> Kind regards,
> Arina
>
>
> On Tue, Jun 12, 2018 at 8:45 PM Abhishek Girish 
> wrote:
>
> > Paul, I think this proposal was specific to storage plugin and workspace
> > *names*. And not for the whole of Drill.
> >
> > I agree it makes sense to have these names case insensitive, to improve
> > user experience. The only impact to current users I can think of is if
> > someone created two storage plugins dfs and DFS. Or configured workspaces
> > tmp and TMP. In this case, they'd need to rename those. One thing I'm not
> > clear on is how we'll handle upgrades in these cases.
> >
> > On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers 
> > wrote:
> >
> > > Hi All,
> > >
> > > As it turns out, this topic has been discussed, in depth, previously.
> > > Can't recall if it was on this list, or in a JIRA.
> > >
> > > We face a number of constraints:
> > >
> > > * As was noted, for some data sources, the data source itself has case
> > > insensitive names. (Windows file systems, RDBMSs, etc.)
> > > * In other cases, the data source itself has case sensitive names.
> (HDFS
> > > file system, Linux file systems, JSON, etc.)
> > > * SQL is defined to be case insensitive.
> > > * We now have several years of user queries, in production, based on
> the
> > > current semantics.
> > >
> > > Given all this, it is very likely that simply shifting to
> case-sensitive
> > > will break existing applications.
> > >
> > > Perhaps a more subtle solution is to make the case-sensitivity a
> property
> > > of the symbol that is carried through the query pipeline as another
> piece
> > > of metadata.
> > >
> > > Thus, a workspace that corresponds to a DB schema would be labeled as
> > case
> > > insensitive. A workspace that corresponds to an HDFS directory would be
> > > case sensitive. Names defined within Drill (as part of an AS clause),
> > would
> > > follow SQL rules and be case insensitive.
> > >
> > > I believe that, if we sit down and work out exactly what users would
> > > expect, and what is required to handle both case sensitive and case
> > > insensitive names, we'll end up with a solution not far from the above
> --
> > > out of simple necessity.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva <
> > > arina.yelchiy...@gmail.com> wrote:
> > >
> > >  To make it clear we have three notions here: storage plugin name,
> > > workspace
> > > (schema) and table name (dfs.root.`/tmp/t`).
> > > My suggestion is the following:
> > > Storage plugin names to be case insensitive (DFS vs dfs,
> > INFORMATION_SCHEMA
> > > vs information_schema).
> > > Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs
> > > tmp). Even if user has two directories /TMP and /tmp, he can create two
> > > workspaces but not both with tmp name. For example, tmp vs tmp_u.
> > > Table names case sensitivity are treated per plugin. For example,
> system
> > > plugins (information_schema, sys) table names (views, tables) should be
> > > case insensitive. Actually, currently for sys plugin table names are
> case
> > > insensitive, information_schema table names are case sensitive. That
> > needs
> > > to be synchronized. For file system plugins table names must be case
> > > sensitive, since under table name we imply directory / file name and
> > their
> > > case sensitivity depends on file system.
> > >
> > > Kind regards,
> > > Arina
> > >
> > > On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha 
> wrote:
> > >
> > > > Drill is dependent on the underlying file system's case sensitivity.
> > On
> > > > HDFS one can create  'hadoop fs -mkdir /tmp/TPCH'  and /tmp/tpch
> which
> > > are
> > > > separate directories.
> > > > These could be set as workspace in Drill's 

[jira] [Created] (DRILL-6493) Replace BitVector with Uint1Vector

2018-06-13 Thread Karthikeyan Manivannan (JIRA)
Karthikeyan Manivannan created DRILL-6493:
-

 Summary: Replace BitVector with Uint1Vector
 Key: DRILL-6493
 URL: https://issues.apache.org/jira/browse/DRILL-6493
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Karthikeyan Manivannan


BitVector stores a single bit of data and it uses a bit of storage space. UInt1 
is an alternate implementation which uses a byte to store a bit. Recently 
discovered bugs in BitVector and anecdotal evidence of performance issues seems 
to suggest that this code is slow and buggy. I am opening this bug to analyze 
the impact of replacing BitVector with UInt1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6492) Make storage plugins names case insensitive

2018-06-13 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-6492:
---

 Summary: Make storage plugins names case insensitive
 Key: DRILL-6492
 URL: https://issues.apache.org/jira/browse/DRILL-6492
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.13.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.14.0


Storage plugin names to be case insensitive (DFS vs dfs, INFORMATION_SCHEMA vs 
information_schema).
Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs tmp). 
Even if user has two directories /TMP and /tmp, he can create two workspaces 
but not both with tmp name. For example, tmp vs tmp_u.
Table names case sensitivity are treated per plugin. For example, system 
plugins (information_schema, sys) table names (views, tables) should be case 
insensitive. Actually, currently for sys plugin table names are case 
insensitive, information_schema table names are case sensitive. That needs to 
be synchronized. For file system plugins table names must be case sensitive, 
since under table name we imply directory / file name and their case 
sensitivity depends on file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] case insensitive storage plugin and workspaces names

2018-06-13 Thread Arina Yelchiyeva
>From the Drill code workspaces are already case insensitive (though the
documentation states the opposite). Since there were no complaints from the
users so far, I believe there are not many (if any) who uses the same names
in different case.
Regarding those users that already have duplicating storage plugins names,
after the change Drill start up will fail with appropriate error message
and they would have to rename those storage plugins.

Kind regards,
Arina


On Tue, Jun 12, 2018 at 8:45 PM Abhishek Girish  wrote:

> Paul, I think this proposal was specific to storage plugin and workspace
> *names*. And not for the whole of Drill.
>
> I agree it makes sense to have these names case insensitive, to improve
> user experience. The only impact to current users I can think of is if
> someone created two storage plugins dfs and DFS. Or configured workspaces
> tmp and TMP. In this case, they'd need to rename those. One thing I'm not
> clear on is how we'll handle upgrades in these cases.
>
> On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers 
> wrote:
>
> > Hi All,
> >
> > As it turns out, this topic has been discussed, in depth, previously.
> > Can't recall if it was on this list, or in a JIRA.
> >
> > We face a number of constraints:
> >
> > * As was noted, for some data sources, the data source itself has case
> > insensitive names. (Windows file systems, RDBMSs, etc.)
> > * In other cases, the data source itself has case sensitive names. (HDFS
> > file system, Linux file systems, JSON, etc.)
> > * SQL is defined to be case insensitive.
> > * We now have several years of user queries, in production, based on the
> > current semantics.
> >
> > Given all this, it is very likely that simply shifting to case-sensitive
> > will break existing applications.
> >
> > Perhaps a more subtle solution is to make the case-sensitivity a property
> > of the symbol that is carried through the query pipeline as another piece
> > of metadata.
> >
> > Thus, a workspace that corresponds to a DB schema would be labeled as
> case
> > insensitive. A workspace that corresponds to an HDFS directory would be
> > case sensitive. Names defined within Drill (as part of an AS clause),
> would
> > follow SQL rules and be case insensitive.
> >
> > I believe that, if we sit down and work out exactly what users would
> > expect, and what is required to handle both case sensitive and case
> > insensitive names, we'll end up with a solution not far from the above --
> > out of simple necessity.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> > On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com> wrote:
> >
> >  To make it clear we have three notions here: storage plugin name,
> > workspace
> > (schema) and table name (dfs.root.`/tmp/t`).
> > My suggestion is the following:
> > Storage plugin names to be case insensitive (DFS vs dfs,
> INFORMATION_SCHEMA
> > vs information_schema).
> > Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs
> > tmp). Even if user has two directories /TMP and /tmp, he can create two
> > workspaces but not both with tmp name. For example, tmp vs tmp_u.
> > Table names case sensitivity are treated per plugin. For example, system
> > plugins (information_schema, sys) table names (views, tables) should be
> > case insensitive. Actually, currently for sys plugin table names are case
> > insensitive, information_schema table names are case sensitive. That
> needs
> > to be synchronized. For file system plugins table names must be case
> > sensitive, since under table name we imply directory / file name and
> their
> > case sensitivity depends on file system.
> >
> > Kind regards,
> > Arina
> >
> > On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha  wrote:
> >
> > > Drill is dependent on the underlying file system's case sensitivity.
> On
> > > HDFS one can create  'hadoop fs -mkdir /tmp/TPCH'  and /tmp/tpch which
> > are
> > > separate directories.
> > > These could be set as workspace in Drill's storage plugin configuration
> > and
> > > we would want the ability to query both.  If we change the current
> > > behavior, we would want
> > > some way, either using back-quotes `  or other way to support that.
> > >
> > > RDBMSs seem to have vendor-specific behavior...
> > > In MySQL [1] the database name and schema name are case-sensitive on
> > Linux
> > > and case-insensitive on Windows.  Whereas in Postgres it converts the
> > > database name and schema name to lower-case by default but one can put
> > > double-quotes to make it case-sensitive [2].
> > >
> > > [1]
> > >
> https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html
> > > [2]
> > >
> >
> http://www.postgresqlforbeginners.com/2010/11/gotcha-case-sensitivity.html
> > >
> > >
> > >
> > > On Tue, Jun 12, 2018 at 5:01 AM, Arina Yelchiyeva <
> > > arina.yelchiy...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Currently Drill we treat storage plugin names and workspaces as
> > > > case-sensitive