[Drill] Doubt in code flow

2015-09-04 Thread Sudip Mukherjee
Hi Devs,

I added some log-in in the drill code (copying excerpt of logs). I am seeing 
some repetitive log lines when is do a "show databases" . wondering why would 
schema scan would happen multiple times. Or is something I am missing. Could 
you please help? [below are the log-lines that I added]

2015-09-04 16:14:50,108 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.SchemaFactory - registering schema for 
plugin:org.apache.drill.exec.store.dfs.FileSystemPlugin@3cc861f9
2015-09-04 16:14:50,108 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs]
2015-09-04 16:14:50,108 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs, root]
2015-09-04 16:14:50,109 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs, tmp]
2015-09-04 16:14:50,109 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs, donuts]
2015-09-04 16:14:50,110 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs, default]

2015-09-04 16:14:50,145 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs.default]
2015-09-04 16:14:50,145 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs.donuts]
2015-09-04 16:14:50,145 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs.root]
2015-09-04 16:14:50,145 [2a1689de-9581-f912-3dc6-f4b61fc2e676:frag:0:0] INFO  
o.a.drill.exec.store.AbstractSchema - schemapath is...[dfs.tmp]


Thanks,
Sudip



***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

[jira] [Created] (DRILL-3740) TopNBatch and OrderedPartitionRecordBatch call SortRecordBatchBuilder.add(VectorAccessible) but don't check it's return value

2015-09-04 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-3740:
---

 Summary: TopNBatch and OrderedPartitionRecordBatch call 
SortRecordBatchBuilder.add(VectorAccessible) but don't check it's return value
 Key: DRILL-3740
 URL: https://issues.apache.org/jira/browse/DRILL-3740
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Deneche A. Hakim
Assignee: Chris Westin


SortRecordBatchBuilder.add(VectorAccessible) checks various internal limits and 
may return false if those limits are exceeded. The passed batch won't be added 
to the batchBuilder and will basically be lost



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3739) NPE on select from Hive for HBase table

2015-09-04 Thread ckran (JIRA)
ckran created DRILL-3739:


 Summary: NPE on select from Hive for HBase table
 Key: DRILL-3739
 URL: https://issues.apache.org/jira/browse/DRILL-3739
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: ckran


For a table in HBase or MapR-DB with metadata created in Hive so that it can be 
accessed through beeline or Hue. From Drill query fail with
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [drill] Web UI Security (0a01dfd)

2015-09-04 Thread Jacques Nadeau
Adding Dev, which somehow got dropped off.

On Fri, Sep 4, 2015 at 8:56 AM, Jacques Nadeau  wrote:

> My main concern is that we're recreating an authorization management
> model. It looks like Jetty has a LoginService approach which would then
> integrate with its authorization capabilities around roles/realms/etc.  We
> also can also then use Servlet 3.x's annotation based security to ensure
> compliance.
>
> On Thu, Aug 27, 2015 at 9:18 AM, Venki Korukanti <
> venki.koruka...@gmail.com> wrote:
>
>> Hi Jacques,
>>
>> I am looking into details of inbuilt security capabilities in jetty and
>> jersey. If I understand correctly following are the approaches I found:
>>
>>1. Add a SecurityHandler (which in turn could include
>>FormAuthenticator, LoginService and ConstraintMapping) to
>>ServletContextHolder.
>>2. Add custom RolesAllowedResourceFilterFactory and ResourceFilter 
>> implementations
>>which creates a custom SecurityContext implementation and User details
>>(isUserInRole methods). We could add roles annotation to rest methods to
>>control authorization.
>>
>> These approaches again seems to be reimplementing the same authentication
>> or authorization code we use in DrillClient.
>>
>> I am wondering whether you have some time (mostly may take 15mins) to
>> discuss these over a hangout.
>>
>> Thanks
>> Venki
>>
>>
>> On Fri, Aug 21, 2015 at 6:05 PM, Jacques Nadeau > > wrote:
>>
>>> It really seems like this should be three separate proposed commits.
>>> Here are some high level comments on each:
>>>
>>>-
>>>
>>>Add SSL to WebUI
>>>I'm inclined to switch Drill's default to always doing SSL. We can
>>>do this by using a self-signed cert rather than having to configure a 
>>> trust
>>>store. If someone wants to control the server cert, then they would have 
>>> to
>>>configure a trust store.
>>>-
>>>
>>>Make Rest API methods all use DrillClient
>>>As discussed above, I don't think we should be creating a large
>>>number of additional RPC wire protocol items. We should continue the
>>>existing pattern of minimizing API surface and using SQL for operations.
>>>These operations are all things that people have also requested via SQL.
>>>This allows us to have only one entry point for that code instead of
>>>multiple.
>>>-
>>>
>>>Add WebUI Authentication
>>>We need to have a design discussion on JIRA around the method to
>>>securing the WebUI calls. It seems like we're implementing a custom way 
>>> to
>>>manage security around web requests (including the use of a customer 
>>> filter
>>>and BaseModel. There are a number of standard ways to provide this
>>>functionality that will probably be much easier to maintain and more
>>>secure. For example Jersey has some built in capabilities. There is also 
>>> a
>>>bunch of capabilities built into servlets and jetty around security
>>>constraints and context. Especially for security, I'm inclined to use
>>>pre-existing solutions rather than implementing custom solutions.
>>>
>>> —
>>> Reply to this email directly or view it on GitHub
>>> 
>>> .
>>>
>>
>>
>


[GitHub] drill pull request: DRILL-3566: Fix: PreparedStatement.executeQuer...

2015-09-04 Thread dsbos
Github user dsbos commented on the pull request:

https://github.com/apache/drill/pull/143#issuecomment-137794418
  
> can you confirm that this patch passes all our "jdbc" tests ?

Yes, it passes our regular tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2304: Case sensitivity - system and sess...

2015-09-04 Thread jaltekruse
Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/90#issuecomment-13717
  
+1



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2304: Case sensitivity - system and sess...

2015-09-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/90


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3589: Update JDBC driver to shade and mi...

2015-09-04 Thread dsbos
Github user dsbos commented on the pull request:

https://github.com/apache/drill/pull/116#issuecomment-137811250
  
Would it be better if the JDBC-all Jar file didn't contain a logback.xml 
file?

If the JDBC-all Jar doesn't already have that file, then to configure 
logback differently with their own logback.xml file, users only have to get 
their own file on the classpath _somewhere_.

However, if the JDBC-all Jar does already have that file, it seems that 
users will have to get their logback.xml file on the class path _before_ the 
JDBC-all Jar file (right?).  Although sometimes that's trivial, when the class 
path is set by, say, listing all .jar files in a lib/ directory (e.g., as in 
Tomcat for Spotfire), controlling the order might not be possible or easy.

(A solution to keep the contents of the logback.xml file (as a convenient 
starting point for the user to copy out and modify) in the JDBC-all JAR file 
but not really have an active/interfering actual logback.xml file would be to 
use a modified name (e.g, logback-example.xml).)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [drill] Web UI Security (0a01dfd)

2015-09-04 Thread Venki Korukanti
Some background on the thread:
Currently Web UI has no authentication or authorization mechanism. Any user
can access/update/delete storage profiles or cancel other user queries.
DRILL-3201  logged to add
authentication and authorization controls to Web UI to make it usable in
multi-user cluster environment. JIRA has a sheet listing access control for
each provided Web API.

Currently JDBC/ODBC has authentication, but no authorization. Currently
users can't access or update storage plugin config or query profiles
through JDBC/ODBC. We don't have the support yet. Users can cancel queries
through JDBC/ODBC, but we don't check for authorization (whether the user
who is canceling the query has authorization to do so).

Issues we are trying to address apart from the authentication and
authorization in Web UI are:
1) Enhance Drill User API to support access or updating storage plugin
config and query profiles. Jacques suggested we should add two new sys
tables (sys.storage and sys.profiles). Access to these is controlled in
UserServer/UserWorker (or at the place where we try to create the sys
tables).
2) Update UserWorker.cancelQuery to check for authorization
3) Currently Web UI access the Drill's internal data structures directory
for storage config and profiles. We want to change this to use DrillClient
to access storage or profile resources through queries on (sys.storage or
sys.profiles). Just like JDBC/ODBC use DrillClient for accessing Drill
cluster.
4) Maintain a DrillClient for each user session through Web UI. Currently
if a user executes a 'set option' or 'use schema', it is lost and not
available for subsequent queries.


@Jacques:
The reason why I am not preferring to use LoginService and Jetty/jersey's
authorization model is:
1) We already have or going to add authorization in UserServer/UserWorker.
WebServer is going to a wrapper around DrillClient (just like JDBC) which
gets/updates data through UserServer/UserWorker.
2) If we add LoginService, which needs its own configuration (we could
construct JAAS config from existing user authentication config in boot
config or create a new LoginService implementation, but these options are
again hard to maintain with changes in authentication support). As we are
going through DrillClient which already has authentication support, we can
use the same auth settings for JDBC/ODBC/WebUI.

AuthFilter I created in the patch currently has two tasks:
1) each web request has valid user session (through the cookies). If not
request is forwarded to the login page.
2) logged in user has privileges to access restricted resources. As we are
not planning to implement sys.storage/sys.profiles in v1.2, I need to add
this check here. This can be removed once we add sys.storage/sys.profiles.

Thanks
Venki


On Fri, Sep 4, 2015 at 8:56 AM, Jacques Nadeau  wrote:

> Adding Dev, which somehow got dropped off.
>
> On Fri, Sep 4, 2015 at 8:56 AM, Jacques Nadeau  wrote:
>
> > My main concern is that we're recreating an authorization management
> > model. It looks like Jetty has a LoginService approach which would then
> > integrate with its authorization capabilities around roles/realms/etc.
> We
> > also can also then use Servlet 3.x's annotation based security to ensure
> > compliance.
> >
> > On Thu, Aug 27, 2015 at 9:18 AM, Venki Korukanti <
> > venki.koruka...@gmail.com> wrote:
> >
> >> Hi Jacques,
> >>
> >> I am looking into details of inbuilt security capabilities in jetty and
> >> jersey. If I understand correctly following are the approaches I found:
> >>
> >>1. Add a SecurityHandler (which in turn could include
> >>FormAuthenticator, LoginService and ConstraintMapping) to
> >>ServletContextHolder.
> >>2. Add custom RolesAllowedResourceFilterFactory and ResourceFilter
> implementations
> >>which creates a custom SecurityContext implementation and User
> details
> >>(isUserInRole methods). We could add roles annotation to rest
> methods to
> >>control authorization.
> >>
> >> These approaches again seems to be reimplementing the same
> authentication
> >> or authorization code we use in DrillClient.
> >>
> >> I am wondering whether you have some time (mostly may take 15mins) to
> >> discuss these over a hangout.
> >>
> >> Thanks
> >> Venki
> >>
> >>
> >> On Fri, Aug 21, 2015 at 6:05 PM, Jacques Nadeau <
> notificati...@github.com
> >> > wrote:
> >>
> >>> It really seems like this should be three separate proposed commits.
> >>> Here are some high level comments on each:
> >>>
> >>>-
> >>>
> >>>Add SSL to WebUI
> >>>I'm inclined to switch Drill's default to always doing SSL. We can
> >>>do this by using a self-signed cert rather than having to configure
> a trust
> >>>store. If someone wants to control the server cert, then they would
> have to
> >>>configure a trust store.
> >>>-
> >>>
> >>>Make Rest API methods all use 

[GitHub] drill pull request: DRILL-3589: Update JDBC driver to shade and mi...

2015-09-04 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/116#issuecomment-137827475
  
Agree on the logback. That shouldn't in there. An oversight on my part.  We 
should be purely logging aganostic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3589: Update JDBC driver to shade and mi...

2015-09-04 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/116#issuecomment-137797991
  
I'm confused too.  I got this comment in my email:

"Something seems to be broken.

After rebasing your branch on my branch with my DRILL-3347 (Hadoop 
Test) and DRILL-3566 (Prep.Stmt.) fixes, I tried installing the resulting 
JDBC-all Jar file on Spotfire, but Spotfire's getting 
IndexOutOfBoundsExceptions somewhere within ResultSet.next()."

But I don't see it here. I was saying, lets make sure that this patch works 
as is before rebasing on 3347 and 3566.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Who is coming to Strata NYC?

2015-09-04 Thread Tomer Shiran
I'll be at Strata. Jacques Nadeau will be there too. (We're actually doing
a "Apache Drill Bootcamp" on the 29th.)

Would be great to get together. Maybe dinner? Let's see how many people
want to join and we can figure it out.

Thanks,
Tomer

On Fri, Sep 4, 2015 at 12:57 PM, Edmon Begoli  wrote:

> I am planning on going.
>
> Maybe we can have a little gathering.
>



-- 
Tomer Shiran
CEO and Co-Founder, Dremio


Who is coming to Strata NYC?

2015-09-04 Thread Edmon Begoli
I am planning on going.

Maybe we can have a little gathering.


Re: Potential resource for large scale testing

2015-09-04 Thread Edmon Begoli
I can work with my institution and the NSF that we committ the time on the
Beacon supercomputing cluster to Apache and the Drill project. Maybe 20
hours a month for 4-5 nodes.

I have discretionary hours that I can put in, and I can, with our
HPC admins, create deploy scripts on few clustered machines (these are all
very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
with local 1 TB SSD each). There is also Medusa 10 PB filesystem attached
but HDFS over local drives would probably be better.
They are otherwise just a regular machines, and run regular JVMs on Linux.

We can also get Rahul an access with a secure token to setup
and run stress/performance/integration tests for Drill. I can actually help
there as well. This can be automated to run tests and collect results.

I think that the only requirement would be that the JICS team be named for
commitment because both NSF/XSEDE and UT like to see the resources
being officially used and acknowledged. They are there to support open and
academic research; open source projects fit well.

If this sounds OK with the project PMCs, I can start the process of
allocation, accounts creation, setup.

I would also, as a CDO, of JICS sign whatever standard papers with
the Apache organization.

With all this being said, let me know please if this is something we want
to pursue.

Thank you,
Edmon

On Tuesday, September 1, 2015, Jacques Nadeau  wrote:

> I spent a bunch of time looking at the Phi coprocessors and forgot to get
> back to the thread. I'd love it if someone spent some time looking at
> leveraging them (since Drill is frequently processor bound).  Any takers?
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra  > wrote:
>
> > Hi Edmon,
> >   Sorry no one seems to have got back to you on this.
> >   We are in the process of publishing a test suite for regression testing
> > Drill and the cluster you have (even a few nodes ) would be a great
> > resource for folks to run the test suite. Rahul, et al are working on
> this
> > and I would suggest watching out for Rahul's posts on the topic.
> >
> > Parth
> >
> > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli  > wrote:
> >
> > > Hey folks,
> > >
> > > As we discussed today on a hangout, this is a machine that we have at
> > > JICS/NICS
> > > where I have Drill installed and where I could set up a test cluster
> over
> > > few nodes.
> > >
> > >
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > >
> > > Note that each node is:
> > > - 2x8-core Intel® Xeon® E5-2670 processors
> > > - 256 GB of memory
> > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> > > - 960 GB of SSD storage
> > >
> > > Would someone advise on what would be an interesting test setup?
> > >
> > > Thank you,
> > > Edmon
> > >
> >
>


[jira] [Resolved] (DRILL-2190) Failure to order by function if DISTINCT clause is present

2015-09-04 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2190.
--
Resolution: Fixed

> Failure to order by function if DISTINCT clause is present
> --
>
> Key: DRILL-2190
> URL: https://issues.apache.org/jira/browse/DRILL-2190
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
>Reporter: Victoria Markman
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.2.0
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> ++++
> | a1 | b1 | c1 |
> ++++
> | 1  | a  | 2015-01-01 |
> | 2  | b  | 2015-01-02 |
> | 3  | c  | 2015-01-03 |
> | 4  | null   | 2015-01-04 |
> | 5  | e  | 2015-01-05 |
> | 6  | f  | 2015-01-06 |
> | 7  | g  | 2015-01-07 |
> | null   | h  | 2015-01-08 |
> | 9  | i  | null   |
> | 10 | j  | 2015-01-10 |
> ++++
> 10 rows selected (0.092 seconds)
> 0: jdbc:drill:schema=dfs> select distinct count(distinct a1) from t1 group by 
> b1 order by 1;
> Query failed: SqlValidatorException: Expression 'COUNT(`t1`.`a1`)' is not in 
> the select clause
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Replaced ordinal with function: fails
> {code}
> 0: jdbc:drill:schema=dfs> select distinct count(distinct a1) from t1 group by 
> b1 order by count(distinct a1);
> Query failed: SqlValidatorException: Expression 'COUNT(DISTINCT `a1`)' is not 
> in the select clause
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Different aggregate function without DISTINCT clause: fails
> {code}
> 0: jdbc:drill:schema=dfs> select  distinct sum(a1) from t1 group by b1 order 
> by 1;
> Query failed: SqlValidatorException: Expression 'SUM(`t1`.`a1`)' is not in 
> the select clause
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Added alias to the function and order by alias: fails
> {code}
> 0: jdbc:drill:schema=dfs> select  distinct sum(a1) as x from t1 group by b1 
> order by x;
> Query failed: SqlValidatorException: Expression 'SUM(`t1`.`a1`)' is not in 
> the select clause
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-3589: Update JDBC driver to shade and mi...

2015-09-04 Thread dsbos
Github user dsbos commented on the pull request:

https://github.com/apache/drill/pull/116#issuecomment-137805930
  
Okay, now I think I see what happened.

First of all, you can now ignore that "Something ... broken ..." comment; 
it's obsolete.  (Your patch no longer seems broken.)

(At first I thought things didn't work, and added that comment.  Then I 
noticed that I had a local version/build mismatch, and amended the comment to 
say "hold on; I'm checking again," and tested again.  Then everything worked, 
so I just deleted the comment from the GitHub review.  Next time I'll amend 
rather than delete.)

And now I recognize the "rebasing" reference:  I didn't mean rebasing like 
rebasing on the latest version of master.  I was just mentioning that rebasing 
was what I happened to use (as opposed to cherry-picking, merging, or other 
patching), in case my choice there caused the apparent breakage, to apply 
patches for DRILL-3347 and DRILL-3566 so I could try you patch with Spotfire 
(with would have hit the DRILL-3347 and DRILL-3566 bugs).

So ...

Your patch seems good; Spotfire ran fine with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3732: Drill leaks memory if external sor...

2015-09-04 Thread adeneche
GitHub user adeneche opened a pull request:

https://github.com/apache/drill/pull/147

DRILL-3732: Drill leaks memory if external sort hits out of disk spac…

…e exception

- ExternalSort.mergeAndSpill() cleans all it's data in case an errors 
occurs while it's spilling to disk
- made BatchGroup AutoCloseable so it can easily be closed with 
AutoCloseables.close() if an error occurs
- added injection site while External sort is spilling to disk
- added unit test that forces a 2 batch query to spill to disk and injects 
an exception while it does so

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adeneche/incubator-drill DRILL-3732

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #147


commit a113d9a5e5d434715159c01e97ff7ffa4bad9b38
Author: adeneche 
Date:   2015-09-05T00:10:17Z

DRILL-3732: Drill leaks memory if external sort hits out of disk space 
exception

- ExternalSort.mergeAndSpill() cleans all it's data in case an errors 
occurs while it's spilling to disk
- made BatchGroup AutoCloseable so it can easily be closed with 
AutoCloseables.close() if an error occurs
- added injection site while External sort is spilling to disk
- added unit test that forces a 2 batch query to spill to disk and injects 
an exception while it does so




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Deploying Apache Storm on Google Cloud

2015-09-04 Thread Ankur Garg
Hi ,

I wish to deploy storm cluster on Google Cloud .

Browsing on the internet , I could find this  =>
http://datadventures.ghost.io/2013/12/29/deploying-storm-on-gce/


Now , the above was written in 2013 . I read somewhere that Storm no longer
needs Zmq instead it is using netty .

Can someone here confirm , if the above is a good source (I will anyways
try it) .

In case any of u have some other document where this is explained , pls
share it .

PS-> I am new to apache Storm and  Google Cloud .

Any input is appreciated .

Thanks
Ankur


[GitHub] drill pull request: Classpath scanning

2015-09-04 Thread julienledem
GitHub user julienledem opened a pull request:

https://github.com/apache/drill/pull/148

Classpath scanning



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/julienledem/drill classpath_scanning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #148


commit c29ee03ce0341f020e42830bc4de381362b476bc
Author: Julien Le Dem 
Date:   2015-09-03T23:06:28Z

initial_version

commit e66654cf02f537dbb48aa5b8b975efc9675f89cc
Author: Julien Le Dem 
Date:   2015-09-04T23:31:46Z

cleanup

commit 291b220c555e738e27b239d0bfa5af892e3dd10a
Author: Julien Le Dem 
Date:   2015-09-05T00:21:15Z

finalize




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3455: If fragments on unregistered Drill...

2015-09-04 Thread sudheeshkatkam
Github user sudheeshkatkam closed the pull request at:

https://github.com/apache/drill/pull/145


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Who is coming to Strata NYC?

2015-09-04 Thread Jim Scott
I'm speaking here on the 28th
http://www.meetup.com/Hadoop-NYC/events/224931207/ on Drill

On Fri, Sep 4, 2015 at 4:10 PM, Tomer Shiran  wrote:

> I'll be at Strata. Jacques Nadeau will be there too. (We're actually doing
> a "Apache Drill Bootcamp" on the 29th.)
>
> Would be great to get together. Maybe dinner? Let's see how many people
> want to join and we can figure it out.
>
> Thanks,
> Tomer
>
> On Fri, Sep 4, 2015 at 12:57 PM, Edmon Begoli  wrote:
>
> > I am planning on going.
> >
> > Maybe we can have a little gathering.
> >
>
>
>
> --
> Tomer Shiran
> CEO and Co-Founder, Dremio
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal 


[image: MapR Technologies] 

Now Available - Free Hadoop On-Demand Training



Re: Who is coming to Strata NYC?

2015-09-04 Thread Ellen Friedman
I will be at Strata NYC and Ted Dunning will be there from midday on 30th
Wed and Thur

Ellen

On Fri, Sep 4, 2015 at 2:10 PM, Tomer Shiran  wrote:

> I'll be at Strata. Jacques Nadeau will be there too. (We're actually doing
> a "Apache Drill Bootcamp" on the 29th.)
>
> Would be great to get together. Maybe dinner? Let's see how many people
> want to join and we can figure it out.
>
> Thanks,
> Tomer
>
> On Fri, Sep 4, 2015 at 12:57 PM, Edmon Begoli  wrote:
>
> > I am planning on going.
> >
> > Maybe we can have a little gathering.
> >
>
>
>
> --
> Tomer Shiran
> CEO and Co-Founder, Dremio
>


[jira] [Resolved] (DRILL-3455) If a drillbit, that contains fragments for the current query, dies the QueryManager will fail the query even if those fragments already finished successfully

2015-09-04 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam resolved DRILL-3455.

Resolution: Fixed

> If a drillbit, that contains fragments for the current query, dies the 
> QueryManager will fail the query even if those fragments already finished 
> successfully
> -
>
> Key: DRILL-3455
> URL: https://issues.apache.org/jira/browse/DRILL-3455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Deneche A. Hakim
>Assignee: Sudheesh Katkam
> Fix For: 1.2.0
>
> Attachments: DRILL-3455.1.patch.txt, DRILL-3455.2.patch.txt
>
>
> Once DRILL-3448 is fixed we need to update 
> QueryManager.DrillbitStatusListener to no fragment is still running on the 
> dead node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3742) Improve classpath scanning to reduce the time it takes

2015-09-04 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-3742:


 Summary: Improve classpath scanning to reduce the time it takes
 Key: DRILL-3742
 URL: https://issues.apache.org/jira/browse/DRILL-3742
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem


classpath scanning and function registry take a long time (seconds every time).
We'd want to avoid loading the classes (use bytecode inspection instead) and 
have a build time cache to avoid doing the scanning at startup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3743) query hangs on sqlline once Drillbit on foreman node is killed

2015-09-04 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3743:
-

 Summary: query hangs on sqlline once Drillbit on foreman node is 
killed
 Key: DRILL-3743
 URL: https://issues.apache.org/jira/browse/DRILL-3743
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.2.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz
Assignee: Chris Westin


sqlline/query hangs once Drillbit (on Foreman node) is killed. (kill -9 )
query was issued from the Foreman node. The query returns many records, and it 
is a long running query.

Steps to reproduce the problem.

set planner.slice_target=1

1.  clush -g khurram service mapr-warden stop
2.  clush -g khurram service mapr-warden start
3.  ./sqlline -u "jdbc:drill:schema=dfs.tmp"
0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 200;

4.  Immediately from another console do a jps and kill the Drillbit process (in 
this case foreman) while the query is being run on sqlline. You will notice 
that sqlline just hangs, we do not see any exceptions or errors being reported 
on sqlline prompt or in drillbit.log or drillbit.out

I do see this Exception in sqlline.log on the node from where sqlline was 
started

{code}
2015-09-04 18:45:12,069 [Client-1] INFO  o.a.d.e.rpc.user.QueryResultHandler - 
User Error Occurred
org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: Connection 
/10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) closed 
unexpectedly.


[Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
 ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
2015-09-04 18:45:12,069 [Client-1] INFO  
o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#7] Query failed:
org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: Connection 
/10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) closed 
unexpectedly.


[Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
 ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
2015-09-04 18:45:12,071 [Client-1] ERROR o.a.d.e.rpc.user.QueryResultHandler - 
SYSTEM ERROR: ChannelClosedException


[Error Id: c53c477f-f1cf-4458-8620-b1e11ba31701 ]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ChannelClosedException


[Error Id: c53c477f-f1cf-4458-8620-b1e11ba31701 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
 

Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin

2015-09-04 Thread Edmon Begoli
Thanks,  Daniel.

I'll ask questions here and expand these, put them into a Markdown and run
them by you all for review.

On Thursday, September 3, 2015, Daniel Barclay 
wrote:

> I wrote:
>
> ... Below are some notes on the detailed requirements I had extracted from
> the code.  ...
>
> I found a later copy of my (still rough) notes.
>
> See the Google Docs document at
> [Notes for] Instructions on Creating Storage Plug-ins
> 
> .
>
> Daniel
>
> --
> Daniel Barclay
> MapR Technologies
>
>


Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin

2015-09-04 Thread Edmon Begoli
Which gives me an idea - storage plug in for Excel using POI or something
like that - a r/w for Excel spreadsheets.

Not sexy, but when it comes to data, covers the most widely used tabular
data format.

On Thursday, September 3, 2015, Daniel Barclay 
wrote:

> I wrote:
>
> ... Below are some notes on the detailed requirements I had extracted from
> the code.  ...
>
> I found a later copy of my (still rough) notes.
>
> See the Google Docs document at
> [Notes for] Instructions on Creating Storage Plug-ins
> 
> .
>
> Daniel
>
> --
> Daniel Barclay
> MapR Technologies
>
>


Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin

2015-09-04 Thread Edmon Begoli
I.e.  - we would add support for .xls, .xlsx for Drill.

Who can mentor me to do this?

I think it would be a great new feature for Drill to add this support.

What do you think?

On Friday, September 4, 2015, Edmon Begoli  wrote:

> Which gives me an idea - storage plug in for Excel using POI or something
> like that - a r/w for Excel spreadsheets.
>
> Not sexy, but when it comes to data, covers the most widely used tabular
> data format.
>
> On Thursday, September 3, 2015, Daniel Barclay  > wrote:
>
>> I wrote:
>>
>> ... Below are some notes on the detailed requirements I had extracted
>> from
>> the code.  ...
>>
>> I found a later copy of my (still rough) notes.
>>
>> See the Google Docs document at
>> [Notes for] Instructions on Creating Storage Plug-ins
>> 
>> .
>>
>> Daniel
>>
>> --
>> Daniel Barclay
>> MapR Technologies
>>
>>


[jira] [Created] (DRILL-3738) Create StoragePlugin for Excel files (.xlsx or .possibly xls)

2015-09-04 Thread Edmon Begoli (JIRA)
Edmon Begoli created DRILL-3738:
---

 Summary: Create StoragePlugin for Excel files (.xlsx or .possibly 
xls)
 Key: DRILL-3738
 URL: https://issues.apache.org/jira/browse/DRILL-3738
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: Future
Reporter: Edmon Begoli


I would like to implement a new storage plugin for Excel files which would 
support reading and writing of these. 

I would most likely use Apache POI to do this.
https://poi.apache.org/spreadsheet/

I would eventually support an Apache Parquet or start with it, but that would 
be supported through that project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)