Apache Drill Plan... - Delete: "}l"

2016-02-25 Thread Dhruv Shah (Google Docs)
Dhruv Shah added a suggestion to Apache Drill Plan Syntax  
(https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit?disco=AkGD0dE=comment_email_discussion)


Dhruv Shah
Delete: "}l"


You received this email because you are subscribed to all comments on  
Apache Drill Plan Syntax.
Change  
(https://docs.google.com/document/docos/notify?id=1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I=Apache+Drill+Plan+Syntax)  
what Google sends you.

You can reply to this email to reply to the comment.


Apache Drill Plan... - Add paragraph (11 times)

2016-02-25 Thread Dhruv Shah (Google Docs)
Dhruv Shah added a suggestion to Apache Drill Plan Syntax  
(https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit?disco=AkGWQAg=comment_email_discussion)


Dhruv Shah
Add paragraph (11 times)


You received this email because you are subscribed to all comments on  
Apache Drill Plan Syntax.
Change  
(https://docs.google.com/document/docos/notify?id=1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I=Apache+Drill+Plan+Syntax)  
what Google sends you.

You can reply to this email to reply to the comment.


Re: The praises for Drill

2016-02-25 Thread cchang
So good to hear Drill is useful in real life.

Chun

> On Feb 25, 2016, at 7:27 PM, Edmon Begoli  wrote:
> 
> Hello fellow Driilers,
> 
> I have been inactive on the development side of the project, as we got busy
> being heavy/power users of the Drill in the last few months.
> 
> I just want to share some great experiences with the latest versions of
> Drill.
> 
> Just tonight, as we were scrambling to meet the deadline, we were able to
> query two years of flat psv files of claims/billing and clinical data in
> Drill in less than 60 seconds.
> 
> No ETL, no warehousing - just plain SQL against tons of files. Run SQL, get
> results.
> 
> Amazing!
> 
> We have also done some much more important things too, and we had a paper
> accepted to Big Data Services about the experiences. The co-author of the
> paper is Drill's own Dr. Ted Dunning :-)
> I will share it once it is published.
> 
> Anyway, cheers to all, and hope to re-join the dev activities soon.
> 
> Best,
> Edmon


The praises for Drill

2016-02-25 Thread Edmon Begoli
Hello fellow Driilers,

I have been inactive on the development side of the project, as we got busy
being heavy/power users of the Drill in the last few months.

I just want to share some great experiences with the latest versions of
Drill.

Just tonight, as we were scrambling to meet the deadline, we were able to
query two years of flat psv files of claims/billing and clinical data in
Drill in less than 60 seconds.

No ETL, no warehousing - just plain SQL against tons of files. Run SQL, get
results.

Amazing!

We have also done some much more important things too, and we had a paper
accepted to Big Data Services about the experiences. The co-author of the
paper is Drill's own Dr. Ted Dunning :-)
I will share it once it is published.

Anyway, cheers to all, and hope to re-join the dev activities soon.

Best,
Edmon


[jira] [Created] (DRILL-4439) Improve new unit operator tests to handle operators that expect RawBatchBuffers off of the wire, such as the UnorderedReciever and MergingReciever

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4439:
--

 Summary: Improve new unit operator tests to handle operators that 
expect RawBatchBuffers off of the wire, such as the UnorderedReciever and 
MergingReciever
 Key: DRILL-4439
 URL: https://issues.apache.org/jira/browse/DRILL-4439
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4437) Implement framework for testing operators in isolation

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4437:
--

 Summary: Implement framework for testing operators in isolation
 Key: DRILL-4437
 URL: https://issues.apache.org/jira/browse/DRILL-4437
 Project: Apache Drill
  Issue Type: Test
  Components: Tools, Build & Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 1.6.0


Most of the tests written for Drill are end-to-end. We spin up a full instance 
of the server, submit one or more SQL queries and check the results.

While integration tests like this are useful for ensuring that all features are 
guaranteed to not break end-user functionality overuse of this approach has 
caused a number of pain points.

Overall the tests end up running a lot of the exact same code, parsing and 
planning many similar queries.

Creating consistent reproductions of issues, especially edge cases found in 
clustered environments can be extremely difficult. Even the simpler case of 
testing cases where operators are able to handle a particular series of 
incoming batches of records has required hacks like generating large enough 
files so that the scanners happen to break them up into separate batches. These 
tests are brittle as they make assumptions about how the scanners will work in 
the future. An example of when this could break, we might do perf evaluation to 
find out we should be producing larger batches in some cases. Existing tests 
that are trying to test multiple batches by producing a few more records than 
the current threshold for batch size would not be testing the same code paths.

We need to make more parts of the system testable without initializing the 
entire Drill server, as well as making the different internal settings and 
state of the server configurable for tests.

This is a first effort to enable testing the physical operators in Drill by 
mocking the components of the system necessary to enable operators to 
initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4438) Fix out of memory failure identified by new operator unit tests

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4438:
--

 Summary: Fix out of memory failure identified by new operator unit 
tests
 Key: DRILL-4438
 URL: https://issues.apache.org/jira/browse/DRILL-4438
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse
Assignee: Jason Altekruse
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3930) Remove direct references to TopLevelAllocator from unit tests

2016-02-25 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-3930.

   Resolution: Fixed
 Assignee: (was: Chris Westin)
Fix Version/s: 1.3.0

> Remove direct references to TopLevelAllocator from unit tests
> -
>
> Key: DRILL-3930
> URL: https://issues.apache.org/jira/browse/DRILL-3930
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Chris Westin
> Fix For: 1.3.0
>
>
> The RootAllocatorFactory should be used throughout the code to allow us to 
> change allocators via configuration or other software choices. Some unit 
> tests still reference TopLevelAllocator directly. We also need to do a better 
> job of handling exceptions that can be handled by close()ing an allocator 
> that isn't in the proper state (remaining open child allocators, outstanding 
> buffers, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4436) Result data gets mixed up when various tables have a column "label"

2016-02-25 Thread Vincent Uribe (JIRA)
Vincent Uribe created DRILL-4436:


 Summary: Result data gets mixed up when various tables have a 
column "label"
 Key: DRILL-4436
 URL: https://issues.apache.org/jira/browse/DRILL-4436
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.5.0
 Environment: Drill 1.5.0 with Zookeeper on CentOS 7.0 
Reporter: Vincent Uribe


We have two tables in a MySQL database:
CREATE TABLE `Gender` (
  `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
  `label` varchar(15) NOT NULL,
  PRIMARY KEY (`genderId`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;

CREATE TABLE `Civility` (
  `civilityId` bigint(20) NOT NULL AUTO_INCREMENT,
  `abbreviation` varchar(15) NOT NULL,
  `label` varchar(60) DEFAULT NULL
  PRIMARY KEY (`civilityId`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;

With a query on these two tables with Gender.label as 'gender' and 
Civility.label as 'civility', we obtain, depending of the query :
* gender in civility
* civility in the gender
* NULL in the other column (gender or civility)

if we drop the table Gender and recreate it with like this:
CREATE TABLE `Gender` (
  `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
  `label2` varchar(15) NOT NULL,
  PRIMARY KEY (`genderId`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;

Everything is fine.

I guess something is wrong with the metadata...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)