Crate via Apache Drill

2017-08-24 Thread charuta.rajopadhye
Hi,

I could successfully connect to crate via Apache Drill, using the following 
configuration:
{
"type": "jdbc",
"driver": "io.crate.client.jdbc.CrateDriver",
"url": "jdbc:crate://localhost:5432/",
"username": "crate",
"password": null,
"enabled": true
}

and the jar: crate-jdbc-standalone-2.2.0.jar

Querying tables with simple data types return results, but crate is capable of 
storying dynamic data (objects) that drill is unable to understand.
Please let me know how to get around this issue.

I read in Apache Drill documentation  regarding 
creating custom storage plugin  ( What datastores does Drill support?: A new 
datastore can be added by developing a storage plugin )
but could not find any pertinent information/ tuts/ references for the same.
Please guide me in this regard.

My ultimate objective is to be able to query all type of data from crate via 
Apache Drill.

Thanks and Regards,
Charuta Rajopadhye

[GitHub] drill issue #904: DRILL-5717: change some date time test cases with specific...

2017-08-24 Thread weijietong
Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/904
  
@vvysotskyi  please review the update ones


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Drill 2.0 (design) hackathon

2017-08-24 Thread Aman Sinha
Drill Developers,

In order to kick-start the Drill 2.0  release discussions, I would like to
propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).

As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
for most of you!

The goal is to get the community together for a day-long technical
discussion on key topics in preparation for a Drill 2.0 release as well as
potential improvements in upcoming 1.xx releases.  Depending on the
interest areas, we could form groups and have a volunteer lead each group.

 Based on prior discussions on the dev list, hangouts and existing JIRAs,
there is already a substantial set of topics and I have summarized a few of
them below.   What other topics do folks want to talk about?   Feel free to
respond to this thread and I will create a google doc to consolidate.
Understandably, the list would be long but we will use the hackathon to get
a sense of a reasonable feature set for 1.xx and 2.0 releases.


1. Metadata management.

  1a: Defining an abstraction layer for various types of metadata: views,
schema, statistics, security

  1b: Underlying storage for metadata: what are the options and their
trade-offs?

  - Hive metastore

  - Parquet metadata cache (parquet specific)

  - An embedded DBMS

  - A distributed key-value store

  - Others..



2. Drill integration with Apache Arrow

  2a: Evaluate the choices and tradeoffs



3. Resource management

  3a: Memory limits per query

  3b: Spilling

  3c: Resource management with Drill on Yarn/Mesos/Kubernetes

  3d: Local vs. global resource management

  3e: Aligning with admission control/queueing



4. TPC-DS coverage and related planner/operator enhancements

  4a: Additional set operations: INTERSECT, EXCEPT

  4b: GROUPING SETS, ROLLUP, CUBE support

  4c: Handling inequality joins and cartesian joins of non-scalar inputs
(via Nested Loop Join)

  4d: Remaining gaps in correlated subquery

  4e: Statistics: Number of Distinct Values, Histograms



5. Schema handling

  5a: Creation, management of schema

  5b: Handling schema changes in certain common cases

  5c: Schema-awareness

  5d: Others TBD



6. Concurrency

  6a: What are the bottlenecks to achieving higher concurrency

  6b: Ideas to address these..e.g async execution ?



7. Storage plugins,  REST APIs related enhancements





8. Performance improvements

  8a: Filter pushdown

  8b: Vectorized Parquet reader

  8c: Code-gen improvements

  8d: Others TBD


Re: Drill 2.0 (design) hackathon

2017-08-24 Thread Charles Givre
Hi Aman, 
Would you consider doing some sort of livestream so that those of us who 
couldn’t be there in person can participate?
Thanks,
— C

> On Aug 24, 2017, at 11:39, Aman Sinha  wrote:
> 
> Drill Developers,
> 
> In order to kick-start the Drill 2.0  release discussions, I would like to
> propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
> 
> As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
> Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
> for most of you!
> 
> The goal is to get the community together for a day-long technical
> discussion on key topics in preparation for a Drill 2.0 release as well as
> potential improvements in upcoming 1.xx releases.  Depending on the
> interest areas, we could form groups and have a volunteer lead each group.
> 
> Based on prior discussions on the dev list, hangouts and existing JIRAs,
> there is already a substantial set of topics and I have summarized a few of
> them below.   What other topics do folks want to talk about?   Feel free to
> respond to this thread and I will create a google doc to consolidate.
> Understandably, the list would be long but we will use the hackathon to get
> a sense of a reasonable feature set for 1.xx and 2.0 releases.
> 
> 
> 1. Metadata management.
> 
>  1a: Defining an abstraction layer for various types of metadata: views,
> schema, statistics, security
> 
>  1b: Underlying storage for metadata: what are the options and their
> trade-offs?
> 
>  - Hive metastore
> 
>  - Parquet metadata cache (parquet specific)
> 
>  - An embedded DBMS
> 
>  - A distributed key-value store
> 
>  - Others..
> 
> 
> 
> 2. Drill integration with Apache Arrow
> 
>  2a: Evaluate the choices and tradeoffs
> 
> 
> 
> 3. Resource management
> 
>  3a: Memory limits per query
> 
>  3b: Spilling
> 
>  3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> 
>  3d: Local vs. global resource management
> 
>  3e: Aligning with admission control/queueing
> 
> 
> 
> 4. TPC-DS coverage and related planner/operator enhancements
> 
>  4a: Additional set operations: INTERSECT, EXCEPT
> 
>  4b: GROUPING SETS, ROLLUP, CUBE support
> 
>  4c: Handling inequality joins and cartesian joins of non-scalar inputs
> (via Nested Loop Join)
> 
>  4d: Remaining gaps in correlated subquery
> 
>  4e: Statistics: Number of Distinct Values, Histograms
> 
> 
> 
> 5. Schema handling
> 
>  5a: Creation, management of schema
> 
>  5b: Handling schema changes in certain common cases
> 
>  5c: Schema-awareness
> 
>  5d: Others TBD
> 
> 
> 
> 6. Concurrency
> 
>  6a: What are the bottlenecks to achieving higher concurrency
> 
>  6b: Ideas to address these..e.g async execution ?
> 
> 
> 
> 7. Storage plugins,  REST APIs related enhancements
> 
>
> 
> 
> 
> 8. Performance improvements
> 
>  8a: Filter pushdown
> 
>  8b: Vectorized Parquet reader
> 
>  8c: Code-gen improvements
> 
>  8d: Others TBD



Re: Drill 2.0 (design) hackathon

2017-08-24 Thread Aman Sinha
Hi Charles,
yes, it would be great if remote folks could participate..I will look into
the options for livestreaming.


On Thu, Aug 24, 2017 at 8:42 AM, Charles Givre  wrote:

> Hi Aman,
> Would you consider doing some sort of livestream so that those of us who
> couldn’t be there in person can participate?
> Thanks,
> — C
>
> > On Aug 24, 2017, at 11:39, Aman Sinha  wrote:
> >
> > Drill Developers,
> >
> > In order to kick-start the Drill 2.0  release discussions, I would like
> to
> > propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
> >
> > As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
> > Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
> > for most of you!
> >
> > The goal is to get the community together for a day-long technical
> > discussion on key topics in preparation for a Drill 2.0 release as well
> as
> > potential improvements in upcoming 1.xx releases.  Depending on the
> > interest areas, we could form groups and have a volunteer lead each
> group.
> >
> > Based on prior discussions on the dev list, hangouts and existing JIRAs,
> > there is already a substantial set of topics and I have summarized a few
> of
> > them below.   What other topics do folks want to talk about?   Feel free
> to
> > respond to this thread and I will create a google doc to consolidate.
> > Understandably, the list would be long but we will use the hackathon to
> get
> > a sense of a reasonable feature set for 1.xx and 2.0 releases.
> >
> >
> > 1. Metadata management.
> >
> >  1a: Defining an abstraction layer for various types of metadata: views,
> > schema, statistics, security
> >
> >  1b: Underlying storage for metadata: what are the options and their
> > trade-offs?
> >
> >  - Hive metastore
> >
> >  - Parquet metadata cache (parquet specific)
> >
> >  - An embedded DBMS
> >
> >  - A distributed key-value store
> >
> >  - Others..
> >
> >
> >
> > 2. Drill integration with Apache Arrow
> >
> >  2a: Evaluate the choices and tradeoffs
> >
> >
> >
> > 3. Resource management
> >
> >  3a: Memory limits per query
> >
> >  3b: Spilling
> >
> >  3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> >
> >  3d: Local vs. global resource management
> >
> >  3e: Aligning with admission control/queueing
> >
> >
> >
> > 4. TPC-DS coverage and related planner/operator enhancements
> >
> >  4a: Additional set operations: INTERSECT, EXCEPT
> >
> >  4b: GROUPING SETS, ROLLUP, CUBE support
> >
> >  4c: Handling inequality joins and cartesian joins of non-scalar inputs
> > (via Nested Loop Join)
> >
> >  4d: Remaining gaps in correlated subquery
> >
> >  4e: Statistics: Number of Distinct Values, Histograms
> >
> >
> >
> > 5. Schema handling
> >
> >  5a: Creation, management of schema
> >
> >  5b: Handling schema changes in certain common cases
> >
> >  5c: Schema-awareness
> >
> >  5d: Others TBD
> >
> >
> >
> > 6. Concurrency
> >
> >  6a: What are the bottlenecks to achieving higher concurrency
> >
> >  6b: Ideas to address these..e.g async execution ?
> >
> >
> >
> > 7. Storage plugins,  REST APIs related enhancements
> >
> >
> >
> >
> >
> > 8. Performance improvements
> >
> >  8a: Filter pushdown
> >
> >  8b: Vectorized Parquet reader
> >
> >  8c: Code-gen improvements
> >
> >  8d: Others TBD
>
>


DrillRestServer Tests

2017-08-24 Thread Timothy Farkas
Hi All,

I want to add some unit tests for a new rest endpoint I added to the 
DrillRestServer. I've been looking but I couldn't find an existing test for 
rest endpoints, or a rest api client. Does anyone have any pointers to where 
the rest api tests are kept and if there is a pre-existing rest api client?

Thanks,
Tim



[GitHub] drill issue #913: - DRILL-5729 Fix Travis Build

2017-08-24 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/913
  
@ilooner-mapr Do you know why it fails on the Trusty (new default)? Going 
back to using Precise is OK for a while, but my guess that it will eventually 
become obsolete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #913: - DRILL-5729 Fix Travis Build

2017-08-24 Thread ilooner-mapr
Github user ilooner-mapr commented on the issue:

https://github.com/apache/drill/pull/913
  
@vrozov It looks like the default jdk for Trusty on travis is jdk8. I'll 
check to see if explicitly telling travis to use jdk7 with Trusty is also a 
possible fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #913: - DRILL-5729 Fix Travis Build

2017-08-24 Thread ilooner-mapr
Github user ilooner-mapr commented on the issue:

https://github.com/apache/drill/pull/913
  
@vrozov I got it working with Trusty and explicitly configuring openjdk7. 
One thing to note is that Trusty was running out of memory doing the build with 
the default 4gb container size. So I had to add the **sudo: required** option 
in order to increase the vm size 
https://docs.travis-ci.com/user/reference/overview/ .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #913: - DRILL-5729 Fix Travis Build

2017-08-24 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/913
  
@ilooner-mapr I tested it on my branch and it works with the default jdk 
and without *sudo*. Add `MAVEN_OPTS="-Xms1G -Xmx1G"` before `mvn`.

Can you remove leading dash from the commit message as part of rebase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #913: - DRILL-5729 Fix Travis Build

2017-08-24 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/913#discussion_r135143313
  
--- Diff: .travis.yml ---
@@ -13,8 +13,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-sudo: false
+sudo: required
 before_install: git fetch --unshallow
 language: java
+jdk:
+  - openjdk7
 install: mvn install --batch-mode -DskipTests=true 
-Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || (cat 
mvn_install.log && false)
--- End diff --

@ilooner-mapr can you remove redirection to mvn_install.log as part of the 
same PR. It is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #913: - DRILL-5729 Fix Travis Build

2017-08-24 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/913#discussion_r135148431
  
--- Diff: .travis.yml ---
@@ -13,8 +13,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-sudo: false
+sudo: required
 before_install: git fetch --unshallow
 language: java
+jdk:
+  - openjdk7
 install: mvn install --batch-mode -DskipTests=true 
-Dmaven.javadoc.skip=true -Dsource.skip=true > mvn_install.log || (cat 
mvn_install.log && false)
 script: mvn package -DskipTests=true
--- End diff --

`install` includes `package` goal and I don't think that javadoc or source 
plugin are enabled in the default profile, so `script` just repeats the 
`install`. Also, if the goal is to skip source jar generation, the proper 
property is maven.source.skip, I believe. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #907: DRILL-5697: Improve performance of filter operator for pat...

2017-08-24 Thread ppadma
Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/907
  
@paul-rogers Paul, thanks a lot for the review. I made changes as per your 
comments. Please review updated diffs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #907: DRILL-5697: Improve performance of filter operator for pat...

2017-08-24 Thread ppadma
Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/907
  
@kkhatua Kunal, we can add more patterns later if we want. For now, let us 
get the most simple cases done first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---