Re: Gradle?

2015-09-13 Thread Ted Dunning
Not particularly.

Maven handles the needs so far pretty easily and has the considerable
benefit of being bog standard.

What benefit would you foresee with gradle?




On Sat, Sep 12, 2015 at 9:02 PM, Edmon Begoli  wrote:

> Hey guys - has there been any consideration given to using Gradle in the
> future instead of Maven?
>


Gradle?

2015-09-13 Thread Edmon Begoli
Ted,

I will tell you my opinion, not some deeply researched engineered position.
Right now, I am looking into Gradle to be my primary build/release tool. I
am not 100% set on it, but I see plenty of benefits, main being
conciseness, configurabity, and robustness of the build process.
In the context of the Drill project, here are couple of things I could
recommend:

- you can still use the same POMs, so one can immediately use it without
breaking backward compatibility. You can go forever with POMs

- much smaller build definition file (very subjective here because I do not
like XML for human use)

- you can break up the build up into multiple independent tasks. I think
this would help with making tests more modular etc.

- continues execution after failures. This saves time a lot.

- much more expressive and feature-full task design and execution
   -- API automatic detection of build dependencies
   -- a complete DAG for dependencies - one task can depend on multiple
others, and any of the dependencies can be of any depth

- dry run feature - you can see what will compile without having to
actually build it

- it can support and produce multiple versions, supports multiple profiles,
etc.

There are many others to mention, but I just wanted to share these few that
could be of interest.

Edmon

On Sun, Sep 13, 2015 at 2:46 AM, Ted Dunning > wrote:

> Not particularly.
>
> Maven handles the needs so far pretty easily and has the considerable
> benefit of being bog standard.
>
> What benefit would you foresee with gradle?
>
>
>
>
> On Sat, Sep 12, 2015 at 9:02 PM, Edmon Begoli  > wrote:
>
> > Hey guys - has there been any consideration given to using Gradle in the
> > future instead of Maven?
> >
>


[GitHub] drill pull request: DRILL-1942-readers:

2015-09-13 Thread amansinha100
Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/154#issuecomment-139897179
  
+1  LGTM. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Update on EDI support for Drill - repo and design collaboratory

2015-09-13 Thread Edmon Begoli
Ted, Matt, et al.,

I have created temporary repository for design and development of the
support for EDI format in Drill.
At this point, it is not a fork of Drill, but rather a collaboration space
and code repository for exploratory code.

Wiki:
https://github.com/ebegoli/edi-drill-store/wiki

Repo:
https://github.com/ebegoli/edi-drill-store

Once the difficult parts specific to EDI (logical nesting, record
representation) are figured out, and generic code written for I/O and
translation,
I will look to merge this with Drill and blend it into Drill-specific
patterns.

*If you wish, I will add you to the repo, so you can edit Wiki.*

Let me know please.

Edmon


On Sun, Sep 6, 2015 at 7:16 AM, Edmon Begoli  wrote:

> Matt - that is fantastic. Having good, liberally licensed format
> converters probably takes care of the 50% of the problem. The other 50%
> will be in figuring out the logical mapping.
>
> Let me think a little bit and propose how can we best set up a
> collaboration platform. Any suggestion for this welcome.
>
> I personally like Google stuff, Hangouts, docs, and Github, of course.
>
>
> On Saturday, September 5, 2015, Matthew Burgess 
> wrote:
>
>> Edmon,
>>
>> All our Data Integration (file-format parsing, e.g.) code is Apache-2.0
>> licensed, we have parsers/processors
>> <
>> https://github.com/pentaho/pentaho-kettle/tree/master/engine/src/org/pentah
>> o/di/trans/steps
>> >
>> for EDI / XML(StaX) / HL7 / YAML, etc. I have a plugin
>>   (also
>> Apache-2.0)
>> using Tika to extract metadata, this could be refactored as a Drill
>> plugin.
>>
>> The (semi-)structured-to-tabular conversion will be an issue that most
>> Drill
>> extenders will have to deal with, although with powerful functions like
>> KVGEN() and FLATTEN() it should be less daunting. For graphs
>> (highly-structured but non-tabular data sources), I'm also looking into a
>> Gremlin   plugin, which could
>> connect Graph Databases with Drill. Again, the problem is representing
>> non-tabular data in a SQL environment as you mentioned.
>>
>> Regards,
>> Matt
>>
>> From:  Edmon Begoli 
>> Reply-To:  
>> Date:  Saturday, September 5, 2015 at 8:46 PM
>> To:  
>> Subject:  Re: Data representation and conversation - translating nested
>> hierarchies into a tabular/queriable format
>>
>> Matt - any contribution of your time is welcome! Thank you.
>>
>> These problems that we are wanting to look into are not easy problems; I
>> would not expect quick solutions, but any good idea, contribution of time,
>> or code will help us advance the state of the capabilities.
>>
>> I might create a branch or separate Github repo, so that we just use its
>> wiki for documentation and collaboration, and then later for scratch pad
>> development.
>>
>> Regarding existing tools you might have - *do you think you could bring
>> this code under the Apache 2 license?*
>> Knowing what you told me before, I think that contributing this code would
>> help advance the state of the Drill's format support tremendously.
>>
>> I see two major challenges related to what I am proposing:
>>
>> 1. (greater challenge) How to bring heterogeneously structured data
>> logically and semantically into the tabular orientation of a typical SQL
>> query processing engine.
>> I think that some problems will not be completely implementable, so we'll
>> need to either approximate or make some limiting/bounding design choices.
>>
>> 2. How to support these new formats through the Drill API. This is more of
>> just a API study, design and programming effort. Nothing contradictory.
>>
>> Edmon
>>
>>
>>
>>
>> On Sat, Sep 5, 2015 at 8:12 PM, Matt Burgess  wrote:
>>
>> >  Challenge accepted! :) are we talking about things like XML, Jsonnet,
>> >  Yaml, etc.? And/or binary file formats that are (semi-)structured in
>> nature
>> >  like XLSX?
>> >
>> >  If we want to go more unstructured we could look at Apache Tika to at
>> >  least pull out metadata on things like image and video files, and I'm
>> >  tinkering with the idea of a UDF called topics() for human-generated
>> text
>> >  using Apache OpenNLP, the problem being a well-trained model for the
>> target
>> >  data.
>> >
>> >  Edmon, I admire your ambition and would like to help out where/when I
>> can.
>> >  Having said that, so far my amount of available time for Drill has been
>> >  embarrassingly lower than my amount of interest.
>> >
>> >  For well-known file formats, I may be able to help with some of our
>> >  open-source tools for parsing such files.
>> >
>> >  Regards,
>> >  Matt
>> >
>> >  Sent from my iPhone
>> >
>> >>  > On Sep 5, 2015, at 7:44 PM, Edmon Begoli  wrote:
>> >>  >
>> >>  > 

[GitHub] drill pull request: DRILL-3735: For partition pruning divide up th...

2015-09-13 Thread amansinha100
GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/156

DRILL-3735: For partition pruning divide up the partition lists into …

…sublists of 64K each and iterate over each sublist.

Add abstract base class for various partition descriptors.  Add logging 
messages in PruneScanRule for better debuggability.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill partition9

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #156


commit dc079ad2cfa2813817564cc8bdd66356d0c6e59c
Author: Aman Sinha 
Date:   2015-09-12T19:57:12Z

DRILL-3735: For partition pruning divide up the partition lists into 
sublists of 64K each and iterate over each sublist.

Add abstract base class for various partition descriptors.  Add logging 
messages in PruneScanRule for better debuggability.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-3774) The Test TestExampleQueries.testTextPartitions fails consistently on fresh install

2015-09-13 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3774:
-

 Summary: The Test TestExampleQueries.testTextPartitions fails 
consistently on fresh install
 Key: DRILL-3774
 URL: https://issues.apache.org/jira/browse/DRILL-3774
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau


I continue to see failures for this test on a brand new machine. Marking ignore 
until resolution can be determined.  Machine Info:

{code}
uname -a
Linux ip-172-31-6-19 3.14.48-33.39.amzn1.x86_64 #1 SMP Tue Jul 14 23:43:07 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux
{code}

{code}
java -version
java version "1.7.0_85"
OpenJDK Runtime Environment (amzn-2.6.1.3.61.amzn1-x86_64 u85-b01)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)
{code}

{code}
mvn -version
Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 
2014-12-14T17:29:23+00:00)
Maven home: /usr/share/apache-maven
Java version: 1.7.0_85, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.14.48-33.39.amzn1.x86_64", arch: "amd64", family: 
"unix"
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Gradle?

2015-09-13 Thread Matt Burgess
I'm a huge Gradle fanatic and proponent, I use it for all of my projects large 
and small. However in this case I tend towards Ted's arguments. Maven is 
working well and it demands a little extra discipline that supports the ASF 
style. I expect and hope this changes in the future, but I've got to go with 
the "not yet" sentiment.

Sent from my iPhone

> On Sep 13, 2015, at 5:40 PM, Ted Dunning  wrote:
> 
> OK.  To explain some of the reasons that moving has little force, here are
> some answers in-line.
> 
> The summary is that I think that you are claiming that gradle offers a
> combination of readability, expressiveness and flexibility. My thought in
> response is that the readability difference is probably real, but not very
> important, that the expressiveness and flexibility are actually
> mis-features, at least potentially.
> 
> Mostly, though, I feel like this is in the intersection of "ain't broke"
> and "religious" categories.
> 
> 
>> On Sun, Sep 13, 2015 at 8:56 AM, Edmon Begoli  wrote:
>> 
>> - you can still use the same POMs, so one can immediately use it without
>> breaking backward compatibility. You can go forever with POMs
> 
> Nice.
> 
> 
>> - much smaller build definition file (very subjective here because I do not
>> like XML for human use)
> 
> I hear you here, but this has much less impact any more than it used to.
> This has happened because tools like IDEA do most of the editing of the
> pom's and they also assist in reading them by folding the text.
> 
> As much as XML is a pain to work with manually, once the task becomes
> semi-automated, it really doesn't matter that much.
> 
> 
>> - you can break up the build up into multiple independent tasks. I think
>> this would help with making tests more modular etc.
> 
> Not sure how this is different from what we have now. I am very dubious of
> having too much more flexibility since much of the value in a build system
> is extreme standardization of the life-cycle.
> 
> 
>> - continues execution after failures. This saves time a lot.
> 
> Yes. It can be helpful. On the other hand, how often does the build fail?
> My gut feeling is that modern builds fail far less than builds used to
> fail. This is probably partly due to continuous integration and partly due
> to more standardization in the various build tasks so I don't have to worry
> about where things go nor do I have to worry that somebody has not
> implemented a standard build step poorly in some ant or make build script
> somewhere.
> 
> 
>> - much more expressive and feature-full task design and execution
> 
> I am actually somewhat of an opponent of expressiveness in builds. I prefer
> that they be bog standard and boring. Creativity belongs elsewhere.
> 
> 
>>   -- API automatic detection of build dependencies
> 
> Maven does this (at least with IDEA).
> 
> 
>>   -- a complete DAG for dependencies - one task can depend on multiple
>> others, and any of the dependencies can be of any depth
> 
> Maven has this.
> 
> 
>> - dry run feature - you can see what will compile without having to
>> actually build it
> 
> Maven has this.
> 
> 
>> - it can support and produce multiple versions, supports multiple profiles,
>> etc.
> 
> Maven has this.


[jira] [Created] (DRILL-3773) Mongo RecordReader projection pushdown doesn't work past first level paths

2015-09-13 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3773:
-

 Summary: Mongo RecordReader projection pushdown doesn't work past 
first level paths
 Key: DRILL-3773
 URL: https://issues.apache.org/jira/browse/DRILL-3773
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - MongoDB
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Gradle?

2015-09-13 Thread Matt Burgess
In that case ;) Gradle is faster, more attuned to incremental builds, has a 
daemon to remove startup time, and as of 2.6 has continuous builds so you don't 
even have to kick it off after you make changes to source :)

Of course, with great power comes great responsibility. As an embedded DSL vs a 
markup declarative document, you have to keep a closer eye as to what will 
actually be performed when builds are invoked. At least that's what "they" tell 
me :)

Sent from my iPhone

> On Sep 13, 2015, at 8:28 PM, Ted Dunning  wrote:
> 
>> On Sun, Sep 13, 2015 at 3:43 PM, Matt Burgess  wrote:
>> 
>> but I've got to go with the "not yet"
> 
> 
> Regardless of the decision, the discussion is still useful.  These things
> have a habit of changing over time. I know that I have gone through quite a
> number of build systems in my time. Capabilities and requirements change
> over time which eventually can change what the right answer is.


Re: Update on EDI support for Drill - repo and design collaboratory

2015-09-13 Thread Ted Dunning
I doubt that I will be able to produce significant amounts of code. If I do
produce much of anything, I would be happy to contribute via pull requests.

So I don't need to be on the repo as a contributor.

On Sun, Sep 13, 2015 at 1:42 PM, Edmon Begoli  wrote:

> Ted, Matt, et al.,
>
> I have created temporary repository for design and development of the
> support for EDI format in Drill.
> At this point, it is not a fork of Drill, but rather a collaboration space
> and code repository for exploratory code.
>
> Wiki:
> https://github.com/ebegoli/edi-drill-store/wiki
>
> Repo:
> https://github.com/ebegoli/edi-drill-store
>
> Once the difficult parts specific to EDI (logical nesting, record
> representation) are figured out, and generic code written for I/O and
> translation,
> I will look to merge this with Drill and blend it into Drill-specific
> patterns.
>
> *If you wish, I will add you to the repo, so you can edit Wiki.*
>
> Let me know please.
>
> Edmon
>
>
> On Sun, Sep 6, 2015 at 7:16 AM, Edmon Begoli  wrote:
>
> > Matt - that is fantastic. Having good, liberally licensed format
> > converters probably takes care of the 50% of the problem. The other 50%
> > will be in figuring out the logical mapping.
> >
> > Let me think a little bit and propose how can we best set up a
> > collaboration platform. Any suggestion for this welcome.
> >
> > I personally like Google stuff, Hangouts, docs, and Github, of course.
> >
> >
> > On Saturday, September 5, 2015, Matthew Burgess 
> > wrote:
> >
> >> Edmon,
> >>
> >> All our Data Integration (file-format parsing, e.g.) code is Apache-2.0
> >> licensed, we have parsers/processors
> >> <
> >>
> https://github.com/pentaho/pentaho-kettle/tree/master/engine/src/org/pentah
> >> o/di/trans/steps
> >> <
> https://github.com/pentaho/pentaho-kettle/tree/master/engine/src/org/pentaho/di/trans/steps
> >>
> >> for EDI / XML(StaX) / HL7 / YAML, etc. I have a plugin
> >>   (also
> >> Apache-2.0)
> >> using Tika to extract metadata, this could be refactored as a Drill
> >> plugin.
> >>
> >> The (semi-)structured-to-tabular conversion will be an issue that most
> >> Drill
> >> extenders will have to deal with, although with powerful functions like
> >> KVGEN() and FLATTEN() it should be less daunting. For graphs
> >> (highly-structured but non-tabular data sources), I'm also looking into
> a
> >> Gremlin   plugin, which could
> >> connect Graph Databases with Drill. Again, the problem is representing
> >> non-tabular data in a SQL environment as you mentioned.
> >>
> >> Regards,
> >> Matt
> >>
> >> From:  Edmon Begoli 
> >> Reply-To:  
> >> Date:  Saturday, September 5, 2015 at 8:46 PM
> >> To:  
> >> Subject:  Re: Data representation and conversation - translating nested
> >> hierarchies into a tabular/queriable format
> >>
> >> Matt - any contribution of your time is welcome! Thank you.
> >>
> >> These problems that we are wanting to look into are not easy problems; I
> >> would not expect quick solutions, but any good idea, contribution of
> time,
> >> or code will help us advance the state of the capabilities.
> >>
> >> I might create a branch or separate Github repo, so that we just use its
> >> wiki for documentation and collaboration, and then later for scratch pad
> >> development.
> >>
> >> Regarding existing tools you might have - *do you think you could bring
> >> this code under the Apache 2 license?*
> >> Knowing what you told me before, I think that contributing this code
> would
> >> help advance the state of the Drill's format support tremendously.
> >>
> >> I see two major challenges related to what I am proposing:
> >>
> >> 1. (greater challenge) How to bring heterogeneously structured data
> >> logically and semantically into the tabular orientation of a typical SQL
> >> query processing engine.
> >> I think that some problems will not be completely implementable, so
> we'll
> >> need to either approximate or make some limiting/bounding design
> choices.
> >>
> >> 2. How to support these new formats through the Drill API. This is more
> of
> >> just a API study, design and programming effort. Nothing contradictory.
> >>
> >> Edmon
> >>
> >>
> >>
> >>
> >> On Sat, Sep 5, 2015 at 8:12 PM, Matt Burgess 
> wrote:
> >>
> >> >  Challenge accepted! :) are we talking about things like XML, Jsonnet,
> >> >  Yaml, etc.? And/or binary file formats that are (semi-)structured in
> >> nature
> >> >  like XLSX?
> >> >
> >> >  If we want to go more unstructured we could look at Apache Tika to at
> >> >  least pull out metadata on things like image and video files, and I'm
> >> >  tinkering with the idea of a UDF called topics() for human-generated
> >> text
> >> >  using Apache OpenNLP, the problem being a well-trained model for the
> >> 

Re: Gradle?

2015-09-13 Thread Ted Dunning
OK.  To explain some of the reasons that moving has little force, here are
some answers in-line.

The summary is that I think that you are claiming that gradle offers a
combination of readability, expressiveness and flexibility. My thought in
response is that the readability difference is probably real, but not very
important, that the expressiveness and flexibility are actually
mis-features, at least potentially.

Mostly, though, I feel like this is in the intersection of "ain't broke"
and "religious" categories.


On Sun, Sep 13, 2015 at 8:56 AM, Edmon Begoli  wrote:

> - you can still use the same POMs, so one can immediately use it without
> breaking backward compatibility. You can go forever with POMs
>

Nice.


> - much smaller build definition file (very subjective here because I do not
> like XML for human use)
>

I hear you here, but this has much less impact any more than it used to.
This has happened because tools like IDEA do most of the editing of the
pom's and they also assist in reading them by folding the text.

As much as XML is a pain to work with manually, once the task becomes
semi-automated, it really doesn't matter that much.


> - you can break up the build up into multiple independent tasks. I think
> this would help with making tests more modular etc.
>

Not sure how this is different from what we have now. I am very dubious of
having too much more flexibility since much of the value in a build system
is extreme standardization of the life-cycle.


> - continues execution after failures. This saves time a lot.
>

Yes. It can be helpful. On the other hand, how often does the build fail?
My gut feeling is that modern builds fail far less than builds used to
fail. This is probably partly due to continuous integration and partly due
to more standardization in the various build tasks so I don't have to worry
about where things go nor do I have to worry that somebody has not
implemented a standard build step poorly in some ant or make build script
somewhere.


> - much more expressive and feature-full task design and execution
>

I am actually somewhat of an opponent of expressiveness in builds. I prefer
that they be bog standard and boring. Creativity belongs elsewhere.


>-- API automatic detection of build dependencies
>

Maven does this (at least with IDEA).


>-- a complete DAG for dependencies - one task can depend on multiple
> others, and any of the dependencies can be of any depth
>

Maven has this.


> - dry run feature - you can see what will compile without having to
> actually build it
>

Maven has this.


> - it can support and produce multiple versions, supports multiple profiles,
> etc.
>

Maven has this.


Re: Gradle?

2015-09-13 Thread Ted Dunning
On Sun, Sep 13, 2015 at 3:43 PM, Matt Burgess  wrote:

> but I've got to go with the "not yet"


Regardless of the decision, the discussion is still useful.  These things
have a habit of changing over time. I know that I have gone through quite a
number of build systems in my time. Capabilities and requirements change
over time which eventually can change what the right answer is.


[jira] [Created] (DRILL-3775) Fix issues with TestMongoProjectPushDown

2015-09-13 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3775:
-

 Summary: Fix issues with TestMongoProjectPushDown
 Key: DRILL-3775
 URL: https://issues.apache.org/jira/browse/DRILL-3775
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - MongoDB
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau
 Fix For: 1.2.0


Currently it fails on linux. Need to fix to reenable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3160) Make JDBC Javadoc documentation available to users

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3160.
---
Resolution: Fixed

Merged in e43155d

> Make JDBC Javadoc documentation available to users
> --
>
> Key: DRILL-3160
> URL: https://issues.apache.org/jira/browse/DRILL-3160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Daniel Barclay (Drill)
>Assignee: Aditya Kishore
> Fix For: 1.2.0
>
> Attachments: 
> DRILL-3160-Make-JDBC-Javadoc-documentation-available.patch
>
>
> The existing Javadoc documentation (source) for Drill JDBC classes/interfaces 
> such as org.apache.drill.jdbc.Driver and org.apache.drill.jdbc.DrillResultSet 
> is not generated into documentation pages and packaged for users by the 
> current Maven build scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3458) Avro file format's support for map and nullable union data types.

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3458.
---
Resolution: Fixed

Merged in fe07b6c

> Avro file format's support for map and nullable union data types.
> -
>
> Key: DRILL-3458
> URL: https://issues.apache.org/jira/browse/DRILL-3458
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.0.0
>Reporter: Bhallamudi Venkata Siva Kamesh
>Assignee: Jacques Nadeau
>
> Avro file format as of now does not support union and map datatypes.
> For union datatypes, like 
> [Pig|https://cwiki.apache.org/confluence/display/PIG/AvroStorage] and 
> [Hive|https://cwiki.apache.org/confluence/display/Hive/AvroSerDe], I think, 
> we can support nullable union like ["null", "some-type"].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3720) Avro Record Reader should process Avro files by per block basis

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3720.
---
Resolution: Fixed

Merged in 8f4ca6e

> Avro Record Reader should process Avro files by per block basis
> ---
>
> Key: DRILL-3720
> URL: https://issues.apache.org/jira/browse/DRILL-3720
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Bhallamudi Venkata Siva Kamesh
>Assignee: Jacques Nadeau
> Fix For: 1.2.0
>
>
> Currently Avro Record Reader processes entire file in single drill bit. This 
> is fine as long as files are of size HDFS block size. However, if the size of 
> the file is *large*, it causes a lot of performance issues. 
> To address this, Avro Record Reader should process Avro files by per block 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3589) JDBC driver maven artifact includes a lot of unnecessary dependencies

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3589.
---
Resolution: Fixed

Resolved in 4e3b7dc

> JDBC driver maven artifact includes a lot of unnecessary dependencies
> -
>
> Key: DRILL-3589
> URL: https://issues.apache.org/jira/browse/DRILL-3589
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Joseph Barefoot
>Assignee: Jacques Nadeau
> Fix For: 1.2.0
>
>
> The Drill JDBC POM file pulls in so many unused transitive dependencies that 
> it takes quite a while to exclude all the unnecessary ones when using it from 
> within a Java project.  This is similar to DRILL-3581 in that you can work 
> around it via exclusions of transitive dependencies, but since it makes 
> interoperability with other open-source projects problematic, this will keep 
> coming up for anyone using the JDBC driver from within any serious java app.
> Considering the pom:
> http://repo1.maven.org/maven2/org/apache/drill/exec/drill-jdbc/1.1.0/drill-jdbc-1.1.0.pom
> ...it seems that most of the unused dependencies are transitive from 
> drill-common and perhaps also drill-java-exec.  Here's an example of some 
> dependencies that the JDBC driver shouldn't need (and we excluded in our 
> project):
> parquet-*
> jetty-server
> javassist
> commons-daemon
> hibernate-validator
> xalan
> xercesImpl
> For the record we are now able to use the JDBC driver fine from within our 
> project, but it did take some dependency tree analysis (and a little 
> trial-and-error) to figure out what to exclude.  We would like to save future 
> developers that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Update on EDI support for Drill - repo and design collaboratory

2015-09-13 Thread Edmon Begoli
I understand. I hope you and the rest will help me with design guidance as
I start translating EDI format into a Drill-amenable one.

On Sunday, September 13, 2015, Ted Dunning  wrote:

> I doubt that I will be able to produce significant amounts of code. If I do
> produce much of anything, I would be happy to contribute via pull requests.
>
> So I don't need to be on the repo as a contributor.
>
> On Sun, Sep 13, 2015 at 1:42 PM, Edmon Begoli  > wrote:
>
> > Ted, Matt, et al.,
> >
> > I have created temporary repository for design and development of the
> > support for EDI format in Drill.
> > At this point, it is not a fork of Drill, but rather a collaboration
> space
> > and code repository for exploratory code.
> >
> > Wiki:
> > https://github.com/ebegoli/edi-drill-store/wiki
> >
> > Repo:
> > https://github.com/ebegoli/edi-drill-store
> >
> > Once the difficult parts specific to EDI (logical nesting, record
> > representation) are figured out, and generic code written for I/O and
> > translation,
> > I will look to merge this with Drill and blend it into Drill-specific
> > patterns.
> >
> > *If you wish, I will add you to the repo, so you can edit Wiki.*
> >
> > Let me know please.
> >
> > Edmon
> >
> >
> > On Sun, Sep 6, 2015 at 7:16 AM, Edmon Begoli  > wrote:
> >
> > > Matt - that is fantastic. Having good, liberally licensed format
> > > converters probably takes care of the 50% of the problem. The other 50%
> > > will be in figuring out the logical mapping.
> > >
> > > Let me think a little bit and propose how can we best set up a
> > > collaboration platform. Any suggestion for this welcome.
> > >
> > > I personally like Google stuff, Hangouts, docs, and Github, of course.
> > >
> > >
> > > On Saturday, September 5, 2015, Matthew Burgess  >
> > > wrote:
> > >
> > >> Edmon,
> > >>
> > >> All our Data Integration (file-format parsing, e.g.) code is
> Apache-2.0
> > >> licensed, we have parsers/processors
> > >> <
> > >>
> >
> https://github.com/pentaho/pentaho-kettle/tree/master/engine/src/org/pentah
> > >> o/di/trans/steps
> > >> <
> >
> https://github.com/pentaho/pentaho-kettle/tree/master/engine/src/org/pentaho/di/trans/steps
> > >>
> > >> for EDI / XML(StaX) / HL7 / YAML, etc. I have a plugin
> > >>   (also
> > >> Apache-2.0)
> > >> using Tika to extract metadata, this could be refactored as a Drill
> > >> plugin.
> > >>
> > >> The (semi-)structured-to-tabular conversion will be an issue that most
> > >> Drill
> > >> extenders will have to deal with, although with powerful functions
> like
> > >> KVGEN() and FLATTEN() it should be less daunting. For graphs
> > >> (highly-structured but non-tabular data sources), I'm also looking
> into
> > a
> > >> Gremlin   plugin, which could
> > >> connect Graph Databases with Drill. Again, the problem is representing
> > >> non-tabular data in a SQL environment as you mentioned.
> > >>
> > >> Regards,
> > >> Matt
> > >>
> > >> From:  Edmon Begoli >
> > >> Reply-To:  >
> > >> Date:  Saturday, September 5, 2015 at 8:46 PM
> > >> To:  >
> > >> Subject:  Re: Data representation and conversation - translating
> nested
> > >> hierarchies into a tabular/queriable format
> > >>
> > >> Matt - any contribution of your time is welcome! Thank you.
> > >>
> > >> These problems that we are wanting to look into are not easy
> problems; I
> > >> would not expect quick solutions, but any good idea, contribution of
> > time,
> > >> or code will help us advance the state of the capabilities.
> > >>
> > >> I might create a branch or separate Github repo, so that we just use
> its
> > >> wiki for documentation and collaboration, and then later for scratch
> pad
> > >> development.
> > >>
> > >> Regarding existing tools you might have - *do you think you could
> bring
> > >> this code under the Apache 2 license?*
> > >> Knowing what you told me before, I think that contributing this code
> > would
> > >> help advance the state of the Drill's format support tremendously.
> > >>
> > >> I see two major challenges related to what I am proposing:
> > >>
> > >> 1. (greater challenge) How to bring heterogeneously structured data
> > >> logically and semantically into the tabular orientation of a typical
> SQL
> > >> query processing engine.
> > >> I think that some problems will not be completely implementable, so
> > we'll
> > >> need to either approximate or make some limiting/bounding design
> > choices.
> > >>
> > >> 2. How to support these new formats through the Drill API. This is
> more
> > of
> > >> just a API study, design and programming effort. Nothing
> contradictory.
> > >>
> > >> Edmon
> > >>
> > >>
> > >>
> > >>
> > >> On Sat, Sep 5, 2015 at 8:12 

[GitHub] drill pull request: Drill 3180: Implement JDBC Storage Plugin

2015-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/115


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-3773) Mongo RecordReader projection pushdown doesn't work past first level paths

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3773.
---
Resolution: Fixed

Merged in 97615e5

> Mongo RecordReader projection pushdown doesn't work past first level paths
> --
>
> Key: DRILL-3773
> URL: https://issues.apache.org/jira/browse/DRILL-3773
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Reporter: Jacques Nadeau
>Assignee: Jason Altekruse
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-1666) Provide Test cases for Mongo Storage plugin

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-1666.
---
Resolution: Fixed

Merged in 197d972

> Provide Test cases for Mongo Storage plugin
> ---
>
> Key: DRILL-1666
> URL: https://issues.apache.org/jira/browse/DRILL-1666
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MongoDB
>Affects Versions: 1.1.0
>Reporter: Bhallamudi Venkata Siva Kamesh
>Assignee: Jacques Nadeau
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

2015-09-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3180.
---
   Resolution: Fixed
Fix Version/s: (was: 1.3.0)
   1.2.0

Merged in e12cd470e4ab57b025840fdfa200a051a01df029

> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and 
> Netezza from Apache Drill
> ---
>
> Key: DRILL-3180
> URL: https://issues.apache.org/jira/browse/DRILL-3180
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.0.0
>Reporter: Magnus Pierre
>Assignee: Jacques Nadeau
>  Labels: Drill, JDBC, plugin
> Fix For: 1.2.0
>
> Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. 
> The code is primitive but consitutes a good starting point for further 
> coding. Today it provides primitive support for SELECT against RDBMS with 
> JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down 
> capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Drill- Query execution plan

2015-09-13 Thread Sudip Mukherjee
Hi,
Need some help understanding the below steps of a query execution. The below 
query is broken down in to these steps.
If you could just explain me in short or point to a documentation link would be 
great for me as I am trying to dig into drill code and logics.

SELECT SUM(1) AS `COL` FROM mydb.testtable HAVING COUNT(1)>0 [query is sent 
from tableau]




Sudip[cid:image001.png@01D0EED8.6C54AE60]



***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

[GitHub] drill pull request: DRILL-3724 - added javadoc for core classes fo...

2015-09-13 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/139#issuecomment-139965734
  
This has been merged. However, I forgot to include the "merged in" message 
so @ebegoli , can you please close?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---