Re: Drill on YARN

2016-03-28 Thread John Omernik
Great summary.  I'll fill in some "non-technical" explanations of some
challenges with Memory as I see. Drill Devs, please keep Paul and I
accurate in our understanding.

First,  Memory is already set at the drillbit level... sorta.  It's set via
ENV in drill-env, and is not a cluster specific thing. However, I believe
there are some challenges that come into play when you have bits of
different sizes. Drill "may" assume that bits are all the same size, and
thus, if you run a query, depending on which bit is the foreman, and which
fragments land where, the query may succeed or fail. That's not an ideal
situation. I think for a holistic discussion on memory, we need to get some
definitives around how Drill handles memory, especially different sized
nodes, and what changes would need to be made for bits of different size to
work well together on a production cluster.

This discussion forms the basis of almost all work around memory
management. If we can realistically only have bits of one size in it's
current form, then static allocations are where we are going to be for the
initial Yarn work. I love the idea of scaling up and down, but it will be
difficult to scale an entire cluster worth of bits up and down, so
heterogeneous resource allocations must be a prerequisite to dynamic
allocation discussions (other then just adding and removing whole bits).

Second, this also plays into the multiple drillbits per node discussion.
If static sized bits are our only approach, then the initial reaction is to
make them smaller so you have some granularity in scaling up and down.
This may actually hurt a cluster.  Large queries may be challenged by
trying to fit it's fragments on 3 nodes of say 8GB of direct RAM, but that
query would run fine on bits of 24GB of direct RAM.  Drill Devs: Keep me
honest here. I am going off of lots of participation in this memory/cpu
discussions when I first started Drill/Marathon integration, and that is
the feeling I got in talking to folks on and off list about memory
management.

This is a hard topic, but one that I am glad you are spearheading Paul,
 because as we see more and more clusters get folded together, having a
citizen that plays nice with others, and provides flexibility with regards
to performance vs resource tradeoffs will be a huge selling/implementation
point of any analytics tool.  If it's hard to implement and test at scale
without dedicated hardware, it won't get a fair shake.

John


On Sun, Mar 27, 2016 at 3:25 PM, Paul Rogers  wrote:

> Hi John,
>
> The other main topic of your discussion is memory management. Here we seem
> to have 6 topics:
>
> 1. Setting the limits for Drill.
> 2. Drill respects the limits.
> 3. Drill lives within its memory “budget.”
> 4. Drill throttles work based on available memory.
> 5. Drill adapts memory usage to available memory.
> 6. Some means to inform Drill of increases (or decreased) in memory
> allocation.
>
> YARN, via container requests, solves the first problem. Someone (the
> network admin) has to decide on the size of each drill-bit container, but
> YARN handles allocating the space, preventing memory oversubscription, and
> enforcing the limit (by killing processes that exceed their allocation.)
>
> As you pointed out, memory management is different than CPU: we can’t just
> expect Linux to silently give us more or less depending on load. Instead,
> Drill itself has to actively request and release memory (and know what to
> do in each case.)
>
> Item 2 says that Drill must limit its memory use. The JVM enforces heap
> size. (As the heap is exhausted, the a Java program gets slower due to
> increased garbage collection events until finally it receives an
> out-of-memory error.
>
> At present I’m still learning the details of how Drill manages memory so,
> by necessity, most of what follows is at the level of “what we could do”
> rather than “how it works today.” Drill devs, please help fill in the gaps.
>
> The docs. suggest we have a variety of settings that configure drill
> memory (heap size, off-heap size, etc.) I need to ask around more to learn
> if Drill does, in fact, limit its off-heap memory usage. If not, then
> perhaps this is a change we want to make.
>
> Once Drill respects memory limits, we move to item 3: Drill should live
> within the limits. By this I mean that query operations should work with
> constrained memory, perhaps by spilling to disk — it is not sufficient to
> simply fail when memory is exhausted. Again, I don’t yet know where we’re
> at here, but I understand we may still have a bit of work to do to achieve
> this goal.
>
> Item 4 looks at the larger picture. Suppose a Drill-bit has 32GB of memory
> available to it. We do the work needed so that any given query can succeed
> within this limit (perhaps slowly if operations spill to disk.) But, what
> happens when the same Drill-bit now has to process 10 such queries or 100?
> We now have a much harder problem: having the collection of ALL queries

Embedded Hazelcast for a distributed Drill cluster

2016-03-28 Thread Pradeeban Kathiravelu
Hi,
[1] states that Drill uses Hazelcast as an embedded distributed cache to
distribute and store metadata and locality information.

However, when I cloned the git repository and looked into the code, it does
not look like Hazelcast is used, except for some unused variable
definitions and pom definitions.

I also found a resolved bug report on Hazelcast cluster membership [2].

May I know whether Hazelcast is currently used by Drill, and what does
exactly Drill achieve by using it? Relevant pointers to existing
discussions (if this was already discussed) or code location (if this was
indeed implemented in Drill) are also appreciated.

[1]
http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
[2] https://issues.apache.org/jira/browse/DRILL-489

Thank you.
Regards,
Pradeeban.
-- 
Pradeeban Kathiravelu.
PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing,
INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
Portugal.
Biomedical Informatics Software Engineer, Emory University School of
Medicine.

Blog: [Llovizna] http://kkpradeeban.blogspot.com/
LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03


Re: Embedded Hazelcast for a distributed Drill cluster

2016-03-28 Thread Neeraja Rentachintala
Its not currently used as far as I know.
Drill used this at some point, but we removed it due to issues in
multicast/subnet scenarios in Drill distributed mode.

-Neeraja

On Mon, Mar 28, 2016 at 12:49 PM, Pradeeban Kathiravelu <
kk.pradee...@gmail.com> wrote:

> Hi,
> [1] states that Drill uses Hazelcast as an embedded distributed cache to
> distribute and store metadata and locality information.
>
> However, when I cloned the git repository and looked into the code, it does
> not look like Hazelcast is used, except for some unused variable
> definitions and pom definitions.
>
> I also found a resolved bug report on Hazelcast cluster membership [2].
>
> May I know whether Hazelcast is currently used by Drill, and what does
> exactly Drill achieve by using it? Relevant pointers to existing
> discussions (if this was already discussed) or code location (if this was
> indeed implemented in Drill) are also appreciated.
>
> [1]
>
> http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
> [2] https://issues.apache.org/jira/browse/DRILL-489
>
> Thank you.
> Regards,
> Pradeeban.
> --
> Pradeeban Kathiravelu.
> PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing,
> INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
> Portugal.
> Biomedical Informatics Software Engineer, Emory University School of
> Medicine.
>
> Blog: [Llovizna] http://kkpradeeban.blogspot.com/
> LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03
>


Re: Embedded Hazelcast for a distributed Drill cluster

2016-03-28 Thread Pradeeban Kathiravelu
Thanks Neeraja for your quick response.

May I know what replaced Hazelcast in that case?

I mean, how does Drill distributed mode currently offer the multicast and
subnet scenarios, or are these scenarios not valid anymore?

Regards,
Pradeeban.

On Mon, Mar 28, 2016 at 3:52 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> Its not currently used as far as I know.
> Drill used this at some point, but we removed it due to issues in
> multicast/subnet scenarios in Drill distributed mode.
>
> -Neeraja
>
> On Mon, Mar 28, 2016 at 12:49 PM, Pradeeban Kathiravelu <
> kk.pradee...@gmail.com> wrote:
>
> > Hi,
> > [1] states that Drill uses Hazelcast as an embedded distributed cache to
> > distribute and store metadata and locality information.
> >
> > However, when I cloned the git repository and looked into the code, it
> does
> > not look like Hazelcast is used, except for some unused variable
> > definitions and pom definitions.
> >
> > I also found a resolved bug report on Hazelcast cluster membership [2].
> >
> > May I know whether Hazelcast is currently used by Drill, and what does
> > exactly Drill achieve by using it? Relevant pointers to existing
> > discussions (if this was already discussed) or code location (if this was
> > indeed implemented in Drill) are also appreciated.
> >
> > [1]
> >
> >
> http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
> > [2] https://issues.apache.org/jira/browse/DRILL-489
> >
> > Thank you.
> > Regards,
> > Pradeeban.
> > --
> > Pradeeban Kathiravelu.
> > PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing,
> > INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
> > Portugal.
> > Biomedical Informatics Software Engineer, Emory University School of
> > Medicine.
> >
> > Blog: [Llovizna] http://kkpradeeban.blogspot.com/
> > LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03
> >
>



-- 
Pradeeban Kathiravelu.
PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing,
INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
Portugal.
Biomedical Informatics Software Engineer, Emory University School of
Medicine.

Blog: [Llovizna] http://kkpradeeban.blogspot.com/
LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03


[GitHub] drill pull request: DRILL-4544: Improve error messages for REFRESH...

2016-03-28 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/448

DRILL-4544: Improve error messages for REFRESH TABLE METADATA command

1. Added error message when storage plugin or workspace does not exist
2. Updated error message when refresh metadata is not supported
3. Unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-4544

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #448


commit 1d1edee8920563f7b5fadf760f99d96f6c68d432
Author: Arina Ielchiieva 
Date:   2016-03-28T10:55:56Z

DRILL-4544: Improve error messages for REFRESH TABLE METADATA command
1. Added error message when storage plugin or workspace does not exist
2. Updated error message when refresh metadata is not supported
3. Unit tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill Perf Test Framework Repo

2016-03-28 Thread Dechang Gu
Hakim,
Sorry for not responding to you earlier (I just came back from my vacation
in China, unfortunately gmail was blocked there).

As for the cluster sizes and node configurations, I would leave them to the
test suite users to decide.

Thanks,
Dechang

On Wed, Mar 16, 2016 at 10:58 PM, Abdel Hakim Deneche  wrote:

> Just noticed this. Thanks DC, this is a great job you've done here.
>
> Do we have any recommendations about the cluster size and node
> configuration that one should use to test Drill's performance ?
>
> On Thu, Mar 10, 2016 at 1:38 AM, Dechang Gu  wrote:
>
> > Hi All,
> > Drill Perf Test Framework repo is now public:
> > https://github.com/mapr/drill-perf-test-framework
> >
> > The repo contains the test frameworks source, performance test coverage
> for
> > Apache Drill.
> >
> > Please check it out and test it, if interested.  And let me know if you
> > have any suggestions/comments.
> >
> > Thanks,
> > Dechang
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>


[jira] [Created] (DRILL-4546) mvn deploy pushes the same zip artifact twice

2016-03-28 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-4546:
-

 Summary: mvn deploy pushes the same zip artifact twice
 Key: DRILL-4546
 URL: https://issues.apache.org/jira/browse/DRILL-4546
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Laurent Goujon


When using the apache-release profile, both Apache and Drill assembly 
descriptors are used. This caused the zip artifact to be generated twice, and 
pushed twice to the remote repository. Because some repositories are configured 
to not allow the same artifact to be pushed several times, this might cause the 
build to fail.

Ideally, only one zip file should be built and push. Also, the Apache Parent 
pom provides a descriptor to build both the tar and the zip archives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4547) Javadoc fails with Java8

2016-03-28 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-4547:
-

 Summary: Javadoc fails with Java8
 Key: DRILL-4547
 URL: https://issues.apache.org/jira/browse/DRILL-4547
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Affects Versions: 1.6.0
Reporter: Laurent Goujon


Javadoc cannot be generated when using Java8 (likely because the parser is now 
more strict).

Here's an example of issues when trying to generate javadocs in module 
{{drill-fmpp-maven-plugin}}

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:2.9.1:jar (attach-javadocs) on 
project drill-fmpp-maven-plugin: MavenReportException: Error while creating 
archive:
[ERROR] Exit code: 1 - 
/Users/laurent/devel/drill/tools/fmpp/src/main/java/org/apache/drill/fmpp/mojo/FMPPMojo.java:44:
 error: unknown tag: goal
[ERROR] * @goal generate
[ERROR] ^
[ERROR] 
/Users/laurent/devel/drill/tools/fmpp/src/main/java/org/apache/drill/fmpp/mojo/FMPPMojo.java:45:
 error: unknown tag: phase
[ERROR] * @phase generate-sources
[ERROR] ^
[ERROR] 
/Users/laurent/devel/drill/tools/fmpp/target/generated-sources/plugin/org/apache/drill/fmpp/mojo/HelpMojo.java:25:
 error: unknown tag: goal
[ERROR] * @goal help
[ERROR] ^
[ERROR] 
/Users/laurent/devel/drill/tools/fmpp/target/generated-sources/plugin/org/apache/drill/fmpp/mojo/HelpMojo.java:26:
 error: unknown tag: requiresProject
[ERROR] * @requiresProject false
[ERROR] ^
[ERROR] 
/Users/laurent/devel/drill/tools/fmpp/target/generated-sources/plugin/org/apache/drill/fmpp/mojo/HelpMojo.java:27:
 error: unknown tag: threadSafe
[ERROR] * @threadSafe
[ERROR] ^
[ERROR] 
[ERROR] Command line was: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_72.jdk/Contents/Home/bin/javadoc 
@options @packages
[ERROR] 
[ERROR] Refer to the generated Javadoc files in 
'/Users/laurent/devel/drill/tools/fmpp/target/apidocs' dir.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :drill-fmpp-maven-plugin
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4199: Add Support for HBase 1.X

2016-03-28 Thread adityakishore
Github user adityakishore commented on the pull request:

https://github.com/apache/drill/pull/443#issuecomment-202576211
  
I have verified that, with these changes, Drill queries run fine with HBase 
0.98 cluster too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4548) Drill can not select from multiple Avro files

2016-03-28 Thread JIRA
Stefán Baxter created DRILL-4548:


 Summary: Drill can not select from multiple Avro files
 Key: DRILL-4548
 URL: https://issues.apache.org/jira/browse/DRILL-4548
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Avro
Affects Versions: 1.6.0, 1.7.0
Reporter: Stefán Baxter


Hi,

I have reworked/refactored our Avro based logging system trying to make the 
whole Drill + Avro->Parquet experience a bit more agreeable.

Long story short I'm getting this error when selecting form multiple Avro files 
even though these files share the EXCACT same schema:

Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
changes
Fragment 0:0
[Error Id: 00d49aa2-5564-497e-a330-e852d5889beb on swift:31010] (state=,code=0)

We are using union types but only to allow for null values as seems to be 
supported by drill as per this comment in the Drill code: 
// currently supporting only nullable union (optional fields) like ["null", 
"some-type"].

This happens for a very simple group_by + count(*) query that only uses two 
fields in Avro and neither one of them uses a Union construct so and both of 
them contain string values in every case.

I now think this has nothing to do with the union types since the query uses 
only simple string, unless there is a full schema validation done on the 
content of the files rather then the identical Avro schema embedded in both 
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4546: Only generate one zip archive when...

2016-03-28 Thread laurentgo
GitHub user laurentgo opened a pull request:

https://github.com/apache/drill/pull/449

DRILL-4546: Only generate one zip archive when using apache-release profile

Drill root pom doesn't override completely Apache parent pom configuration
regarding assemblies, which caused a zip archive of the project to be 
generated
twice, and deployed to a remote server twice too.

The fix updated the Apache parent pom version, and uses the plugin 
properties
to override the configuration. Also remove Drill source assembly descriptor 
as
the Apache parent project provides the same one.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/laurentgo/drill 
laurent/fix-assembly-descriptor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/449.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #449


commit 6353b057361f40f2ec56942e77321cd84147f340
Author: Laurent Goujon 
Date:   2016-03-28T20:31:53Z

DRILL-4546: Only generate one zip archive when using apache-release profile

Drill root pom doesn't override completely Apache parent pom configuration
regarding assemblies, which caused a zip archive of the project to be 
generated
twice, and deployed to a remote server twice too.

The fix updated the Apache parent pom version, and uses the plugin 
properties
to override the configuration. Also remove Drill source assembly descriptor 
as
the Apache parent project provides the same one.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4467: Fix ProjectPushInfo#desiredFields ...

2016-03-28 Thread laurentgo
Github user laurentgo commented on the pull request:

https://github.com/apache/drill/pull/404#issuecomment-202577322
  
was fixed by @jacques-n in commit edea8b1cf4e5476d803e8b87c79e08e8c3263e04. 
I'm closing this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4467: Fix ProjectPushInfo#desiredFields ...

2016-03-28 Thread laurentgo
Github user laurentgo closed the pull request at:

https://github.com/apache/drill/pull/404


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [jira] [Created] (DRILL-4548) Drill can not select from multiple Avro files

2016-03-28 Thread Stefán Baxter
Ok, this has nothing to do with multiple files either.

Selecting from a single file produces a schema change. (Who would have
thunk it)

0: jdbc:drill:zk=local> select type, count(*) from
dfs.asa.`/streaming/venuepoint/events/2016-03-28` as s group by type;
Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
changes
Fragment 0:0
[Error Id: ad4aa637-e0c5-46fe-b074-814005bf8024 on swift:31010]
(state=,code=0)



On Mon, Mar 28, 2016 at 8:51 PM, Stefán Baxter (JIRA) 
wrote:

> Stefán Baxter created DRILL-4548:
> 
>
>  Summary: Drill can not select from multiple Avro files
>  Key: DRILL-4548
>  URL: https://issues.apache.org/jira/browse/DRILL-4548
>  Project: Apache Drill
>   Issue Type: Bug
>   Components: Storage - Avro
> Affects Versions: 1.6.0, 1.7.0
> Reporter: Stefán Baxter
>
>
> Hi,
>
> I have reworked/refactored our Avro based logging system trying to make
> the whole Drill + Avro->Parquet experience a bit more agreeable.
>
> Long story short I'm getting this error when selecting form multiple Avro
> files even though these files share the EXCACT same schema:
>
> Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema
> changes
> Fragment 0:0
> [Error Id: 00d49aa2-5564-497e-a330-e852d5889beb on swift:31010]
> (state=,code=0)
>
> We are using union types but only to allow for null values as seems to be
> supported by drill as per this comment in the Drill code:
> // currently supporting only nullable union (optional fields) like
> ["null", "some-type"].
>
> This happens for a very simple group_by + count(*) query that only uses
> two fields in Avro and neither one of them uses a Union construct so and
> both of them contain string values in every case.
>
> I now think this has nothing to do with the union types since the query
> uses only simple string, unless there is a full schema validation done on
> the content of the files rather then the identical Avro schema embedded in
> both files.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


[jira] [Created] (DRILL-4549) Add support for more units in date_trunc function

2016-03-28 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-4549:
--

 Summary: Add support for more units in date_trunc function
 Key: DRILL-4549
 URL: https://issues.apache.org/jira/browse/DRILL-4549
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.6.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.7.0


Currently we support only {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} truncate 
units for types {{TIME, TIMESTAMP and DATE}}. Extend the functions to support 
{{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEK, QUARTER, DECADE, CENTURY, 
MILLENNIUM}} truncate units for types {{TIME, TIMESTAMP, DATE, INTERVAL DAY, 
INTERVAL YEAR}}.

Also get rid of the if-and-else (on truncation unit) implementation. Instead 
resolve to a direct function based on the truncation unit in Calcite -> Drill 
(DrillOptiq) expression conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4549: Add support for more truncation un...

2016-03-28 Thread vkorukanti
GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/450

DRILL-4549: Add support for more truncation units in date_trunc function



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4549

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/450.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #450


commit 89fc8462497e162772b56d6d45bca0074fbe3e43
Author: vkorukanti 
Date:   2016-03-28T18:09:34Z

DRILL-4549: Add support for more truncation units in date_trunc function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4549: Add support for more truncation un...

2016-03-28 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/450#issuecomment-202632855
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4550) Add support more time units in extract function

2016-03-28 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-4550:
--

 Summary: Add support more time units in extract function
 Key: DRILL-4550
 URL: https://issues.apache.org/jira/browse/DRILL-4550
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 1.6.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.7.0


Currently {{extract}} function support following units {{YEAR, MONTH, DAY, 
HOUR, MINUTE, SECOND}}. Add support for more units: {{CENTURY, DECADE, DOW, 
DOY, EPOCH, MILLENNIUM, QUARTER, WEEK}}.

We also need changes in the SQL parser. Currently the parser only allows 
{{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} as units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4549: Add support for more truncation un...

2016-03-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/450


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-4549) Add support for more truncation units in date_trunc function

2016-03-28 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4549.

Resolution: Fixed

> Add support for more truncation units in date_trunc function
> 
>
> Key: DRILL-4549
> URL: https://issues.apache.org/jira/browse/DRILL-4549
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.7.0
>
>
> Currently we support only {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} truncate 
> units for types {{TIME, TIMESTAMP and DATE}}. Extend the functions to support 
> {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEK, QUARTER, DECADE, CENTURY, 
> MILLENNIUM}} truncate units for types {{TIME, TIMESTAMP, DATE, INTERVAL DAY, 
> INTERVAL YEAR}}.
> Also get rid of the if-and-else (on truncation unit) implementation. Instead 
> resolve to a direct function based on the truncation unit in Calcite -> Drill 
> (DrillOptiq) expression conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Embedded Hazelcast for a distributed Drill cluster

2016-03-28 Thread Steven Phillips
We actually removed the concept of a distributed cache from drill
altogether. So currently nothing is replacing HazelCast.

The distributed cache was used for storing the initialization-data for
intermediate fragments. Only leaf node fragments were sent via the RPC
layer. But the distributed cache added a lot of complexity, and didn't
provide very much benefit. We decided to get rid of it and simply send all
PlanFragments via the rpc layer.

On Mon, Mar 28, 2016 at 12:56 PM, Pradeeban Kathiravelu <
kk.pradee...@gmail.com> wrote:

> Thanks Neeraja for your quick response.
>
> May I know what replaced Hazelcast in that case?
>
> I mean, how does Drill distributed mode currently offer the multicast and
> subnet scenarios, or are these scenarios not valid anymore?
>
> Regards,
> Pradeeban.
>
> On Mon, Mar 28, 2016 at 3:52 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
> > Its not currently used as far as I know.
> > Drill used this at some point, but we removed it due to issues in
> > multicast/subnet scenarios in Drill distributed mode.
> >
> > -Neeraja
> >
> > On Mon, Mar 28, 2016 at 12:49 PM, Pradeeban Kathiravelu <
> > kk.pradee...@gmail.com> wrote:
> >
> > > Hi,
> > > [1] states that Drill uses Hazelcast as an embedded distributed cache
> to
> > > distribute and store metadata and locality information.
> > >
> > > However, when I cloned the git repository and looked into the code, it
> > does
> > > not look like Hazelcast is used, except for some unused variable
> > > definitions and pom definitions.
> > >
> > > I also found a resolved bug report on Hazelcast cluster membership [2].
> > >
> > > May I know whether Hazelcast is currently used by Drill, and what does
> > > exactly Drill achieve by using it? Relevant pointers to existing
> > > discussions (if this was already discussed) or code location (if this
> was
> > > indeed implemented in Drill) are also appreciated.
> > >
> > > [1]
> > >
> > >
> >
> http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
> > > [2] https://issues.apache.org/jira/browse/DRILL-489
> > >
> > > Thank you.
> > > Regards,
> > > Pradeeban.
> > > --
> > > Pradeeban Kathiravelu.
> > > PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed
> Computing,
> > > INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
> > > Portugal.
> > > Biomedical Informatics Software Engineer, Emory University School of
> > > Medicine.
> > >
> > > Blog: [Llovizna] http://kkpradeeban.blogspot.com/
> > > LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03
> > >
> >
>
>
>
> --
> Pradeeban Kathiravelu.
> PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing,
> INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
> Portugal.
> Biomedical Informatics Software Engineer, Emory University School of
> Medicine.
>
> Blog: [Llovizna] http://kkpradeeban.blogspot.com/
> LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03
>


Re: Failure Behavior

2016-03-28 Thread Steven Phillips
If a fragment has already begun execution and sent some data to downstream
fragments, there is no way to simply restart the failed fragment, because
we would also have to restart any downstream fragments that consumed that
output, and so on up the tree, as well as restart any leaf fragments that
fed into any of those fragments. This is because we don't store
intermediate results to disk.

The case where I think it would even be possible would be if a node died
before sending any data downstream. But I think the only way to be sure of
this would be to poll all of the downstream fragments and verify that no
data from the failed fragment was ever received. I think this would add a
lot of complication and overhead to Drill.

On Sat, Mar 26, 2016 at 10:03 AM, John Omernik  wrote:

> Thanks for the responses.. So, even if the drillbit that died wasn't the
> foreman the query would fail? Interesting... Is there any mechanism for
> reassigning fragments? *try harder* so to speak?  I guess does this play
> out too if I have a query and say something on that node caused a fragment
> to fail, that it could be tried somewhere else... So I am not trying to
> recreate map reduce in Drill (although I am sorta asking about similar
> features), but in a distributed environment, what is the cost to allow the
> foremen to time out a fragment and try again elsewhere. Say there was a
> heart beat sent back from the bits running a fragment, and if the heartbeat
> and lack of results exceeded 10 seconds, have the foremen try again
> somewhere else (up to X times configured by a setting).  I am just curious
> here for my own knowledge what makes that hard in a system like Drill.
>
> On Sat, Mar 26, 2016 at 10:47 AM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > wrote:
>
> > the query could succeed is if all fragments that were running on the
> > now-dead node already finished. Other than that, the query fails.
> >
> > On Sat, Mar 26, 2016 at 4:45 PM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> > > As far as I know, there is no failure handling in Drill. The query
> dies.
> > >
> > > On Sat, Mar 26, 2016 at 7:52 AM, John Omernik 
> wrote:
> > >
> > > > With distributed Drill, what is the expected/desired bit failure
> > > behavior.
> > > > I.e. if you are running, and certain fragments end up on a node with
> a
> > > bit
> > > > in a flaky state (or a bit that suddenly dies).  What is the desired
> > and
> > > > actual behavior of the query? I am guessing that if the bit was
> > foreman,
> > > > the query dies, I guess that's unavoidable, but if it's just a
> worker,
> > > does
> > > > the foreman detect this and reschedule the fragment or does the query
> > die
> > > > any way?
> > > >
> > > > John
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>


[jira] [Resolved] (DRILL-4545) Incorrect query plan for LIMIT 0 query

2016-03-28 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz resolved DRILL-4545.
---
Resolution: Invalid

> Incorrect query plan for LIMIT 0 query
> --
>
> Key: DRILL-4545
> URL: https://issues.apache.org/jira/browse/DRILL-4545
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>  Labels: limit0
>
> Inner query has a LIMIT 1 and outer query has LIMIT 0. Looking at the query 
> plan it looks like the outer LIMIT 0 is applied before the LIMIT 1 is applied 
> to inner query. This does not seem right.
> Drill 1.6.0 commit ID : fb09973e
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select * from 
> `employee.json` limit 1) limit 0;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[0])
> 00-04Limit(fetch=[1])
> 00-05  Limit(offset=[0], fetch=[0])
> 00-06Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/employee.json, numFiles=1, columns=[`*`], 
> files=[maprfs:///tmp/employee.json]]])
> {noformat}
> Here is the data from JSON file
> {noformat}
> [root@centos-01 ~]# cat employee.json
> {
>   "firstName": "John",
>   "lastName": "Smith",
>   "isAlive": true,
>   "age": 45,
>   "height_cm": 177.6,
>   "address": {
> "streetAddress": "29 4th Street",
> "city": "New York",
> "state": "NY",
> "postalCode": "10021-3100"
>   },
>   "phoneNumbers": [
> {
>   "type": "home",
>   "number": "212 555-1234"
> },
> {
>   "type": "office",
>   "number": "646 555-4567"
> }
>   ],
>   "children": [],
>   "hobbies": ["scuba diving","hiking","biking","rock climbing","surfing"]
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-3317: when ProtobufLengthDecoder couldn'...

2016-03-28 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/446#discussion_r57676038
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java ---
@@ -82,15 +79,7 @@ protected void decode(ChannelHandlerContext ctx, ByteBuf 
in, List out) t
 } else {
   // need to make buffer copy, otherwise netty will try to refill 
this buffer if we move the readerIndex forward...
   // TODO: Can we avoid this copy?
-  ByteBuf outBuf;
-  try {
-outBuf = allocator.buffer(length);
-  } catch (OutOfMemoryException e) {
-logger.warn("Failure allocating buffer on incoming stream due 
to memory limits.  Current Allocation: {}.", allocator.getAllocatedMemory());
-in.resetReaderIndex();
-outOfMemoryHandler.handle();
-return;
-  }
+  ByteBuf outBuf = allocator.buffer(length);
--- End diff --

@jacques-n can you confirm it's indeed the case ? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---