date:20150731

Re: [DISCUSS] Drop table support

2015-07-31 Thread Ted Dunning

On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> Also will there any mechanism to recover once you accidentally drop?
>

yes.  Snapshots .

Seriously, recovery of data due to user error is a platform thing.  How can
we recover from turning off the cluster?  From removing a disk on an Oracle
node?

I don't think that this is Drill's business.

[jira] [Created] (DRILL-3584) Drill Kerberos HDFS Support + Documentation

2015-07-31 Thread Hari Sekhon (JIRA)

Hari Sekhon created DRILL-3584:
--

 Summary: Drill Kerberos HDFS Support + Documentation
 Key: DRILL-3584
 URL: https://issues.apache.org/jira/browse/DRILL-3584
 Project: Apache Drill
  Issue Type: New Feature
Affects Versions: 1.1.0
Reporter: Hari Sekhon
Priority: Blocker


I'm trying to find Drill docs for Kerberos support for secure HDFS clusters and 
it does appear to be well tested / supported / documented yet.

This product is Dead-on-Arrival if it doesn't integrate well with secure Hadoop 
clusters, specifically HDFS + Kerberos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3585) Apache Solr as a storage plugin

2015-07-31 Thread Sudip Mukherjee (JIRA)

Sudip Mukherjee created DRILL-3585:
--

 Summary: Apache Solr as a storage plugin
 Key: DRILL-3585
 URL: https://issues.apache.org/jira/browse/DRILL-3585
 Project: Apache Drill
  Issue Type: New Feature
  Components: Client - HTTP
Reporter: Sudip Mukherjee
Assignee: Jason Altekruse


A new storage plugin supporting Apache solr search engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3586) Sum of values not correctly calculated

2015-07-31 Thread Uwe Geercken (JIRA)

Uwe Geercken created DRILL-3586:
---

 Summary: Sum of values not correctly calculated
 Key: DRILL-3586
 URL: https://issues.apache.org/jira/browse/DRILL-3586
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.1.0
Reporter: Uwe Geercken
Assignee: Hanifi Gunes


I have a column "value" containing numeric values only - but the column was 
defined as String when creating the parquet files using the CTAS statement.

I have 163 records being returned from the query. When I run a sum on the 
values like this - cast to double:

select count(1),sum(cast(`value` as double)) as total from 
dfs.datatransfer. where key_account_id='x'

then the result is: 213.420002

when I cast to float instead:

select count(1),sum(cast(`value` as float)) as total from 
dfs.datatransfer. where key_account_id='x'

then the result is: 213.420166893

summing up the values in the original system (MySQL) then the real and correct 
sum is: 213.42.

I checked the data manually and the sum from the origin system is correct.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Request - hold off on merging to master for 48 hours

2015-07-31 Thread Jacques Nadeau

That sounds frustrating.

I agree that we need to get this merged.  The old allocator is sloppy about
accounting at best.  Lets work together on trying to come up with a
solution. Can you point us at the current branch so other people can
provide some brainstorming?

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Jul 30, 2015 at 4:00 PM, Chris Westin 
wrote:

> Short version: I'll call it quits on the merge moratorium for now. Thank
> you to everyone for participating. Merge away.
>
> In the precommit suite, one query fails with an illegal reference counting
> exception from the external sort, and Steven has found that for me. This is
> the closest I've ever gotten. On future attempts to commit after rebasing,
> I'm going to be counting on other file owners a lot more to get through
> that quickly, rather than trying to find all the newly introduced problems
> myself.
>
> Long version: when I run the performance suite, the results with the
> non-locking version of the allocator are terrible. Worse than the locking
> implementation of the allocator (I still have both on separate branches).
> When we ran this on the locking implementation, there was roughly a 20%
> performance degradation, and consensus was that this was too much to accept
> the change. The locking implementation uses a single lock for all
> allocators. (Yes, I know that sounds heavy-handed, but it wasn't the first
> choice. There was a prior implementation that used a lock per allocator,
> but that one got deadlocks all the time because it couldn't ensure
> consistent lock acquisition orders when allocators went to their parents to
> get more space, combined with allocators locking each other to transfer or
> share buffer ownership.)
>
> I thought I'd solve this with a non-locking implementation. In this
> version, the variables that are used to track the state of an allocator re
> its available space, and how it is used, are kept in a small inner class;
> the allocator has an AtomicReference to that. A space allocation consists
> of getting that reference, making a clone of it, and then making all the
> necessary changes to the clone. To commit the space transaction, I try to
> swap it in with AtomicReference.compareAndSet(). If that fails, the
> transaction is retried. I expected that there would be no failures with
> leaf allocators, because they're only used by the thread doing the
> fragment's work. Only the root should have seen contention. But the
> performance cluster test showed the performance for this implementation to
> be five times worse than the current master (yes 5x, not just 20% worse
> like the locking implementation). I've done some quick sanity checks today,
> but don't see anything obviously silly. I will investigate a little further
> -- I've already come up with a couple of potential issues, but I need to do
> a couple experiments with it over the next few hours (and which wouldn't
> leave enough time to do the merge by the 48 hour deadline).
>
> If I can't over come those issues, then I will at least go for obtaining
> the root allocator from a factory, and set things up so that the current
> and new allocator can co-exist, because the new one definitely catches a
> lot more problems -- we should be running tests with it on. Hopefully I can
> overcome the issues, shortly, because I think the accounting is much better
> (that's why it catches more problems), and we need that in order to find
> our ongoing slow memory leak.
>
> On Wed, Jul 29, 2015 at 4:00 PM, Jacques Nadeau 
> wrote:
>
> > Makes sense.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Wed, Jul 29, 2015 at 3:32 PM, Chris Westin 
> > wrote:
> >
> > > Ordinarily, I would agree. However, in this particular case, some other
> > > folks wanted me closer to master so they could use my branch to track
> > down
> > > problems in new code. Also, the problems I was seeing were in code I'm
> > not
> > > familiar with, but there had been several recent commits claiming to
> fix
> > > memory issues there. So I wanted to see if the problems I was seeing
> had
> > > been taken care of. Sure enough, my initial testing shows that the
> > problems
> > > I was trying to fix had already been fixed by others -- they went away
> > > after I rebased. In this case, chasing master saved me from having to
> > track
> > > all of those down myself, and duplicating the work. I'm hoping that
> there
> > > weren't any significant new ones introduced. Testing is proceeding.
> > >
> > > On Wed, Jul 29, 2015 at 1:59 PM, Parth Chandra 
> > wrote:
> > >
> > > > I think the idea (some of the idea is mine I'm afraid) is to allow
> > Chris
> > > to
> > > > catch up and rebase, not to have it reviewed and merged in two days.
> > > > At the moment the problem is that every time he rebases, some test
> > breaks
> > > > and while he's chasing that down, the master moves ahead.
> > > > If we get this 2 day break, we can get close enough to master and
> share
> > > t

Re: Request - hold off on merging to master for 48 hours

2015-07-31 Thread Hanifi Gunes

+1 to Jacques, it'd be nice if we had the core changes easily re/viewable.
Also, would it make sense at this point to split the change set into
smaller patches as there seems more work to do now?


H+

On Fri, Jul 31, 2015 at 7:06 PM, Jacques Nadeau  wrote:

> That sounds frustrating.
>
> I agree that we need to get this merged.  The old allocator is sloppy about
> accounting at best.  Lets work together on trying to come up with a
> solution. Can you point us at the current branch so other people can
> provide some brainstorming?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Thu, Jul 30, 2015 at 4:00 PM, Chris Westin 
> wrote:
>
> > Short version: I'll call it quits on the merge moratorium for now. Thank
> > you to everyone for participating. Merge away.
> >
> > In the precommit suite, one query fails with an illegal reference
> counting
> > exception from the external sort, and Steven has found that for me. This
> is
> > the closest I've ever gotten. On future attempts to commit after
> rebasing,
> > I'm going to be counting on other file owners a lot more to get through
> > that quickly, rather than trying to find all the newly introduced
> problems
> > myself.
> >
> > Long version: when I run the performance suite, the results with the
> > non-locking version of the allocator are terrible. Worse than the locking
> > implementation of the allocator (I still have both on separate branches).
> > When we ran this on the locking implementation, there was roughly a 20%
> > performance degradation, and consensus was that this was too much to
> accept
> > the change. The locking implementation uses a single lock for all
> > allocators. (Yes, I know that sounds heavy-handed, but it wasn't the
> first
> > choice. There was a prior implementation that used a lock per allocator,
> > but that one got deadlocks all the time because it couldn't ensure
> > consistent lock acquisition orders when allocators went to their parents
> to
> > get more space, combined with allocators locking each other to transfer
> or
> > share buffer ownership.)
> >
> > I thought I'd solve this with a non-locking implementation. In this
> > version, the variables that are used to track the state of an allocator
> re
> > its available space, and how it is used, are kept in a small inner class;
> > the allocator has an AtomicReference to that. A space allocation consists
> > of getting that reference, making a clone of it, and then making all the
> > necessary changes to the clone. To commit the space transaction, I try to
> > swap it in with AtomicReference.compareAndSet(). If that fails, the
> > transaction is retried. I expected that there would be no failures with
> > leaf allocators, because they're only used by the thread doing the
> > fragment's work. Only the root should have seen contention. But the
> > performance cluster test showed the performance for this implementation
> to
> > be five times worse than the current master (yes 5x, not just 20% worse
> > like the locking implementation). I've done some quick sanity checks
> today,
> > but don't see anything obviously silly. I will investigate a little
> further
> > -- I've already come up with a couple of potential issues, but I need to
> do
> > a couple experiments with it over the next few hours (and which wouldn't
> > leave enough time to do the merge by the 48 hour deadline).
> >
> > If I can't over come those issues, then I will at least go for obtaining
> > the root allocator from a factory, and set things up so that the current
> > and new allocator can co-exist, because the new one definitely catches a
> > lot more problems -- we should be running tests with it on. Hopefully I
> can
> > overcome the issues, shortly, because I think the accounting is much
> better
> > (that's why it catches more problems), and we need that in order to find
> > our ongoing slow memory leak.
> >
> > On Wed, Jul 29, 2015 at 4:00 PM, Jacques Nadeau 
> > wrote:
> >
> > > Makes sense.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Wed, Jul 29, 2015 at 3:32 PM, Chris Westin  >
> > > wrote:
> > >
> > > > Ordinarily, I would agree. However, in this particular case, some
> other
> > > > folks wanted me closer to master so they could use my branch to track
> > > down
> > > > problems in new code. Also, the problems I was seeing were in code
> I'm
> > > not
> > > > familiar with, but there had been several recent commits claiming to
> > fix
> > > > memory issues there. So I wanted to see if the problems I was seeing
> > had
> > > > been taken care of. Sure enough, my initial testing shows that the
> > > problems
> > > > I was trying to fix had already been fixed by others -- they went
> away
> > > > after I rebased. In this case, chasing master saved me from having to
> > > track
> > > > all of those down myself, and duplicating the work. I'm hoping that
> > there
> > > > weren't any significant new ones introduced. Testing is proceeding.
> > > >
> >

[jira] [Created] (DRILL-3587) Select hive's struct data gives IndexOutOfBoundsException instead of unsupported error

2015-07-31 Thread Krystal (JIRA)

Krystal created DRILL-3587:
--

 Summary: Select hive's struct data gives IndexOutOfBoundsException 
instead of unsupported error
 Key: DRILL-3587
 URL: https://issues.apache.org/jira/browse/DRILL-3587
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.2.0
Reporter: Krystal
Assignee: Venki Korukanti


I have a hive table that has a STRUCT data column.
hive> select c15 from alltypes;
OK
NULL
{"r":null,"s":null}
{"r":1,"s":{"a":2,"b":"x"}}

>From drill:
select c15 from alltypes;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index (1) must be less than 
size (1)

Since drill currently does not support hive struct data type, drill should 
display user friendly error that hive struct data type is not supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

2015-07-31 Thread hnfgns

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/81#discussion_r35994287
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/NullableValueVectors.java ---
@@ -239,37 +183,30 @@ public void allocateNew(int valueCount) {
 accessor.reset();
   }
 
-  /**
-   * {@inheritDoc}
-   */
+  @Override
   public void zeroVector() {
-this.values.zeroVector();
-this.bits.zeroVector();
+bits.zeroVector();
+values.zeroVector();
   }
+  
 
-  @Override
-  public int load(int valueCount, DrillBuf buf){
-clear();
-int loaded = bits.load(valueCount, buf);
-
-// remove bits part of buffer.
-buf = buf.slice(loaded, buf.capacity() - loaded);
-loaded += values.load(valueCount, buf);
-return loaded;
-  }
 
   @Override
   public void load(SerializedField metadata, DrillBuf buffer) {
-assert this.field.matches(metadata);
-int loaded = load(metadata.getValueCount(), buffer);
-assert metadata.getBufferLength() == loaded;
-  }
+clear();
+final SerializedField bitsField = metadata.getChild(0);
--- End diff --

That would be a really nice addition to VV documentation. Thanks for 
pointing me this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-3588) Write back to Hive Metastore

2015-07-31 Thread Joseph Barefoot (JIRA)

Joseph Barefoot created DRILL-3588:
--

 Summary: Write back to Hive Metastore
 Key: DRILL-3588
 URL: https://issues.apache.org/jira/browse/DRILL-3588
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Joseph Barefoot
Priority: Critical


This feature is particularly important to us here at AtScale in order to 
leverage Drill as a query engine option for our BI on Hadoop solution. 
Currently you can connect to and query databases/tables from Hive Metastore 
fine. However if you create a table, it will be created in HDFS but no metadata 
is written to the Hive Metastore. That means those tables won't be easily 
visible to any other tool. 

When you read schemas from a Hive datasource via Drill, they are prefixed with 
"hive.". This namespacing makes sense to us considering how Drill works, and 
ideally it would work symmetrically when you create tables with the same 
prefix, i.e. Drill would map the prefix to the target data source, in this case 
Hive, and write the schema information back to the Hive MetaStore. Our specific 
use case is Create Table As Select, however ideally any DDL statements against 
a hive datasource schema/table would write back to the Hive Metastore. 

The reason it's important to have the metadata in Hive Metastore is we have 
found many of our customers use multiple SQL tools to access data tracked in 
the Metastore. For example, even if Impala is their primary SQL on Hadoop 
engine for clients/tools, they may run Spark jobs to manipulate data via RDDs 
that pull data by referencing the Metastore. Organizations using a lot of SQL 
on Hadoop have come to expect this sort of interoperability between Hive, 
Spark, and Impala, and supporting it within Drill will help drive adoption 
within the Hadoop community (besides making it a lot easier for us to use Drill 
effectively from within our BI engine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] drill pull request:

2015-07-31 Thread jaltekruse

Github user jaltekruse commented on the pull request:


https://github.com/apache/drill/commit/4c2b698fe5c059c7987c808e7e90f55659b74ba5#commitcomment-12471637
  
In common/src/test/java/org/apache/drill/test/DrillTest.java:
In common/src/test/java/org/apache/drill/test/DrillTest.java on line 63:
I don't understand why this is here, isn't the default behavior to have no 
expected exception on tests? Can you make a comment about how this changes the 
behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-3589) JDBC driver maven artifact includes a lot of unnecessary dependencies

2015-07-31 Thread Joseph Barefoot (JIRA)

Joseph Barefoot created DRILL-3589:
--

 Summary: JDBC driver maven artifact includes a lot of unnecessary 
dependencies
 Key: DRILL-3589
 URL: https://issues.apache.org/jira/browse/DRILL-3589
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - JDBC
Reporter: Joseph Barefoot
Assignee: Daniel Barclay (Drill)
Priority: Minor


The Drill JDBC POM file pulls in so many unused transitive dependencies that it 
takes quite a while to exclude all the unnecessary ones when using it from 
within a Java project.  This is similar to DRILL-3581 in that you can work 
around it via exclusions of transitive dependencies, but since it makes 
interoperability with other open-source projects problematic, this will keep 
coming up for anyone using the JDBC driver from within any serious java app.

Considering the pom:
http://repo1.maven.org/maven2/org/apache/drill/exec/drill-jdbc/1.1.0/drill-jdbc-1.1.0.pom

...it seems that most of the unused dependencies are transitive from 
drill-common and perhaps also drill-java-exec.  Here's an example of some 
dependencies that the JDBC driver shouldn't need (and we excluded in our 
project):

parquet-*
jetty-server
javassist
commons-daemon
hibernate-validator
xalan
xercesImpl

For the record we are now able to use the JDBC driver fine from within our 
project, but it did take some dependency tree analysis (and a little 
trial-and-error) to figure out what to exclude.  We would like to save future 
developers that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3590) storage plugin config API doesn't log exceptions or yield any useful error

2015-07-31 Thread Joseph Barefoot (JIRA)

Joseph Barefoot created DRILL-3590:
--

 Summary: storage plugin config API doesn't log exceptions or yield 
any useful error 
 Key: DRILL-3590
 URL: https://issues.apache.org/jira/browse/DRILL-3590
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
 Environment: Linux
Reporter: Joseph Barefoot
Assignee: Jacques Nadeau


Not sure if I have the component right here.  This is regarding the REST API 
for configuring a storage plugin.  It's not specific to any particular plugin, 
rather how you configure any of them.  The REST API is critical for automating 
setup of Drill within a cluster, as such being able to debug it is pretty 
critical as well.

The problem lies with how the endpoints handle exceptions as you can see here:
https://github.com/apache/drill/blob/9e164662f5296f7048c880c40bc551030fb58cca/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StorageResources.java

Every method/endpoint in that class eats the exception, there's really no way 
to figure out what's going wrong without connecting a debugger (besides just 
guessing).  The generic error return messages in the REST response aren't so 
bad, but not logging the exception on the server makes life tough for devs like 
me.  Ideally all of the endpoints (not just the one for storage config that 
tripped me up) would log the exceptions and maybe even return them in the error 
response as well.

In my particular case, I was configuring Hive and simply had the Metastore port 
wrong (typo).  I imagine that would have been pretty obvious from a nested 
exception stack trace, if I had been able to see it in the log.  Instead I 
couldn't tell if there was some internal Drill error or what.  Here's the curl 
call I was using, just for reference:
{code}
curl -X POST -H "Content-Type: application/json" -d '
{"name":"hive",
 "config":
{
  "type": "hive",
  "enabled": true,
  "configProps": {
"hive.metastore.uris": "thrift://192.168.99.9:9083",  
"hive.metastore.sasl.enabled": "false"
  }
}   
}' http://192.168.99.9:8047/storage/hive.json
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3591) Partition pruning hit AssertionError when a cast function is used in comparison operand

2015-07-31 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-3591:
-

 Summary: Partition pruning hit AssertionError when a cast function 
is used in comparison operand
 Key: DRILL-3591
 URL: https://issues.apache.org/jira/browse/DRILL-3591
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
Reporter: Jinfeng Ni
Assignee: Aman Sinha


In Drill unit test  TestPartitionFilter.testPartitionFilter6_Parquet(), the 
query is :

{code}
select * from dfs_test.tmp.parquet where (yr=1995 and o_totalprice < 4) or 
yr=1996
{code}

If I slightly modify the filter, by adding a cast function: {
{code}
select * from dfs_test.`%s/multilevel/parquet` where (dir0=cast(1995 as 
varchar(10)) and o_totalprice < 4) or dir0=1996
{code}

It will hit Assertion Error, when PruneScanRule calls interpreter to evaluate 
the filter condition.

{code}
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError
...
Caused by: java.lang.AssertionError
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.isBitOn(InterpreterEvaluator.java:490)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitBooleanAnd(InterpreterEvaluator.java:434)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitBooleanOperator(InterpreterEvaluator.java:332)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitBooleanOperator(InterpreterEvaluator.java:147)
 ~[classes/:na]
at 
org.apache.drill.common.expression.BooleanOperator.accept(BooleanOperator.java:36)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitBooleanOr(InterpreterEvaluator.java:463)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitBooleanOperator(InterpreterEvaluator.java:334)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitBooleanOperator(InterpreterEvaluator.java:147)
 ~[classes/:na]
at 
org.apache.drill.common.expression.BooleanOperator.accept(BooleanOperator.java:36)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluate(InterpreterEvaluator.java:80)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:420)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:156)
 ~[classes/:na]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
 ~[calcite-core-1.1.0-drill-r15.jar:1.1.0-drill-r15]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:795)
 ~[calcite-core-1.1.0-drill-r15.jar:1.1.0-drill-r15]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
~[calcite-core-1.1.0-drill-r15.jar:1.1.0-drill-r15]
at 
org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:316) 
~[calcite-core-1.1.0-drill-r15.jar:1.1.0-drill-r15]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:528)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
 ~[classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
[classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
[classes/:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
... 13 more
{code}

I debug a bit, seems in PartitionPruneRule, the condition is rewrote to the 
following, which seems not right, since CAST function becomes one operand of 
"AND".

{code}
OR(AND(CAST(1995):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary" NOT NULL, =($1, CAST(1995):VARCHAR(10) CHARACTER SET 
"ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary" NOT NULL)), =($1, 1996))
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 36630: DRILL-3503: Make PruneScanRule pluggable

2015-07-31 Thread Aman Sinha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36630/#review93819
---

Ship it!


I had already reviewed the original changes. Reviewed the incremental ones and 
looks good.

- Aman Sinha


On July 31, 2015, 3:35 a.m., Mehant Baid wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36630/
> ---
> 
> (Updated July 31, 2015, 3:35 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Bugs: DRILL-3503
> https://issues.apache.org/jira/browse/DRILL-3503
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> Added an interface to abstract away partitioning scheme away from the 
> partition pruning rule. Removed some of the redundant logic in PruneScanRule.
> 
> 
> Diffs
> -
> 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
>  8307dff 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
>  9ad14b1 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java
>  127e70a 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionDescriptor.java
>  35fdae9 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionLocation.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushPartitionFilterIntoScan.java
>  b83cedd 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  daa7276 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/ParquetPruneScanRule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
>  5b5e4bc 
> 
> Diff: https://reviews.apache.org/r/36630/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>

Re: Suspicious direct memory consumption when running queries concurrently

2015-07-31 Thread Abdel Hakim Deneche

I tried getting a jmap dump multiple times without success, each time it
crashes the jvm with the following exception:

Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
> ...
> Exception in thread "main" java.io.IOException: Premature EOF
> at
> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
> at
> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
> at
> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
> at
> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
> at sun.tools.jmap.JMap.dump(JMap.java:242)
> at sun.tools.jmap.JMap.main(JMap.java:140)


On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau  wrote:

> A allocate -> release cycle all on the same thread goes into a per thread
> cache.
>
> A bunch of Netty arena settings are configurable.  The big issue I believe
> is that the limits are soft limits implemented by the allocation-time
> release mechanism.  As such, if you allocate a bunch of memory, then
> release it all, that won't necessarily trigger any actual chunk releases.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > wrote:
>
> > @Jacques, my understanding is that chunks are not owned by specific a
> > thread but they are part of a specific memory arena which is in turn only
> > accessed by specific threads. Do you want me to find which threads are
> > associated with the same arena where we have hanging chunks ?
> >
> >
> > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau 
> > wrote:
> >
> > > It sounds like your statement is that we're cacheing too many unused
> > > chunks.  Hanifi and I previously discussed implementing a separate
> > flushing
> > > mechanism to release unallocated chunks that are hanging around.  The
> > main
> > > question is, why are so many chunks hanging around and what threads are
> > > they associated with.  A Jmap dump and analysis should allow you to do
> > > determine which thread owns the excess chunks.  My guess would be the
> RPC
> > > pool since those are long lasting (as opposed to the WorkManager pool,
> > > which is contracting).
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > > adene...@maprtech.com>
> > > wrote:
> > >
> > > > When running a set of, mostly window function, queries concurrently
> on
> > a
> > > > single drillbit with a 8GB max direct memory. We are seeing a
> > continuous
> > > > increase of direct memory allocation.
> > > >
> > > > We repeat the following steps multiple times:
> > > > - we launch in "iteration" of tests that will run all queries in a
> > random
> > > > order, 10 queries at a time
> > > > - after the iteration finishes, we wait for a couple of minute to
> give
> > > > Drill time to release the memory being held by the finishing
> fragments
> > > >
> > > > Using Drill's memory logger ("drill.allocator") we were able to get
> > > > snapshots of how memory was internally used by Netty, we only focused
> > on
> > > > the number of allocated chunks, if we take this number and multiply
> it
> > by
> > > > 16MB (netty's chunk size) we get approximately the same value
> reported
> > by
> > > > Drill's direct memory allocation.
> > > > Here is a graph that shows the evolution of the number of allocated
> > > chunks
> > > > on a 500 iterations run (I'm working on improving the plots) :
> > > >
> > > > http://bit.ly/1JL6Kp3
> > > >
> > > > In this specific case, after the first iteration Drill was allocating
> > > ~2GB
> > > > of direct memory, this number kept rising after each iteration to
> ~6GB.
> > > We
> > > > suspect this caused one of our previous runs to crash the JVM.
> > > >
> > > > If we only focus on the log lines between iterations (when Drill's
> > memory
> > > > usage is below 10MB) then all allocated chunks are at most 2% usage.
> At
> > > > some point we end up with 288 nearly empty chunks, yet the next
> > iteration
> > > > will cause more chunks to be allocated!!!
> > > >
> > > > is this expected ?
> > > >
> > > > PS: I am running more tests and will update this thread with more
> > > > informations.
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim

Re: Suspicious direct memory consumption when running queries concurrently

2015-07-31 Thread Jacques Nadeau

Can you give me a single node repro?
On Jul 31, 2015 9:20 PM, "Abdel Hakim Deneche" 
wrote:

> I tried getting a jmap dump multiple times without success, each time it
> crashes the jvm with the following exception:
>
> Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
> > ...
> > Exception in thread "main" java.io.IOException: Premature EOF
> > at
> >
> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
> > at
> >
> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
> > at
> >
> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
> > at
> >
> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
> > at sun.tools.jmap.JMap.dump(JMap.java:242)
> > at sun.tools.jmap.JMap.main(JMap.java:140)
>
>
> On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau 
> wrote:
>
> > A allocate -> release cycle all on the same thread goes into a per thread
> > cache.
> >
> > A bunch of Netty arena settings are configurable.  The big issue I
> believe
> > is that the limits are soft limits implemented by the allocation-time
> > release mechanism.  As such, if you allocate a bunch of memory, then
> > release it all, that won't necessarily trigger any actual chunk releases.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > wrote:
> >
> > > @Jacques, my understanding is that chunks are not owned by specific a
> > > thread but they are part of a specific memory arena which is in turn
> only
> > > accessed by specific threads. Do you want me to find which threads are
> > > associated with the same arena where we have hanging chunks ?
> > >
> > >
> > > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau 
> > > wrote:
> > >
> > > > It sounds like your statement is that we're cacheing too many unused
> > > > chunks.  Hanifi and I previously discussed implementing a separate
> > > flushing
> > > > mechanism to release unallocated chunks that are hanging around.  The
> > > main
> > > > question is, why are so many chunks hanging around and what threads
> are
> > > > they associated with.  A Jmap dump and analysis should allow you to
> do
> > > > determine which thread owns the excess chunks.  My guess would be the
> > RPC
> > > > pool since those are long lasting (as opposed to the WorkManager
> pool,
> > > > which is contracting).
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com>
> > > > wrote:
> > > >
> > > > > When running a set of, mostly window function, queries concurrently
> > on
> > > a
> > > > > single drillbit with a 8GB max direct memory. We are seeing a
> > > continuous
> > > > > increase of direct memory allocation.
> > > > >
> > > > > We repeat the following steps multiple times:
> > > > > - we launch in "iteration" of tests that will run all queries in a
> > > random
> > > > > order, 10 queries at a time
> > > > > - after the iteration finishes, we wait for a couple of minute to
> > give
> > > > > Drill time to release the memory being held by the finishing
> > fragments
> > > > >
> > > > > Using Drill's memory logger ("drill.allocator") we were able to get
> > > > > snapshots of how memory was internally used by Netty, we only
> focused
> > > on
> > > > > the number of allocated chunks, if we take this number and multiply
> > it
> > > by
> > > > > 16MB (netty's chunk size) we get approximately the same value
> > reported
> > > by
> > > > > Drill's direct memory allocation.
> > > > > Here is a graph that shows the evolution of the number of allocated
> > > > chunks
> > > > > on a 500 iterations run (I'm working on improving the plots) :
> > > > >
> > > > > http://bit.ly/1JL6Kp3
> > > > >
> > > > > In this specific case, after the first iteration Drill was
> allocating
> > > > ~2GB
> > > > > of direct memory, this number kept rising after each iteration to
> > ~6GB.
> > > > We
> > > > > suspect this caused one of our previous runs to crash the JVM.
> > > > >
> > > > > If we only focus on the log lines between iterations (when Drill's
> > > memory
> > > > > usage is below 10MB) then all allocated chunks are at most 2%
> usage.
> > At
> > > > > some point we end up with 288 nearly empty chunks, yet the next
> > > iteration
> > > > > will cause more chunks to be allocated!!!
> > > > >
> > > > > is this expected ?
> > > > >
> > > > > PS: I am running more tests and will update this thread with more
> > > > > informations.
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Emai

Re: Suspicious direct memory consumption when running queries concurrently

2015-07-31 Thread Jacques Nadeau

For the memory leak,  not the jmap issue.
On Jul 31, 2015 9:50 PM, "Jacques Nadeau"  wrote:

> Can you give me a single node repro?
> On Jul 31, 2015 9:20 PM, "Abdel Hakim Deneche" 
> wrote:
>
>> I tried getting a jmap dump multiple times without success, each time it
>> crashes the jvm with the following exception:
>>
>> Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
>> > ...
>> > Exception in thread "main" java.io.IOException: Premature EOF
>> > at
>> >
>> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
>> > at
>> >
>> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
>> > at
>> >
>> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
>> > at
>> >
>> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
>> > at sun.tools.jmap.JMap.dump(JMap.java:242)
>> > at sun.tools.jmap.JMap.main(JMap.java:140)
>>
>>
>> On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau 
>> wrote:
>>
>> > A allocate -> release cycle all on the same thread goes into a per
>> thread
>> > cache.
>> >
>> > A bunch of Netty arena settings are configurable.  The big issue I
>> believe
>> > is that the limits are soft limits implemented by the allocation-time
>> > release mechanism.  As such, if you allocate a bunch of memory, then
>> > release it all, that won't necessarily trigger any actual chunk
>> releases.
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
>> > adene...@maprtech.com
>> > > wrote:
>> >
>> > > @Jacques, my understanding is that chunks are not owned by specific a
>> > > thread but they are part of a specific memory arena which is in turn
>> only
>> > > accessed by specific threads. Do you want me to find which threads are
>> > > associated with the same arena where we have hanging chunks ?
>> > >
>> > >
>> > > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau 
>> > > wrote:
>> > >
>> > > > It sounds like your statement is that we're cacheing too many unused
>> > > > chunks.  Hanifi and I previously discussed implementing a separate
>> > > flushing
>> > > > mechanism to release unallocated chunks that are hanging around.
>> The
>> > > main
>> > > > question is, why are so many chunks hanging around and what threads
>> are
>> > > > they associated with.  A Jmap dump and analysis should allow you to
>> do
>> > > > determine which thread owns the excess chunks.  My guess would be
>> the
>> > RPC
>> > > > pool since those are long lasting (as opposed to the WorkManager
>> pool,
>> > > > which is contracting).
>> > > >
>> > > > --
>> > > > Jacques Nadeau
>> > > > CTO and Co-Founder, Dremio
>> > > >
>> > > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
>> > > > adene...@maprtech.com>
>> > > > wrote:
>> > > >
>> > > > > When running a set of, mostly window function, queries
>> concurrently
>> > on
>> > > a
>> > > > > single drillbit with a 8GB max direct memory. We are seeing a
>> > > continuous
>> > > > > increase of direct memory allocation.
>> > > > >
>> > > > > We repeat the following steps multiple times:
>> > > > > - we launch in "iteration" of tests that will run all queries in a
>> > > random
>> > > > > order, 10 queries at a time
>> > > > > - after the iteration finishes, we wait for a couple of minute to
>> > give
>> > > > > Drill time to release the memory being held by the finishing
>> > fragments
>> > > > >
>> > > > > Using Drill's memory logger ("drill.allocator") we were able to
>> get
>> > > > > snapshots of how memory was internally used by Netty, we only
>> focused
>> > > on
>> > > > > the number of allocated chunks, if we take this number and
>> multiply
>> > it
>> > > by
>> > > > > 16MB (netty's chunk size) we get approximately the same value
>> > reported
>> > > by
>> > > > > Drill's direct memory allocation.
>> > > > > Here is a graph that shows the evolution of the number of
>> allocated
>> > > > chunks
>> > > > > on a 500 iterations run (I'm working on improving the plots) :
>> > > > >
>> > > > > http://bit.ly/1JL6Kp3
>> > > > >
>> > > > > In this specific case, after the first iteration Drill was
>> allocating
>> > > > ~2GB
>> > > > > of direct memory, this number kept rising after each iteration to
>> > ~6GB.
>> > > > We
>> > > > > suspect this caused one of our previous runs to crash the JVM.
>> > > > >
>> > > > > If we only focus on the log lines between iterations (when Drill's
>> > > memory
>> > > > > usage is below 10MB) then all allocated chunks are at most 2%
>> usage.
>> > At
>> > > > > some point we end up with 288 nearly empty chunks, yet the next
>> > > iteration
>> > > > > will cause more chunks to be allocated!!!
>> > > > >
>> > > > > is this expected ?
>> > > > >
>> > > > > PS: I am running more tests and will update this thread with more
>> > > > > informations.
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Abdelhaki

Re: Suspicious direct memory consumption when running queries concurrently

2015-07-31 Thread yuliya Feldman

How much memory your jvm is taking?
Do you even have enough disk space to dump it.
  From: Abdel Hakim Deneche 
 To: "dev@drill.apache.org"  
 Sent: Friday, July 31, 2015 9:19 PM
 Subject: Re: Suspicious direct memory consumption when running queries 
concurrently
   
I tried getting a jmap dump multiple times without success, each time it
crashes the jvm with the following exception:

Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
> ...
> Exception in thread "main" java.io.IOException: Premature EOF
>        at
> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
>        at
> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
>        at
> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
>        at
> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
>        at sun.tools.jmap.JMap.dump(JMap.java:242)
>        at sun.tools.jmap.JMap.main(JMap.java:140)


On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau  wrote:

> A allocate -> release cycle all on the same thread goes into a per thread
> cache.
>
> A bunch of Netty arena settings are configurable.  The big issue I believe
> is that the limits are soft limits implemented by the allocation-time
> release mechanism.  As such, if you allocate a bunch of memory, then
> release it all, that won't necessarily trigger any actual chunk releases.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > wrote:
>
> > @Jacques, my understanding is that chunks are not owned by specific a
> > thread but they are part of a specific memory arena which is in turn only
> > accessed by specific threads. Do you want me to find which threads are
> > associated with the same arena where we have hanging chunks ?
> >
> >
> > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau 
> > wrote:
> >
> > > It sounds like your statement is that we're cacheing too many unused
> > > chunks.  Hanifi and I previously discussed implementing a separate
> > flushing
> > > mechanism to release unallocated chunks that are hanging around.  The
> > main
> > > question is, why are so many chunks hanging around and what threads are
> > > they associated with.  A Jmap dump and analysis should allow you to do
> > > determine which thread owns the excess chunks.  My guess would be the
> RPC
> > > pool since those are long lasting (as opposed to the WorkManager pool,
> > > which is contracting).
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > > adene...@maprtech.com>
> > > wrote:
> > >
> > > > When running a set of, mostly window function, queries concurrently
> on
> > a
> > > > single drillbit with a 8GB max direct memory. We are seeing a
> > continuous
> > > > increase of direct memory allocation.
> > > >
> > > > We repeat the following steps multiple times:
> > > > - we launch in "iteration" of tests that will run all queries in a
> > random
> > > > order, 10 queries at a time
> > > > - after the iteration finishes, we wait for a couple of minute to
> give
> > > > Drill time to release the memory being held by the finishing
> fragments
> > > >
> > > > Using Drill's memory logger ("drill.allocator") we were able to get
> > > > snapshots of how memory was internally used by Netty, we only focused
> > on
> > > > the number of allocated chunks, if we take this number and multiply
> it
> > by
> > > > 16MB (netty's chunk size) we get approximately the same value
> reported
> > by
> > > > Drill's direct memory allocation.
> > > > Here is a graph that shows the evolution of the number of allocated
> > > chunks
> > > > on a 500 iterations run (I'm working on improving the plots) :
> > > >
> > > > http://bit.ly/1JL6Kp3
> > > >
> > > > In this specific case, after the first iteration Drill was allocating
> > > ~2GB
> > > > of direct memory, this number kept rising after each iteration to
> ~6GB.
> > > We
> > > > suspect this caused one of our previous runs to crash the JVM.
> > > >
> > > > If we only focus on the log lines between iterations (when Drill's
> > memory
> > > > usage is below 10MB) then all allocated chunks are at most 2% usage.
> At
> > > > some point we end up with 288 nearly empty chunks, yet the next
> > iteration
> > > > will cause more chunks to be allocated!!!
> > > >
> > > > is this expected ?
> > > >
> > > > PS: I am running more tests and will update this thread with more
> > > > informations.
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >  
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available


> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhaki

Re: [DISCUSS] Drop table support

[jira] [Created] (DRILL-3584) Drill Kerberos HDFS Support + Documentation

[jira] [Created] (DRILL-3585) Apache Solr as a storage plugin

[jira] [Created] (DRILL-3586) Sum of values not correctly calculated

Re: Request - hold off on merging to master for 48 hours

Re: Request - hold off on merging to master for 48 hours

[jira] [Created] (DRILL-3587) Select hive's struct data gives IndexOutOfBoundsException instead of unsupported error

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

[jira] [Created] (DRILL-3588) Write back to Hive Metastore

[GitHub] drill pull request:

[jira] [Created] (DRILL-3589) JDBC driver maven artifact includes a lot of unnecessary dependencies

[jira] [Created] (DRILL-3590) storage plugin config API doesn't log exceptions or yield any useful error

[jira] [Created] (DRILL-3591) Partition pruning hit AssertionError when a cast function is used in comparison operand

Re: Review Request 36630: DRILL-3503: Make PruneScanRule pluggable

Re: Suspicious direct memory consumption when running queries concurrently

Re: Suspicious direct memory consumption when running queries concurrently

Re: Suspicious direct memory consumption when running queries concurrently

Re: Suspicious direct memory consumption when running queries concurrently

18 matches

Site Navigation

Mail list logo

Footer information