[GitHub] drill pull request: Bugs/various work re master 07 09

2015-07-10 Thread dsbos
GitHub user dsbos opened a pull request:

https://github.com/apache/drill/pull/87

Bugs/various work re master 07 09



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dsbos/incubator-drill 
bugs/various_WORK_re_master_07-09

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/87.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #87


commit 92e2cc18c79dbf5269304d1479f1b6ff8bcd477e
Author: dbarclay dbarc...@maprtech.com
Date:   2015-03-23T21:16:48Z

sep.:  --- forkCount 2 - 1 [pom.xml] ---  = 
Working,  Review

commit 59608f3664cefd9754b778c45479c3b72025b1b2
Author: dbarclay dbarc...@maprtech.com
Date:   2015-03-24T17:20:27Z

sep.:  =-=-=-=-=-=-=-= forkCount 3 - 2.  [/pom.xml] =-=-=-=-=-=-=-=  = in 
Review,  awaiting Merge

Conflicts:
pom.xml

commit 3e155dc1345118a997de66b338adead8328517bd
Author: dbarclay dbarc...@maprtech.com
Date:   2015-06-20T02:05:39Z

DRILL-3151:  Fix many ResultSetMetaData method return values.

Added ~unit test for ResultSetMetaData implementation.

Made getObject return classes available to implementation of 
getColumnClassName:
- Added SqlAccessor.getObjectClass() (to put that metadata right next to 
code
  to which it corresponds rather than in far-away parallel code).
- Added similar AvaticaDrillSqlAccessor.getObjectClass().
- Changed DrillAccessorList.accessors from Accessor[] to
  AvaticaDrillSqlAccessor[] for better access to JDBC getObject return 
class.
- Extracted return classes from accessors to pass to updateColumnMetaData.

Reworked some data type mapping and utilities:
- Added Added Types.getSqlTypeName(...).
- Renamed Types.getJdbcType(...) to getJdbcTypeCode(...)
- Replaced Types.isUnSigned with isJdbcSignedType.
- Fixed various bogus RPC-type XXX - java.sql.Types.SMALLINT mappings.
- Removed DrillColumnMetaDataList.getJdbcTypeName.
- Moved getAvaticaType up (for bottom-up order).
- Revised DrillColumnMetaDataList.getAvaticaType(...).

MAIN:
- Updated updateColumnMetaData(...) to change many calculations of metadata
  input to ColumnMetaData construction.  [DrillColumnMetaDataList]

Updated other metadata tests per changes.

commit ceae4668f250672391e51c84d9b4e295a4c0f4a5
Author: dbarclay dbarc...@maprtech.com
Date:   2015-07-02T22:04:31Z

DRILL-3483: Clarify CommonConstants' constants.  [CommonConstants, 
DrillConfig, PathScanner]

Renamed constants.
Documented constants.
Removed extraneous public static final (redudant and abnormal since in 
interface).

commit 03dafaca207b45c01519e22ceeb0dd2784db18d5
Author: dbarclay dbarc...@maprtech.com
Date:   2015-04-17T20:09:59Z

DRILL-2696: Test for future DRILL-2696 fix (currently disabled with 
@Ignore).

commit 73183a415ac765863ade2a632799f328fcddad74
Author: dbarclay dbarc...@maprtech.com
Date:   2015-04-17T23:27:46Z

DRILL-2815: Some PathScanner logging, misc. cleanup.

Add some DEBUG-level log calls and augmented, edited log message.

Misc. code hygiene:
- Added method doc comment.
- Renamed to clarify a number of names.
- Added final.
- Fixed indentation; wrapped some long lines.

commit c68cc505c03b06b2db3d18eef7bce965408efb15
Author: dbarclay dbarc...@maprtech.com
Date:   2015-03-29T21:46:47Z

temp:  logging adjustments.  [exec/java-exec/src/test/resources/logback.xml]

commit e48f5267c6c1b27cd2fa500a8062876a61e0f6d2
Author: dbarclay dbarc...@maprtech.com
Date:   2015-03-29T21:47:11Z

temp:  logging adjustments.  [exec/jdbc/src/test/resources/logback.xml]

commit a0e9cb671e08ded93e571fa096fa5c38f765ed85
Author: dbarclay dbarc...@maprtech.com
Date:   2015-04-06T18:11:13Z

temp:  logging adjustments.  [common/src/test/resources/logback.xml]

commit 321b9d09da0973e8da48f81a94c64b3503602321
Author: dbarclay dbarc...@maprtech.com
Date:   2015-05-10T19:33:21Z

temp:  logging adjustments.  [distribution/src/resources/logback.xml]

commit d7da49862c13144c29d78d49c9d4605952f436cf
Author: dbarclay dbarc...@maprtech.com
Date:   2015-05-14T22:40:06Z

temp:  logging adjustments (probably TEMP).  
[exec/jdbc/src/test/resources/logback.xml]

commit ae797565156fbaefa795ca22d75631763bd4cfe6
Author: dbarclay dbarc...@maprtech.com
Date:   2015-05-15T03:54:51Z

temp:  logging adjustments.  [jdbc/src/test/resources/logback.xml]

commit 1cd642ad80311850c4df1f76099a05c78b8a6a31
Author: dbarclay dbarc...@maprtech.com
Date:   2015-04-17T20:54:22Z

:  Logging:  Added calls.  [DrillConfig]

commit 11a72689afc4e3c86cc1dcf6fbd5213040cf444e
Author: dbarclay dbarc...@maprtech.com
Date:   2015-05-30T18:11:13Z

temp:  logging adjustments.  [jdbc/src/test/resources/logback.xml]

[GitHub] drill pull request: Bugs/various work re master 07 09

2015-07-10 Thread dsbos
Github user dsbos closed the pull request at:

https://github.com/apache/drill/pull/87


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar

2015-07-10 Thread Sean Hsuan-Yi Chu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36278/
---

(Updated July 10, 2015, 11:33 p.m.)


Review request for drill, Aman Sinha and Jinfeng Ni.


Changes
---

new patch


Bugs: DRILL-3189
https://issues.apache.org/jira/browse/DRILL-3189


Repository: drill-git


Description
---

Disable disallow partial in Over-Clause


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
 9bbd537 
  exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
7071bea 

Diff: https://reviews.apache.org/r/36278/diff/


Testing
---

All requested


Thanks,

Sean Hsuan-Yi Chu



[GitHub] drill pull request: DRILL-2650: Mark query end time when closing t...

2015-07-10 Thread cwestin
Github user cwestin commented on the pull request:

https://github.com/apache/drill/pull/80#issuecomment-120554910
  
Do we want to stick with .info() for these messages instead of .debug()? 
Asking because I'm not sure, but it seems like noise.

Otherwise, non-binding ship it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2650: Mark query end time when closing t...

2015-07-10 Thread cwestin
Github user cwestin commented on a diff in the pull request:

https://github.com/apache/drill/pull/80#discussion_r34406523
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -789,7 +794,7 @@ protected void processEvent(final StateEvent event) {
   final Exception exception = event.exception;
 
   // TODO Auto-generated method stub
-  logger.info(State change requested.  {} -- {}, state, newState,
+  logger.info(queryIdString + : State change requested {} -- {}, 
state, newState,
--- End diff --

.info() - .debug()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2650: Mark query end time when closing t...

2015-07-10 Thread cwestin
Github user cwestin commented on a diff in the pull request:

https://github.com/apache/drill/pull/80#discussion_r34406507
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -695,7 +697,10 @@ public void close() {
   Preconditions.checkState(!isClosed);
   Preconditions.checkState(resultState != null);
 
-  logger.info(foreman cleaning up.);
+  // to track how long the query takes
+  queryManager.markEndTime();
+
+  logger.info(queryIdString + : cleaning up.);
--- End diff --

.info() seems like it will generate a lot of noise here, should it be 
.debug()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Threads left after Drillbit shutdown (in dev./unit tests)

2015-07-10 Thread Hanifi Gunes
Is there any way to re-produce this at a smaller scale? Have you tried
failing a couple of tests and dumping threads?

-Hanifi
Thanks

On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay dbarc...@maprtech.com
wrote:

 Is Drill terminating threads correctly?

 In running jstack on a JVM running a dev. test run that ended up hung
 after getting about three test timeout errors, I see that there are
 409 threads.

 Although 138 of those are not-unexpected ShutdownHook threads (since
 many tests are run in one VM), there are:
 - 138 WorkManager.StatusThread threads (hmm 138 again)
 -   7 Client-1 threads
 -   4 UserServer-1 threads
 -  21 BitClient-1 threads
 -   4 BitClient-2 threads
 -   3 BitClient-3 threads
 -   8 BitServer-1 threads
 -   8 BitServer-2 threads
 -   7 BitServer-3 threads
 -   7 BitServer-4 threads
 -   7 BitServer-5 threads
 -   6 BitServer-6 threads
 -   6 BitServer-7 threads
 -   6 BitServer-8 threads
 -   5 BitServer-9 threads
 -   5 BitServer-10 threads
 (Other thread names have only 1 or 2 occurrences.)

 Regarding the 4 for the number of UserServer-1 threads:  Three test
 methods had timeout failures plus one got hung.


 Here's the tail end of the output from the test running, including
 all the timeout errors and including the hang (except for repeated
 query-results data lines).



 dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install

 TRIMMED

 Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
 Running
 org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec
 - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
 Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
 Running
 org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
 00:44:34.017 [globalEventExecutor-1-523] ERROR
 o.a.z.server.NIOServerCnxnFactory - Thread
 Thread[globalEventExecutor-1-523,5,main] died
 java.lang.AssertionError: null
 at
 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
 at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
 Running
 org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
 1
 Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599
 sec  FAILURE! - in
 org.apache.drill.exec.physical.impl.TestBroadcastExchange
 TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
 Time elapsed: 50.063 sec   ERROR!
 java.lang.Exception: test timed out after 5 milliseconds
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:503)
 at
 io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
 at
 io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
 at
 io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
 at
 org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
 at com.google.common.io.Closeables.close(Closeables.java:77)
 at com.google.common.io
 .Closeables.closeQuietly(Closeables.java:108)
 at
 org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
 at com.google.common.io.Closeables.close(Closeables.java:77)
 at com.google.common.io
 .Closeables.closeQuietly(Closeables.java:108)
 at
 org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
 at com.google.common.io.Closeables.close(Closeables.java:77)
 at com.google.common.io
 .Closeables.closeQuietly(Closeables.java:108)
 at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
 at
 org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)

 

Re: Hash Agg vs Streaming Agg for a smaller data set

2015-07-10 Thread rahul challapalli
That could be the reason, in the first query we are scanning 64000 records
and in the second case just 108 records. Thanks for the replies!

On Fri, Jul 10, 2015 at 4:48 PM, Jinfeng Ni jinfengn...@gmail.com wrote:

 I'm not clear which column is the partitioning column. From what you
 described, row count of aggregator in the first case is larger than that in
 the second case, since the former one requires full table scan. Cost-wise,
 hash-agg would make more sense when the input is larger, since
 streaming-agg requires sort, which could be expensive for large dataset.

 My guess is the difference of rowcounts in the two cases cause the
 difference in the query plan.

 One suggestion. If you want to check query plan, it would make more sense
 to try with reasonably large data.  Drill's costing model is not fully
 calibrated yet;  a small dataset like tpch_0.0.1 might make it hard for the
 cost model to pick the right plan. On the other hand, if the dataset is
 small, two different plans normally would not make a big difference in
 terms of performance. In other words, try to use large dataset if you are
 interested in performance testing / plan verification.





 On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli 
 challapallira...@gmail.com wrote:

  Hi,
 
  Info about Data : The data is auto partitioned tpch 0.01 data. The second
  filter is a non-partitioned column, so in the first case the 'OR'
 predicate
  results in a full-table scan, while in the second case, partition pruning
  takes effect.
 
  The first case results in a hash agg and the second case in a streaming
  agg. Any idea why?
 
  1. explain plan for select distinct l_modline, l_moddate from
  `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
  '1992-01-01' or l_shipdate=date'1992-01-01';
  +--+--+
  | text | json |
  +--+--+
  | 00-00Screen
  00-01  Project(l_modline=[$0], l_moddate=[$1])
  00-02Project(l_modline=[$0], l_moddate=[$1])
  00-03  HashAgg(group=[{0, 1}])
  00-04Project(l_modline=[$2], l_moddate=[$0])
  00-05  SelectionVectorRemover
  00-06Filter(condition=[OR(=($0, 1992-01-01), =($1,
  1992-01-01))])
  00-07  Project(l_moddate=[$2], l_shipdate=[$1],
  l_modline=[$0])
  00-08Scan..
 
  2. explain plan for select distinct l_modline, l_moddate from
  `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
  '1992-01-01' and l_shipdate=date'1992-01-01';
  +--+--+
  | text | json |
  +--+--+
  | 00-00Screen
  00-01  Project(l_modline=[$0], l_moddate=[$1])
  00-02Project(l_modline=[$0], l_moddate=[$1])
  00-03  StreamAgg(group=[{0, 1}])
  00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
  00-05  Project(l_modline=[$2], l_moddate=[$0])
  00-06SelectionVectorRemover
  00-07  Filter(condition=[AND(=($0, 1992-01-01), =($1,
  1992-01-01))])
  00-08Project(l_moddate=[$2], l_shipdate=[$1],
  l_modline=[$0])
  00-09  Scan.
 
  - Rahul
 



Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar

2015-07-10 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36278/#review91375
---

Ship it!


Ship It!

- Aman Sinha


On July 10, 2015, 11:33 p.m., Sean Hsuan-Yi Chu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36278/
 ---
 
 (Updated July 10, 2015, 11:33 p.m.)
 
 
 Review request for drill, Aman Sinha and Jinfeng Ni.
 
 
 Bugs: DRILL-3189
 https://issues.apache.org/jira/browse/DRILL-3189
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Disable disallow partial in Over-Clause
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
  9bbd537 
   exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
 7071bea 
 
 Diff: https://reviews.apache.org/r/36278/diff/
 
 
 Testing
 ---
 
 All requested
 
 
 Thanks,
 
 Sean Hsuan-Yi Chu
 




Re: Hash Agg vs Streaming Agg for a smaller data set

2015-07-10 Thread Steven Phillips
My guess is that in the second query, the size of the dataset is smaller,
and this causes the cost of sorting to be small enough that it is cheaper
than the HashAgg.

On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli 
challapallira...@gmail.com wrote:

 Hi,

 Info about Data : The data is auto partitioned tpch 0.01 data. The second
 filter is a non-partitioned column, so in the first case the 'OR' predicate
 results in a full-table scan, while in the second case, partition pruning
 takes effect.

 The first case results in a hash agg and the second case in a streaming
 agg. Any idea why?

 1. explain plan for select distinct l_modline, l_moddate from
 `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
 '1992-01-01' or l_shipdate=date'1992-01-01';
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(l_modline=[$0], l_moddate=[$1])
 00-02Project(l_modline=[$0], l_moddate=[$1])
 00-03  HashAgg(group=[{0, 1}])
 00-04Project(l_modline=[$2], l_moddate=[$0])
 00-05  SelectionVectorRemover
 00-06Filter(condition=[OR(=($0, 1992-01-01), =($1,
 1992-01-01))])
 00-07  Project(l_moddate=[$2], l_shipdate=[$1],
 l_modline=[$0])
 00-08Scan..

 2. explain plan for select distinct l_modline, l_moddate from
 `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
 '1992-01-01' and l_shipdate=date'1992-01-01';
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(l_modline=[$0], l_moddate=[$1])
 00-02Project(l_modline=[$0], l_moddate=[$1])
 00-03  StreamAgg(group=[{0, 1}])
 00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-05  Project(l_modline=[$2], l_moddate=[$0])
 00-06SelectionVectorRemover
 00-07  Filter(condition=[AND(=($0, 1992-01-01), =($1,
 1992-01-01))])
 00-08Project(l_moddate=[$2], l_shipdate=[$1],
 l_modline=[$0])
 00-09  Scan.

 - Rahul




-- 
 Steven Phillips
 Software Engineer

 mapr.com


Re: Hash Agg vs Streaming Agg for a smaller data set

2015-07-10 Thread Jinfeng Ni
I'm not clear which column is the partitioning column. From what you
described, row count of aggregator in the first case is larger than that in
the second case, since the former one requires full table scan. Cost-wise,
hash-agg would make more sense when the input is larger, since
streaming-agg requires sort, which could be expensive for large dataset.

My guess is the difference of rowcounts in the two cases cause the
difference in the query plan.

One suggestion. If you want to check query plan, it would make more sense
to try with reasonably large data.  Drill's costing model is not fully
calibrated yet;  a small dataset like tpch_0.0.1 might make it hard for the
cost model to pick the right plan. On the other hand, if the dataset is
small, two different plans normally would not make a big difference in
terms of performance. In other words, try to use large dataset if you are
interested in performance testing / plan verification.





On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli 
challapallira...@gmail.com wrote:

 Hi,

 Info about Data : The data is auto partitioned tpch 0.01 data. The second
 filter is a non-partitioned column, so in the first case the 'OR' predicate
 results in a full-table scan, while in the second case, partition pruning
 takes effect.

 The first case results in a hash agg and the second case in a streaming
 agg. Any idea why?

 1. explain plan for select distinct l_modline, l_moddate from
 `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
 '1992-01-01' or l_shipdate=date'1992-01-01';
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(l_modline=[$0], l_moddate=[$1])
 00-02Project(l_modline=[$0], l_moddate=[$1])
 00-03  HashAgg(group=[{0, 1}])
 00-04Project(l_modline=[$2], l_moddate=[$0])
 00-05  SelectionVectorRemover
 00-06Filter(condition=[OR(=($0, 1992-01-01), =($1,
 1992-01-01))])
 00-07  Project(l_moddate=[$2], l_shipdate=[$1],
 l_modline=[$0])
 00-08Scan..

 2. explain plan for select distinct l_modline, l_moddate from
 `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
 '1992-01-01' and l_shipdate=date'1992-01-01';
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(l_modline=[$0], l_moddate=[$1])
 00-02Project(l_modline=[$0], l_moddate=[$1])
 00-03  StreamAgg(group=[{0, 1}])
 00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-05  Project(l_modline=[$2], l_moddate=[$0])
 00-06SelectionVectorRemover
 00-07  Filter(condition=[AND(=($0, 1992-01-01), =($1,
 1992-01-01))])
 00-08Project(l_moddate=[$2], l_shipdate=[$1],
 l_modline=[$0])
 00-09  Scan.

 - Rahul



Re: Recursive CTE Support in Drill

2015-07-10 Thread Abdel Hakim Deneche
@Ted, the log-synth storage format would be really useful. I'm already
seeing many unit tests that could benefit from this. Do you have a github
repo for your ongoing work ?

Thanks!

On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Are you hard set on using common table expressions?

 I have discussed a bit off-list creating a data format that would allow
 tables to be read from a log-synth [1] schema.  That would let you read as
 much data as you might like with an arbitrarily complex (or simple) query.

 Operationally, you would create a file containing a log-synth schema that
 has the extension .synth.  Your data source would have to be configured to
 connect that extension with the log-synth format.  At that point, you could
 select as much or little data as you like from the file and you would see
 generated data rather than the schema.



 [1] https://github.com/tdunning/log-synth

 On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei 
 alexanderz.si...@gmail.com
  wrote:

  Hi All,
 
  I am trying to come up with a query which returns a given number of rows
  without having a real table on Storage.
 
  I am hoping to achieve something like this:
 
 
 
 http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
 
  DECLARE @start INT = 1;DECLARE @end INT = 100;
  WITH numbers AS (
  SELECT @start AS number
  UNION ALL
  SELECT number + 1
  FROM  numbers
  WHERE number  @end)SELECT *FROM numbersOPTION (MAXRECURSION 0);
 
  I do not actually need to create different values and returning identical
  rows would work too.I just need to bypass the from clause in the query.
 
  Thanks,
  Alex
 




-- 

Abdelhakim Deneche

Software Engineer

  http://www.mapr.com/


Now Available - Free Hadoop On-Demand Training
http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available


Re: Review Request 35808: DRILL-2862: Convert_to/Convert_From throw assertion when an incorrect encoding type is specified

2015-07-10 Thread Parth Chandra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35808/
---

(Updated July 10, 2015, 6:40 p.m.)


Review request for drill, Mehant Baid, Sudheesh Katkam, and Venki Korukanti.


Changes
---

Updated with review comments.


Repository: drill-git


Description
---

DRILL-2862: Convert_to/Convert_From throw assertion when an incorrect encoding 
type is specified or if the encoding type is not a string literal.

Instead of an assertion when user input is wrong, we now throw an exception 
with the appropriate error message. 
For the case where the user types in a type name incorrectly, the error message 
also provides a helpful suggestion. The suggested name is selected from the 
list of available functions.

For example: 

  select convert_from(foo, 'UTF') from dfs.`/table_foo`

will print the following error:

  Error: UNSUPPORTED_OPERATION ERROR: CONVERT_FROM does not support conversion 
from type 'UTF'.
  Did you mean UTF8?
  [Error Id: 87ed2941-f9c2-4c35-8ff2-a3f21eae1104 on localhost:31010] 
(state=,code=0)


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/PreProcessLogicalRel.java
 0f8e45a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/util/ApproximateStringMatcher.java
 PRE-CREATION 
  
exec/java-exec/src/test/java/org/apache/drill/exec/util/TestApproximateStringMatcher.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/35808/diff/


Testing
---

All regression tests


Thanks,

Parth Chandra



Re: Recursive CTE Support in Drill

2015-07-10 Thread Abdel Hakim Deneche
Yeah, we still lack documentation on how to write a storage plugin. One
advice I've been seeing a lot is to take a look at the mongo-db plugin, it
was basically added in one single commit:

https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304

I think this will give some general ideas on what to expect when writing a
storage plugin.

On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Hakim,

 Not yet.  Still very much in the stage of gathering feedback.

 I would think it very simple.  The biggest obstacles are

 1) no documentation on how to write a data format

 2) I need to release a jar for log-synth to Maven Central.




 On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
 wrote:

  @Ted, the log-synth storage format would be really useful. I'm already
  seeing many unit tests that could benefit from this. Do you have a github
  repo for your ongoing work ?
 
  Thanks!
 
  On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   Are you hard set on using common table expressions?
  
   I have discussed a bit off-list creating a data format that would allow
   tables to be read from a log-synth [1] schema.  That would let you read
  as
   much data as you might like with an arbitrarily complex (or simple)
  query.
  
   Operationally, you would create a file containing a log-synth schema
 that
   has the extension .synth.  Your data source would have to be configured
  to
   connect that extension with the log-synth format.  At that point, you
  could
   select as much or little data as you like from the file and you would
 see
   generated data rather than the schema.
  
  
  
   [1] https://github.com/tdunning/log-synth
  
   On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei 
   alexanderz.si...@gmail.com
wrote:
  
Hi All,
   
I am trying to come up with a query which returns a given number of
  rows
without having a real table on Storage.
   
I am hoping to achieve something like this:
   
   
   
  
 
 http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
   
DECLARE @start INT = 1;DECLARE @end INT = 100;
WITH numbers AS (
SELECT @start AS number
UNION ALL
SELECT number + 1
FROM  numbers
WHERE number  @end)SELECT *FROM numbersOPTION (MAXRECURSION 0);
   
I do not actually need to create different values and returning
  identical
rows would work too.I just need to bypass the from clause in the
  query.
   
Thanks,
Alex
   
  
 
 
 
  --
 
  Abdelhakim Deneche
 
  Software Engineer
 
http://www.mapr.com/
 
 
  Now Available - Free Hadoop On-Demand Training
  
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
  
 




-- 

Abdelhakim Deneche

Software Engineer

  http://www.mapr.com/


Now Available - Free Hadoop On-Demand Training
http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available


Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
I don't think we need a full on storage plugin.  I think a data format
should be sufficient, basically CSV on steroids.





On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche adene...@maprtech.com
 wrote:

 Yeah, we still lack documentation on how to write a storage plugin. One
 advice I've been seeing a lot is to take a look at the mongo-db plugin, it
 was basically added in one single commit:


 https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304

 I think this will give some general ideas on what to expect when writing a
 storage plugin.

 On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  Hakim,
 
  Not yet.  Still very much in the stage of gathering feedback.
 
  I would think it very simple.  The biggest obstacles are
 
  1) no documentation on how to write a data format
 
  2) I need to release a jar for log-synth to Maven Central.
 
 
 
 
  On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche 
  adene...@maprtech.com
  wrote:
 
   @Ted, the log-synth storage format would be really useful. I'm already
   seeing many unit tests that could benefit from this. Do you have a
 github
   repo for your ongoing work ?
  
   Thanks!
  
   On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com
   wrote:
  
Are you hard set on using common table expressions?
   
I have discussed a bit off-list creating a data format that would
 allow
tables to be read from a log-synth [1] schema.  That would let you
 read
   as
much data as you might like with an arbitrarily complex (or simple)
   query.
   
Operationally, you would create a file containing a log-synth schema
  that
has the extension .synth.  Your data source would have to be
 configured
   to
connect that extension with the log-synth format.  At that point, you
   could
select as much or little data as you like from the file and you would
  see
generated data rather than the schema.
   
   
   
[1] https://github.com/tdunning/log-synth
   
On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei 
alexanderz.si...@gmail.com
 wrote:
   
 Hi All,

 I am trying to come up with a query which returns a given number of
   rows
 without having a real table on Storage.

 I am hoping to achieve something like this:



   
  
 
 http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table

 DECLARE @start INT = 1;DECLARE @end INT = 100;
 WITH numbers AS (
 SELECT @start AS number
 UNION ALL
 SELECT number + 1
 FROM  numbers
 WHERE number  @end)SELECT *FROM numbersOPTION (MAXRECURSION
 0);

 I do not actually need to create different values and returning
   identical
 rows would work too.I just need to bypass the from clause in the
   query.

 Thanks,
 Alex

   
  
  
  
   --
  
   Abdelhakim Deneche
  
   Software Engineer
  
 http://www.mapr.com/
  
  
   Now Available - Free Hadoop On-Demand Training
   
  
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
   
  
 



 --

 Abdelhakim Deneche

 Software Engineer

   http://www.mapr.com/


 Now Available - Free Hadoop On-Demand Training
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
 



Re: Recursive CTE Support in Drill

2015-07-10 Thread Jacques Nadeau
Creating an EasyFormatPlugin is pretty simple.  They were designed to get
rid of much of the scaffolding required for a standard FormatPlugin.

JSON
https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json

Text
https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text

AVRO
https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro

In all cases, the connection code is pretty light.  A fully schematized
format like log-synth should be even simpler to implement.

On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 I don't think we need a full on storage plugin.  I think a data format
 should be sufficient, basically CSV on steroids.





 On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
  wrote:

  Yeah, we still lack documentation on how to write a storage plugin. One
  advice I've been seeing a lot is to take a look at the mongo-db plugin,
 it
  was basically added in one single commit:
 
 
 
 https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304
 
  I think this will give some general ideas on what to expect when writing
 a
  storage plugin.
 
  On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   Hakim,
  
   Not yet.  Still very much in the stage of gathering feedback.
  
   I would think it very simple.  The biggest obstacles are
  
   1) no documentation on how to write a data format
  
   2) I need to release a jar for log-synth to Maven Central.
  
  
  
  
   On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche 
   adene...@maprtech.com
   wrote:
  
@Ted, the log-synth storage format would be really useful. I'm
 already
seeing many unit tests that could benefit from this. Do you have a
  github
repo for your ongoing work ?
   
Thanks!
   
On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
   
 Are you hard set on using common table expressions?

 I have discussed a bit off-list creating a data format that would
  allow
 tables to be read from a log-synth [1] schema.  That would let you
  read
as
 much data as you might like with an arbitrarily complex (or simple)
query.

 Operationally, you would create a file containing a log-synth
 schema
   that
 has the extension .synth.  Your data source would have to be
  configured
to
 connect that extension with the log-synth format.  At that point,
 you
could
 select as much or little data as you like from the file and you
 would
   see
 generated data rather than the schema.



 [1] https://github.com/tdunning/log-synth

 On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei 
 alexanderz.si...@gmail.com
  wrote:

  Hi All,
 
  I am trying to come up with a query which returns a given number
 of
rows
  without having a real table on Storage.
 
  I am hoping to achieve something like this:
 
 
 

   
  
 
 http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
 
  DECLARE @start INT = 1;DECLARE @end INT = 100;
  WITH numbers AS (
  SELECT @start AS number
  UNION ALL
  SELECT number + 1
  FROM  numbers
  WHERE number  @end)SELECT *FROM numbersOPTION (MAXRECURSION
  0);
 
  I do not actually need to create different values and returning
identical
  rows would work too.I just need to bypass the from clause in
 the
query.
 
  Thanks,
  Alex
 

   
   
   
--
   
Abdelhakim Deneche
   
Software Engineer
   
  http://www.mapr.com/
   
   
Now Available - Free Hadoop On-Demand Training

   
  
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available

   
  
 
 
 
  --
 
  Abdelhakim Deneche
 
  Software Engineer
 
http://www.mapr.com/
 
 
  Now Available - Free Hadoop On-Demand Training
  
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
  
 



Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
Hakim,

Not yet.  Still very much in the stage of gathering feedback.

I would think it very simple.  The biggest obstacles are

1) no documentation on how to write a data format

2) I need to release a jar for log-synth to Maven Central.




On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche adene...@maprtech.com
wrote:

 @Ted, the log-synth storage format would be really useful. I'm already
 seeing many unit tests that could benefit from this. Do you have a github
 repo for your ongoing work ?

 Thanks!

 On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  Are you hard set on using common table expressions?
 
  I have discussed a bit off-list creating a data format that would allow
  tables to be read from a log-synth [1] schema.  That would let you read
 as
  much data as you might like with an arbitrarily complex (or simple)
 query.
 
  Operationally, you would create a file containing a log-synth schema that
  has the extension .synth.  Your data source would have to be configured
 to
  connect that extension with the log-synth format.  At that point, you
 could
  select as much or little data as you like from the file and you would see
  generated data rather than the schema.
 
 
 
  [1] https://github.com/tdunning/log-synth
 
  On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei 
  alexanderz.si...@gmail.com
   wrote:
 
   Hi All,
  
   I am trying to come up with a query which returns a given number of
 rows
   without having a real table on Storage.
  
   I am hoping to achieve something like this:
  
  
  
 
 http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
  
   DECLARE @start INT = 1;DECLARE @end INT = 100;
   WITH numbers AS (
   SELECT @start AS number
   UNION ALL
   SELECT number + 1
   FROM  numbers
   WHERE number  @end)SELECT *FROM numbersOPTION (MAXRECURSION 0);
  
   I do not actually need to create different values and returning
 identical
   rows would work too.I just need to bypass the from clause in the
 query.
  
   Thanks,
   Alex
  
 



 --

 Abdelhakim Deneche

 Software Engineer

   http://www.mapr.com/


 Now Available - Free Hadoop On-Demand Training
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
 



Drill calcite tracing issue

2015-07-10 Thread George Spofford
(If there's a better target for an issue request, please let me know!)

While trying to understand the details Calcite rule execution, I turned on
the Calcite tracing per
https://calcite.incubator.apache.org/docs/howto.html#tracing . At that
point (running a query from the web UI)  I get the error

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
AssertionError: Internal error: should never get here ...


The query is pretty straightforward:

select Person, sum(Qty1) from mongo.mine.test group by Person



A simple trial with the same settings against the Apache Calcite
example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO,
SUM(EMPNO) FROM emps GROUP BY DEPTNO ; )

In Drill, I'm querying against a Mongo db but the code path in the
exception trace doesn't immediately appear to be relevant for that. It
seems to happen the very first time dumpGraph is called.

The innermost cause (in all its glory) is due to the method:

@Override public RelOptCost computeSelfCost(RelOptPlanner planner) {
  // HepRelMetadataProvider is supposed to intercept this
  // and redirect to the real rels.
  throw Util.newInternal(should never get here);
}


and the trace is:

  cause {
exception_class: java.lang.AssertionError
message: Internal error: should never get here
stack_trace {
  class_name: org.apache.calcite.util.Util
  file_name: Util.java
  line_number: 775
  method_name: newInternal
  is_native_method: false
}
stack_trace {
  class_name: org.apache.calcite.plan.hep.HepRelVertex
  file_name: HepRelVertex.java
  line_number: 68
  method_name: computeSelfCost
  is_native_method: false
}
stack_trace {
  class_name:
org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
  file_name: RelMdPercentageOriginalRows.java
  line_number: 165
  method_name: getNonCumulativeCost
  is_native_method: false
}
stack_trace {
  class_name: ...
  line_number: 0
  method_name: ...
  is_native_method: false
}
stack_trace {
  class_name:
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
  file_name: ReflectiveRelMetadataProvider.java
  line_number: 194
  method_name: invoke
  is_native_method: false
}
stack_trace {
  class_name: ...
  line_number: 0
  method_name: ...
  is_native_method: false
}
stack_trace {
  class_name: org.apache.calcite.rel.metadata.RelMetadataQuery
  file_name: RelMetadataQuery.java
  line_number: 115
  method_name: getNonCumulativeCost
  is_native_method: false
}
stack_trace {
  class_name:
org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
  file_name: RelMdPercentageOriginalRows.java
  line_number: 151
  method_name: getCumulativeCost
  is_native_method: false
}
stack_trace {
  class_name: ...
  line_number: 0
  method_name: ...
  is_native_method: false
}
stack_trace {
  class_name:
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
  file_name: ReflectiveRelMetadataProvider.java
  line_number: 194
  method_name: invoke
  is_native_method: false
}
stack_trace {
  class_name: ...
  line_number: 0
  method_name: ...
  is_native_method: false
}
stack_trace {
  class_name: org.apache.calcite.rel.metadata.RelMetadataQuery
  file_name: RelMetadataQuery.java
  line_number: 101
  method_name: getCumulativeCost
  is_native_method: false
}
stack_trace {
  class_name:
org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
  file_name: RelMdPercentageOriginalRows.java
  line_number: 154
  method_name: getCumulativeCost
  is_native_method: false
}
stack_trace {
  class_name: ...
  line_number: 0
  method_name: ...
  is_native_method: false
}
stack_trace {
  class_name:
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
  file_name: ReflectiveRelMetadataProvider.java
  line_number: 194
  method_name: invoke

Re: Review Request 34374: DRILL-3133: MergingRecordBatch can leak memory if query is canceled before batches in rawBatches were loaded

2015-07-10 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34374/
---

(Updated July 10, 2015, 7:33 p.m.)


Review request for drill and Steven Phillips.


Changes
---

addressing review comments.


Bugs: DRILL-3133
https://issues.apache.org/jira/browse/DRILL-3133


Repository: drill-git


Description
---

MergingRecordBatch stores batches in an array list before loading them with 
RecordBatchLoader. If the query is canceled before all received batches are 
loaded, some of the batches won't be cleaned up.

lines 307 and 339 contain questions to the reviewers. I will update the patch 
accordingly


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/mergereceiver/MergingRecordBatch.java
 3ca11f1 

Diff: https://reviews.apache.org/r/34374/diff/


Testing
---

all unit tests are passing along with functional and tpch100


Thanks,

abdelhakim deneche



Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
It may be easy, but it is completely opaque about what really needs to
happen.

For instance,

1) how is schema exposed?

2) which classes do I really need to implement?

3) how do I express partitioning of a format?

4) how do I test it?

Just a bit of documentation and comments would go a very, very long way.

Even answers on the mailing list that have more details than oh, that's
easy.  I would be happy to transcribe answers into the code if I could
just get some.



On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau jacq...@apache.org wrote:

 Creating an EasyFormatPlugin is pretty simple.  They were designed to get
 rid of much of the scaffolding required for a standard FormatPlugin.

 JSON

 https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json

 Text

 https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text

 AVRO

 https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro

 In all cases, the connection code is pretty light.  A fully schematized
 format like log-synth should be even simpler to implement.

 On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  I don't think we need a full on storage plugin.  I think a data format
  should be sufficient, basically CSV on steroids.
 
 
 
 
 
  On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche 
  adene...@maprtech.com
   wrote:
 
   Yeah, we still lack documentation on how to write a storage plugin. One
   advice I've been seeing a lot is to take a look at the mongo-db plugin,
  it
   was basically added in one single commit:
  
  
  
 
 https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304
  
   I think this will give some general ideas on what to expect when
 writing
  a
   storage plugin.
  
   On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com
   wrote:
  
Hakim,
   
Not yet.  Still very much in the stage of gathering feedback.
   
I would think it very simple.  The biggest obstacles are
   
1) no documentation on how to write a data format
   
2) I need to release a jar for log-synth to Maven Central.
   
   
   
   
On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche 
adene...@maprtech.com
wrote:
   
 @Ted, the log-synth storage format would be really useful. I'm
  already
 seeing many unit tests that could benefit from this. Do you have a
   github
 repo for your ongoing work ?

 Thanks!

 On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning 
 ted.dunn...@gmail.com
 wrote:

  Are you hard set on using common table expressions?
 
  I have discussed a bit off-list creating a data format that would
   allow
  tables to be read from a log-synth [1] schema.  That would let
 you
   read
 as
  much data as you might like with an arbitrarily complex (or
 simple)
 query.
 
  Operationally, you would create a file containing a log-synth
  schema
that
  has the extension .synth.  Your data source would have to be
   configured
 to
  connect that extension with the log-synth format.  At that point,
  you
 could
  select as much or little data as you like from the file and you
  would
see
  generated data rather than the schema.
 
 
 
  [1] https://github.com/tdunning/log-synth
 
  On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei 
  alexanderz.si...@gmail.com
   wrote:
 
   Hi All,
  
   I am trying to come up with a query which returns a given
 number
  of
 rows
   without having a real table on Storage.
  
   I am hoping to achieve something like this:
  
  
  
 

   
  
 
 http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
  
   DECLARE @start INT = 1;DECLARE @end INT = 100;
   WITH numbers AS (
   SELECT @start AS number
   UNION ALL
   SELECT number + 1
   FROM  numbers
   WHERE number  @end)SELECT *FROM numbersOPTION
 (MAXRECURSION
   0);
  
   I do not actually need to create different values and returning
 identical
   rows would work too.I just need to bypass the from clause in
  the
 query.
  
   Thanks,
   Alex
  
 



 --

 Abdelhakim Deneche

 Software Engineer

   http://www.mapr.com/


 Now Available - Free Hadoop On-Demand Training
 

   
  
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
 

   
  
  
  
   --
  
   Abdelhakim Deneche
  
   Software Engineer
  
 http://www.mapr.com/
  
  
   Now Available - Free Hadoop On-Demand Training
   
  
 
 http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
   
  
 



Threads left after Drillbit shutdown (in dev./unit tests)

2015-07-10 Thread Daniel Barclay

Is Drill terminating threads correctly?

In running jstack on a JVM running a dev. test run that ended up hung
after getting about three test timeout errors, I see that there are
409 threads.

Although 138 of those are not-unexpected ShutdownHook threads (since
many tests are run in one VM), there are:
- 138 WorkManager.StatusThread threads (hmm 138 again)
-   7 Client-1 threads
-   4 UserServer-1 threads
-  21 BitClient-1 threads
-   4 BitClient-2 threads
-   3 BitClient-3 threads
-   8 BitServer-1 threads
-   8 BitServer-2 threads
-   7 BitServer-3 threads
-   7 BitServer-4 threads
-   7 BitServer-5 threads
-   6 BitServer-6 threads
-   6 BitServer-7 threads
-   6 BitServer-8 threads
-   5 BitServer-9 threads
-   5 BitServer-10 threads
(Other thread names have only 1 or 2 occurrences.)

Regarding the 4 for the number of UserServer-1 threads:  Three test
methods had timeout failures plus one got hung.


Here's the tail end of the output from the test running, including
all the timeout errors and including the hang (except for repeated
query-results data lines).



dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install

TRIMMED

Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running 
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
Running 
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
Running 
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
Running 
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec - in 
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
Running 
org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
00:44:34.017 [globalEventExecutor-1-523] ERROR 
o.a.z.server.NIOServerCnxnFactory - Thread 
Thread[globalEventExecutor-1-523,5,main] died
java.lang.AssertionError: null
at 
io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
Running 
org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
1
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 sec  
FAILURE! - in org.apache.drill.exec.physical.impl.TestBroadcastExchange
TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
  Time elapsed: 50.063 sec   ERROR!
java.lang.Exception: test timed out after 5 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at 
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at 
org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at 
org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at 
org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)

TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
  Time elapsed: 50.014 sec   ERROR!
java.lang.Exception: test timed out after 5 milliseconds
at java.lang.Object.wait(Native Method)
at 

[GitHub] drill pull request: DRILL-3483: Clarify CommonConstants' constants...

2015-07-10 Thread dsbos
GitHub user dsbos opened a pull request:

https://github.com/apache/drill/pull/88

DRILL-3483: Clarify CommonConstants' constants.

Renamed constants.
Documented constants.
Removed extraneous public static final (redundant and abnormal since in 
interface).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dsbos/incubator-drill 
bugs/drill-3483-CommonConstants-clarification

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/88.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #88


commit 6e39eb7ca428d0d0baeeb29f50ed25b1a8872685
Author: dbarclay dbarc...@maprtech.com
Date:   2015-07-02T22:04:31Z

DRILL-3483: Clarify CommonConstants' constants.  [CommonConstants, 
DrillConfig, PathScanner]

Renamed constants.
Documented constants.
Removed extraneous public static final (redudant and abnormal since in 
interface).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill calcite tracing issue

2015-07-10 Thread Jinfeng Ni
@Tim, AFAIK, the HepPlanner was introduced for window function, because
otherwise the VolcanoPlanner would hit CanNotPlan issue at that time.

@George, it would be hard to modify the code path in a production system,
unless you apply a temporary patch to do that, assuming you do not use
window function in your production system at all. Otherwise, you may
consider waiting for 1.2.0, for which we will try to get a patch to fix
this issue.

In general, the tracing feature is mainly by developer for debugging issue
related to Calcite.




On Fri, Jul 10, 2015 at 1:31 PM, George Spofford geospoff...@gmail.com
wrote:

 Thanks, will see if I can influence the code path here. Not sure how I can
 skip HepPlanner findBestExp()  in production code path, but I'm in my early
 days here.

 Can I suggest that, for at least a smattering of query forms, that FINER
 and FINEST tracing be included in integration if not also unit tests?

 On Fri, Jul 10, 2015 at 1:22 PM, Jinfeng Ni jinfengn...@gmail.com wrote:

  DRILL-3156 was filed to track the calcite trace issue [1].
 
  Basically, the HepPlanner used for window function planning caused the
  tracing issue. I have a prototype patch to fix this issue. I'll try to
 see
  if I can get it ready for 1.2.0 release.
 
  As a workaround, if you do not use window function, you may consider
  skipping the HepPlanner findBestExp() call in
   DefaultSqlHandler.:convertToRel(SqlNode node)[2].
 
 
  1. https://issues.apache.org/jira/browse/DRILL-3156
  2.
 
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451
 
 
 
  On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com
  wrote:
 
   (If there's a better target for an issue request, please let me know!)
  
   While trying to understand the details Calcite rule execution, I turned
  on
   the Calcite tracing per
   https://calcite.incubator.apache.org/docs/howto.html#tracing . At that
   point (running a query from the web UI)  I get the error
  
   Query Failed: An Error Occurred
   org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
   AssertionError: Internal error: should never get here ...
  
  
   The query is pretty straightforward:
  
   select Person, sum(Qty1) from mongo.mine.test group by Person
  
  
  
   A simple trial with the same settings against the Apache Calcite
   example/csv doesn't show the same behavior. (Calcite query: SELECT
  DEPTNO,
   SUM(EMPNO) FROM emps GROUP BY DEPTNO ; )
  
   In Drill, I'm querying against a Mongo db but the code path in the
   exception trace doesn't immediately appear to be relevant for that. It
   seems to happen the very first time dumpGraph is called.
  
   The innermost cause (in all its glory) is due to the method:
  
   @Override public RelOptCost computeSelfCost(RelOptPlanner planner) {
 // HepRelMetadataProvider is supposed to intercept this
 // and redirect to the real rels.
 throw Util.newInternal(should never get here);
   }
  
  
   and the trace is:
  
 cause {
   exception_class: java.lang.AssertionError
   message: Internal error: should never get here
   stack_trace {
 class_name: org.apache.calcite.util.Util
 file_name: Util.java
 line_number: 775
 method_name: newInternal
 is_native_method: false
   }
   stack_trace {
 class_name: org.apache.calcite.plan.hep.HepRelVertex
 file_name: HepRelVertex.java
 line_number: 68
 method_name: computeSelfCost
 is_native_method: false
   }
   stack_trace {
 class_name:
   org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
 file_name: RelMdPercentageOriginalRows.java
 line_number: 165
 method_name: getNonCumulativeCost
 is_native_method: false
   }
   stack_trace {
 class_name: ...
 line_number: 0
 method_name: ...
 is_native_method: false
   }
   stack_trace {
 class_name:
   org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
 file_name: ReflectiveRelMetadataProvider.java
 line_number: 194
 method_name: invoke
 is_native_method: false
   }
   stack_trace {
 class_name: ...
 line_number: 0
 method_name: ...
 is_native_method: false
   }
   stack_trace {
 class_name:
   org.apache.calcite.rel.metadata.RelMetadataQuery
 file_name: RelMetadataQuery.java
 line_number: 115
 

Re: Drill calcite tracing issue

2015-07-10 Thread Jinfeng Ni
DRILL-3156 was filed to track the calcite trace issue [1].

Basically, the HepPlanner used for window function planning caused the
tracing issue. I have a prototype patch to fix this issue. I'll try to see
if I can get it ready for 1.2.0 release.

As a workaround, if you do not use window function, you may consider
skipping the HepPlanner findBestExp() call in
 DefaultSqlHandler.:convertToRel(SqlNode node)[2].


1. https://issues.apache.org/jira/browse/DRILL-3156
2.
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451



On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com
wrote:

 (If there's a better target for an issue request, please let me know!)

 While trying to understand the details Calcite rule execution, I turned on
 the Calcite tracing per
 https://calcite.incubator.apache.org/docs/howto.html#tracing . At that
 point (running a query from the web UI)  I get the error

 Query Failed: An Error Occurred
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
 AssertionError: Internal error: should never get here ...


 The query is pretty straightforward:

 select Person, sum(Qty1) from mongo.mine.test group by Person



 A simple trial with the same settings against the Apache Calcite
 example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO,
 SUM(EMPNO) FROM emps GROUP BY DEPTNO ; )

 In Drill, I'm querying against a Mongo db but the code path in the
 exception trace doesn't immediately appear to be relevant for that. It
 seems to happen the very first time dumpGraph is called.

 The innermost cause (in all its glory) is due to the method:

 @Override public RelOptCost computeSelfCost(RelOptPlanner planner) {
   // HepRelMetadataProvider is supposed to intercept this
   // and redirect to the real rels.
   throw Util.newInternal(should never get here);
 }


 and the trace is:

   cause {
 exception_class: java.lang.AssertionError
 message: Internal error: should never get here
 stack_trace {
   class_name: org.apache.calcite.util.Util
   file_name: Util.java
   line_number: 775
   method_name: newInternal
   is_native_method: false
 }
 stack_trace {
   class_name: org.apache.calcite.plan.hep.HepRelVertex
   file_name: HepRelVertex.java
   line_number: 68
   method_name: computeSelfCost
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
   file_name: RelMdPercentageOriginalRows.java
   line_number: 165
   method_name: getNonCumulativeCost
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
   file_name: ReflectiveRelMetadataProvider.java
   line_number: 194
   method_name: invoke
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMetadataQuery
   file_name: RelMetadataQuery.java
   line_number: 115
   method_name: getNonCumulativeCost
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
   file_name: RelMdPercentageOriginalRows.java
   line_number: 151
   method_name: getCumulativeCost
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
   file_name: ReflectiveRelMetadataProvider.java
   line_number: 194
   method_name: invoke
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMetadataQuery
   file_name: RelMetadataQuery.java
   line_number: 101

Re: Drill calcite tracing issue

2015-07-10 Thread George Spofford
Thanks, will see if I can influence the code path here. Not sure how I can
skip HepPlanner findBestExp()  in production code path, but I'm in my early
days here.

Can I suggest that, for at least a smattering of query forms, that FINER
and FINEST tracing be included in integration if not also unit tests?

On Fri, Jul 10, 2015 at 1:22 PM, Jinfeng Ni jinfengn...@gmail.com wrote:

 DRILL-3156 was filed to track the calcite trace issue [1].

 Basically, the HepPlanner used for window function planning caused the
 tracing issue. I have a prototype patch to fix this issue. I'll try to see
 if I can get it ready for 1.2.0 release.

 As a workaround, if you do not use window function, you may consider
 skipping the HepPlanner findBestExp() call in
  DefaultSqlHandler.:convertToRel(SqlNode node)[2].


 1. https://issues.apache.org/jira/browse/DRILL-3156
 2.

 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451



 On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com
 wrote:

  (If there's a better target for an issue request, please let me know!)
 
  While trying to understand the details Calcite rule execution, I turned
 on
  the Calcite tracing per
  https://calcite.incubator.apache.org/docs/howto.html#tracing . At that
  point (running a query from the web UI)  I get the error
 
  Query Failed: An Error Occurred
  org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
  AssertionError: Internal error: should never get here ...
 
 
  The query is pretty straightforward:
 
  select Person, sum(Qty1) from mongo.mine.test group by Person
 
 
 
  A simple trial with the same settings against the Apache Calcite
  example/csv doesn't show the same behavior. (Calcite query: SELECT
 DEPTNO,
  SUM(EMPNO) FROM emps GROUP BY DEPTNO ; )
 
  In Drill, I'm querying against a Mongo db but the code path in the
  exception trace doesn't immediately appear to be relevant for that. It
  seems to happen the very first time dumpGraph is called.
 
  The innermost cause (in all its glory) is due to the method:
 
  @Override public RelOptCost computeSelfCost(RelOptPlanner planner) {
// HepRelMetadataProvider is supposed to intercept this
// and redirect to the real rels.
throw Util.newInternal(should never get here);
  }
 
 
  and the trace is:
 
cause {
  exception_class: java.lang.AssertionError
  message: Internal error: should never get here
  stack_trace {
class_name: org.apache.calcite.util.Util
file_name: Util.java
line_number: 775
method_name: newInternal
is_native_method: false
  }
  stack_trace {
class_name: org.apache.calcite.plan.hep.HepRelVertex
file_name: HepRelVertex.java
line_number: 68
method_name: computeSelfCost
is_native_method: false
  }
  stack_trace {
class_name:
  org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
file_name: RelMdPercentageOriginalRows.java
line_number: 165
method_name: getNonCumulativeCost
is_native_method: false
  }
  stack_trace {
class_name: ...
line_number: 0
method_name: ...
is_native_method: false
  }
  stack_trace {
class_name:
  org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
file_name: ReflectiveRelMetadataProvider.java
line_number: 194
method_name: invoke
is_native_method: false
  }
  stack_trace {
class_name: ...
line_number: 0
method_name: ...
is_native_method: false
  }
  stack_trace {
class_name:
  org.apache.calcite.rel.metadata.RelMetadataQuery
file_name: RelMetadataQuery.java
line_number: 115
method_name: getNonCumulativeCost
is_native_method: false
  }
  stack_trace {
class_name:
  org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
file_name: RelMdPercentageOriginalRows.java
line_number: 151
method_name: getCumulativeCost
is_native_method: false
  }
  stack_trace {
class_name: ...
line_number: 0
method_name: ...
is_native_method: false
  }
  stack_trace {
class_name:
  org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
file_name: 

Re: Drill calcite tracing issue

2015-07-10 Thread Timothy Chen
I remember introducing the HepPlanner that when I worked on window
functions, probably a good idea to comment on why we actually need
that since I can't recall the exact reasons now.

Tim

On Fri, Jul 10, 2015 at 1:22 PM, Jinfeng Ni jinfengn...@gmail.com wrote:
 DRILL-3156 was filed to track the calcite trace issue [1].

 Basically, the HepPlanner used for window function planning caused the
 tracing issue. I have a prototype patch to fix this issue. I'll try to see
 if I can get it ready for 1.2.0 release.

 As a workaround, if you do not use window function, you may consider
 skipping the HepPlanner findBestExp() call in
  DefaultSqlHandler.:convertToRel(SqlNode node)[2].


 1. https://issues.apache.org/jira/browse/DRILL-3156
 2.
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451



 On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com
 wrote:

 (If there's a better target for an issue request, please let me know!)

 While trying to understand the details Calcite rule execution, I turned on
 the Calcite tracing per
 https://calcite.incubator.apache.org/docs/howto.html#tracing . At that
 point (running a query from the web UI)  I get the error

 Query Failed: An Error Occurred
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
 AssertionError: Internal error: should never get here ...


 The query is pretty straightforward:

 select Person, sum(Qty1) from mongo.mine.test group by Person



 A simple trial with the same settings against the Apache Calcite
 example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO,
 SUM(EMPNO) FROM emps GROUP BY DEPTNO ; )

 In Drill, I'm querying against a Mongo db but the code path in the
 exception trace doesn't immediately appear to be relevant for that. It
 seems to happen the very first time dumpGraph is called.

 The innermost cause (in all its glory) is due to the method:

 @Override public RelOptCost computeSelfCost(RelOptPlanner planner) {
   // HepRelMetadataProvider is supposed to intercept this
   // and redirect to the real rels.
   throw Util.newInternal(should never get here);
 }


 and the trace is:

   cause {
 exception_class: java.lang.AssertionError
 message: Internal error: should never get here
 stack_trace {
   class_name: org.apache.calcite.util.Util
   file_name: Util.java
   line_number: 775
   method_name: newInternal
   is_native_method: false
 }
 stack_trace {
   class_name: org.apache.calcite.plan.hep.HepRelVertex
   file_name: HepRelVertex.java
   line_number: 68
   method_name: computeSelfCost
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
   file_name: RelMdPercentageOriginalRows.java
   line_number: 165
   method_name: getNonCumulativeCost
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
   file_name: ReflectiveRelMetadataProvider.java
   line_number: 194
   method_name: invoke
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMetadataQuery
   file_name: RelMetadataQuery.java
   line_number: 115
   method_name: getNonCumulativeCost
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows
   file_name: RelMdPercentageOriginalRows.java
   line_number: 151
   method_name: getCumulativeCost
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0
   method_name: ...
   is_native_method: false
 }
 stack_trace {
   class_name:
 org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1
   file_name: ReflectiveRelMetadataProvider.java
   line_number: 194
   method_name: invoke
   is_native_method: false
 }
 stack_trace {
   class_name: ...
   line_number: 0

[GitHub] drill pull request: DRILL-3483: Clarify CommonConstants' constants...

2015-07-10 Thread jaltekruse
Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/88#issuecomment-120540926
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Hash Agg vs Streaming Agg for a smaller data set

2015-07-10 Thread rahul challapalli
Hi,

Info about Data : The data is auto partitioned tpch 0.01 data. The second
filter is a non-partitioned column, so in the first case the 'OR' predicate
results in a full-table scan, while in the second case, partition pruning
takes effect.

The first case results in a hash agg and the second case in a streaming
agg. Any idea why?

1. explain plan for select distinct l_modline, l_moddate from
`tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
'1992-01-01' or l_shipdate=date'1992-01-01';
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(l_modline=[$0], l_moddate=[$1])
00-02Project(l_modline=[$0], l_moddate=[$1])
00-03  HashAgg(group=[{0, 1}])
00-04Project(l_modline=[$2], l_moddate=[$0])
00-05  SelectionVectorRemover
00-06Filter(condition=[OR(=($0, 1992-01-01), =($1,
1992-01-01))])
00-07  Project(l_moddate=[$2], l_shipdate=[$1],
l_modline=[$0])
00-08Scan..

2. explain plan for select distinct l_modline, l_moddate from
`tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
'1992-01-01' and l_shipdate=date'1992-01-01';
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(l_modline=[$0], l_moddate=[$1])
00-02Project(l_modline=[$0], l_moddate=[$1])
00-03  StreamAgg(group=[{0, 1}])
00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-05  Project(l_modline=[$2], l_moddate=[$0])
00-06SelectionVectorRemover
00-07  Filter(condition=[AND(=($0, 1992-01-01), =($1,
1992-01-01))])
00-08Project(l_moddate=[$2], l_shipdate=[$1],
l_modline=[$0])
00-09  Scan.

- Rahul