[GitHub] drill pull request: Bugs/various work re master 07 09
GitHub user dsbos opened a pull request: https://github.com/apache/drill/pull/87 Bugs/various work re master 07 09 You can merge this pull request into a Git repository by running: $ git pull https://github.com/dsbos/incubator-drill bugs/various_WORK_re_master_07-09 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/87.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #87 commit 92e2cc18c79dbf5269304d1479f1b6ff8bcd477e Author: dbarclay dbarc...@maprtech.com Date: 2015-03-23T21:16:48Z sep.: --- forkCount 2 - 1 [pom.xml] --- = Working, Review commit 59608f3664cefd9754b778c45479c3b72025b1b2 Author: dbarclay dbarc...@maprtech.com Date: 2015-03-24T17:20:27Z sep.: =-=-=-=-=-=-=-= forkCount 3 - 2. [/pom.xml] =-=-=-=-=-=-=-= = in Review, awaiting Merge Conflicts: pom.xml commit 3e155dc1345118a997de66b338adead8328517bd Author: dbarclay dbarc...@maprtech.com Date: 2015-06-20T02:05:39Z DRILL-3151: Fix many ResultSetMetaData method return values. Added ~unit test for ResultSetMetaData implementation. Made getObject return classes available to implementation of getColumnClassName: - Added SqlAccessor.getObjectClass() (to put that metadata right next to code to which it corresponds rather than in far-away parallel code). - Added similar AvaticaDrillSqlAccessor.getObjectClass(). - Changed DrillAccessorList.accessors from Accessor[] to AvaticaDrillSqlAccessor[] for better access to JDBC getObject return class. - Extracted return classes from accessors to pass to updateColumnMetaData. Reworked some data type mapping and utilities: - Added Added Types.getSqlTypeName(...). - Renamed Types.getJdbcType(...) to getJdbcTypeCode(...) - Replaced Types.isUnSigned with isJdbcSignedType. - Fixed various bogus RPC-type XXX - java.sql.Types.SMALLINT mappings. - Removed DrillColumnMetaDataList.getJdbcTypeName. - Moved getAvaticaType up (for bottom-up order). - Revised DrillColumnMetaDataList.getAvaticaType(...). MAIN: - Updated updateColumnMetaData(...) to change many calculations of metadata input to ColumnMetaData construction. [DrillColumnMetaDataList] Updated other metadata tests per changes. commit ceae4668f250672391e51c84d9b4e295a4c0f4a5 Author: dbarclay dbarc...@maprtech.com Date: 2015-07-02T22:04:31Z DRILL-3483: Clarify CommonConstants' constants. [CommonConstants, DrillConfig, PathScanner] Renamed constants. Documented constants. Removed extraneous public static final (redudant and abnormal since in interface). commit 03dafaca207b45c01519e22ceeb0dd2784db18d5 Author: dbarclay dbarc...@maprtech.com Date: 2015-04-17T20:09:59Z DRILL-2696: Test for future DRILL-2696 fix (currently disabled with @Ignore). commit 73183a415ac765863ade2a632799f328fcddad74 Author: dbarclay dbarc...@maprtech.com Date: 2015-04-17T23:27:46Z DRILL-2815: Some PathScanner logging, misc. cleanup. Add some DEBUG-level log calls and augmented, edited log message. Misc. code hygiene: - Added method doc comment. - Renamed to clarify a number of names. - Added final. - Fixed indentation; wrapped some long lines. commit c68cc505c03b06b2db3d18eef7bce965408efb15 Author: dbarclay dbarc...@maprtech.com Date: 2015-03-29T21:46:47Z temp: logging adjustments. [exec/java-exec/src/test/resources/logback.xml] commit e48f5267c6c1b27cd2fa500a8062876a61e0f6d2 Author: dbarclay dbarc...@maprtech.com Date: 2015-03-29T21:47:11Z temp: logging adjustments. [exec/jdbc/src/test/resources/logback.xml] commit a0e9cb671e08ded93e571fa096fa5c38f765ed85 Author: dbarclay dbarc...@maprtech.com Date: 2015-04-06T18:11:13Z temp: logging adjustments. [common/src/test/resources/logback.xml] commit 321b9d09da0973e8da48f81a94c64b3503602321 Author: dbarclay dbarc...@maprtech.com Date: 2015-05-10T19:33:21Z temp: logging adjustments. [distribution/src/resources/logback.xml] commit d7da49862c13144c29d78d49c9d4605952f436cf Author: dbarclay dbarc...@maprtech.com Date: 2015-05-14T22:40:06Z temp: logging adjustments (probably TEMP). [exec/jdbc/src/test/resources/logback.xml] commit ae797565156fbaefa795ca22d75631763bd4cfe6 Author: dbarclay dbarc...@maprtech.com Date: 2015-05-15T03:54:51Z temp: logging adjustments. [jdbc/src/test/resources/logback.xml] commit 1cd642ad80311850c4df1f76099a05c78b8a6a31 Author: dbarclay dbarc...@maprtech.com Date: 2015-04-17T20:54:22Z : Logging: Added calls. [DrillConfig] commit 11a72689afc4e3c86cc1dcf6fbd5213040cf444e Author: dbarclay dbarc...@maprtech.com Date: 2015-05-30T18:11:13Z temp: logging adjustments. [jdbc/src/test/resources/logback.xml]
[GitHub] drill pull request: Bugs/various work re master 07 09
Github user dsbos closed the pull request at: https://github.com/apache/drill/pull/87 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36278/ --- (Updated July 10, 2015, 11:33 p.m.) Review request for drill, Aman Sinha and Jinfeng Ni. Changes --- new patch Bugs: DRILL-3189 https://issues.apache.org/jira/browse/DRILL-3189 Repository: drill-git Description --- Disable disallow partial in Over-Clause Diffs (updated) - exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java 9bbd537 exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 7071bea Diff: https://reviews.apache.org/r/36278/diff/ Testing --- All requested Thanks, Sean Hsuan-Yi Chu
[GitHub] drill pull request: DRILL-2650: Mark query end time when closing t...
Github user cwestin commented on the pull request: https://github.com/apache/drill/pull/80#issuecomment-120554910 Do we want to stick with .info() for these messages instead of .debug()? Asking because I'm not sure, but it seems like noise. Otherwise, non-binding ship it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-2650: Mark query end time when closing t...
Github user cwestin commented on a diff in the pull request: https://github.com/apache/drill/pull/80#discussion_r34406523 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -789,7 +794,7 @@ protected void processEvent(final StateEvent event) { final Exception exception = event.exception; // TODO Auto-generated method stub - logger.info(State change requested. {} -- {}, state, newState, + logger.info(queryIdString + : State change requested {} -- {}, state, newState, --- End diff -- .info() - .debug()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-2650: Mark query end time when closing t...
Github user cwestin commented on a diff in the pull request: https://github.com/apache/drill/pull/80#discussion_r34406507 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -695,7 +697,10 @@ public void close() { Preconditions.checkState(!isClosed); Preconditions.checkState(resultState != null); - logger.info(foreman cleaning up.); + // to track how long the query takes + queryManager.markEndTime(); + + logger.info(queryIdString + : cleaning up.); --- End diff -- .info() seems like it will generate a lot of noise here, should it be .debug()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Threads left after Drillbit shutdown (in dev./unit tests)
Is there any way to re-produce this at a smaller scale? Have you tried failing a couple of tests and dumping threads? -Hanifi Thanks On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay dbarc...@maprtech.com wrote: Is Drill terminating threads correctly? In running jstack on a JVM running a dev. test run that ended up hung after getting about three test timeout errors, I see that there are 409 threads. Although 138 of those are not-unexpected ShutdownHook threads (since many tests are run in one VM), there are: - 138 WorkManager.StatusThread threads (hmm 138 again) - 7 Client-1 threads - 4 UserServer-1 threads - 21 BitClient-1 threads - 4 BitClient-2 threads - 3 BitClient-3 threads - 8 BitServer-1 threads - 8 BitServer-2 threads - 7 BitServer-3 threads - 7 BitServer-4 threads - 7 BitServer-5 threads - 6 BitServer-6 threads - 6 BitServer-7 threads - 6 BitServer-8 threads - 5 BitServer-9 threads - 5 BitServer-10 threads (Other thread names have only 1 or 2 occurrences.) Regarding the 4 for the number of UserServer-1 threads: Three test methods had timeout failures plus one got hung. Here's the tail end of the output from the test running, including all the timeout errors and including the hang (except for repeated query-results data lines). dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install TRIMMED Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun Running org.apache.drill.exec.physical.impl.TestBroadcastExchange Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans 00:44:34.017 [globalEventExecutor-1-523] ERROR o.a.z.server.NIOServerCnxnFactory - Thread Thread[globalEventExecutor-1-523,5,main] died java.lang.AssertionError: null at io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72] Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange 1 Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 sec FAILURE! - in org.apache.drill.exec.physical.impl.TestBroadcastExchange TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange) Time elapsed: 50.063 sec ERROR! java.lang.Exception: test timed out after 5 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254) at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32) at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31) at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218) at com.google.common.io.Closeables.close(Closeables.java:77) at com.google.common.io .Closeables.closeQuietly(Closeables.java:108) at org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70) at com.google.common.io.Closeables.close(Closeables.java:77) at com.google.common.io .Closeables.closeQuietly(Closeables.java:108) at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88) at com.google.common.io.Closeables.close(Closeables.java:77) at com.google.common.io .Closeables.closeQuietly(Closeables.java:108) at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288) at org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)
Re: Hash Agg vs Streaming Agg for a smaller data set
That could be the reason, in the first query we are scanning 64000 records and in the second case just 108 records. Thanks for the replies! On Fri, Jul 10, 2015 at 4:48 PM, Jinfeng Ni jinfengn...@gmail.com wrote: I'm not clear which column is the partitioning column. From what you described, row count of aggregator in the first case is larger than that in the second case, since the former one requires full table scan. Cost-wise, hash-agg would make more sense when the input is larger, since streaming-agg requires sort, which could be expensive for large dataset. My guess is the difference of rowcounts in the two cases cause the difference in the query plan. One suggestion. If you want to check query plan, it would make more sense to try with reasonably large data. Drill's costing model is not fully calibrated yet; a small dataset like tpch_0.0.1 might make it hard for the cost model to pick the right plan. On the other hand, if the dataset is small, two different plans normally would not make a big difference in terms of performance. In other words, try to use large dataset if you are interested in performance testing / plan verification. On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli challapallira...@gmail.com wrote: Hi, Info about Data : The data is auto partitioned tpch 0.01 data. The second filter is a non-partitioned column, so in the first case the 'OR' predicate results in a full-table scan, while in the second case, partition pruning takes effect. The first case results in a hash agg and the second case in a streaming agg. Any idea why? 1. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' or l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 HashAgg(group=[{0, 1}]) 00-04Project(l_modline=[$2], l_moddate=[$0]) 00-05 SelectionVectorRemover 00-06Filter(condition=[OR(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-07 Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-08Scan.. 2. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' and l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 StreamAgg(group=[{0, 1}]) 00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-05 Project(l_modline=[$2], l_moddate=[$0]) 00-06SelectionVectorRemover 00-07 Filter(condition=[AND(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-08Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-09 Scan. - Rahul
Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36278/#review91375 --- Ship it! Ship It! - Aman Sinha On July 10, 2015, 11:33 p.m., Sean Hsuan-Yi Chu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36278/ --- (Updated July 10, 2015, 11:33 p.m.) Review request for drill, Aman Sinha and Jinfeng Ni. Bugs: DRILL-3189 https://issues.apache.org/jira/browse/DRILL-3189 Repository: drill-git Description --- Disable disallow partial in Over-Clause Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java 9bbd537 exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 7071bea Diff: https://reviews.apache.org/r/36278/diff/ Testing --- All requested Thanks, Sean Hsuan-Yi Chu
Re: Hash Agg vs Streaming Agg for a smaller data set
My guess is that in the second query, the size of the dataset is smaller, and this causes the cost of sorting to be small enough that it is cheaper than the HashAgg. On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli challapallira...@gmail.com wrote: Hi, Info about Data : The data is auto partitioned tpch 0.01 data. The second filter is a non-partitioned column, so in the first case the 'OR' predicate results in a full-table scan, while in the second case, partition pruning takes effect. The first case results in a hash agg and the second case in a streaming agg. Any idea why? 1. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' or l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 HashAgg(group=[{0, 1}]) 00-04Project(l_modline=[$2], l_moddate=[$0]) 00-05 SelectionVectorRemover 00-06Filter(condition=[OR(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-07 Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-08Scan.. 2. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' and l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 StreamAgg(group=[{0, 1}]) 00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-05 Project(l_modline=[$2], l_moddate=[$0]) 00-06SelectionVectorRemover 00-07 Filter(condition=[AND(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-08Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-09 Scan. - Rahul -- Steven Phillips Software Engineer mapr.com
Re: Hash Agg vs Streaming Agg for a smaller data set
I'm not clear which column is the partitioning column. From what you described, row count of aggregator in the first case is larger than that in the second case, since the former one requires full table scan. Cost-wise, hash-agg would make more sense when the input is larger, since streaming-agg requires sort, which could be expensive for large dataset. My guess is the difference of rowcounts in the two cases cause the difference in the query plan. One suggestion. If you want to check query plan, it would make more sense to try with reasonably large data. Drill's costing model is not fully calibrated yet; a small dataset like tpch_0.0.1 might make it hard for the cost model to pick the right plan. On the other hand, if the dataset is small, two different plans normally would not make a big difference in terms of performance. In other words, try to use large dataset if you are interested in performance testing / plan verification. On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli challapallira...@gmail.com wrote: Hi, Info about Data : The data is auto partitioned tpch 0.01 data. The second filter is a non-partitioned column, so in the first case the 'OR' predicate results in a full-table scan, while in the second case, partition pruning takes effect. The first case results in a hash agg and the second case in a streaming agg. Any idea why? 1. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' or l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 HashAgg(group=[{0, 1}]) 00-04Project(l_modline=[$2], l_moddate=[$0]) 00-05 SelectionVectorRemover 00-06Filter(condition=[OR(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-07 Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-08Scan.. 2. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' and l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 StreamAgg(group=[{0, 1}]) 00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-05 Project(l_modline=[$2], l_moddate=[$0]) 00-06SelectionVectorRemover 00-07 Filter(condition=[AND(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-08Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-09 Scan. - Rahul
Re: Recursive CTE Support in Drill
@Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you would create a file containing a log-synth schema that has the extension .synth. Your data source would have to be configured to connect that extension with the log-synth format. At that point, you could select as much or little data as you like from the file and you would see generated data rather than the schema. [1] https://github.com/tdunning/log-synth On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei alexanderz.si...@gmail.com wrote: Hi All, I am trying to come up with a query which returns a given number of rows without having a real table on Storage. I am hoping to achieve something like this: http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table DECLARE @start INT = 1;DECLARE @end INT = 100; WITH numbers AS ( SELECT @start AS number UNION ALL SELECT number + 1 FROM numbers WHERE number @end)SELECT *FROM numbersOPTION (MAXRECURSION 0); I do not actually need to create different values and returning identical rows would work too.I just need to bypass the from clause in the query. Thanks, Alex -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
Re: Review Request 35808: DRILL-2862: Convert_to/Convert_From throw assertion when an incorrect encoding type is specified
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35808/ --- (Updated July 10, 2015, 6:40 p.m.) Review request for drill, Mehant Baid, Sudheesh Katkam, and Venki Korukanti. Changes --- Updated with review comments. Repository: drill-git Description --- DRILL-2862: Convert_to/Convert_From throw assertion when an incorrect encoding type is specified or if the encoding type is not a string literal. Instead of an assertion when user input is wrong, we now throw an exception with the appropriate error message. For the case where the user types in a type name incorrectly, the error message also provides a helpful suggestion. The suggested name is selected from the list of available functions. For example: select convert_from(foo, 'UTF') from dfs.`/table_foo` will print the following error: Error: UNSUPPORTED_OPERATION ERROR: CONVERT_FROM does not support conversion from type 'UTF'. Did you mean UTF8? [Error Id: 87ed2941-f9c2-4c35-8ff2-a3f21eae1104 on localhost:31010] (state=,code=0) Diffs (updated) - exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/PreProcessLogicalRel.java 0f8e45a exec/java-exec/src/main/java/org/apache/drill/exec/util/ApproximateStringMatcher.java PRE-CREATION exec/java-exec/src/test/java/org/apache/drill/exec/util/TestApproximateStringMatcher.java PRE-CREATION Diff: https://reviews.apache.org/r/35808/diff/ Testing --- All regression tests Thanks, Parth Chandra
Re: Recursive CTE Support in Drill
Yeah, we still lack documentation on how to write a storage plugin. One advice I've been seeing a lot is to take a look at the mongo-db plugin, it was basically added in one single commit: https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304 I think this will give some general ideas on what to expect when writing a storage plugin. On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: Hakim, Not yet. Still very much in the stage of gathering feedback. I would think it very simple. The biggest obstacles are 1) no documentation on how to write a data format 2) I need to release a jar for log-synth to Maven Central. On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: @Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you would create a file containing a log-synth schema that has the extension .synth. Your data source would have to be configured to connect that extension with the log-synth format. At that point, you could select as much or little data as you like from the file and you would see generated data rather than the schema. [1] https://github.com/tdunning/log-synth On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei alexanderz.si...@gmail.com wrote: Hi All, I am trying to come up with a query which returns a given number of rows without having a real table on Storage. I am hoping to achieve something like this: http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table DECLARE @start INT = 1;DECLARE @end INT = 100; WITH numbers AS ( SELECT @start AS number UNION ALL SELECT number + 1 FROM numbers WHERE number @end)SELECT *FROM numbersOPTION (MAXRECURSION 0); I do not actually need to create different values and returning identical rows would work too.I just need to bypass the from clause in the query. Thanks, Alex -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
Re: Recursive CTE Support in Drill
I don't think we need a full on storage plugin. I think a data format should be sufficient, basically CSV on steroids. On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Yeah, we still lack documentation on how to write a storage plugin. One advice I've been seeing a lot is to take a look at the mongo-db plugin, it was basically added in one single commit: https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304 I think this will give some general ideas on what to expect when writing a storage plugin. On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: Hakim, Not yet. Still very much in the stage of gathering feedback. I would think it very simple. The biggest obstacles are 1) no documentation on how to write a data format 2) I need to release a jar for log-synth to Maven Central. On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: @Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you would create a file containing a log-synth schema that has the extension .synth. Your data source would have to be configured to connect that extension with the log-synth format. At that point, you could select as much or little data as you like from the file and you would see generated data rather than the schema. [1] https://github.com/tdunning/log-synth On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei alexanderz.si...@gmail.com wrote: Hi All, I am trying to come up with a query which returns a given number of rows without having a real table on Storage. I am hoping to achieve something like this: http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table DECLARE @start INT = 1;DECLARE @end INT = 100; WITH numbers AS ( SELECT @start AS number UNION ALL SELECT number + 1 FROM numbers WHERE number @end)SELECT *FROM numbersOPTION (MAXRECURSION 0); I do not actually need to create different values and returning identical rows would work too.I just need to bypass the from clause in the query. Thanks, Alex -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
Re: Recursive CTE Support in Drill
Creating an EasyFormatPlugin is pretty simple. They were designed to get rid of much of the scaffolding required for a standard FormatPlugin. JSON https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json Text https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text AVRO https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro In all cases, the connection code is pretty light. A fully schematized format like log-synth should be even simpler to implement. On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning ted.dunn...@gmail.com wrote: I don't think we need a full on storage plugin. I think a data format should be sufficient, basically CSV on steroids. On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Yeah, we still lack documentation on how to write a storage plugin. One advice I've been seeing a lot is to take a look at the mongo-db plugin, it was basically added in one single commit: https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304 I think this will give some general ideas on what to expect when writing a storage plugin. On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: Hakim, Not yet. Still very much in the stage of gathering feedback. I would think it very simple. The biggest obstacles are 1) no documentation on how to write a data format 2) I need to release a jar for log-synth to Maven Central. On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: @Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you would create a file containing a log-synth schema that has the extension .synth. Your data source would have to be configured to connect that extension with the log-synth format. At that point, you could select as much or little data as you like from the file and you would see generated data rather than the schema. [1] https://github.com/tdunning/log-synth On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei alexanderz.si...@gmail.com wrote: Hi All, I am trying to come up with a query which returns a given number of rows without having a real table on Storage. I am hoping to achieve something like this: http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table DECLARE @start INT = 1;DECLARE @end INT = 100; WITH numbers AS ( SELECT @start AS number UNION ALL SELECT number + 1 FROM numbers WHERE number @end)SELECT *FROM numbersOPTION (MAXRECURSION 0); I do not actually need to create different values and returning identical rows would work too.I just need to bypass the from clause in the query. Thanks, Alex -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
Re: Recursive CTE Support in Drill
Hakim, Not yet. Still very much in the stage of gathering feedback. I would think it very simple. The biggest obstacles are 1) no documentation on how to write a data format 2) I need to release a jar for log-synth to Maven Central. On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: @Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you would create a file containing a log-synth schema that has the extension .synth. Your data source would have to be configured to connect that extension with the log-synth format. At that point, you could select as much or little data as you like from the file and you would see generated data rather than the schema. [1] https://github.com/tdunning/log-synth On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei alexanderz.si...@gmail.com wrote: Hi All, I am trying to come up with a query which returns a given number of rows without having a real table on Storage. I am hoping to achieve something like this: http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table DECLARE @start INT = 1;DECLARE @end INT = 100; WITH numbers AS ( SELECT @start AS number UNION ALL SELECT number + 1 FROM numbers WHERE number @end)SELECT *FROM numbersOPTION (MAXRECURSION 0); I do not actually need to create different values and returning identical rows would work too.I just need to bypass the from clause in the query. Thanks, Alex -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
Drill calcite tracing issue
(If there's a better target for an issue request, please let me know!) While trying to understand the details Calcite rule execution, I turned on the Calcite tracing per https://calcite.incubator.apache.org/docs/howto.html#tracing . At that point (running a query from the web UI) I get the error Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError: Internal error: should never get here ... The query is pretty straightforward: select Person, sum(Qty1) from mongo.mine.test group by Person A simple trial with the same settings against the Apache Calcite example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO, SUM(EMPNO) FROM emps GROUP BY DEPTNO ; ) In Drill, I'm querying against a Mongo db but the code path in the exception trace doesn't immediately appear to be relevant for that. It seems to happen the very first time dumpGraph is called. The innermost cause (in all its glory) is due to the method: @Override public RelOptCost computeSelfCost(RelOptPlanner planner) { // HepRelMetadataProvider is supposed to intercept this // and redirect to the real rels. throw Util.newInternal(should never get here); } and the trace is: cause { exception_class: java.lang.AssertionError message: Internal error: should never get here stack_trace { class_name: org.apache.calcite.util.Util file_name: Util.java line_number: 775 method_name: newInternal is_native_method: false } stack_trace { class_name: org.apache.calcite.plan.hep.HepRelVertex file_name: HepRelVertex.java line_number: 68 method_name: computeSelfCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 165 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 115 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 151 method_name: getCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 101 method_name: getCumulativeCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 154 method_name: getCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke
Re: Review Request 34374: DRILL-3133: MergingRecordBatch can leak memory if query is canceled before batches in rawBatches were loaded
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34374/ --- (Updated July 10, 2015, 7:33 p.m.) Review request for drill and Steven Phillips. Changes --- addressing review comments. Bugs: DRILL-3133 https://issues.apache.org/jira/browse/DRILL-3133 Repository: drill-git Description --- MergingRecordBatch stores batches in an array list before loading them with RecordBatchLoader. If the query is canceled before all received batches are loaded, some of the batches won't be cleaned up. lines 307 and 339 contain questions to the reviewers. I will update the patch accordingly Diffs (updated) - exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/mergereceiver/MergingRecordBatch.java 3ca11f1 Diff: https://reviews.apache.org/r/34374/diff/ Testing --- all unit tests are passing along with functional and tpch100 Thanks, abdelhakim deneche
Re: Recursive CTE Support in Drill
It may be easy, but it is completely opaque about what really needs to happen. For instance, 1) how is schema exposed? 2) which classes do I really need to implement? 3) how do I express partitioning of a format? 4) how do I test it? Just a bit of documentation and comments would go a very, very long way. Even answers on the mailing list that have more details than oh, that's easy. I would be happy to transcribe answers into the code if I could just get some. On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau jacq...@apache.org wrote: Creating an EasyFormatPlugin is pretty simple. They were designed to get rid of much of the scaffolding required for a standard FormatPlugin. JSON https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json Text https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text AVRO https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro In all cases, the connection code is pretty light. A fully schematized format like log-synth should be even simpler to implement. On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning ted.dunn...@gmail.com wrote: I don't think we need a full on storage plugin. I think a data format should be sufficient, basically CSV on steroids. On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Yeah, we still lack documentation on how to write a storage plugin. One advice I've been seeing a lot is to take a look at the mongo-db plugin, it was basically added in one single commit: https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304 I think this will give some general ideas on what to expect when writing a storage plugin. On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: Hakim, Not yet. Still very much in the stage of gathering feedback. I would think it very simple. The biggest obstacles are 1) no documentation on how to write a data format 2) I need to release a jar for log-synth to Maven Central. On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: @Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you would create a file containing a log-synth schema that has the extension .synth. Your data source would have to be configured to connect that extension with the log-synth format. At that point, you could select as much or little data as you like from the file and you would see generated data rather than the schema. [1] https://github.com/tdunning/log-synth On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei alexanderz.si...@gmail.com wrote: Hi All, I am trying to come up with a query which returns a given number of rows without having a real table on Storage. I am hoping to achieve something like this: http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table DECLARE @start INT = 1;DECLARE @end INT = 100; WITH numbers AS ( SELECT @start AS number UNION ALL SELECT number + 1 FROM numbers WHERE number @end)SELECT *FROM numbersOPTION (MAXRECURSION 0); I do not actually need to create different values and returning identical rows would work too.I just need to bypass the from clause in the query. Thanks, Alex -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available
Threads left after Drillbit shutdown (in dev./unit tests)
Is Drill terminating threads correctly? In running jstack on a JVM running a dev. test run that ended up hung after getting about three test timeout errors, I see that there are 409 threads. Although 138 of those are not-unexpected ShutdownHook threads (since many tests are run in one VM), there are: - 138 WorkManager.StatusThread threads (hmm 138 again) - 7 Client-1 threads - 4 UserServer-1 threads - 21 BitClient-1 threads - 4 BitClient-2 threads - 3 BitClient-3 threads - 8 BitServer-1 threads - 8 BitServer-2 threads - 7 BitServer-3 threads - 7 BitServer-4 threads - 7 BitServer-5 threads - 6 BitServer-6 threads - 6 BitServer-7 threads - 6 BitServer-8 threads - 5 BitServer-9 threads - 5 BitServer-10 threads (Other thread names have only 1 or 2 occurrences.) Regarding the 4 for the number of UserServer-1 threads: Three test methods had timeout failures plus one got hung. Here's the tail end of the output from the test running, including all the timeout errors and including the hang (except for repeated query-results data lines). dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install TRIMMED Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun Running org.apache.drill.exec.physical.impl.TestBroadcastExchange Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans 00:44:34.017 [globalEventExecutor-1-523] ERROR o.a.z.server.NIOServerCnxnFactory - Thread Thread[globalEventExecutor-1-523,5,main] died java.lang.AssertionError: null at io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72] Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange 1 Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 sec FAILURE! - in org.apache.drill.exec.physical.impl.TestBroadcastExchange TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange) Time elapsed: 50.063 sec ERROR! java.lang.Exception: test timed out after 5 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254) at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32) at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31) at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218) at com.google.common.io.Closeables.close(Closeables.java:77) at com.google.common.io.Closeables.closeQuietly(Closeables.java:108) at org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70) at com.google.common.io.Closeables.close(Closeables.java:77) at com.google.common.io.Closeables.closeQuietly(Closeables.java:108) at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88) at com.google.common.io.Closeables.close(Closeables.java:77) at com.google.common.io.Closeables.closeQuietly(Closeables.java:108) at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288) at org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62) TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange) Time elapsed: 50.014 sec ERROR! java.lang.Exception: test timed out after 5 milliseconds at java.lang.Object.wait(Native Method) at
[GitHub] drill pull request: DRILL-3483: Clarify CommonConstants' constants...
GitHub user dsbos opened a pull request: https://github.com/apache/drill/pull/88 DRILL-3483: Clarify CommonConstants' constants. Renamed constants. Documented constants. Removed extraneous public static final (redundant and abnormal since in interface). You can merge this pull request into a Git repository by running: $ git pull https://github.com/dsbos/incubator-drill bugs/drill-3483-CommonConstants-clarification Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/88.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #88 commit 6e39eb7ca428d0d0baeeb29f50ed25b1a8872685 Author: dbarclay dbarc...@maprtech.com Date: 2015-07-02T22:04:31Z DRILL-3483: Clarify CommonConstants' constants. [CommonConstants, DrillConfig, PathScanner] Renamed constants. Documented constants. Removed extraneous public static final (redudant and abnormal since in interface). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Drill calcite tracing issue
@Tim, AFAIK, the HepPlanner was introduced for window function, because otherwise the VolcanoPlanner would hit CanNotPlan issue at that time. @George, it would be hard to modify the code path in a production system, unless you apply a temporary patch to do that, assuming you do not use window function in your production system at all. Otherwise, you may consider waiting for 1.2.0, for which we will try to get a patch to fix this issue. In general, the tracing feature is mainly by developer for debugging issue related to Calcite. On Fri, Jul 10, 2015 at 1:31 PM, George Spofford geospoff...@gmail.com wrote: Thanks, will see if I can influence the code path here. Not sure how I can skip HepPlanner findBestExp() in production code path, but I'm in my early days here. Can I suggest that, for at least a smattering of query forms, that FINER and FINEST tracing be included in integration if not also unit tests? On Fri, Jul 10, 2015 at 1:22 PM, Jinfeng Ni jinfengn...@gmail.com wrote: DRILL-3156 was filed to track the calcite trace issue [1]. Basically, the HepPlanner used for window function planning caused the tracing issue. I have a prototype patch to fix this issue. I'll try to see if I can get it ready for 1.2.0 release. As a workaround, if you do not use window function, you may consider skipping the HepPlanner findBestExp() call in DefaultSqlHandler.:convertToRel(SqlNode node)[2]. 1. https://issues.apache.org/jira/browse/DRILL-3156 2. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451 On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com wrote: (If there's a better target for an issue request, please let me know!) While trying to understand the details Calcite rule execution, I turned on the Calcite tracing per https://calcite.incubator.apache.org/docs/howto.html#tracing . At that point (running a query from the web UI) I get the error Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError: Internal error: should never get here ... The query is pretty straightforward: select Person, sum(Qty1) from mongo.mine.test group by Person A simple trial with the same settings against the Apache Calcite example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO, SUM(EMPNO) FROM emps GROUP BY DEPTNO ; ) In Drill, I'm querying against a Mongo db but the code path in the exception trace doesn't immediately appear to be relevant for that. It seems to happen the very first time dumpGraph is called. The innermost cause (in all its glory) is due to the method: @Override public RelOptCost computeSelfCost(RelOptPlanner planner) { // HepRelMetadataProvider is supposed to intercept this // and redirect to the real rels. throw Util.newInternal(should never get here); } and the trace is: cause { exception_class: java.lang.AssertionError message: Internal error: should never get here stack_trace { class_name: org.apache.calcite.util.Util file_name: Util.java line_number: 775 method_name: newInternal is_native_method: false } stack_trace { class_name: org.apache.calcite.plan.hep.HepRelVertex file_name: HepRelVertex.java line_number: 68 method_name: computeSelfCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 165 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 115
Re: Drill calcite tracing issue
DRILL-3156 was filed to track the calcite trace issue [1]. Basically, the HepPlanner used for window function planning caused the tracing issue. I have a prototype patch to fix this issue. I'll try to see if I can get it ready for 1.2.0 release. As a workaround, if you do not use window function, you may consider skipping the HepPlanner findBestExp() call in DefaultSqlHandler.:convertToRel(SqlNode node)[2]. 1. https://issues.apache.org/jira/browse/DRILL-3156 2. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451 On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com wrote: (If there's a better target for an issue request, please let me know!) While trying to understand the details Calcite rule execution, I turned on the Calcite tracing per https://calcite.incubator.apache.org/docs/howto.html#tracing . At that point (running a query from the web UI) I get the error Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError: Internal error: should never get here ... The query is pretty straightforward: select Person, sum(Qty1) from mongo.mine.test group by Person A simple trial with the same settings against the Apache Calcite example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO, SUM(EMPNO) FROM emps GROUP BY DEPTNO ; ) In Drill, I'm querying against a Mongo db but the code path in the exception trace doesn't immediately appear to be relevant for that. It seems to happen the very first time dumpGraph is called. The innermost cause (in all its glory) is due to the method: @Override public RelOptCost computeSelfCost(RelOptPlanner planner) { // HepRelMetadataProvider is supposed to intercept this // and redirect to the real rels. throw Util.newInternal(should never get here); } and the trace is: cause { exception_class: java.lang.AssertionError message: Internal error: should never get here stack_trace { class_name: org.apache.calcite.util.Util file_name: Util.java line_number: 775 method_name: newInternal is_native_method: false } stack_trace { class_name: org.apache.calcite.plan.hep.HepRelVertex file_name: HepRelVertex.java line_number: 68 method_name: computeSelfCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 165 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 115 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 151 method_name: getCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 101
Re: Drill calcite tracing issue
Thanks, will see if I can influence the code path here. Not sure how I can skip HepPlanner findBestExp() in production code path, but I'm in my early days here. Can I suggest that, for at least a smattering of query forms, that FINER and FINEST tracing be included in integration if not also unit tests? On Fri, Jul 10, 2015 at 1:22 PM, Jinfeng Ni jinfengn...@gmail.com wrote: DRILL-3156 was filed to track the calcite trace issue [1]. Basically, the HepPlanner used for window function planning caused the tracing issue. I have a prototype patch to fix this issue. I'll try to see if I can get it ready for 1.2.0 release. As a workaround, if you do not use window function, you may consider skipping the HepPlanner findBestExp() call in DefaultSqlHandler.:convertToRel(SqlNode node)[2]. 1. https://issues.apache.org/jira/browse/DRILL-3156 2. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451 On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com wrote: (If there's a better target for an issue request, please let me know!) While trying to understand the details Calcite rule execution, I turned on the Calcite tracing per https://calcite.incubator.apache.org/docs/howto.html#tracing . At that point (running a query from the web UI) I get the error Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError: Internal error: should never get here ... The query is pretty straightforward: select Person, sum(Qty1) from mongo.mine.test group by Person A simple trial with the same settings against the Apache Calcite example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO, SUM(EMPNO) FROM emps GROUP BY DEPTNO ; ) In Drill, I'm querying against a Mongo db but the code path in the exception trace doesn't immediately appear to be relevant for that. It seems to happen the very first time dumpGraph is called. The innermost cause (in all its glory) is due to the method: @Override public RelOptCost computeSelfCost(RelOptPlanner planner) { // HepRelMetadataProvider is supposed to intercept this // and redirect to the real rels. throw Util.newInternal(should never get here); } and the trace is: cause { exception_class: java.lang.AssertionError message: Internal error: should never get here stack_trace { class_name: org.apache.calcite.util.Util file_name: Util.java line_number: 775 method_name: newInternal is_native_method: false } stack_trace { class_name: org.apache.calcite.plan.hep.HepRelVertex file_name: HepRelVertex.java line_number: 68 method_name: computeSelfCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 165 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 115 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 151 method_name: getCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name:
Re: Drill calcite tracing issue
I remember introducing the HepPlanner that when I worked on window functions, probably a good idea to comment on why we actually need that since I can't recall the exact reasons now. Tim On Fri, Jul 10, 2015 at 1:22 PM, Jinfeng Ni jinfengn...@gmail.com wrote: DRILL-3156 was filed to track the calcite trace issue [1]. Basically, the HepPlanner used for window function planning caused the tracing issue. I have a prototype patch to fix this issue. I'll try to see if I can get it ready for 1.2.0 release. As a workaround, if you do not use window function, you may consider skipping the HepPlanner findBestExp() call in DefaultSqlHandler.:convertToRel(SqlNode node)[2]. 1. https://issues.apache.org/jira/browse/DRILL-3156 2. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L451 On Fri, Jul 10, 2015 at 9:19 AM, George Spofford geospoff...@gmail.com wrote: (If there's a better target for an issue request, please let me know!) While trying to understand the details Calcite rule execution, I turned on the Calcite tracing per https://calcite.incubator.apache.org/docs/howto.html#tracing . At that point (running a query from the web UI) I get the error Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError: Internal error: should never get here ... The query is pretty straightforward: select Person, sum(Qty1) from mongo.mine.test group by Person A simple trial with the same settings against the Apache Calcite example/csv doesn't show the same behavior. (Calcite query: SELECT DEPTNO, SUM(EMPNO) FROM emps GROUP BY DEPTNO ; ) In Drill, I'm querying against a Mongo db but the code path in the exception trace doesn't immediately appear to be relevant for that. It seems to happen the very first time dumpGraph is called. The innermost cause (in all its glory) is due to the method: @Override public RelOptCost computeSelfCost(RelOptPlanner planner) { // HepRelMetadataProvider is supposed to intercept this // and redirect to the real rels. throw Util.newInternal(should never get here); } and the trace is: cause { exception_class: java.lang.AssertionError message: Internal error: should never get here stack_trace { class_name: org.apache.calcite.util.Util file_name: Util.java line_number: 775 method_name: newInternal is_native_method: false } stack_trace { class_name: org.apache.calcite.plan.hep.HepRelVertex file_name: HepRelVertex.java line_number: 68 method_name: computeSelfCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 165 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMetadataQuery file_name: RelMetadataQuery.java line_number: 115 method_name: getNonCumulativeCost is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows file_name: RelMdPercentageOriginalRows.java line_number: 151 method_name: getCumulativeCost is_native_method: false } stack_trace { class_name: ... line_number: 0 method_name: ... is_native_method: false } stack_trace { class_name: org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1 file_name: ReflectiveRelMetadataProvider.java line_number: 194 method_name: invoke is_native_method: false } stack_trace { class_name: ... line_number: 0
[GitHub] drill pull request: DRILL-3483: Clarify CommonConstants' constants...
Github user jaltekruse commented on the pull request: https://github.com/apache/drill/pull/88#issuecomment-120540926 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Hash Agg vs Streaming Agg for a smaller data set
Hi, Info about Data : The data is auto partitioned tpch 0.01 data. The second filter is a non-partitioned column, so in the first case the 'OR' predicate results in a full-table scan, while in the second case, partition pruning takes effect. The first case results in a hash agg and the second case in a streaming agg. Any idea why? 1. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' or l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 HashAgg(group=[{0, 1}]) 00-04Project(l_modline=[$2], l_moddate=[$0]) 00-05 SelectionVectorRemover 00-06Filter(condition=[OR(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-07 Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-08Scan.. 2. explain plan for select distinct l_modline, l_moddate from `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date '1992-01-01' and l_shipdate=date'1992-01-01'; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_modline=[$0], l_moddate=[$1]) 00-02Project(l_modline=[$0], l_moddate=[$1]) 00-03 StreamAgg(group=[{0, 1}]) 00-04Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-05 Project(l_modline=[$2], l_moddate=[$0]) 00-06SelectionVectorRemover 00-07 Filter(condition=[AND(=($0, 1992-01-01), =($1, 1992-01-01))]) 00-08Project(l_moddate=[$2], l_shipdate=[$1], l_modline=[$0]) 00-09 Scan. - Rahul