Re: [DISCUSSION] How can we improve the performance of Window Functions
Can you give us some data on what the current performance looks like, vs what you would expect? Are we spend most of the time in the sort, or the Window function operator? On Thu, Jun 11, 2015 at 10:55 PM, Ted Dunning ted.dunn...@gmail.com wrote: Speed in many such loops depends a lot on how the loops are ordered so that cache and registers can be re-used. I have no idea what will make your windowing functions fast, but I can say some things about what makes matrix math fast. The key with matrix multiplication is that there are n^3/2 operations to do on n^2 elements. The minimum number of memory operations is n^2 which sounds good because modern CPU's can often do hundreds of operations per main memory access. This also means that if we code a naive implementation, we will generally be memory bound because that will increase the number of memory operations to k n^3. To avoid that, the loops involved can be restructured so that larger and larger blocks of data are used. At the lowest levels, small blocks of 2 x 4 values or so are used to code the multiplication since all of these values can be kept in registers. At one step up, the computation is structured to only operate on elements that fit in the fastest level of cache which is typically 10's of kB in size. Your loop looks like this: for (start = 0 ... end-n) { initialize() for (offset = 0 ... n-1) { aggregate(start + offset) } finalize() } This arrangement is pretty cache friendly if n is small enough, but it seems that it could be even more friendly if you kept all of the aggregators at the read and handed each sample to all of the aggregators before moving to the next position. On Thu, Jun 11, 2015 at 3:55 PM, Abdel Hakim Deneche adene...@maprtech.com wrote: Hi all, The purpose of this email is to describe how window functions are computed and to try to come up with better ways to do it. DRILL-3200 https://issues.apache.org/jira/browse/DRILL-3200 added support for RANK, ROW_NUMBER, DENSE_RANK, PERCENT_RANK and CUME_DIST but also made some significant improvements to the way Drill computes window functions. The general idea was to update the code to only support the default frame which makes it run faster and use less memory. WindowFrameRecordBatch works similarly to StreamingAggregate: it requires the data to be sorted on the partition and order by columns and only computes one frame at a time. With the default frame we only need to aggregate every row only once. Memory consumption depend on the data, but in general each record batch is kept in memory until we are ready to process all it's rows (which is possible when we find the last peer row of the batch's last row). Drill's external sort can spill to disk if data is too big, and we only need to keep at most one partition's worth of data in memory for the window functions to be computed (when over clause doesn't contain an order by) Each time a batch is ready to be processed we do the following: 1- we start with it's first row (current row) 2- we compute the length of the current row's frame (in this case we find the number of peer rows for the current row), 3- we aggregate (this includes computing the window function values) all rows of the current frame 4- we write the aggregated value in each row of the current frame. 5- We then move to the 1st non peer row which becomes the current row 6- if we didn't reach the end of the current batch go back to 2 With all this in mind, how can we improve the performance of window functions ? Thanks! -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available - Free Hadoop On-Demand Training http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm_campaign=Free%20available -- Steven Phillips Software Engineer mapr.com
third-part storage-plugin development
hi?? I want to develop a third-part storage-plugin to implement a specific function for our research, but you know, the open code of drill is not very clear for class function description?? so i want to get some advice that i can develop the storage-plugin quickly, such as how the storage-plugin links to drill-common and drill-exec. I will be grateful for your advice.
Re: Review Request 35235: DRILL-3262: Rename DrillDatabaseMetaData - DrillDatabaseMetaDataImpl.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35235/ --- (Updated June 12, 2015, 10:50 p.m.) Review request for drill, Mehant Baid and Parth Chandra. Changes --- Fixed bad patch uploading. Bugs: DRILL-3262 https://issues.apache.org/jira/browse/DRILL-3262 Repository: drill-git Description --- Renamed org.apache.drill.jdbc.impl.DrillDatabaseMetaData to org.apache.drill.jdbc.impl.DrillDatabaseMetaDataImpl (re creating org.apache.drill.jdbc.DrillDatabaseMetaDataImpl for DRILL-3198). Diffs (updated) - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaData.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillJdbc41Factory.java PRE-CREATION Diff: https://reviews.apache.org/r/35235/diff/ Testing --- Ran tests; no new failures. Thanks, Daniel Barclay
[jira] [Created] (DRILL-3289) Drill fails to find the xpath function in hive
Rahul Challapalli created DRILL-3289: Summary: Drill fails to find the xpath function in hive Key: DRILL-3289 URL: https://issues.apache.org/jira/browse/DRILL-3289 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Reporter: Rahul Challapalli Assignee: Mehant Baid git.commit.id.abbrev=48449fe The below query, which works in hive, fails in drill {code} select xpath ('ab id=1c//bb id=2c//b/a','/descendant::c/ancestor::b/@id') from hive.hive_storage.dummy limit 1; Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [xpath(VARCHAR-REQUIRED, VARCHAR-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: ec9b68fd-6720-41ca-9adc-c8b2d560a51c on qa-node190.qa.lab:31010] (state=,code=0) {code} Stack Trace : {code} org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [xpath(VARCHAR-REQUIRED, VARCHAR-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--.. at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:387) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:92) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:260) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at
[jira] [Created] (DRILL-3285) Split DrillCursor.next(), clean up DrillCursor for clarity
Daniel Barclay (Drill) created DRILL-3285: - Summary: Split DrillCursor.next(), clean up DrillCursor for clarity Key: DRILL-3285 URL: https://issues.apache.org/jira/browse/DRILL-3285 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3286) IN clause with null in it results in AssertionError: Error while applying rule DrillValuesRule
Khurram Faraaz created DRILL-3286: - Summary: IN clause with null in it results in AssertionError: Error while applying rule DrillValuesRule Key: DRILL-3286 URL: https://issues.apache.org/jira/browse/DRILL-3286 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Environment: 3bccec9110c7ff86fa3cf04baa81a1747e1f5b9e Reporter: Khurram Faraaz Assignee: Jinfeng Ni Query that uses IN clause and there is a null as a value specified inside the IN clause we see an UnsupportedOperationException and AssertionError: Internal error: Error while applying rule DrillValuesRule. Test was executed on 4 node cluster on CentOS. {code} 0: jdbc:drill:schema=dfs.tmp select * from tblWnulls where c2 in ('a','b','c',null); Error: SYSTEM ERROR: java.lang.UnsupportedOperationException: Unable to convert the value of null and type ANY to a Drill constant expression. [Error Id: ecd34f5c-ca9e-46a1-87bb-f7257b155de4 on centos-01.qa.lab:31010] (state=,code=0) {code} Data in the table (it is coming from a Parquet file) {code} 0: jdbc:drill:schema=dfs.tmp select c1, c2 from tblWnulls; +-+---+ | c1 | c2 | +-+---+ | 1 | a | | 2 | b | | 13 | c | | 4 | c | | 5 | a | | 6 | c | | null| d | | 17 | b | | 8 | c | | 9 | b | | 10 | d | | 2147483647 | d | | 10 | a | | 11 | a | | null| c | | 11 | d | | 12 | c | | 19 | null | | 13 | b | | 14 | a | | 13 | c | | 15 | e | | -1 | e | | 0 | a | | 2147483647 | d | | null| d | | 65536 | null | | 100 | null | | null| null | | 1 | a | +-+---+ 30 rows selected (0.169 seconds) {code} Stack trace from dill bit.log {code} [Error Id: ecd34f5c-ca9e-46a1-87bb-f7257b155de4 on centos-01.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: java.lang.UnsupportedOperationException: Unable to convert the value of null and type ANY to a Drill constant expression. [Error Id: ecd34f5c-ca9e-46a1-87bb-f7257b155de4 on centos-01.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522) ~[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:840) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:782) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) [drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:784) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:893) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:253) [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillValuesRule, args [rel#5149:LogicalValues.NONE.ANY([]).[[0]](type=RecordType(ANY ROW_VALUE),tuples=[{ 'a' }, { 'b' }, { 'c' }, { null }])] ... 4 common frames omitted Caused by: java.lang.AssertionError: Internal error: Error while applying rule DrillValuesRule, args [rel#5149:LogicalValues.NONE.ANY([]).[[0]](type=RecordType(ANY ROW_VALUE),tuples=[{ 'a' }, { 'b' }, { 'c' }, { null }])] at org.apache.calcite.util.Util.newInternal(Util.java:790) ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251) ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7] at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:795) ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
Re: Review Request 35329: DRILL-2997: Remove references to groupCount from SerializedField
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35329/#review87653 --- Ship it! LGTM once the protobuf id is fixed. protocol/src/main/protobuf/UserBitShared.proto https://reviews.apache.org/r/35329/#comment140061 It is not recommended to change the numeric id of a field to maintain backward compatibility. The id of the buffer_length field should remain 7. (See https://developers.google.com/protocol-buffers/docs/proto#updating) - Parth Chandra On June 11, 2015, 6:55 p.m., Hanifi Gunes wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35329/ --- (Updated June 11, 2015, 6:55 p.m.) Review request for drill, Mehant Baid and Parth Chandra. Repository: drill-git Description --- DRILL-2997: Remove references to groupCount from SerializedField - Remove references to group count where applicable and adapt vectors to work with the changes. - Fix misc test cases RepeatedValueVectors - get rid of multiple #load methods and rely on load(metadata, buffer) regardless of the vector type. BaseValueVector - all vector must have a field VariableLengthVectors - all vector must have a field so does offsets vector MapVector - refactor #load for better code readability and consistency RepeatedFixedWidthVectorLike RepeatedVariableWidthVectorLike - #load is un-needed now Diffs - exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 12dce2596a7d590d2d33c85fca5c47acb2495a25 exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java bd41e10d3f69e13d0f8c426460af5e9a09d93fd9 exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java de6f66598927bee5fe3afb1d1046cc20c136b424 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java ec409a3fc59616708226aa500ccab1680cd261f6 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java af364bd5487eb510b2ffc8219a37611b828136e5 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java f292e4c22f07a65036dbdae2cbe809e00758faf6 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java d0f38c2a397aac7eaad247c39b4b856c89c970a0 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java fb7ed2a975e095b1f49d3a24cd735b8c7551c7f1 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java f6d3d88ca32a6a3320ec317dbe99c0031f0d54c6 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java 4617ede4aaaebe36dbe9f50dccbff107351dc40f exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java c57143e24e2c60ba400c4f3fcfb9c012e5e25a89 exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/TestEmptyPopulation.java 06a73e22b981d5b8ae7289c171afa703eab91788 exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/fn/TestJsonReaderWithSparseFiles.java 544b962142e24f6c185ad91384d6cb270776acb3 protocol/src/main/java/org/apache/drill/exec/proto/SchemaUserBitShared.java bee2a3dfac6325460e9902d901b72a6c72cb7d81 protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java 92afa4f4b6fc223fb9179887d808c3f2925f303b protocol/src/main/java/org/apache/drill/exec/proto/beans/SerializedField.java 699097a0ab45468b90ec8f5141108c844dbfadf1 protocol/src/main/protobuf/UserBitShared.proto 68c8612dadfc5cbe4a8157ec516ddf7246f5b956 Diff: https://reviews.apache.org/r/35329/diff/ Testing --- unit + regression Thanks, Hanifi Gunes
[jira] [Created] (DRILL-3287) Changing session level parameter back to the default value does not change it
Victoria Markman created DRILL-3287: --- Summary: Changing session level parameter back to the default value does not change it Key: DRILL-3287 URL: https://issues.apache.org/jira/browse/DRILL-3287 Project: Apache Drill Issue Type: Bug Reporter: Victoria Markman Initial state: {code} 0: jdbc:drill:schema=dfs select * from sys.options where status like '%CHANGED%'; +---+--+-+--+--+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+--+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null | null| true | null | +---+--+-+--+--+-+---++ 1 row selected (0.247 seconds) {code} I changed session parameter: {code} 0: jdbc:drill:schema=dfs alter session set `planner.enable_hashjoin` = false; +---+---+ | ok | summary | +---+---+ | true | planner.enable_hashjoin updated. | +---+---+ 1 row selected (0.1 seconds) {code} So far, so good: it appears on changed options list: {code} 0: jdbc:drill:schema=dfs select * from sys.options where status like '%CHANGED%'; +---+--+--+--+--+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+--+--+--+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null | null| true | null | | planner.enable_hashjoin | BOOLEAN | SESSION | CHANGED | null | null| false | null | +---+--+--+--+--+-+---++ 2 rows selected (0.133 seconds) {code} I changed session parameter back to it's default value: {code} 0: jdbc:drill:schema=dfs alter session set `planner.enable_hashjoin` = true; +---+---+ | ok | summary | +---+---+ | true | planner.enable_hashjoin updated. | +---+---+ 1 row selected (0.096 seconds) {code} {color:red} It still appears on changed list, even though it has default value:{color} {code} 0: jdbc:drill:schema=dfs select * from sys.options where status like '%CHANGED%'; +---+--+--+--+--+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+--+--+--+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null | null| true | null | | planner.enable_hashjoin | BOOLEAN | SESSION | CHANGED | null | null| true | null | +---+--+--+--+--+-+---++ 2 rows selected (0.124 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3288) False Hash aggregate does not support schema changes error message in a query containing both window and regular aggregate function
Victoria Markman created DRILL-3288: --- Summary: False Hash aggregate does not support schema changes error message in a query containing both window and regular aggregate function Key: DRILL-3288 URL: https://issues.apache.org/jira/browse/DRILL-3288 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin This error seems to be happening only when you have both window and regular aggregate function in a query. You will need to disable hash join to reproduce this error: alter session set `planner.enable_hashjoin` = false Columns in table j6 are all of 'optional' type, columns in j7 are all required type. (attached sample for each) Here are two queries that are failing for me: Query 1 (aggregate function in the having clause): {code} 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . j6.c_integer, . . . . . . . . . . . . sum(j6.c_integer) over(partition by j6.c_date order by j6.c_time) . . . . . . . . . . . . from . . . . . . . . . . . . j6, j7 . . . . . . . . . . . . where j6.c_integer = j7.c_integer . . . . . . . . . . . . group by . . . . . . . . . . . . j6.c_date, j6.c_time, j6.c_integer . . . . . . . . . . . . having . . . . . . . . . . . . avg(j7.c_integer) 0; java.lang.RuntimeException: java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes Fragment 0:0 [Error Id: ed0140d4-244c-4895-bf65-6ea1d085382e on atsqa4-133.qa.lab:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85) at sqlline.TableOutputFormat.print(TableOutputFormat.java:116) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Query 2: (window function and aggregate function in projection list): {code} 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . j6.c_integer, . . . . . . . . . . . . avg(j7.c_integer), . . . . . . . . . . . . sum(j6.c_integer) over(partition by j6.c_date order by j6.c_time) . . . . . . . . . . . . from . . . . . . . . . . . . j6, j7 . . . . . . . . . . . . where j6.c_integer = j7.c_integer . . . . . . . . . . . . group by . . . . . . . . . . . . j6.c_date, j6.c_time, j6.c_integer; java.lang.RuntimeException: java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes Fragment 0:0 [Error Id: 370188bd-012d-4fc2-a365-fe9e482aaa0f on atsqa4-133.qa.lab:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85) at sqlline.TableOutputFormat.print(TableOutputFormat.java:116) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 35415: DRILL-3285: Part 3--Invert beforeFirstBatch - ! afterFirstBatch.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35415/ --- Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3285 https://issues.apache.org/jira/browse/DRILL-3285 Repository: drill-git Description --- Changed old beforeFirstBatch flag to afterFirstBatch (with opposite sense). Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java PRE-CREATION Diff: https://reviews.apache.org/r/35415/diff/ Testing --- Ran tests (all parts together); no new errors. Thanks, Daniel Barclay
Review Request 35416: DRILL-3285: Part 4--Reorder fields, updateColumns.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35416/ --- Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3285 https://issues.apache.org/jira/browse/DRILL-3285 Repository: drill-git Description --- Put declarations in more-logical order. Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java PRE-CREATION Diff: https://reviews.apache.org/r/35416/diff/ Testing --- Ran tests (all parts together); no new errors. Thanks, Daniel Barclay
Review Request 35414: DRILL-3285: Part 2--Renaming.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35414/ --- Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3285 https://issues.apache.org/jira/browse/DRILL-3285 Repository: drill-git Description --- Renamed state/control-flow members: - started - initialSchemaLoaded - first - beforeFirstBatch - redoFirstNext - returnTrueForNextCallToNext - finished - afterLastRow Renamed other items: - changed - schemaChanged - currentBatch - currentBatchHolder - DrillResultSet's currentBatch - batchLoader Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java cb6bd1d Diff: https://reviews.apache.org/r/35414/diff/ Testing --- Ran tests (all parts together); no new errors. Thanks, Daniel Barclay
Review Request 35417: DRILL-3285: Part 5--Split old hacky next() into separate methods.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35417/ --- Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3285 https://issues.apache.org/jira/browse/DRILL-3285 Repository: drill-git Description --- Split the original public next() method (which was hacked to handle an extra, initial call to read the schema batch) into: - new loadInitialSchema() (for handling the call for the schema) - modified next() (for handling normal calls from ResultSet.next()) - new private nextRowInternally() (for common code) Pulled invariant afterFirstBatch up out of bogus-batch loop. Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java cb6bd1d Diff: https://reviews.apache.org/r/35417/diff/ Testing --- Ran tests (all parts together); no new errors. Thanks, Daniel Barclay
Review Request 35413: DRILL-3285: Part 1--Prep., Hygiene: Mainly, adding comments.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35413/ --- Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3285 https://issues.apache.org/jira/browse/DRILL-3285 Repository: drill-git Description --- Added/edited comments: - field doc. comments - method doc. comments - branch/block comments Removed unused recordBatchCount and getRecordBatchCount(). Added logger call for spurious batch. Various cleanup: - Cleaned up logger. - Added final on updateColumns(). - Wrapped some lines - Misc. comment whitespace. Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java cb6bd1d Diff: https://reviews.apache.org/r/35413/diff/ Testing --- Ran tests (all parts together); no new errors. Thanks, Daniel Barclay
Re: Review Request 35235: DRILL-3262: Rename DrillDatabaseMetaData - DrillDatabaseMetaDataImpl.
On June 11, 2015, 11:38 p.m., Parth Chandra wrote: There appears to be an empty file in this patch. Can you fix that? Fixed (re-built and re-uploaded patch files). - Daniel --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35235/#review87650 --- On June 12, 2015, 10:50 p.m., Daniel Barclay wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35235/ --- (Updated June 12, 2015, 10:50 p.m.) Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3262 https://issues.apache.org/jira/browse/DRILL-3262 Repository: drill-git Description --- Renamed org.apache.drill.jdbc.impl.DrillDatabaseMetaData to org.apache.drill.jdbc.impl.DrillDatabaseMetaDataImpl (re creating org.apache.drill.jdbc.DrillDatabaseMetaDataImpl for DRILL-3198). Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaData.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillJdbc41Factory.java PRE-CREATION Diff: https://reviews.apache.org/r/35235/diff/ Testing --- Ran tests; no new failures. Thanks, Daniel Barclay
Re: Review Request 35235: DRILL-3262: Rename DrillDatabaseMetaData - DrillDatabaseMetaDataImpl.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35235/ --- (Updated June 12, 2015, 11:17 p.m.) Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-3262 https://issues.apache.org/jira/browse/DRILL-3262 Repository: drill-git Description --- Renamed org.apache.drill.jdbc.impl.DrillDatabaseMetaData to org.apache.drill.jdbc.impl.DrillDatabaseMetaDataImpl (re creating org.apache.drill.jdbc.DrillDatabaseMetaDataImpl for DRILL-3198). Diffs (updated) - exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaData.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillJdbc41Factory.java PRE-CREATION Diff: https://reviews.apache.org/r/35235/diff/ Testing --- Ran tests; no new failures. Thanks, Daniel Barclay
[jira] [Resolved] (DRILL-2023) Hive function
[ https://issues.apache.org/jira/browse/DRILL-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2023. Resolution: Fixed {{getCumulativeCost}} is implemented as part of DRILL-2269. Hive function -- Key: DRILL-2023 URL: https://issues.apache.org/jira/browse/DRILL-2023 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Reporter: Jacques Nadeau Assignee: Venki Korukanti Fix For: 1.1.0 If you try to do a query that uses regexp_extract with Drill expressions inside of it, Drill doesn't handle it correctly. The exception is: The type of org.apache.drill.exec.expr.HiveFuncHolderExpr doesn't currently support LogicalExpression.getCumulativeCost() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3290) Hive Storage : Add support for Hive complex types
Rahul Challapalli created DRILL-3290: Summary: Hive Storage : Add support for Hive complex types Key: DRILL-3290 URL: https://issues.apache.org/jira/browse/DRILL-3290 Project: Apache Drill Issue Type: Improvement Components: Functions - Hive, Storage - Hive Reporter: Rahul Challapalli Assignee: Mehant Baid Improve the hive storage plugin to add support for complex types in hive. Below are the complex types hive supports {code} ARRAYdata_type MAPprimitive_type, data_type STRUCTcol_name : data_type [COMMENT col_comment], ... UNIONTYPEdata_type, data_type, ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1570) Support Hive Data Types
[ https://issues.apache.org/jira/browse/DRILL-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli resolved DRILL-1570. -- Resolution: Fixed Support Hive Data Types --- Key: DRILL-1570 URL: https://issues.apache.org/jira/browse/DRILL-1570 Project: Apache Drill Issue Type: Task Components: Execution - Data Types Affects Versions: 0.6.0 Reporter: Na Yang Fix For: Future Currently, following Hive data types are supported in Drill for querying: BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, DECIMAL, STRING, and VARCHAR. Above is a subset of hive data type. To be able to support all hive data types, Drill needs to add support for the following: i TINYINT, SMALLINT, BIGINT, CHAR, STRUCT, MAP, ARRAY, UNION -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3283) Extend TestWindowFrame to check the results when PARTITION BY and/or ORDER BY are missing from the OVER clause
Deneche A. Hakim created DRILL-3283: --- Summary: Extend TestWindowFrame to check the results when PARTITION BY and/or ORDER BY are missing from the OVER clause Key: DRILL-3283 URL: https://issues.apache.org/jira/browse/DRILL-3283 Project: Apache Drill Issue Type: Bug Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Current unit tests for window functions only check the following cases: OVER(PARTITION BY X ORDER BY Y) OVER(PARTITION BY X) We need to extend those tests to also check the following: OVER(ORDER BY Y) OVER() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 35393: DRILL-3220: IOB Exception when using constants in window functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35393/ --- Review request for drill and Aman Sinha. Bugs: DRILL-3220 https://issues.apache.org/jira/browse/DRILL-3220 Repository: drill-git Description --- Made changes to DrillWindowRule and WindowPrel to properly handle constants in OVER clause Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWindowRule.java 76939d9 exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java 4f6551a exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java 4295002 exec/java-exec/src/test/resources/window/q1.sql PRE-CREATION exec/java-exec/src/test/resources/window/q2.sql PRE-CREATION exec/java-exec/src/test/resources/window/q3218.sql PRE-CREATION exec/java-exec/src/test/resources/window/q3220.sql PRE-CREATION Diff: https://reviews.apache.org/r/35393/diff/ Testing --- Thanks, abdelhakim deneche
Re: Unable to run drill...
+ dev@drill.apache.org Deva team - anybody can help? Mehul Trivedi trivedimeh...@gmail.com +91.999.875.1555 On Fri, Jun 12, 2015 at 8:20 PM, Mehul Trivedi trivedimeh...@gmail.com wrote: Hello Jacques Sorry to bring you in this but I could not find any help on net for the issue I am facing. I am working on a research project where we want a schema free engine to talk to cloud big data behind some application logic interfaces. Drill was my first choice when I heard it got out. I downloaded and tried to run on WIN but its giving me an exception while starting itself. Can you guide me right direction to whom I can talk to or email or phone call or chat with to have this sorted out? Thanks Mehul Trivedi trivedimeh...@gmail.com +91.999.875.1555
Re: Unable to run drill...
What error are you getting On Friday, June 12, 2015, Mehul Trivedi trivedimeh...@gmail.com wrote: + dev@drill.apache.org javascript:; Deva team - anybody can help? Mehul Trivedi trivedimeh...@gmail.com javascript:; +91.999.875.1555 On Fri, Jun 12, 2015 at 8:20 PM, Mehul Trivedi trivedimeh...@gmail.com javascript:; wrote: Hello Jacques Sorry to bring you in this but I could not find any help on net for the issue I am facing. I am working on a research project where we want a schema free engine to talk to cloud big data behind some application logic interfaces. Drill was my first choice when I heard it got out. I downloaded and tried to run on WIN but its giving me an exception while starting itself. Can you guide me right direction to whom I can talk to or email or phone call or chat with to have this sorted out? Thanks Mehul Trivedi trivedimeh...@gmail.com javascript:; +91.999.875.1555
Re: Unable to run drill...
Best thing to do is post the specific error on the user list and myself or someone else will try to troubleshoot. On Jun 12, 2015 7:52 AM, Mehul Trivedi trivedimeh...@gmail.com wrote: + dev@drill.apache.org Deva team - anybody can help? Mehul Trivedi trivedimeh...@gmail.com +91.999.875.1555 On Fri, Jun 12, 2015 at 8:20 PM, Mehul Trivedi trivedimeh...@gmail.com wrote: Hello Jacques Sorry to bring you in this but I could not find any help on net for the issue I am facing. I am working on a research project where we want a schema free engine to talk to cloud big data behind some application logic interfaces. Drill was my first choice when I heard it got out. I downloaded and tried to run on WIN but its giving me an exception while starting itself. Can you guide me right direction to whom I can talk to or email or phone call or chat with to have this sorted out? Thanks Mehul Trivedi trivedimeh...@gmail.com +91.999.875.1555
Re: Review Request 35393: DRILL-3220: IOB Exception when using constants in window functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35393/ --- (Updated June 12, 2015, 5:40 p.m.) Review request for drill and Aman Sinha. Bugs: DRILL-3220 https://issues.apache.org/jira/browse/DRILL-3220 Repository: drill-git Description --- Made changes to DrillWindowRule and WindowPrel to properly handle constants in OVER clause Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWindowRule.java 76939d9 exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java 4f6551a exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java 4295002 exec/java-exec/src/test/resources/window/q1.sql PRE-CREATION exec/java-exec/src/test/resources/window/q2.sql PRE-CREATION exec/java-exec/src/test/resources/window/q3218.sql PRE-CREATION exec/java-exec/src/test/resources/window/q3220.sql PRE-CREATION Diff: https://reviews.apache.org/r/35393/diff/ Testing (updated) --- all unit tests are passing along with functional and tpch100 Thanks, abdelhakim deneche
[jira] [Created] (DRILL-3284) Document incompatibility between drill's to_date and hive's unix_timestamp
Rahul Challapalli created DRILL-3284: Summary: Document incompatibility between drill's to_date and hive's unix_timestamp Key: DRILL-3284 URL: https://issues.apache.org/jira/browse/DRILL-3284 Project: Apache Drill Issue Type: Bug Components: Documentation, Functions - Hive Reporter: Rahul Challapalli Assignee: Bridget Bevens The below query from drill produces wrong results because unix_timestamp (function from hive) returns the value in seconds while to_date treats its input in milliseconds. {code} select to_date(unix_timestamp('1998-05-06', '-MM-dd')) from dummy limit 1; +-+ | EXPR$0| +-+ | 1970-01-11 | +-+ {code} In order to make this work we should use the below query {code} select to_date(unix_timestamp('1998-05-06', '-MM-dd')*1000) from dummy limit 1; +-+ | EXPR$0| +-+ | 1998-05-06 | +-+ {code} If this is not a bug on drill's side, we should atleast document this -- This message was sent by Atlassian JIRA (v6.3.4#6332)