[GitHub] drill pull request #850: DRILL-5541: C++ Client Crashes During Simple "Man i...
GitHub user superbstreak opened a pull request: https://github.com/apache/drill/pull/850 DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Atta⦠â¦ck Test with Exploitable Write AV You can merge this pull request into a Git repository by running: $ git pull https://github.com/superbstreak/drill DRILL-5541 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/850.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #850 commit 716db51df61d0ee47804217a6a133d1d1152b64a Author: Rob Wu Date: 2017-06-05T21:06:33Z DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (DRILL-5567) Review changes for DRILL 5514
[ https://issues.apache.org/jira/browse/DRILL-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthikeyan Manivannan resolved DRILL-5567. --- Resolution: Done > Review changes for DRILL 5514 > - > > Key: DRILL-5567 > URL: https://issues.apache.org/jira/browse/DRILL-5567 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan > Fix For: 1.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #849: DRILL-5568: Include hadoop-common jars inside drill...
GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/849 DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar More details on this PR is in [JIRA](https://issues.apache.org/jira/browse/DRILL-5568) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-5568 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/849.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #849 commit e84ce5bb6317e7a8caa50c7ffc85dfc416616596 Author: Sorabh Hamirwasia Date: 2017-06-05T20:45:27Z DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r120198724 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { return true; } + /** + * Merge two schema to produce a new, merged schema. The caller is responsible + * for ensuring that column names are unique. The order of the fields in the + * new schema is the same as that of this schema, with the other schema's fields + * appended in the order defined in the other schema. The resulting selection + * vector mode is the same as this schema. (That is, this schema is assumed to + * be the main part of the batch, possibly with a selection vector, with the + * other schema representing additional, new columns.) + * @param otherSchema the schema to merge with this one + * @return the new, merged, schema + */ + + public BatchSchema merge(BatchSchema otherSchema) { +if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE && +selectionVectorMode != otherSchema.selectionVectorMode) { + throw new IllegalArgumentException("Left schema must carry the selection vector mode"); +} +List mergedFields = new ArrayList<>(); --- End diff -- List mergedFields = new ArrayList(this.fields.size() + otherSchema.fields.size()) would avoid having to potentially grow the ArrayList twice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r118797793 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { return true; } + /** + * Merge two schema to produce a new, merged schema. The caller is responsible + * for ensuring that column names are unique. The order of the fields in the + * new schema is the same as that of this schema, with the other schema's fields + * appended in the order defined in the other schema. The resulting selection + * vector mode is the same as this schema. (That is, this schema is assumed to + * be the main part of the batch, possibly with a selection vector, with the + * other schema representing additional, new columns.) + * @param otherSchema the schema to merge with this one + * @return the new, merged, schema + */ + + public BatchSchema merge(BatchSchema otherSchema) { +if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE && +selectionVectorMode != otherSchema.selectionVectorMode) { + throw new IllegalArgumentException("Left schema must carry the selection vector mode"); --- End diff -- "Left schema must carry the same selection vector mode" + "as the right schema"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Thinking about Drill 2.0
Adding to my list of things to consider for Drill 2.0, I would think that getting Drill off our forks of Calcite and Parquet should also be a goal, though a tactical one. On Mon, Jun 5, 2017 at 1:51 PM, Parth Chandra wrote: > Nice suggestion Paul, to start a discussion on 2.0 (it's about time). I > would like to make this a broader discussion than just APIs, though APIs > are a good place to start. In particular. we usually get the opportunity to > break backward compatibility only for a major release and that is the time > we have to finalize the APIs. > > In the broader discussion I feel we also need to consider some other > aspects - > 1) Formalize Drill's support for schema free operations. > 2) Drill's execution engine architecture and it's 'optimistic' use of > resources. > > Re the APIs: > One more public API is the UDFs. This and the storage plugin APIs > together are tied at the hip with vectors and memory management. I'm not > sure if we can cleanly separate the underlying representation of vectors > from the interfaces to these APIs, but I agree we need to clarify this > part. For instance, some of the performance benefits in the Parquet scan > come from vectorizing writes to the vector especially for null or repeated > values. We could provide interfaces to provide the same without which the > scans would have to be vector-internals aware. The same goes for UDFs. > Assuming that a 2.0 goal would be to provide vectorized interfaces for > users to write table (or aggregate) UDFs, one now needs a standardized data > set representation. If you choose this data set representation to be > columnar (for better vectorization), will you end up with ValueVector/Arrow > based RecordBatches? I included Arrow in this since the project is > formalizing exactly this requirement. > > For the client APIs, I believe that ODBC and JDBC drivers initially were > written using record based APIs provided by vendors, but to get better > performance started to move to working with raw streams coming over the > wire (eg TDS with Sybase/MS-SQLServer [1] ). So what Drill does is in fact > similar to that approach. The client APIs are really thin layers on top of > the vector data stream and provide row based, read only access to the > vector. > > Lest I begin to sound too contrary, thank you for starting this > discussion. It is really needed! > > Parth > > > > > > > > On Mon, Jun 5, 2017 at 11:59 AM, Paul Rogers wrote: > >> Hi All, >> >> A while back there was a discussion about the scope of Drill 2.0. Got me >> thinking about possible topics. My two cents: >> >> Drill 2.0 should focus on making Drill’s external APIs production ready. >> This means five things: >> >> * Clearly identify and define each API. >> * (Re)design each API to ensure it fully isolates the client from Drill >> internals. >> * Ensure the API allows full version compatibility: Allow mixing of >> old/new clients and servers with some limits. >> * Fully test each API. >> * Fully document each API. >> >> Once client code is isolated from Drill internals, we are free to evolve >> the internals in either Drill 2.0 or a later release. >> >> In my mind, the top APIs to revisit are: >> >> * The drill client API. >> * The storage plugin API. >> >> (Explanation below.) >> >> What other APIs should we consider? Here are some examples, please >> suggest items you know about: >> >> * Command line scripts and arguments >> * REST API >> * Names and contents of system tables >> * Structure of the storage plugin configuration JSON >> * Structure of the query profile >> * Structure of the EXPLAIN PLAN output. >> * Semantics of Drill functions, such as the date functions recently >> partially fixed by adding “ANSI” alternatives. >> * Naming of config and system/session options. >> * (Your suggestions here…) >> >> I’ve taken the liberty of moving some API-breaking tickets in the Apache >> Drill JIRA to 2.0. Perhaps we can add others so that we have a good >> inventory of 2.0 candidates. >> >> Here are the reasons for my two suggestions. >> >> Today, we expose Drill value vectors to the client. This means if we want >> to enhance anything about Drill’s internal memory format (i.e. value >> vectors, such as a possible move to Arrow), we break compatibility with old >> clients. Using value vectors also means we need a very large percentage of >> Drill’s internal code on the client in Java or C++. We are learning that >> doing so is a challenge. >> >> A new client API should follow established SQL database tradition: a >> synchronous, row-based API designed for versioning, for forward and >> backward compatibility, and to support ODBC and JDBC users. >> >> We can certainly maintain the existing full, async, heavy-weight client >> for our tests and for applications that would benefit from it. >> >> Once we define a new API, we are free to alter Drill’s value vectors to, >> say, add the needed null states to fully support JSON, to change offset >> vectors to not
Re: Thinking about Drill 2.0
Nice suggestion Paul, to start a discussion on 2.0 (it's about time). I would like to make this a broader discussion than just APIs, though APIs are a good place to start. In particular. we usually get the opportunity to break backward compatibility only for a major release and that is the time we have to finalize the APIs. In the broader discussion I feel we also need to consider some other aspects - 1) Formalize Drill's support for schema free operations. 2) Drill's execution engine architecture and it's 'optimistic' use of resources. Re the APIs: One more public API is the UDFs. This and the storage plugin APIs together are tied at the hip with vectors and memory management. I'm not sure if we can cleanly separate the underlying representation of vectors from the interfaces to these APIs, but I agree we need to clarify this part. For instance, some of the performance benefits in the Parquet scan come from vectorizing writes to the vector especially for null or repeated values. We could provide interfaces to provide the same without which the scans would have to be vector-internals aware. The same goes for UDFs. Assuming that a 2.0 goal would be to provide vectorized interfaces for users to write table (or aggregate) UDFs, one now needs a standardized data set representation. If you choose this data set representation to be columnar (for better vectorization), will you end up with ValueVector/Arrow based RecordBatches? I included Arrow in this since the project is formalizing exactly this requirement. For the client APIs, I believe that ODBC and JDBC drivers initially were written using record based APIs provided by vendors, but to get better performance started to move to working with raw streams coming over the wire (eg TDS with Sybase/MS-SQLServer [1] ). So what Drill does is in fact similar to that approach. The client APIs are really thin layers on top of the vector data stream and provide row based, read only access to the vector. Lest I begin to sound too contrary, thank you for starting this discussion. It is really needed! Parth On Mon, Jun 5, 2017 at 11:59 AM, Paul Rogers wrote: > Hi All, > > A while back there was a discussion about the scope of Drill 2.0. Got me > thinking about possible topics. My two cents: > > Drill 2.0 should focus on making Drill’s external APIs production ready. > This means five things: > > * Clearly identify and define each API. > * (Re)design each API to ensure it fully isolates the client from Drill > internals. > * Ensure the API allows full version compatibility: Allow mixing of > old/new clients and servers with some limits. > * Fully test each API. > * Fully document each API. > > Once client code is isolated from Drill internals, we are free to evolve > the internals in either Drill 2.0 or a later release. > > In my mind, the top APIs to revisit are: > > * The drill client API. > * The storage plugin API. > > (Explanation below.) > > What other APIs should we consider? Here are some examples, please suggest > items you know about: > > * Command line scripts and arguments > * REST API > * Names and contents of system tables > * Structure of the storage plugin configuration JSON > * Structure of the query profile > * Structure of the EXPLAIN PLAN output. > * Semantics of Drill functions, such as the date functions recently > partially fixed by adding “ANSI” alternatives. > * Naming of config and system/session options. > * (Your suggestions here…) > > I’ve taken the liberty of moving some API-breaking tickets in the Apache > Drill JIRA to 2.0. Perhaps we can add others so that we have a good > inventory of 2.0 candidates. > > Here are the reasons for my two suggestions. > > Today, we expose Drill value vectors to the client. This means if we want > to enhance anything about Drill’s internal memory format (i.e. value > vectors, such as a possible move to Arrow), we break compatibility with old > clients. Using value vectors also means we need a very large percentage of > Drill’s internal code on the client in Java or C++. We are learning that > doing so is a challenge. > > A new client API should follow established SQL database tradition: a > synchronous, row-based API designed for versioning, for forward and > backward compatibility, and to support ODBC and JDBC users. > > We can certainly maintain the existing full, async, heavy-weight client > for our tests and for applications that would benefit from it. > > Once we define a new API, we are free to alter Drill’s value vectors to, > say, add the needed null states to fully support JSON, to change offset > vectors to not need n+1 values (which doubles vector size in 64K batches), > and so on. Since vectors become private to Drill (or Arrow) after the new > client API, we are free to innovate to improve them. > > Similarly, the storage plugin API exposes details of Calcite (which seems > to evolve with each new version), exposes value vector implementations, and > so on. A cleaner, simpler, m
[jira] [Created] (DRILL-5568) Include Hadoop dependency jars inside drill-jdbc-all.jar
Sorabh Hamirwasia created DRILL-5568: Summary: Include Hadoop dependency jars inside drill-jdbc-all.jar Key: DRILL-5568 URL: https://issues.apache.org/jira/browse/DRILL-5568 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia With Sasl support in 1.10 the authentication using username/password was moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop classes like Configuration.java and UserGroupInformation.java defined in hadoop-common package which were used in DrillClient for security mechanisms like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency inside _drill-jdbc-all.jar_ Without it the application using this driver will fail to connect to Drill with authentication enabled. Today this jar (which is JDBC driver for Drill) already has lots of other dependencies which DrillClient relies on like Netty, etc. But the way we add these dependencies are under *oadd* namespace so that the application using this driver won't end up in conflict with it's own version of same dependencies. As part of this JIRA it will include hadoop-common dependencies under same namespace. This will allow an application to connect to Drill using this driver with security enabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5567) Review changes for DRILL 5514
Karthikeyan Manivannan created DRILL-5567: - Summary: Review changes for DRILL 5514 Key: DRILL-5567 URL: https://issues.apache.org/jira/browse/DRILL-5567 Project: Apache Drill Issue Type: Sub-task Reporter: Karthikeyan Manivannan Assignee: Karthikeyan Manivannan -- This message was sent by Atlassian JIRA (v6.3.15#6346)
protobuf version
Hi List, I see that Apache Drill is limited to 2.x series for protobuf. I cannot find any reference as to why this is. Could someone explain the dependency restriction? Did something major change in the 3.x release series for protobuf? The only reason I ask is that protobuf 3.3 builds much cleaner in VS 2015 and they have proper CMAKE support. Cheers, Ralph
[jira] [Created] (DRILL-5566) AssertionError: Internal error: invariant violated: call to wrong operator
Khurram Faraaz created DRILL-5566: - Summary: AssertionError: Internal error: invariant violated: call to wrong operator Key: DRILL-5566 URL: https://issues.apache.org/jira/browse/DRILL-5566 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.11.0 Reporter: Khurram Faraaz CHARACTER_LENGTH is a non-reserved keyword as per the SQL specification. It is a monadic function that accepts exactly one operand or parameter. {noformat} ::= | | | | ... ... ::= | ::= { CHAR_LENGTH | CHARACTER_LENGTH } [ USING ] ... ... ::= CHARACTERS | OCTETS {noformat} Drill reports an assertion error in drillbit.log when character_length function is used in a SQL query. {noformat} 0: jdbc:drill:schema=dfs.tmp> select character_length(cast('hello' as varchar(10))) col1 from (values(1)); Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: call to wrong operator [Error Id: 49198839-5a1b-4786-9257-59739b27d2a8 on centos-01.qa.lab:31010] (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: Internal error: invariant violated: call to wrong operator org.apache.drill.exec.work.foreman.Foreman.run():297 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 Caused By (java.lang.AssertionError) Internal error: invariant violated: call to wrong operator org.apache.calcite.util.Util.newInternal():777 org.apache.calcite.util.Util.permAssert():885 org.apache.calcite.sql2rel.ReflectiveConvertletTable$3.convertCall():219 org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall():59 org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():4148 org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():3581 org.apache.calcite.sql.SqlCall.accept():130 org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression():4040 org.apache.calcite.sql2rel.StandardConvertletTable$8.convertCall():185 org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall():59 org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():4148 org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():3581 org.apache.calcite.sql.SqlCall.accept():130 org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression():4040 org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList():3411 org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl():612 org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect():568 org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive():2773 org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery():522 org.apache.drill.exec.planner.sql.SqlConverter.toRel():269 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel():623 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():195 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():164 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():131 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():79 org.apache.drill.exec.work.foreman.Foreman.runSQL():1050 org.apache.drill.exec.work.foreman.Foreman.run():280 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 (state=,code=0) {noformat} Calcite supports character_length function {noformat} [root@centos-0170 csv]# ./sqlline sqlline version 1.1.9 sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 0: jdbc:calcite:model=target/test-classes/mod> select character_length(cast('hello' as varchar(10))) col1 from (values(1)); ++ |COL1| ++ | 5 | ++ 1 row selected (1.379 seconds) {noformat} Postgres 9.3 also supports character_length function {noformat} postgres=# select character_length(cast('hello' as varchar(10))) col1 from (values(1)) foo; col1 -- 5 (1 row) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5565) Directory Query fails with Permission denied: access=EXECUTE if dirN name is 'year=2017' or 'month=201704'
ehur created DRILL-5565: --- Summary: Directory Query fails with Permission denied: access=EXECUTE if dirN name is 'year=2017' or 'month=201704' Key: DRILL-5565 URL: https://issues.apache.org/jira/browse/DRILL-5565 Project: Apache Drill Issue Type: Bug Components: Functions - Drill, SQL Parser Affects Versions: 1.6.0 Reporter: ehur running a query like this works fine, when the name dir0 contains numerics only: select * from all.my.records where dir0 >= '20170322' limit 10; if the dirN is named according to this convention: year=2017 we get one of the following problems: 1. Either "system error permission denied" in: select * from all.my.records where dir0 >= 'year=2017' limit 10; SYSTEM ERROR: RemoteException: Permission denied: user=myuser, access=EXECUTE, inode: /user/myuser/all/my/records/year=2017/month=201701/day=20170101/application_1485464650247_1917/part-r-0.gz.parquet":myuser:supergroup:-rw-r--r-- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4223) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:894) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) 2. OR, if the where clause only specifies numerics in the dirname, it does not blow up, but neither does it return the relevant data, since that where clause is not the correct path to our data: select * from all.my.records where dir0 >= '2017' limit 10; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Thinking about Drill 2.0
Hi All, A while back there was a discussion about the scope of Drill 2.0. Got me thinking about possible topics. My two cents: Drill 2.0 should focus on making Drill’s external APIs production ready. This means five things: * Clearly identify and define each API. * (Re)design each API to ensure it fully isolates the client from Drill internals. * Ensure the API allows full version compatibility: Allow mixing of old/new clients and servers with some limits. * Fully test each API. * Fully document each API. Once client code is isolated from Drill internals, we are free to evolve the internals in either Drill 2.0 or a later release. In my mind, the top APIs to revisit are: * The drill client API. * The storage plugin API. (Explanation below.) What other APIs should we consider? Here are some examples, please suggest items you know about: * Command line scripts and arguments * REST API * Names and contents of system tables * Structure of the storage plugin configuration JSON * Structure of the query profile * Structure of the EXPLAIN PLAN output. * Semantics of Drill functions, such as the date functions recently partially fixed by adding “ANSI” alternatives. * Naming of config and system/session options. * (Your suggestions here…) I’ve taken the liberty of moving some API-breaking tickets in the Apache Drill JIRA to 2.0. Perhaps we can add others so that we have a good inventory of 2.0 candidates. Here are the reasons for my two suggestions. Today, we expose Drill value vectors to the client. This means if we want to enhance anything about Drill’s internal memory format (i.e. value vectors, such as a possible move to Arrow), we break compatibility with old clients. Using value vectors also means we need a very large percentage of Drill’s internal code on the client in Java or C++. We are learning that doing so is a challenge. A new client API should follow established SQL database tradition: a synchronous, row-based API designed for versioning, for forward and backward compatibility, and to support ODBC and JDBC users. We can certainly maintain the existing full, async, heavy-weight client for our tests and for applications that would benefit from it. Once we define a new API, we are free to alter Drill’s value vectors to, say, add the needed null states to fully support JSON, to change offset vectors to not need n+1 values (which doubles vector size in 64K batches), and so on. Since vectors become private to Drill (or Arrow) after the new client API, we are free to innovate to improve them. Similarly, the storage plugin API exposes details of Calcite (which seems to evolve with each new version), exposes value vector implementations, and so on. A cleaner, simpler, more isolated API will allow storage plugins to be built faster, but will also isolate them from Drill internals changes. Without isolation, each change to Drill internals would require plugin authors to update their plugin before Drill can be released. Thoughts? Suggestions? Thanks, - Paul
[jira] [Created] (DRILL-5564) IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: buffer space (16674816) + prealloc space (0) + child space (0) != allocated (16740352)
Khurram Faraaz created DRILL-5564: - Summary: IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: buffer space (16674816) + prealloc space (0) + child space (0) != allocated (16740352) Key: DRILL-5564 URL: https://issues.apache.org/jira/browse/DRILL-5564 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.11.0 Environment: 3 node CentOS cluster Reporter: Khurram Faraaz Run a concurrent Java program that executes TPCDS query11 while the above concurrent java program is under execution stop foreman Drillbit (from another shell, using below command) ./bin/drillbit.sh stop and you will see the IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: and another assertion error, in the drillbit.log AssertionError: Failure while stopping processing for operator id 10. Currently have states of processing:false, setup:false, waiting:true. Drill 1.11.0 git commit ID: d11aba2 (with assertions enabled) details from drillbit.log from the foreman Drillbit node. {noformat} 2017-06-05 18:38:33,838 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO o.a.d.e.w.fragment.FragmentExecutor - 26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested RUNNING --> FAILED 2017-06-05 18:38:33,849 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO o.a.d.e.w.fragment.FragmentExecutor - 26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested FAILED --> FINISHED 2017-06-05 18:38:33,852 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: AssertionError: Failure while stopping processing for operator id 10. Currently have states of processing:false, setup:false, waiting:true. Fragment 23:1 [Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: Failure while stopping processing for operator id 10. Currently have states of processing:false, setup:false, waiting:true. Fragment 23:1 [Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] Caused by: java.lang.RuntimeException: java.lang.AssertionError: Failure while stopping processing for operator id 10. Currently have states of processing:false, setup:false, waiting:true. at org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101) ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] ... 4 common frames omitted Caused by: java.lang.AssertionError: Failure while stopping processing for operator id 10. Currently have states of processing:false, setup:false, waiting:true. at org.apache.drill.exec.ops.OperatorStats.stopProcessing(OperatorStats.java:167) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:255) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleR
[jira] [Created] (DRILL-5563) Stop non foreman Drillbit results in IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.
Khurram Faraaz created DRILL-5563: - Summary: Stop non foreman Drillbit results in IllegalStateException: Allocator[ROOT] closed with outstanding child allocators. Key: DRILL-5563 URL: https://issues.apache.org/jira/browse/DRILL-5563 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.11.0 Environment: 3 node CentOS cluster Reporter: Khurram Faraaz Stopping the non-foreman Drillbit normally (as shown below) results in IllegalStateException: Allocator[ROOT] closed with outstanding child allocators. /opt/mapr/drill/drill-1.11.0/bin/drillbit.sh stop Drill 1.11.0 commit ID: d11aba2 Details from drillbit.log {noformat} Mon Jun 5 09:29:09 UTC 2017 Terminating drillbit pid 28182 2017-06-05 09:29:09,651 [Drillbit-ShutdownHook#0] INFO o.apache.drill.exec.server.Drillbit - Received shutdown request. 2017-06-05 09:29:11,691 [pool-6-thread-1] INFO o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@55511dc2 in 1004 ms 2017-06-05 09:29:11,691 [pool-6-thread-2] INFO o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@4078d750 in 1004 ms 2017-06-05 09:29:11,692 [pool-6-thread-1] INFO o.a.drill.exec.service.ServiceEngine - closed userServer in 1005 ms 2017-06-05 09:29:11,692 [pool-6-thread-2] INFO o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms 2017-06-05 09:29:11,701 [Drillbit-ShutdownHook#0] INFO o.a.drill.exec.compile.CodeCompiler - Stats: code gen count: 21, cache miss count: 7, hit rate: 67% 2017-06-05 09:29:11,709 [Drillbit-ShutdownHook#0] ERROR o.a.d.exec.server.BootStrapContext - Error while closing java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding child allocators. Allocator(ROOT) 0/800/201359872/17179869184 (res/actual/peak/limit) child allocators: 4 Allocator(frag:3:2) 200/0/0/200 (res/actual/peak/limit) child allocators: 0 ledgers: 0 reservations: 0 Allocator(frag:4:2) 200/0/0/200 (res/actual/peak/limit) child allocators: 0 ledgers: 0 reservations: 0 Allocator(frag:1:2) 200/0/0/200 (res/actual/peak/limit) child allocators: 0 ledgers: 0 reservations: 0 Allocator(frag:2:2) 200/0/0/200 (res/actual/peak/limit) child allocators: 0 ledgers: 0 reservations: 0 ledgers: 0 reservations: 0 at org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:492) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:247) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:159) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit$ShutdownThread.run(Drillbit.java:253) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] 2017-06-05 09:29:11,709 [Drillbit-ShutdownHook#0] INFO o.apache.drill.exec.server.Drillbit - Shutdown completed (2057 ms). {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)