[jira] [Resolved] (DRILL-4930) Metadata results are not sorted
[ https://issues.apache.org/jira/browse/DRILL-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4930. Resolution: Fixed Fix Version/s: 1.9.0 > Metadata results are not sorted > --- > > Key: DRILL-4930 > URL: https://issues.apache.org/jira/browse/DRILL-4930 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Fix For: 1.9.0 > > > According to JDBC and ODBC specs, metadata results should be ordered. > Currently, results are unordered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4880) Support JDBC driver registration using ServiceLoader
[ https://issues.apache.org/jira/browse/DRILL-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4880. Resolution: Fixed Fixed in [09abcc3|https://github.com/apache/drill/commit/09abcc32cc9d6e3de23d3daf633d34fb6183d0f3] > Support JDBC driver registration using ServiceLoader > - > > Key: DRILL-4880 > URL: https://issues.apache.org/jira/browse/DRILL-4880 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0 > Environment: Windows Server 2012 >Reporter: Sudip Mukherjee >Assignee: Laurent Goujon > Fix For: 1.9.0 > > > Currently drill-jdbc-all*.jar doesn't contain a > META_INF/services/java.sql.Driver file which is apparently used to discover a > service by Java ServiceLoader API. > Can drill jdbc driver have this file like all the other jdbc drivers so that > the driver can be loaded using ServiceLoader instead of a direct > Class.forName? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4452) Update avatica version for Drill jdbc
[ https://issues.apache.org/jira/browse/DRILL-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4452. Resolution: Fixed Fix Version/s: 1.9.0 Fixed in [a888ce6|https://github.com/apache/drill/commit/a888ce6ec289a5ecfe056d4db5da417dd4cc95f5] > Update avatica version for Drill jdbc > - > > Key: DRILL-4452 > URL: https://issues.apache.org/jira/browse/DRILL-4452 > Project: Apache Drill > Issue Type: Sub-task > Components: Client - JDBC >Affects Versions: 1.5.0 >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Fix For: 1.9.0 > > > Drill depends on a very old version of Avatica (0.9.0/pre-calcite), which > makes integrating changes harder and harder. > Although Avatica has evolved to support a custom protocol, with a server > stub, I believe it is still possible for Drill to use the client part as the > JDBC facade, with small adjustments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient
[ https://issues.apache.org/jira/browse/DRILL-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4732. Resolution: Fixed Fixed in [e02aa59|https://github.com/apache/drill/commit/e02aa596fa64f38bce773fd108f15032f8086601] > Update JDBC driver to use the new prepared statement APIs on DrillClient > > > Key: DRILL-4732 > URL: https://issues.apache.org/jira/browse/DRILL-4732 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.8.0 > > > DRILL-4729 is adding new prepared statement implementation on server side and > it provides APIs on DrillClient to create new prepared statement which > returns metadata along with a opaque handle and submit prepared statement for > execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4729) Add support for prepared statement implementation on server side
[ https://issues.apache.org/jira/browse/DRILL-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4729. Resolution: Fixed Fixed in [14f6ec7|https://github.com/apache/drill/commit/14f6ec7dd9b010de6c884431e443eb788ce54339]. > Add support for prepared statement implementation on server side > > > Key: DRILL-4729 > URL: https://issues.apache.org/jira/browse/DRILL-4729 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.8.0 > > > Currently Drill JDBC/ODBC driver implements its own prepared statement > implementation, which basically issues limit 0 query to get the metadata and > then executes the actual query. So the query is planned twice (for metadata > fetch and actual execution). Proposal is to move that logic to server where > we can make optimizations without disrupting/updating the JDBC/ODBC drivers. > * {{PreparedStatement createPreparedStatement(String query)}}. > {{PreparedStatement}} object contains the following: > ** {{ResultSetMetadata getResultSetMetadata()}} > *** {{ResultsSetMetadata}} contains methods to fetch info about output > columns of the query. What info these methods provide is given in this > [spreadsheet|https://docs.google.com/spreadsheets/d/1A6nqUQo5xJaZDQlDTittpVrK7t4Kylycs3P32Yn_O5k/edit?usp=sharing]. > It lists the ODBC/JDBC requirements and what Drill will provided through > object {{ResultsSetMetadata}}. > *** Server can put more info here which is opaque to client and use it in > server when the client sends execute prepared statement query request. > Overload the current submit query API to take the {{PreparedStatement}} > returned above. > In the initial implementation, server side implementation of > {{createPreparedStatement}} API is implemented as follows: > * Runs the query with {{LIMIT 0}}, gets the schema > * Convert the query into a binary blob and set it as opaque object in > {{PreparedStatement}}. > When the {{PreparedStatement}} is submitted for execution, reconstruct the > query from binary blob in opaque component of {{PreparedStatement}} and > execute it from scratch. > Opaque component of the {{PreparedStatement}} is where we can save more > information which we can use for optimizations/speedups. > NOTE: We are not going to worry about parameters in prepared query in initial > implementation. We can provide the functionality later if there is sufficient > demand from Drill community. > Changes in this patch are going to include protobuf messages, server side > messages and Java client APIs. Native client changes are going to be tracked > in a separate JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4603) Refactor FileSystem plugin code to allow customizations
[ https://issues.apache.org/jira/browse/DRILL-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4603. Resolution: Won't Fix > Refactor FileSystem plugin code to allow customizations > --- > > Key: DRILL-4603 > URL: https://issues.apache.org/jira/browse/DRILL-4603 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: Future > > > Currently FileSystemPlugin is hard to extend, lot of logic for creating > component implementations ({{WorkspaceSchemaFactory}}s, {{FormatCreator}, > defining default workspaces and configuration (implicit to FileSystem > implementation)) are hard coded in constructor. > > This JIRA is to track > * refactoring the FileSystemPlugin to allow custom component implementations > (Configuration, WorkSpaceSchemaFactory, FileSystemSchemaFactory or > FormatCreator). > * Share a single Hadoop {{Configuration}} object to create new > {{Configuration}} objects. Creating a new {{Configuration}} without an > existing copy is not efficient, because it involves scanning the classpath > for *-site files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4728) Add support for new metadata fetch APIs
[ https://issues.apache.org/jira/browse/DRILL-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4728. Resolution: Fixed Resolved in [ef6e522|https://github.com/apache/drill/commit/ef6e522c9cba816110aa43ff6bccedf29a901236] > Add support for new metadata fetch APIs > --- > > Key: DRILL-4728 > URL: https://issues.apache.org/jira/browse/DRILL-4728 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.8.0 > > > Please see the doc attached to the parent JIRA DRILL-4714 for details on APIs. > Add support for following APIs (including {{protobuf}} messages, server > handling code and Java client APIs) > {code} >List getCatalogs(Filter catalogNameFilter) >List getSchemas( > Filter catalogNameFilter, > Filter schemaNameFilter >) >List getTables( > Filter catalogNameFilter, > Filter schemaNameFilter, > Filter tableNameFilter >) >List getColumns( > Filter catalogNameFilter, > Filter schemaNameFilter, > Filter tableNameFilter, > Filter columnNameFilter >) > {code} > Note: native client changes are not going to be included in this patch. Will > file a separate JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4785) Limit 0 queries regressed in Drill 1.7.0
[ https://issues.apache.org/jira/browse/DRILL-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4785. Resolution: Fixed > Limit 0 queries regressed in Drill 1.7.0 > - > > Key: DRILL-4785 > URL: https://issues.apache.org/jira/browse/DRILL-4785 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.7.0 > Environment: Redhat EL6 >Reporter: Dechang Gu >Assignee: Venki Korukanti > Fix For: 1.8.0 > > > We noticed a bunch of limit 0 queries regressed quite a bit: +2500ms, while > the same queries took ~400ms in Apache Drill 1.6.0. 5-6X regression. Further > investigation indicates that most likely the root cause of the regression is > in the commit: > vkorukanti committed with vkorukanti DRILL-4446: Support mandatory work > assignment to endpoint requirement… > commit id: 10afc708600ea9f4cb0e7c2cd981b5b1001fea0d > With drill build on this commit, query takes 3095ms > and in the drillbit.log: > 2016-07-15 17:27:55,048 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 28768074-4ed6-a70a-2e6a-add3201ab801: SELECT * FROM (SELECT > CAST(EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) AS > INTEGER) AS `mn_business_date_ok`,AVG((CASE WHEN ((CAST(EXTRACT(YEAR FROM > CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND > (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 > + 1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS > `avg_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM > CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND > (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 > + 1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS > `sum_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM > CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND > (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 > + 1 AS INTEGER) <= 4)) THEN 1 ELSE NULL END)) AS > `sum_Calculation_CJEBBAEBBFADBDFJ_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM > CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND > (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 > + 1 AS INTEGER) <= 4)) THEN (`rfm_sales`.`pos_comps` + > `rfm_sales`.`pos_promos`) ELSE NULL END)) AS `sum_Net_Sales__YTD___copy__ok` > FROM `dfs.xxx`.`views/rfm_sales` `rfm_sales` GROUP BY CAST(EXTRACT(MONTH FROM > CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER)) T LIMIT 0 > 2016-07-15 17:27:55,664 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 208 ms to read metadata from cache > file > 2016-07-15 17:27:56,783 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 129 ms to read metadata from cache > file > 2016-07-15 17:27:57,960 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2016-07-15 17:27:57,961 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State to report: RUNNING > 2016-07-15 17:27:57,989 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State change requested RUNNING --> > FINISHED > 2016-07-15 17:27:57,989 ucs-node2.perf.lab > [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State to report: FINISHED > while running the same query on the parent commit (commit id > 9f4fff800d128878094ae70b454201f79976135d), it only takes 492ms. > and in the drillbit.log: > 2016-07-15 17:19:27,309 ucs-node7.perf.lab > [2876826f-ee19-9466-0c0c-869f47c409f8:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 2876826f-ee19-9466-0c0c-869f47c409f8: SELECT * FROM (SELECT > CAST(EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) AS > INTEGER) AS `mn_business_date_ok`,AVG((CASE WHEN ((CAST(EXTRACT(YEAR FROM > CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND > (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 > + 1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS >
[jira] [Created] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient
Venki Korukanti created DRILL-4732: -- Summary: Update JDBC driver to use the new prepared statement APIs on DrillClient Key: DRILL-4732 URL: https://issues.apache.org/jira/browse/DRILL-4732 Project: Apache Drill Issue Type: Sub-task Reporter: Venki Korukanti DRILL-4729 is adding new prepared statement implementation on server side and it provides APIs on DrillClient to create new prepared statement which returns metadata along with a opaque handle and submit prepared statement for execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4730) Update JDBC DatabaseMetaData implementation to use new Metadata APIs
Venki Korukanti created DRILL-4730: -- Summary: Update JDBC DatabaseMetaData implementation to use new Metadata APIs Key: DRILL-4730 URL: https://issues.apache.org/jira/browse/DRILL-4730 Project: Apache Drill Issue Type: Sub-task Components: Client - JDBC Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.8.0 DRILL-4728 is going to add support for new metadata APIs. Replace the INFORMATION_SCHEMA queries used to get the metadata with the new APIs provided in Java client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4729) Add support for prepared statement implementation on server side
Venki Korukanti created DRILL-4729: -- Summary: Add support for prepared statement implementation on server side Key: DRILL-4729 URL: https://issues.apache.org/jira/browse/DRILL-4729 Project: Apache Drill Issue Type: Sub-task Components: Metadata Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.8.0 Currently Drill JDBC/ODBC driver implements its own prepared statement implementation, which basically issues limit 0 query to get the metadata and then executes the actual query. So the query is planned twice (for metadata fetch and actual execution). Proposal is to move that logic to server where we can make optimizations without disrupting/updating the JDBC/ODBC drivers. * {{PreparedStatement createPreparedStatement(String query)}}. {{PreparedStatement}} object contains the following: ** {{ResultSetMetadata getResultSetMetadata()}} *** {{ResultsSetMetadata}} contains methods to fetch info about output columns of the query. What info these methods provide is given in this [spreadsheet|https://docs.google.com/spreadsheets/d/1A6nqUQo5xJaZDQlDTittpVrK7t4Kylycs3P32Yn_O5k/edit?usp=sharing]. It lists the ODBC/JDBC requirements and what Drill will provided through object {{ResultsSetMetadata}}. *** Server can put more info here which is opaque to client and use it in server when the client sends execute prepared statement query request. Overload the current submit query API to take the {{PreparedStatement}} returned above. In the initial implementation, server side implementation of {{createPreparedStatement}} API is implemented as follows: * Runs the query with {{LIMIT 0}}, gets the schema * Convert the query into a binary blob and set it as opaque object in {{PreparedStatement}}. When the {{PreparedStatement}} is submitted for execution, reconstruct the query from binary blob in opaque component of {{PreparedStatement}} and execute it from scratch. Opaque component of the {{PreparedStatement}} is where we can save more information which we can use for optimizations/speedups. NOTE: We are not going to worry about parameters in prepared query in initial implementation. We can provide the functionality later if there is sufficient demand from Drill community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4728) Add support for new metadata fetch APIs
Venki Korukanti created DRILL-4728: -- Summary: Add support for new metadata fetch APIs Key: DRILL-4728 URL: https://issues.apache.org/jira/browse/DRILL-4728 Project: Apache Drill Issue Type: Sub-task Components: Metadata Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.8.0 Please see the doc attached to the parent JIRA DRILL-4714 for details on APIs. Add support for following APIs (including {{protobuf}} messages, server handling code and Java client APIs) {code} List getCatalogs(Filter catalogNameFilter) List getSchemas( Filter catalogNameFilter, Filter schemaNameFilter ) List getTables( Filter catalogNameFilter, Filter schemaNameFilter, Filter tableNameFilter ) List getColumns( Filter catalogNameFilter, Filter schemaNameFilter, Filter tableNameFilter, Filter columnNameFilter ) {code} Note: native client changes are not going to be included in this patch. Will file a separate JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4725) Improvements to InfoSchema RecordGenerator needed for DRILL-4714
[ https://issues.apache.org/jira/browse/DRILL-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4725. Resolution: Fixed Fixed in [f70df990f |https://git1-us-west.apache.org/repos/asf?p=drill.git;a=commit;h=f70df990]. > Improvements to InfoSchema RecordGenerator needed for DRILL-4714 > > > Key: DRILL-4725 > URL: https://issues.apache.org/jira/browse/DRILL-4725 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.7.0 > > > 1. Add support for pushing the filter on following fields into > InfoSchemaRecordGenerator: >- CATALOG_NAME >- COLUMN_NAME > 2. Pushdown LIKE with ESCAPE. Add test. > 3. Add a method visitCatalog() to InfoSchemaRecordGenerator to decide whether > to explore the catalog or not -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4725) Improvements to InfoSchema RecordGenerator needed for DRILL-4714
Venki Korukanti created DRILL-4725: -- Summary: Improvements to InfoSchema RecordGenerator needed for DRILL-4714 Key: DRILL-4725 URL: https://issues.apache.org/jira/browse/DRILL-4725 Project: Apache Drill Issue Type: Sub-task Components: Metadata Reporter: Venki Korukanti Assignee: Venki Korukanti 1. Add support for pushing the filter on following fields into InfoSchemaRecordGenerator: - CATALOG_NAME - COLUMN_NAME 2. Pushdown LIKE with ESCAPE. Add test. 3. Add a method visitCatalog() to InfoSchemaRecordGenerator to decide whether to explore the catalog or not -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4714) Add metadata and prepared statement APIs to DrillClient<->Drillbit interface
Venki Korukanti created DRILL-4714: -- Summary: Add metadata and prepared statement APIs to DrillClient<->Drillbit interface Key: DRILL-4714 URL: https://issues.apache.org/jira/browse/DRILL-4714 Project: Apache Drill Issue Type: New Feature Reporter: Venki Korukanti Currently ODBC/JDBC drivers spawn a set of queries on INFORMATION_SCHEMA for metadata. Client has to deal with submitting a query, reading query results and constructing required objects. Sometimes same work is done twice (planning work in case of prepare statements) to get the metadata and execute query. Instead we could simplify the client by providing APIs on the client interface and let the server construct the required objects and send them to client directly. These APIs provide common info that can be consumed by the JDBC/ODBC driver. Will attach a doc explaining the new APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4613) Skip the plugin if it throws errors when registering schemas
Venki Korukanti created DRILL-4613: -- Summary: Skip the plugin if it throws errors when registering schemas Key: DRILL-4613 URL: https://issues.apache.org/jira/browse/DRILL-4613 Project: Apache Drill Issue Type: Improvement Reporter: Venki Korukanti Assignee: Venki Korukanti Currently when registering schemas in root schema, if a plugin throws an exception we fail the query. This causes every query to fail as every query needs a complete schema tree. Plugins could throw exceptions due to transient errors (storage server is temporarily not reachable). If a plugin throws an exception during schema registration, log an error, skip the plugin and continue registering schemas from rest of the plugins. If the user is querying tables from other plugins, the query should succeed. If the user is querying the tables in skipped plugin, a table not found exception is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4446) Improve current fragment parallelization module
[ https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4446. Resolution: Fixed > Improve current fragment parallelization module > --- > > Key: DRILL-4446 > URL: https://issues.apache.org/jira/browse/DRILL-4446 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.5.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.7.0 > > > Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle > correctly the case where an operator has mandatory scheduling requirement for > a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how > much portion of the total tasks to be scheduled on each DrillbitEndpoint). It > assumes that scheduling requirements are soft (except one case where Mux and > DeMux case where mandatory parallelization requirement of 1 unit). > An example is: > Cluster has 3 nodes running Drillbits and storage service on each. Data for a > table is only present at storage services in two nodes. So a GroupScan needs > to be scheduled on these two nodes in order to read the data. Storage service > doesn't support (or costly) reading data from remote node. > Inserting the mandatory scheduling requirements within existing > SimpleParallelizer is not sufficient as you may end up with a plan that has a > fragment with two GroupScans each having its own hard parallelization > requirements. > Proposal is: > Add a property to each operator which tells what parallelization > implementation to use. Most operators don't have any particular strategy > (such as Project or Filter), they depend on incoming operator. Current > existing operators which have requirements (all existing GroupScans) default > to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new > mandatory assignment parallelizer. It is possible that PhysicalPlan generated > can have a fragment with operators having different parallelization > strategies. In that case an exchange is inserted in between operators where a > change in parallelization strategy is required. > Will send a detailed design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4603) Refactor FileSystem plugin code to allow customizations
Venki Korukanti created DRILL-4603: -- Summary: Refactor FileSystem plugin code to allow customizations Key: DRILL-4603 URL: https://issues.apache.org/jira/browse/DRILL-4603 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.7.0 Currently FileSystemPlugin is hard to extend, lot of logic for creating component implementations ({{WorkspaceSchemaFactory}}s, {{FormatCreator}, defining default workspaces and configuration (implicit to FileSystem implementation)) are hard coded in constructor. This JIRA is to track * refactoring the FileSystemPlugin to allow custom component implementations (Configuration, WorkSpaceSchemaFactory, FileSystemSchemaFactory or FormatCreator). * Share a single Hadoop {{Configuration}} object to create new {{Configuration}} objects. Creating a new {{Configuration}} without an existing copy is not efficient, because it involves scanning the classpath for *-site files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4593) Remove OldAssignmentCreator in FileSystemPlugin
Venki Korukanti created DRILL-4593: -- Summary: Remove OldAssignmentCreator in FileSystemPlugin Key: DRILL-4593 URL: https://issues.apache.org/jira/browse/DRILL-4593 Project: Apache Drill Issue Type: Bug Components: Storage - Other Reporter: Venki Korukanti Assignee: Venki Korukanti AssignmentCreator was changed in DRILL-2725. Old assignment creator was kept as fallback option incase of any failures in the new assignment creator. New AssignmentCreator was added an year ago and is the default one. No problems are reported. I think it is safe to get rid of the old AssignmentCreator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4549) Add support for more truncation units in date_trunc function
[ https://issues.apache.org/jira/browse/DRILL-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4549. Resolution: Fixed > Add support for more truncation units in date_trunc function > > > Key: DRILL-4549 > URL: https://issues.apache.org/jira/browse/DRILL-4549 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.7.0 > > > Currently we support only {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} truncate > units for types {{TIME, TIMESTAMP and DATE}}. Extend the functions to support > {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEK, QUARTER, DECADE, CENTURY, > MILLENNIUM}} truncate units for types {{TIME, TIMESTAMP, DATE, INTERVAL DAY, > INTERVAL YEAR}}. > Also get rid of the if-and-else (on truncation unit) implementation. Instead > resolve to a direct function based on the truncation unit in Calcite -> Drill > (DrillOptiq) expression conversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4550) Add support more time units in extract function
Venki Korukanti created DRILL-4550: -- Summary: Add support more time units in extract function Key: DRILL-4550 URL: https://issues.apache.org/jira/browse/DRILL-4550 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.6.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.7.0 Currently {{extract}} function support following units {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}}. Add support for more units: {{CENTURY, DECADE, DOW, DOY, EPOCH, MILLENNIUM, QUARTER, WEEK}}. We also need changes in the SQL parser. Currently the parser only allows {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} as units. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4549) Add support for more units in date_trunc function
Venki Korukanti created DRILL-4549: -- Summary: Add support for more units in date_trunc function Key: DRILL-4549 URL: https://issues.apache.org/jira/browse/DRILL-4549 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.6.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.7.0 Currently we support only {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} truncate units for types {{TIME, TIMESTAMP and DATE}}. Extend the functions to support {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEK, QUARTER, DECADE, CENTURY, MILLENNIUM}} truncate units for types {{TIME, TIMESTAMP, DATE, INTERVAL DAY, INTERVAL YEAR}}. Also get rid of the if-and-else (on truncation unit) implementation. Instead resolve to a direct function based on the truncation unit in Calcite -> Drill (DrillOptiq) expression conversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4509) Ignore unknown storage plugin configs while starting Drillbit
Venki Korukanti created DRILL-4509: -- Summary: Ignore unknown storage plugin configs while starting Drillbit Key: DRILL-4509 URL: https://issues.apache.org/jira/browse/DRILL-4509 Project: Apache Drill Issue Type: Bug Components: Server Affects Versions: 1.5.0 Reporter: Venki Korukanti Priority: Minor Fix For: 1.7.0 If zookeeper contains a storage plugin configuration whose implementation is not found while starting the Drillbit, Drillbit throws an error and fails to restart: {code} Could not resolve type id 'newPlugin' into a subtype of [simple type, class org.apache.drill.common.logical.StoragePluginConfig]: known type ids = [InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file, hbase, hive, jdbc, kudu, mock, mongo, named] {code} Should we ignore such plugins with a warning in logs and continue starting Drillbit? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4508) Null proof all AutoCloseable.close() methods
Venki Korukanti created DRILL-4508: -- Summary: Null proof all AutoCloseable.close() methods Key: DRILL-4508 URL: https://issues.apache.org/jira/browse/DRILL-4508 Project: Apache Drill Issue Type: Bug Components: Server Affects Versions: 1.5.0 Reporter: Venki Korukanti Priority: Minor Fix For: 1.7.0 If Drillbit fails to start (due to incorrect configuration or storage plugin information not found etc.), we end up calling close on various components such as WebServer, Drillbit etc. Some of these components may not have initialized and may have null values. Close() method is not checking for null values before reading them. One example is here: {code} java.lang.NullPointerException: null at org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) ~[drill-common-1.6.0.jar:1.6.0] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) ~[drill-common-1.6.0.jar:1.6.0] at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) [drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) [drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) [drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) [drill-java-exec-1.6.0.jar:1.6.0] {code} This masks the actual error (incorrect configuration) and it is hard to know what went wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4483) Fix text plan issue in query profiles
[ https://issues.apache.org/jira/browse/DRILL-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4483. Resolution: Fixed Fix Version/s: 1.6.0 > Fix text plan issue in query profiles > - > > Key: DRILL-4483 > URL: https://issues.apache.org/jira/browse/DRILL-4483 > Project: Apache Drill > Issue Type: Bug >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.6.0 > > > Text plan (and visualized plan) in query profiles is empty. As of 1.5.0, we > display text plan (and visualized plan) for SQL queries and CTAS (not for > DirectPlans (alter session, show tables etc.) or explan query). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4483) Fix text plan issue in query profiles
Venki Korukanti created DRILL-4483: -- Summary: Fix text plan issue in query profiles Key: DRILL-4483 URL: https://issues.apache.org/jira/browse/DRILL-4483 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti Text plan (and visualized plan) in query profiles is empty. As of 1.5.0, we display text plan (and visualized plan) for SQL queries and CTAS (not for DirectPlans (alter session, show tables etc.) or explan query). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4327) Fix rawtypes warning emitted by compiler
[ https://issues.apache.org/jira/browse/DRILL-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4327. Resolution: Fixed Fix Version/s: 1.6.0 > Fix rawtypes warning emitted by compiler > > > Key: DRILL-4327 > URL: https://issues.apache.org/jira/browse/DRILL-4327 > Project: Apache Drill > Issue Type: Improvement >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Fix For: 1.6.0 > > > The Drill codebase references lots of rawtypes, which generates lots of > warning from the compiler. > Since Drill is now compiled with Java 1.7, it should use generic types as > much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4354) Remove sessions in anonymous (user auth disabled) mode in WebUI server
[ https://issues.apache.org/jira/browse/DRILL-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4354. Resolution: Fixed > Remove sessions in anonymous (user auth disabled) mode in WebUI server > -- > > Key: DRILL-4354 > URL: https://issues.apache.org/jira/browse/DRILL-4354 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.5.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.6.0 > > > Currently we open anonymous sessions when user auth disabled. These sessions > are cleaned up when they expire (controlled by boot config > {{drill.exec.http.session_max_idle_secs}}). This may lead to unnecessary > resource accumulation. This JIRA is to remove anonymous sessions and only > have sessions when user authentication is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4410. Resolution: Fixed Fix Version/s: 1.6.0 > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > Fix For: 1.6.0 > > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config
[ https://issues.apache.org/jira/browse/DRILL-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4383. Resolution: Fixed > Allow passign custom configuration options to a file system through the > storage plugin config > - > > Key: DRILL-4383 > URL: https://issues.apache.org/jira/browse/DRILL-4383 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.6.0 > > > A similar feature already exists in the Hive and Hbase plugins, it simply > provides a key/value map for passing custom configuration options to the > underlying storage system. > This would be useful for the filesystem plugin to configure S3 without > needing to create a core-site.xml file or restart Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4446) Improve current fragment parallelization module
Venki Korukanti created DRILL-4446: -- Summary: Improve current fragment parallelization module Key: DRILL-4446 URL: https://issues.apache.org/jira/browse/DRILL-4446 Project: Apache Drill Issue Type: New Feature Affects Versions: 1.5.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.6.0 Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle correctly the case where an operator has mandatory scheduling requirement for a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how much portion of the total tasks to be scheduled on each DrillbitEndpoint). It assumes that scheduling requirements are soft (except one case where Mux and DeMux case where mandatory parallelization requirement of 1 unit). An example is: Cluster has 3 nodes running Drillbits and storage service on each. Data for a table is only present at storage services in two nodes. So a GroupScan needs to be scheduled on these two nodes in order to read the data. Storage service doesn't support (or costly) reading data from remote node. Inserting the mandatory scheduling requirements within existing SimpleParallelizer is not sufficient as you may end up with a plan that has a fragment with two GroupScans each having its own hard parallelization requirements. Proposal is: Add a property to each operator which tells what parallelization implementation to use. Most operators don't have any particular strategy (such as Project or Filter), they depend on incoming operator. Current existing operators which have requirements (all existing GroupScans) default to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new mandatory assignment parallelizer. It is possible that PhysicalPlan generated can have a fragment with operators having different parallelization strategies. In that case an exchange is inserted in between operators where a change in parallelization strategy is required. Will send a detailed design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4434) Remove (or deprecate) GroupScan.enforceWidth and use GroupScan.getMinParallelization
Venki Korukanti created DRILL-4434: -- Summary: Remove (or deprecate) GroupScan.enforceWidth and use GroupScan.getMinParallelization Key: DRILL-4434 URL: https://issues.apache.org/jira/browse/DRILL-4434 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti It seems like enforceWidth is not necessary which is used only in ExcessibleExchangeRemover. Instead we should rely on GroupScan.getMinParallelization(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak
Venki Korukanti created DRILL-4353: -- Summary: Expired sessions in web server are not cleaning up resources, leading to resource leak Key: DRILL-4353 URL: https://issues.apache.org/jira/browse/DRILL-4353 Project: Apache Drill Issue Type: Bug Components: Web Server, Client - HTTP Affects Versions: 1.5.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Blocker Fix For: 1.5.0 Currently we store the session resources (including DrillClient) in attribute {{SessionAuthentication}} object which implements {{HttpSessionBindingListener}}. Whenever a session is invalidated, all attributes are removed and if an attribute class implements {{HttpSessionBindingListener}}, listener is informed. {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} logs out the user which includes cleaning up the resources as well, but {{SessionAuthentication}} relies on ServletContext stored in thread local variable (see [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]). In case of thread that cleans up the expired sessions there is no {{ServletContext}} in thread local variable, leading to not logging out the user properly and resource leak. Fix: Add {{HttpSessionEventListener}} to cleanup the {{SessionAuthentication}} and resources every time a HttpSession is expired or invalidated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4354) Remove sessions in anonymous (user auth disabled) mode in WebUI server
Venki Korukanti created DRILL-4354: -- Summary: Remove sessions in anonymous (user auth disabled) mode in WebUI server Key: DRILL-4354 URL: https://issues.apache.org/jira/browse/DRILL-4354 Project: Apache Drill Issue Type: Bug Components: Server Affects Versions: 1.5.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.5.0 Currently we open anonymous sessions when user auth disabled. These sessions are cleaned up when they expire (controlled by boot config {{drill.exec.http.session_max_idle_secs}}). This may lead to unnecessary resource accumulation. This JIRA is to remove anonymous sessions and only have sessions when user authentication is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4328) Fix for backward compatibility regression caused by DRILL-4198
Venki Korukanti created DRILL-4328: -- Summary: Fix for backward compatibility regression caused by DRILL-4198 Key: DRILL-4328 URL: https://issues.apache.org/jira/browse/DRILL-4328 Project: Apache Drill Issue Type: Bug Components: Storage - Other Reporter: Venki Korukanti Assignee: Venki Korukanti Revert updates made to StoragePlugin interface in DRILL-4198. Instead add the new methods to AbstractStoragePlugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3624) Enhance Web UI to be able to select schema ("use")
[ https://issues.apache.org/jira/browse/DRILL-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3624. Resolution: Fixed Fix Version/s: (was: Future) 1.5.0 > Enhance Web UI to be able to select schema ("use") > -- > > Key: DRILL-3624 > URL: https://issues.apache.org/jira/browse/DRILL-3624 > Project: Apache Drill > Issue Type: Wish > Components: Client - HTTP >Affects Versions: 1.1.0 >Reporter: Uwe Geercken >Priority: Minor > Fix For: 1.5.0 > > > it would be advantageous to be able to select a schema ("use") in the web ui, > so that the information does not always have to be specified in each query. > this could be realized e.g. through a drop down where the user selects the > schema from the list of available schemas. the ui should store this > information until a different schema is selected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4169) Upgrade Hive Storage Plugin to work with latest stable Hive (v1.2.1)
[ https://issues.apache.org/jira/browse/DRILL-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4169. Resolution: Fixed > Upgrade Hive Storage Plugin to work with latest stable Hive (v1.2.1) > > > Key: DRILL-4169 > URL: https://issues.apache.org/jira/browse/DRILL-4169 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Affects Versions: 1.4.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.5.0 > > > There have been few bug fixes in Hive SerDes since Hive 1.0.0. Its good to > update the Hive storage plugin to work with latest stable Hive version > (1.2.1), so that HiveRecordReader can use the latest SerDes. > Compatibility when working with lower versions (v1.0.0 - currently supported > version) of Hive servers: There are no metastore API changes between Hive > 1.0.0 and Hive 1.2.1 that affect how Drill's Hive storage plugin is > interacting with Hive metastore. Tested to make sure it works fine. So users > can use Drill to query Hive 1.0.0 (currently supported) and Hive 1.2.1 (new > addition in this JIRA). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4198) Enhance StoragePlugin interface to expose logical space rules for planning purpose
[ https://issues.apache.org/jira/browse/DRILL-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4198. Resolution: Fixed > Enhance StoragePlugin interface to expose logical space rules for planning > purpose > -- > > Key: DRILL-4198 > URL: https://issues.apache.org/jira/browse/DRILL-4198 > Project: Apache Drill > Issue Type: Improvement >Reporter: Venki Korukanti >Assignee: Venki Korukanti > > Currently StoragePlugins can only expose rules that are executed in physical > space. Add an interface method to StoragePlugin to expose logical space rules > to planner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4194) Improve the performance of metadata fetch operation in HiveScan
[ https://issues.apache.org/jira/browse/DRILL-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4194. Resolution: Fixed > Improve the performance of metadata fetch operation in HiveScan > --- > > Key: DRILL-4194 > URL: https://issues.apache.org/jira/browse/DRILL-4194 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.4.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.5.0 > > > Current HiveScan fetches the InputSplits for all partitions when {{HiveScan}} > is created. This causes long delays when the table contains large number of > partitions. If we end up pruning majority of partitions, this delay is > unnecessary. > We need this InputSplits info from the beginning of planning because > * it is used in calculating the cost of the {{HiveScan}}. Currently when > calculating the cost first we look at the rowCount (from Hive MetaStore), if > it is available we use it in cost calculation. Otherwise we estimate the > rowCount from InputSplits. > * We also need the InputSplits for determining whether {{HiveScan}} is a > singleton or distributed for adding appropriate traits in {{ScanPrule}} > Fix is to delay the loading of the InputSplits until we need. There are two > cases where we need it. If we end up fetching the InputSplits, store them > until the query completes. > * If the stats are not available, then we need InputSplits > * If the partition is not pruned we need it for parallelization purposes. > Regarding getting the parallelization info in {{ScanPrule}}: Had a discussion > with [~amansinha100]. All we need at this point is whether the data is > distributed or singleton at this point. Added a method {{isSingleton()}} to > GroupScan. Returning {{false}} seems to work fine for HiveScan, but I am not > sure of the implications here. We also have {{ExcessiveExchangeIdentifier}} > which removes unnecessary exchanges by looking at the parallelization info. I > think it is ok to return the parallelization info here as the pruning must > have already completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4198) Enhance StoragePlugin interface to expose logical space rules for planning purpose
Venki Korukanti created DRILL-4198: -- Summary: Enhance StoragePlugin interface to expose logical space rules for planning purpose Key: DRILL-4198 URL: https://issues.apache.org/jira/browse/DRILL-4198 Project: Apache Drill Issue Type: Improvement Reporter: Venki Korukanti Assignee: Venki Korukanti Currently StoragePlugins can only expose rules that are executed in physical space. Add an interface method to StoragePlugin to expose logical space rules to planner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4194) Improve the performance of metadata fetch operation in HiveScan
Venki Korukanti created DRILL-4194: -- Summary: Improve the performance of metadata fetch operation in HiveScan Key: DRILL-4194 URL: https://issues.apache.org/jira/browse/DRILL-4194 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 1.4.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.5.0 Current HiveScan fetches the InputSplits for all partitions when {{HiveScan}} is created. This causes long delays when the table contains large number of partitions. If we end up pruning majority of partitions, this delay is unnecessary. We need this InputSplits info from the beginning of planning because * it is used in calculating the cost of the {{HiveScan}}. Currently when calculating the cost first we look at the rowCount (from Hive MetaStore), if it is available we use it in cost calculation. Otherwise we estimate the rowCount from InputSplits. * We also need the InputSplits for determining whether {{HiveScan}} is a singleton or distributed for adding appropriate traits in {{ScanPrule}} Fix is to delay the loading of the InputSplits until we need. There are two cases where we need it. If we end up fetching the InputSplits, store them until the query completes. * If the stats are not available, then we need InputSplits * If the partition is not pruned we need it for parallelization purposes. Regarding getting the parallelization info in {{ScanPrule}}: Had a discussion with [~amansinha100]. All we need at this point is whether the data is distributed or singleton at this point. Added a method {{isSingleton()}} to GroupScan. Returning {{false}} seems to work fine for HiveScan, but I am not sure of the implications here. We also have {{ExcessiveExchangeIdentifier}} which removes unnecessary exchanges by looking at the parallelization info. I think it is ok to return the parallelization info here as the pruning must have already completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4165) IllegalStateException in MergeJoin for a query against TPC-DS data
[ https://issues.apache.org/jira/browse/DRILL-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4165. Resolution: Fixed Fix Version/s: 1.4.0 > IllegalStateException in MergeJoin for a query against TPC-DS data > -- > > Key: DRILL-4165 > URL: https://issues.apache.org/jira/browse/DRILL-4165 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.4.0 >Reporter: Aman Sinha >Assignee: amit hadke > Fix For: 1.4.0 > > > I am seeing the following on the 1.4.0 branch. > {noformat} > 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false; > .. > 0: jdbc:drill:zk=local> select count(*) from dfs.`tpcds/store_sales` ss1, > dfs.`tpcds/store_sales` ss2 where ss1.ss_customer_sk = ss2.ss_customer_sk and > ss1.ss_store_sk = 1 and ss2.ss_store_sk = 2; > Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#55, > MergeJoinBatch] has size 1984616, which is beyond the limit of 65536 > Fragment 0:0 > [Error Id: 18bf00fe-52d7-4d84-97ec-b04a035afb4e on 192.168.1.103:31010] > (java.lang.IllegalStateException) Incoming batch [#55, MergeJoinBatch] has > size 1984616, which is beyond the limit of 65536 > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():305 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4159) TestCsvHeader sometimes fails due to ordering issue
[ https://issues.apache.org/jira/browse/DRILL-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4159. Resolution: Fixed Fix Version/s: 1.4.0 > TestCsvHeader sometimes fails due to ordering issue > --- > > Key: DRILL-4159 > URL: https://issues.apache.org/jira/browse/DRILL-4159 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.4.0 > > > This test should be rewritten to use the query test framework, rather than > doing a string comparison of the entire result set. And it should be > specified as unordered, so that results aren't affected by the random order > in which files are read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4047) Select with options
[ https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4047. Resolution: Fixed Fix Version/s: 1.4.0 > Select with options > --- > > Key: DRILL-4047 > URL: https://issues.apache.org/jira/browse/DRILL-4047 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 1.4.0 > > > Add a mechanism to pass parameters down to the StoragePlugin when writing a > Select statement. > Some discussion here: > http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E > http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4053) Reduce metadata cache file size
[ https://issues.apache.org/jira/browse/DRILL-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4053. Resolution: Fixed > Reduce metadata cache file size > --- > > Key: DRILL-4053 > URL: https://issues.apache.org/jira/browse/DRILL-4053 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Affects Versions: 1.3.0 >Reporter: Parth Chandra >Assignee: Parth Chandra > Fix For: 1.4.0 > > > The parquet metadata cache file has fair amount of redundant metadata that > causes the size of the cache file to bloat. Two things that we can reduce are > : > 1) Schema is repeated for every row group. We can keep a merged schema > (similar to what was discussed for insert into functionality) 2) The max and > min value in the stats are used for partition pruning when the values are the > same. We can keep the maxValue only and that too only if it is the same as > the minValue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4108) Query on csv file w/ header fails with an exception when non existing column is requested
[ https://issues.apache.org/jira/browse/DRILL-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4108. Resolution: Fixed Assignee: Abhijit Pol > Query on csv file w/ header fails with an exception when non existing column > is requested > - > > Key: DRILL-4108 > URL: https://issues.apache.org/jira/browse/DRILL-4108 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.3.0 >Reporter: Abhi Pol >Assignee: Abhijit Pol > Fix For: 1.4.0 > > > Drill query on a csv file with header requesting column(s) that do not exists > in header fails with an exception. > *Current behavior:* once extractHeader is enabled, query columns must be > columns from the header > *Expected behavior:* non existing columns should appear with 'null' values > like default drill behavior > {noformat} > 0: jdbc:drill:zk=local> select Category from dfs.`/tmp/cars.csvh` limit 10; > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.drill.exec.store.easy.text.compliant.FieldVarCharOutput.(FieldVarCharOutput.java:104) > at > org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:118) > at > org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:108) > at > org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:198) > at > org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35) > at > org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:151) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:105) > at > org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: -1 > Fragment 0:0 > [Error Id: f272960e-fa2f-408e-918c-722190398cd3 on blackhole:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4081. Resolution: Fixed Fix Version/s: 1.4.0 > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.4.0 > > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4094) Respect -DskipTests=true for JDBC plugin tests
[ https://issues.apache.org/jira/browse/DRILL-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4094. Resolution: Fixed Fix Version/s: 1.4.0 > Respect -DskipTests=true for JDBC plugin tests > -- > > Key: DRILL-4094 > URL: https://issues.apache.org/jira/browse/DRILL-4094 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Andrew >Assignee: Andrew >Priority: Trivial > Fix For: 1.4.0 > > > The maven config for the JDBC storage plugin does not respect the -DskipTests > option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs
[ https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4124. Resolution: Fixed Fix Version/s: 1.4.0 > Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise > in logs > --- > > Key: DRILL-4124 > URL: https://issues.apache.org/jira/browse/DRILL-4124 > Project: Apache Drill > Issue Type: Improvement >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3938) Hive: Failure reading from a partition when a new column is added to the table after the partition creation
[ https://issues.apache.org/jira/browse/DRILL-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3938. Resolution: Fixed > Hive: Failure reading from a partition when a new column is added to the > table after the partition creation > --- > > Key: DRILL-3938 > URL: https://issues.apache.org/jira/browse/DRILL-3938 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 0.4.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.4.0 > > > Repro: > From Hive: > {code} > CREATE TABLE kv(key INT, value STRING); > LOAD DATA LOCAL INPATH > '/Users/hadoop/apache-repos/hive-install/apache-hive-1.0.0-bin/examples/files/kv1.txt' > INTO TABLE kv; > CREATE TABLE kv_p(key INT, value STRING, part1 STRING); > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.max.dynamic.partitions=1; > set hive.exec.max.dynamic.partitions.pernode=1; > INSERT INTO TABLE kv_p PARTITION (part1) SELECT key, value, value as s FROM > kv; > ALTER TABLE kv_p ADD COLUMNS (newcol STRING); > {code} > From Drill: > {code} > USE hive; > DESCRIBE kv_p; > SELECT newcol FROM kv_p; > throws column 'newcol' not found error in HiveRecordReader while selecting > only the projected columns. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3893) Issue with Drill after Hive Alters the Table
[ https://issues.apache.org/jira/browse/DRILL-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3893. Resolution: Fixed Assignee: Venki Korukanti Fix Version/s: 1.4.0 > Issue with Drill after Hive Alters the Table > - > > Key: DRILL-3893 > URL: https://issues.apache.org/jira/browse/DRILL-3893 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Storage - Hive >Affects Versions: 1.0.0, 1.1.0 > Environment: DEV >Reporter: arnab chatterjee >Assignee: Venki Korukanti > Fix For: 1.4.0 > > > I reproduced this again on another partitioned table with existing data. > Providing some more details. I have enabled the version mode for errors. > Drill is unable to fetch the new column name that was introduced.This most > likely to me seems to me that it’s still picking up the stale metadata of > hive. > if (!tableColumns.contains(columnName)) { > if (partitionNames.contains(columnName)) { > selectedPartitionNames.add(columnName); > } else { > throw new ExecutionSetupException(String.format("Column %s does > not exist", columnName)); > } > } > select testdata from testtable; > Error: SYSTEM ERROR: ExecutionSetupException: Column testdata does not exist > Fragment 0:0 > [Error Id: be5cccba-97f6-4cc4-94e8-c11a4c53c8f4 on x.x.com:] > (org.apache.drill.common.exceptions.ExecutionSetupException) Failure while > initializing HiveRecordReader: Column testdata does not exist > org.apache.drill.exec.store.hive.HiveRecordReader.init():241 > org.apache.drill.exec.store.hive.HiveRecordReader.():138 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 > org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) > Column testdata does not exist > org.apache.drill.exec.store.hive.HiveRecordReader.init():206 > org.apache.drill.exec.store.hive.HiveRecordReader.():138 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 > org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > # > Please note that this is a partitioned table with existing data. > Does Drill Cache the Meta somewhere and hence it’s not getting reflected > immediately ? > DRILL CLI > > select x from xx; > Error: SYSTEM ERROR: ExecutionSetupException: Column x does not exist > Fragment 0:0 > [Error Id: 62086e22-1341-459e-87ce-430a24cc5119 on x.x.com:999] > (state=,code=0) > HIVE CLI > hive> describe formatted x; > OK > # col_name data_type comment > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3739) NPE on select from Hive for HBase table
[ https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3739. Resolution: Fixed Assignee: Venki Korukanti Fix Version/s: 1.4.0 > NPE on select from Hive for HBase table > --- > > Key: DRILL-3739 > URL: https://issues.apache.org/jira/browse/DRILL-3739 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: ckran >Assignee: Venki Korukanti >Priority: Critical > Fix For: 1.4.0 > > > For a table in HBase or MapR-DB with metadata created in Hive so that it can > be accessed through beeline or Hue. From Drill query fail with > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4111) turn tests off in travis as they don't work there
[ https://issues.apache.org/jira/browse/DRILL-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-4111. Resolution: Fixed Fix Version/s: 1.4.0 > turn tests off in travis as they don't work there > - > > Key: DRILL-4111 > URL: https://issues.apache.org/jira/browse/DRILL-4111 > Project: Apache Drill > Issue Type: Task >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 1.4.0 > > > Since the travis build always fails, we should just turn it off for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3980) Build failure in -Pmapr profile (due to DRILL-3749)
Venki Korukanti created DRILL-3980: -- Summary: Build failure in -Pmapr profile (due to DRILL-3749) Key: DRILL-3980 URL: https://issues.apache.org/jira/browse/DRILL-3980 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti Currently mapr profile is relying on older version (< 2.7.1) of Hadoop, causing compile failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3938) Hive: Failure reading from a partition when a new column is added to the table after the partition creation
Venki Korukanti created DRILL-3938: -- Summary: Hive: Failure reading from a partition when a new column is added to the table after the partition creation Key: DRILL-3938 URL: https://issues.apache.org/jira/browse/DRILL-3938 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 0.4.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.3.0 Repro: >From Hive: {code} CREATE TABLE kv(key INT, value STRING); LOAD DATA LOCAL INPATH '/Users/hadoop/apache-repos/hive-install/apache-hive-1.0.0-bin/examples/files/kv1.txt' INTO TABLE kv; CREATE TABLE kv_p(key INT, value STRING, part1 STRING); set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions=1; set hive.exec.max.dynamic.partitions.pernode=1; INSERT INTO TABLE kv_p PARTITION (part1) SELECT key, value, value as s FROM kv; ALTER TABLE kv_p ADD COLUMNS (newcol STRING); {code} >From Drill: {code} USE hive; DESCRIBE kv_p; SELECT newcol FROM kv_p; throws column 'newcol' not found error in HiveRecordReader while selecting only the projected columns. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3931) Upgrade fileclient dependency in mapr profile
Venki Korukanti created DRILL-3931: -- Summary: Upgrade fileclient dependency in mapr profile Key: DRILL-3931 URL: https://issues.apache.org/jira/browse/DRILL-3931 Project: Apache Drill Issue Type: Improvement Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.3.0 Current dependency version is 4.1.0-mapr. There is a critical fix that went into 4.1.0.34989-mapr. Upgrade the dependency version to 4.1.0.34989-mapr. Only pom file changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3911) Upgrade Hadoop from 2.4.1 to latest stable
[ https://issues.apache.org/jira/browse/DRILL-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3911. Resolution: Duplicate > Upgrade Hadoop from 2.4.1 to latest stable > -- > > Key: DRILL-3911 > URL: https://issues.apache.org/jira/browse/DRILL-3911 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Reporter: Andrew >Assignee: Andrew > Fix For: 1.3.0 > > > Later versions of Hadoop have improved S3 compatibility > (https://issues.apache.org/jira/browse/HADOOP-10400). > Since users are increasingly using Drill with S3, we should upgrade our > Hadoop dependency so we can get the best integration possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3884) Hive native scan has lower parallelization leading to performance degradation
[ https://issues.apache.org/jira/browse/DRILL-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3884. Resolution: Fixed > Hive native scan has lower parallelization leading to performance degradation > - > > Key: DRILL-3884 > URL: https://issues.apache.org/jira/browse/DRILL-3884 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Hive >Affects Versions: 1.2.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti >Priority: Critical > Fix For: 1.2.0 > > > Currently {{HiveDrillNativeParquetScan.getScanStats()}} divides the rowCount > got from {{HiveScan}} by a factor and returns that as cost. Problem is all > cost calculations and parallelization depends on the rowCount. Value > {{cpuCost}} is not taken into consideration in current cost calculations in > {{ScanPrel}}. In order for the planner to choose > {{HiveDrillNativeParquetScan}} over {{HiveScan}}, rowCount has to be lowered > for the former, but this leads to lower parallelization and performance > degradation. > Temporary fix for Drill 1.2 before DRILL-3856 fully resolves considering CPU > cost in cost model: > 1. Change ScanPrel to consider the CPU cost in given Stats from GroupScan > 2. Have higher CPU cost for {{HiveScan}} (SerDe route) > 3. Lower CPU cost for {{HiveDrillNativeParquetScan}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3884) Hive native scan has lower parallelization leading to performance degradation
Venki Korukanti created DRILL-3884: -- Summary: Hive native scan has lower parallelization leading to performance degradation Key: DRILL-3884 URL: https://issues.apache.org/jira/browse/DRILL-3884 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization, Storage - Hive Affects Versions: 1.2.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Critical Fix For: 1.2.0 Currently {{HiveDrillNativeParquetScan.getScanStats()}} divides the rowCount got from {{HiveScan}} by a factor and returns that as cost. Problem is all cost calculations and parallelization depends on the rowCount. Value {{cpuCost}} is not taken into consideration in current cost calculations in {{ScanPrel}}. In order for the planner to choose {{HiveDrillNativeParquetScan}} over {{HiveScan}}, rowCount has to be lowered for the former, but this leads to lower parallelization and performance degradation. Temporary fix for Drill 1.2 before DRILL-3856 fully resolves considering CPU cost in cost model: 1. Change ScanPrel to consider the CPU cost in given Stats from GroupScan 2. Have higher CPU cost for {{HiveScan}} (SerDe route) 3. Lower CPU cost for {{HiveDrillNativeParquetScan}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3852) HiveScan is not expanding '*'
Venki Korukanti created DRILL-3852: -- Summary: HiveScan is not expanding '*' Key: DRILL-3852 URL: https://issues.apache.org/jira/browse/DRILL-3852 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization, Storage - Hive Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.3.0 Any {{SELECT \*}} query on hive table is not expanding the '\*' into columns in HiveScan. This results in execution to have code that handles the '\*' separately. We should expand the '\*' as for Hive the schema is known to avoid complexity of execution code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3857) Enhance HiveDrillNativeParquetScan and related classed to support multiple formats.
Venki Korukanti created DRILL-3857: -- Summary: Enhance HiveDrillNativeParquetScan and related classed to support multiple formats. Key: DRILL-3857 URL: https://issues.apache.org/jira/browse/DRILL-3857 Project: Apache Drill Issue Type: Sub-task Components: Query Planning & Optimization, Storage - Hive Affects Versions: 1.2.0 Reporter: Venki Korukanti Currently DRILL-3209 only adds support for reading Hive parquet tables using Drill's native parquet reader. It would be better if we can abstract out or define clear interfaces so that it can be extended by other formats such as text or avro to use the Drill's native reader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3746) Hive query fails if the table contains external partitions
[ https://issues.apache.org/jira/browse/DRILL-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3746. Resolution: Fixed > Hive query fails if the table contains external partitions > -- > > Key: DRILL-3746 > URL: https://issues.apache.org/jira/browse/DRILL-3746 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.2.0 > > > If Hive contains a table which has external partitions, Drill fails in > partition pruning code, which causes the query to fail. > {code} > CREATE TABLE external_partition_test (boolean_field BOOLEAN) PARTITIONED BY > (boolean_part BOOLEAN); > ALTER TABLE external_partition_test ADD PARTITION (boolean_part='true') > LOCATION '/some/path'; > ALTER TABLE external_partition_test ADD PARTITION (boolean_part='false') > LOCATION '/some/path'; > {code} > Query: > {code} > SELECT * FROM hive.`default`.external_partition_test where boolean_part = > false > {code} > Exception: > {code} > java.lang.StringIndexOutOfBoundsException > String index out of range: -14 > at java.lang.String.substring(String.java:1875) ~[na:1.7.0_45] > at > org.apache.drill.exec.planner.sql.HivePartitionLocation.(HivePartitionLocation.java:31) > ~[classes/:na] > at > org.apache.drill.exec.planner.sql.HivePartitionDescriptor.getPartitions(HivePartitionDescriptor.java:117) > ~[classes/:na] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:185) > ~[classes/:na] > at > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2.onMatch(HivePushPartitionFilterIntoScan.java:92) > ~[classes/:na] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > ~[calcite-core-1.4.0-drill-r0.jar:1.4.0-drill-r0] > {code} > Looking at {{HivePartitionLocation}}, it looks like we are depending on the > organization of files on FileSystem to get the partition values. We should > get the partition values from MetaStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3746) Hive partition pruning fails if the table contains external partitions
Venki Korukanti created DRILL-3746: -- Summary: Hive partition pruning fails if the table contains external partitions Key: DRILL-3746 URL: https://issues.apache.org/jira/browse/DRILL-3746 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti If Hive contains a table which has external partitions, Drill fails in partition pruning code, which causes the query to fail. {code} CREATE TABLE external_partition_test (boolean_field BOOLEAN) PARTITIONED BY (boolean_part BOOLEAN); ALTER TABLE external_partition_test ADD PARTITION (boolean_part='true') LOCATION '/some/path'; ALTER TABLE external_partition_test ADD PARTITION (boolean_part='false') LOCATION '/some/path'; {code} Exception: {code} java.lang.StringIndexOutOfBoundsException String index out of range: -14 at java.lang.String.substring(String.java:1875) ~[na:1.7.0_45] at org.apache.drill.exec.planner.sql.HivePartitionLocation.(HivePartitionLocation.java:31) ~[classes/:na] at org.apache.drill.exec.planner.sql.HivePartitionDescriptor.getPartitions(HivePartitionDescriptor.java:117) ~[classes/:na] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:185) ~[classes/:na] at org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2.onMatch(HivePushPartitionFilterIntoScan.java:92) ~[classes/:na] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) ~[calcite-core-1.4.0-drill-r0.jar:1.4.0-drill-r0] {code} Looking at the {{HivePartitionLocation}}, it looks like we are depending on the structure of the files on FileSystem to get the partition values. We should get these partition from MetaStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3749) Upgrade Hadoop dependency to latest version (2.7.1)
Venki Korukanti created DRILL-3749: -- Summary: Upgrade Hadoop dependency to latest version (2.7.1) Key: DRILL-3749 URL: https://issues.apache.org/jira/browse/DRILL-3749 Project: Apache Drill Issue Type: New Feature Components: Tools, Build & Test Affects Versions: 1.1.0 Reporter: Venki Korukanti Assignee: Steven Phillips Fix For: 1.3.0 Logging a JIRA to track and discuss upgrading Drill's Hadoop dependency version. Currently Drill depends on Hadoop 2.5.0 version. Newer version of Hadoop (2.7.1) has following features. 1) Better S3 support 2) Ability to check if a user has certain permissions on file/directory without performing operations on the file/dir. Useful for cases like DRILL-3467. As Drill is going to use higher version of Hadoop fileclient, there could be potential issues when interacting with Hadoop services (such as HDFS) of lower version than the fileclient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3725) Add HTTPS support for Drill web interface
Venki Korukanti created DRILL-3725: -- Summary: Add HTTPS support for Drill web interface Key: DRILL-3725 URL: https://issues.apache.org/jira/browse/DRILL-3725 Project: Apache Drill Issue Type: New Feature Components: Client - HTTP Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.2.0 Currently web UI or REST API calls don't support transport layer security (TLS). This jira is to add support for TLS. We need this feature before adding the user authentication to Drill's web interface. Proposal is: * Always default to HTTPS * Cluster admin can set the following SSL configuration to specify their own keystore and/or truststore. ** java.net.ssl.keyStore ** java.net.ssl.keyStorePassword ** java.net.ssl.trustStore ** java.net.ssl.trustStorePassword * If cluster admin didn't specified the above SSL config, generate a self signed certificate programmatically and use it by using libraries such as [Bouncy Castle|http://www.bouncycastle.org/]. * Make use of the Jetty APIs to add a HTTPS connection. Example is [here|http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/examples/embedded/src/main/java/org/eclipse/jetty/embedded/LikeJettyXml.java]. Let me know if you have any comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3193) TestDrillbitResilience#interruptingWhileFragmentIsBlockedInAcquiringSendingTicket hangs and fails
[ https://issues.apache.org/jira/browse/DRILL-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3193. Resolution: Cannot Reproduce Tried running the test repeated 1000times. No hangs reproed. TestDrillbitResilience#interruptingWhileFragmentIsBlockedInAcquiringSendingTicket hangs and fails - Key: DRILL-3193 URL: https://issues.apache.org/jira/browse/DRILL-3193 Project: Apache Drill Issue Type: Bug Reporter: Sudheesh Katkam Assignee: Venki Korukanti Fix For: 1.2.0 TestDrillbitResilience#interruptingWhileFragmentIsBlockedInAcquiringSendingTicket hangs when it is run multiple times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3271) Hive : Tpch 01.q fails with a verification issue for SF100 dataset
[ https://issues.apache.org/jira/browse/DRILL-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3271. Resolution: Invalid Just had a discussion with [~adeneche]. Floating point differences between runs are due to truncation in arithmetic operations and the order of data received at aggregator. The differences here still seems to be in acceptable range. We need to update the margin error constant in test framework. Hive : Tpch 01.q fails with a verification issue for SF100 dataset -- Key: DRILL-3271 URL: https://issues.apache.org/jira/browse/DRILL-3271 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Rahul Challapalli Assignee: Venki Korukanti Fix For: 1.2.0 Attachments: tpch100_hive.ddl git.commit.id.abbrev=5f26b8b Query : {code} select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate = date '1998-12-01' - interval '120' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; {code} The 4th column appears to have some differences. Not sure if it is within acceptable range Expected : {code} A F 3.775127758E9 5.660776097194428E125.377736398183942E12 5.592847429515948E1225.499370423275426 38236.11698430475 0.05000224353079674 148047881 N O 7.269911583E9 1.0901214476134316E13 1.0356163586785008E13 1.077041889123738E1325.499873337396807 38236.997134222445 0.04999763132401859 285095988 R F 3.77572497E95.661603032745362E125.378513563915394E12 5.593662252666902E1225.50006628406532 38236.69725845312 0.05000130433952159 148067261 N F 9.8553062E7 1.4777109838597995E11 1.403849659650348E11 1.459997930327757E1125.501556956882876 38237.19938880449 0.04998528433803118 3864590 {code} Actual : {code} A F 3.775127758E9 5.660776097194352E125.37773639818398E12 5.592847429515874E1225.499370423275426 38236.11698430423 0.0500022435305286 148047881 N O 7.269911583E9 1.0901214476134352E13 1.0356163586784926E13 1.0770418891237576E13 25.499873337396807 38236.99713422257 0.04999763132535226 285095988 R F 3.77572497E95.661603032745394E125.378513563915313E12 5.593662252666848E1225.50006628406532 38236.69725845333 0.05000130433925318 148067261 N F 9.8553062E7 1.4777109838598022E11 1.4038496596503506E11 1.45999793032776E11 25.501556956882876 38237.19938880456 0.0499852843380938843864590 {code} The data is 100 GB, so I couldn't attach it here. I attached the hive ddl. Let me know if you need anything else -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3239) Join between empty hive tables throws an IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3239. Resolution: Cannot Reproduce Can't reproduce on latest master 48d8a59. Please reopen if it reproes. Join between empty hive tables throws an IllegalStateException -- Key: DRILL-3239 URL: https://issues.apache.org/jira/browse/DRILL-3239 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Rahul Challapalli Assignee: Venki Korukanti Fix For: 1.2.0 Attachments: error.log git.commit.id.abbrev=6f54223 Created 2 hive tables on top of tpch data in orc format. The tables are empty. Below query returns 0 rows from hive. However it fails with an IllegalStateException from drill {code} select * from customer c, orders o where c.c_custkey = o.o_custkey; Error: SYSTEM ERROR: java.lang.IllegalStateException: You tried to do a batch data read operation when you were in a state of NONE. You can only do this type of operation when you are in a state of OK or OK_NEW_SCHEMA. Fragment 0:0 [Error Id: 8483cab2-d771-4337-ae65-1db41eb5720d on qa-node191.qa.lab:31010] (state=,code=0) {code} Below is the hive ddl I used {code} create table if not exists tpch01_orc.customer ( c_custkey int, c_name string, c_address string, c_nationkey int, c_phone string, c_acctbal double, c_mktsegment string, c_comment string ) STORED AS orc LOCATION '/drill/testdata/Tpch0.01/orc/customer'; create table if not exists tpch01_orc.orders ( o_orderkey int, o_custkey int, o_orderstatus string, o_totalprice double, o_orderdate date, o_orderpriority string, o_clerk string, o_shippriority int, o_comment string ) STORED AS orc LOCATION '/drill/testdata/Tpch0.01/orc/orders'; {code} I attached the log files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2643) HashAggBatch/HashAggTemplate call incoming.cleanup() twice resulting in warnings
[ https://issues.apache.org/jira/browse/DRILL-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2643. Resolution: Fixed This issue is resolved through DRILL-2826 which made a change to close/cleanup an operator exactly once. HashAggBatch/HashAggTemplate call incoming.cleanup() twice resulting in warnings Key: DRILL-2643 URL: https://issues.apache.org/jira/browse/DRILL-2643 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Venki Korukanti Fix For: 1.2.0 Attachments: DRILL-2643.patch, t1.parquet, t2.parquet In this case j1,j2 are views created on top of parquet files, BOTH views have order by on multiple columns in different order with nulls first/last. Also, table in in view j1, consists of 99 parquet files. See attached views.txt file on how to create views (make sure to create views in a different workspace, views have the same names as tables) {code} select DISTINCT COALESCE(j1.c_varchar || j2.c_varchar || 'EMPTY') as concatentated_string from j1 INNER JOIN j2 ON (j1.d18 = j2.d18) ; {code} The same can be reproduced with parquet files and subqueries: (pay attention parquet files are named the same as views: j1, j2) {code} select DISTINCT COALESCE(sq1.c_varchar || sq2.c_varchar || 'EMPTY') as concatentated_string from (select c_varchar, c_integer from j1 order by j1.c_varchar desc nulls first ) as sq1(c_varchar, c_integer) INNER JOIN (select c_varchar, c_integer from j2 order by j2.c_varchar nulls last) as sq2(c_varchar, c_integer) ON (sq1.c_integer = sq2.c_integer) {code} You do need to have sort in order to reproduce the problem. This query works: {code} select DISTINCT COALESCE(j1.c_varchar || j2.c_varchar || 'EMPTY') as concatentated_string from j1,j2 where j1.c_integer = j2.c_integer; {code} {code} 2015-04-01 00:43:42,455 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: Executed 99 out of 99 using 16 threads. Time: 20ms total, 2.877318ms avg, 3ms max. 2015-04-01 00:43:42,458 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host atsqa4-136.qa.lab. Skipping affinity to that host. 2015-04-01 00:43:42,458 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.562620ms avg, 1ms max. 2015-04-01 00:43:42,485 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING -- RUNNING 2015-04-01 00:43:45,613 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] WARN o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 32 batch groups. Current allocated memory: 16642330 2015-04-01 00:43:45,676 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO o.a.d.exec.vector.BaseValueVector - Realloc vector null. [16384] - [32768] 2015-04-01 00:43:45,676 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO o.a.d.exec.vector.BaseValueVector - Realloc vector ``c_varchar`(VARCHAR:OPTIONAL)_bits`(UINT1:REQUIRED). [4096] - [8192] 2015-04-01 00:43:45,679 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO o.a.d.exec.vector.BaseValueVector - Realloc vector null. [32768] - [65536] 2015-04-01 00:43:45,680 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO o.a.d.exec.vector.BaseValueVector - Realloc vector ``c_varchar`(VARCHAR:OPTIONAL)_bits`(UINT1:REQUIRED). [8192] - [16384] 2015-04-01 00:43:45,709 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] WARN o.a.d.exec.memory.AtomicRemainder - Tried to close remainder, but it has already been closed java.lang.Exception: null at org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:196) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:298) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.cleanup(ExternalSortBatch.java:162) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at
[jira] [Resolved] (DRILL-3424) Hive Views are not accessible through Drill Query
[ https://issues.apache.org/jira/browse/DRILL-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3424. Resolution: Duplicate Hive Views are not accessible through Drill Query - Key: DRILL-3424 URL: https://issues.apache.org/jira/browse/DRILL-3424 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 1.0.0 Environment: CentOS 6.5, MapR, Drill 1.0 Reporter: Soumendra Kumar Mishra Assignee: Venki Korukanti Hive Views are not accessible through Drill Query. Error Message: Hive Views are Not Supported in Current Version -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3413) Use DIGEST mechanism in creating Hive MetaStoreClient for proxy users when SASL authentication is enabled
Venki Korukanti created DRILL-3413: -- Summary: Use DIGEST mechanism in creating Hive MetaStoreClient for proxy users when SASL authentication is enabled Key: DRILL-3413 URL: https://issues.apache.org/jira/browse/DRILL-3413 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 1.1.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.1.0 Currently we fail to create HiveMetaStoreClient for proxy users when SASL authentication is enabled between HiveMeaStore server and clients. We fail to create the client because when SASL (kerberos or vendor specific custom SASL implementations) is enabled some vendor specific versions of Hive only accept DIGEST as the authentication mechanism for proxy client. To fix this issue: 1. Drillbit need to create a HiveMetaStoreClient with its credentials (these are directly credentials and not proxy) 2. Whenever Drillbit need to create a HiveMetaStoreClient for proxy user (user being impersonated), get the delegation token for proxy user from MetaStore server using the Drillbit process user HiveMetaStoreClient. Set this delegation token in a new HiveConf object and pass it to HiveMetaStoreClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3398) WebServer is leaking memory for queries submitted through REST API or WebUI
Venki Korukanti created DRILL-3398: -- Summary: WebServer is leaking memory for queries submitted through REST API or WebUI Key: DRILL-3398 URL: https://issues.apache.org/jira/browse/DRILL-3398 Project: Apache Drill Issue Type: Bug Affects Versions: 1.1.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.1.0 1. Start embedded drillbit 2. Submit queries through WebUI or REST APIs 3. Shutdown drillbit. Here TopLevelAllocator close prints out the leaked pools. [~sudheeshkatkam] and I looked the issue, it turns out we don't release the RecordBatchLoaded in QueryWrapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3398) WebServer is leaking memory for queries submitted through REST API or WebUI
[ https://issues.apache.org/jira/browse/DRILL-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3398. Resolution: Fixed Fixed in [60bc945|https://github.com/apache/drill/commit/60bc9459bd8ef29e9d90ffe885771090ab658a40]. WebServer is leaking memory for queries submitted through REST API or WebUI --- Key: DRILL-3398 URL: https://issues.apache.org/jira/browse/DRILL-3398 Project: Apache Drill Issue Type: Bug Affects Versions: 1.1.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.1.0 Attachments: DRILL-3398-1.patch 1. Start embedded drillbit 2. Submit queries through WebUI or REST APIs 3. Shutdown drillbit. Here TopLevelAllocator close prints out the leaked pools. [~sudheeshkatkam] and I looked the issue, it turns out we don't release the RecordBatchLoaded in QueryWrapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-669) Information Schema should be schema sensitive
[ https://issues.apache.org/jira/browse/DRILL-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-669. --- Resolution: Invalid Fix Version/s: (was: 1.1.0) Information Schema should be schema sensitive - Key: DRILL-669 URL: https://issues.apache.org/jira/browse/DRILL-669 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Rahul Challapalli Assignee: Venki Korukanti Priority: Minor If I am currently using the 'hive' schema/database, then information schema should only contain information relevant to 'hive' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2023) Hive function
[ https://issues.apache.org/jira/browse/DRILL-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2023. Resolution: Fixed {{getCumulativeCost}} is implemented as part of DRILL-2269. Hive function -- Key: DRILL-2023 URL: https://issues.apache.org/jira/browse/DRILL-2023 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Reporter: Jacques Nadeau Assignee: Venki Korukanti Fix For: 1.1.0 If you try to do a query that uses regexp_extract with Drill expressions inside of it, Drill doesn't handle it correctly. The exception is: The type of org.apache.drill.exec.expr.HiveFuncHolderExpr doesn't currently support LogicalExpression.getCumulativeCost() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3260) Conflicting servlet-api jar causing web UI to be slow
[ https://issues.apache.org/jira/browse/DRILL-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3260. Resolution: Fixed Fix Version/s: 1.1.0 Fixed in [6796006|https://github.com/apache/drill/commit/6796006f2df5aa598f3715be9de2a724b5c338e3]. Conflicting servlet-api jar causing web UI to be slow - Key: DRILL-3260 URL: https://issues.apache.org/jira/browse/DRILL-3260 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.1.0 Attachments: DRILL-3260-1.patch This is the same issue that we had sometime back. Recent changes to [pom|https://github.com/apache/drill/commit/1de6aed93efce8a524964371d96673b8ef192d89] files are pulling a problematic jar {{servlet-api-2.5.jar}}. {code} +- com.mapr.fs:mapr-hbase:jar:4.1.0-mapr:compile . [INFO] | +- org.apache.hbase:hbase-server:jar:0.98.9-mapr-1503:compile [INFO] | | +- (org.apache.hbase:hbase-common:jar:0.98.9-mapr-1503:compile - omitted for conflict with 0.98.9-mapr-1503-m7-4.1.0) . [INFO] | | +- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile {code} We already have an maven enforcer to detect servlet-api jars, but this one has a slightly different artifact id name {{servlet-api-2.5}} which is not detected by the maven enforcer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2952) Hive 1.0 plugin for Drill
[ https://issues.apache.org/jira/browse/DRILL-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2952. Resolution: Fixed +1. Fixed in [9353383|https://github.com/apache/drill/commit/93533835bdcaff018a6b6ee6ea5999f3c5659d70]. Hive 1.0 plugin for Drill - Key: DRILL-2952 URL: https://issues.apache.org/jira/browse/DRILL-2952 Project: Apache Drill Issue Type: Task Components: Functions - Hive, Storage - Hive Affects Versions: 1.0.0 Reporter: Na Yang Assignee: Na Yang Fix For: 1.1.0 Attachments: DRILL-2952.2.patch, DRILL-2952.patch Currently Drill works with Hive 0.13 only. It needs a newer version of Hive plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3260) Conflicting servlet-api jar causing web UI to be slow
Venki Korukanti created DRILL-3260: -- Summary: Conflicting servlet-api jar causing web UI to be slow Key: DRILL-3260 URL: https://issues.apache.org/jira/browse/DRILL-3260 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti This is the same issue that we had sometime back. Recent changes to [pom|https://github.com/apache/drill/commit/1de6aed93efce8a524964371d96673b8ef192d89] files are pulling a problematic jar {{servlet-api-2.5.jar}}. {code} +- com.mapr.fs:mapr-hbase:jar:4.1.0-mapr:compile . [INFO] | +- org.apache.hbase:hbase-server:jar:0.98.9-mapr-1503:compile [INFO] | | +- (org.apache.hbase:hbase-common:jar:0.98.9-mapr-1503:compile - omitted for conflict with 0.98.9-mapr-1503-m7-4.1.0) . [INFO] | | +- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile {code} We already have an maven enforcer to detect servlet-api jars, but this one has a slightly different artifact id name {{servlet-api-2.5}} which is not detected by the maven enforcer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3240) Fetch hadoop maven profile specific Hive version in Hive storage plugin
Venki Korukanti created DRILL-3240: -- Summary: Fetch hadoop maven profile specific Hive version in Hive storage plugin Key: DRILL-3240 URL: https://issues.apache.org/jira/browse/DRILL-3240 Project: Apache Drill Issue Type: Improvement Components: Storage - Hive, Tools, Build Test Affects Versions: 0.4.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 1.1.0 Currently we always fetch the Apache Hive libs irrespective of the Hadoop vendor profile used in {{mvn clean install}}. This jira is to allow specifying the custom version of Hive in hadoop vendor profile. Note: Hive storage plugin assumes there are no major differences in Hive APIs between different vendor specific custom Hive builds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3208) Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables
[ https://issues.apache.org/jira/browse/DRILL-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-3208. Resolution: Invalid Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables --- Key: DRILL-3208 URL: https://issues.apache.org/jira/browse/DRILL-3208 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Rahul Challapalli Assignee: Venki Korukanti Attachments: customer.parquet, error.log, lineitem_nodate.parquet, nation.parquet, orders_nodate.parquet, tpch.ddl git.commit.id.abbrev=6f54223 I created hive tables on top of tpch parquet data. (Attached the hive ddl script). Since hive does not support date in parquet serde, I regenerated the parquet files for orders and lineitem to use string for the date fields. Remaining files do not have a date column. When I executed query 10 in the tpch suite, it failed with a system error. {code} 0: jdbc:drill:schema=dfs_eea use hive.tpch01_parquet_nodate; +---+-+ | ok | summary | +---+-+ | true | Default schema changed to [hive.tpch01_parquet_nodate] | +---+-+ 1 row selected (0.091 seconds) 0: jdbc:drill:schema=dfs_eea select c.c_custkey, c.c_name, sum(l.l_extendedprice * (1 - l.l_discount)) as revenue, c.c_acctbal, n.n_name, c.c_address, c.c_phone, c.c_comment from customer c, orders o, lineitem l, nation n where c.c_custkey = o.o_custkey and l.l_orderkey = o.o_orderkey and cast(o.o_orderdate as date) = date '1994-03-01' and cast(o.o_orderdate as date) date '1994-03-01' + interval '3' month and l.l_returnflag = 'R' and c.c_nationkey = n.n_nationkey group by c.c_custkey, c.c_name, c.c_acctbal, c.c_phone, n.n_name, c.c_address, c.c_comment order by revenue desc limit 20; Error: SYSTEM ERROR: Fragment 0:0 [Error Id: 1d327ae0-1cf2-4776-acd3-8eef6cca4b6a on qa-node191.qa.lab:31010] (state=,code=0) {code} I tried running the above query using dfs instead of hive and it worked as expected. I attached the newly generated parquet files and the hive ddl for creating hive tables. Let me know if you need anything -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3203) Add support for impersonation in Hive storage plugin
Venki Korukanti created DRILL-3203: -- Summary: Add support for impersonation in Hive storage plugin Key: DRILL-3203 URL: https://issues.apache.org/jira/browse/DRILL-3203 Project: Apache Drill Issue Type: Sub-task Components: Storage - Hive Affects Versions: 0.9.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.1.0 Subtask under DRILL-2363 to add support for impersonation for Hive storage plugin. When impersonation is enabled, Drill currently impersonates as process user (user who started the Drillbits) when accessing table metadata/data in Hive. This task is to add support for impersonating the user who issued the query when accessing Hive metadata/data. Detailed design doc is coming soon... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3074) ReconnectingClient.waitAndRun can stuck in infinite loop if it fails to establish the connection
Venki Korukanti created DRILL-3074: -- Summary: ReconnectingClient.waitAndRun can stuck in infinite loop if it fails to establish the connection Key: DRILL-3074 URL: https://issues.apache.org/jira/browse/DRILL-3074 Project: Apache Drill Issue Type: Bug Affects Versions: 1.0.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Critical Fix For: 1.0.0 Currently we enter into a infinite loop if a connection exception occurs or we timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3017) NPE when cleaning up some RecordReader implementations
Venki Korukanti created DRILL-3017: -- Summary: NPE when cleaning up some RecordReader implementations Key: DRILL-3017 URL: https://issues.apache.org/jira/browse/DRILL-3017 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.9.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Run the following unittest: {code} @Test public void testParquetReaderCleanupNPE() throws Exception { test(SELECT * FROM cp.`parquet2/decimal28_38.parquet`); } {code} Following is the output: {code} Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: Fragment 0:0 [Error Id: 49db6650-8f62-4c5c-b9dc-3f5d6a4413a0 on localhost:31010]. Returned in 407ms. org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: Fragment 0:0 {code} Ideally in this case we should get the following query error: {code} Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION ERROR: Decimal data type is disabled. As of this release decimal data type is a beta level feature and should not be used in production Use option 'planner.enable_decimal_data_type' to enable decimal data type Fragment 0:0 [Error Id: d91a70ac-93c9-4be4-a542-4f3c7615b677 on localhost:31010]. Returned in 392ms. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3010) Convert bad command error messages into UserExceptions in SqlHandlers
Venki Korukanti created DRILL-3010: -- Summary: Convert bad command error messages into UserExceptions in SqlHandlers Key: DRILL-3010 URL: https://issues.apache.org/jira/browse/DRILL-3010 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 0.8.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.0.0 Currently SqlHandlers such as CreateTable or ViewHandler send the error messages as bad command records. Instead we should throw a UserException. {code} 0: jdbc:drill:zk=local create table t1 as select * from cp.`region.json`; +++ | ok | summary | +++ | false | Unable to create table. Schema [dfs.default] is immutable. | +++ 1 row selected (0.103 seconds) {code} Instead it should be like: {code} 0: jdbc:drill:zk=10.10.30.143:5181 create table t1 as select * from cp.`region.json`; Error: PARSE ERROR: Unable to create or drop tables/views. Schema [dfs.default] is immutable. [Error Id: 3a92d026-3df7-4e8b-8988-2300463fa00b on centos64-30146.qa.lab:31010] (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2902) Add support for context UDFs: user (and its synonyms session_user, system_user) and current_schema
Venki Korukanti created DRILL-2902: -- Summary: Add support for context UDFs: user (and its synonyms session_user, system_user) and current_schema Key: DRILL-2902 URL: https://issues.apache.org/jira/browse/DRILL-2902 Project: Apache Drill Issue Type: Sub-task Components: Functions - Drill, Metadata Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 1.0.0 Add support for following UDFs - user (and its synonyms session_user, system_user): get the query userName - current_schema: get the default schema in current user session -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2083) order by on large dataset returns wrong results
[ https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2083. Resolution: Fixed Fixed in [57a96d2|https://github.com/apache/drill/commit/57a96d200e12c0efcad3f3ca9d935c42647234b1]. order by on large dataset returns wrong results --- Key: DRILL-2083 URL: https://issues.apache.org/jira/browse/DRILL-2083 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Chun Chang Assignee: Steven Phillips Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2083.patch #Mon Jan 26 14:10:51 PST 2015 git.commit.id.abbrev=3c6d0ef Test data has 1 million rows and can be accessed at http://apache-drill.s3.amazonaws.com/files/complex.json.gz {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select count (t.id) from `complex.json` t; ++ | EXPR$0 | ++ | 100| ++ {code} But order by returned 30 more rows. {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select t.id from `complex.json` t order by t.id; | 97 | | 98 | | 99 | | 100| ++ 1,000,030 rows selected (19.449 seconds) {code} physical plan {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select t.id from `complex.json` t order by t.id; +++ |text|json| +++ | 00-00Screen 00-01 SingleMergeExchange(sort0=[0 ASC]) 01-01SelectionVectorRemover 01-02 Sort(sort0=[$0], dir0=[ASC]) 01-03HashToRandomExchange(dist0=[[$0]]) 02-01 Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`id`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2858) Refactor hash expression construction in InsertLocalExchangeVisitor and PrelUtil into one place
Venki Korukanti created DRILL-2858: -- Summary: Refactor hash expression construction in InsertLocalExchangeVisitor and PrelUtil into one place Key: DRILL-2858 URL: https://issues.apache.org/jira/browse/DRILL-2858 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti Currently there are two place where we construct the hash expression based on the partition fields: 1. InsertLocalExchangeVistor (generates RexExpr type) 2. PRelUtil.getHashExpression (generate LogicalExpression type) Having this logic in two places makes them prone to errors and they can easily go out of sync causing hard to debug verification failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2856) StreamingAggBatch goes into infinite loop due to state management issues
Venki Korukanti created DRILL-2856: -- Summary: StreamingAggBatch goes into infinite loop due to state management issues Key: DRILL-2856 URL: https://issues.apache.org/jira/browse/DRILL-2856 Project: Apache Drill Issue Type: Bug Reporter: Venki Korukanti Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2857) Update the StreamingAggBatch current workspace record counter variable type to long from current type int
Venki Korukanti created DRILL-2857: -- Summary: Update the StreamingAggBatch current workspace record counter variable type to long from current type int Key: DRILL-2857 URL: https://issues.apache.org/jira/browse/DRILL-2857 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.9.0 This is causing invalid results in cases where the incoming batch has more than (2^31) - 1 records due to overflow issues. Example query: (make sure the nested query returns more than (2^31-1) records. {code} SELECT count(*) FROM (SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, count(*), count(l_quantity) FROM dfs.`lineitem` GROUP BY L_ORDERKEY, L_PARTKEY, L_SUPPKEY ); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2855) Fix invalid result issues with StreamingAggBatch
Venki Korukanti created DRILL-2855: -- Summary: Fix invalid result issues with StreamingAggBatch Key: DRILL-2855 URL: https://issues.apache.org/jira/browse/DRILL-2855 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.9.0 There are two issues that are causing invalid results: 1. In some conditions we are failing to add the record to current aggregation workspace around batch boundary or output batch is full. 2. Incorrectly cleaning up the previous batch. Currently we keep a reference to the current batch in previous and try to get the next incoming batch which has more than zero records or there are no incoming batches. If the next incoming batch has zero records, we are cleaning up the previous batch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2729) Hive partition columns of decimal type are deserialized incorrectly
Venki Korukanti created DRILL-2729: -- Summary: Hive partition columns of decimal type are deserialized incorrectly Key: DRILL-2729 URL: https://issues.apache.org/jira/browse/DRILL-2729 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 0.6.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.9.0 Repro steps: {code} CREATE TABLE IF NOT EXISTS readtest2 ( a BOOLEAN ) PARTITIONED BY ( decimal0_part DECIMAL, decimal9_part DECIMAL(6, 2), decimal18_part DECIMAL(15, 5), decimal28_part DECIMAL(23, 1), decimal38_part DECIMAL(30, 3) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; ALTER TABLE readtest2 ADD IF NOT EXISTS PARTITION ( decimal0_part='36.9', decimal9_part='36.9', decimal18_part='3289379872.945645', decimal28_part='39579334534534.35345', decimal38_part='363945093845093890.9'); LOAD DATA LOCAL INPATH '/tmp/data.txt' OVERWRITE INTO TABLE default.readtest2 PARTITION ( decimal0_part='36.9', decimal9_part='36.9', decimal18_part='3289379872.945645', decimal28_part='39579334534534.35345', decimal38_part='363945093845093890.9'); {code} Contents of /tmp/data.txt: {code} false true {code} Drill output: {code} 0: jdbc:drill:zk=10.10.30.143:5181 select * from readtest2; ++---+---++++ | a | decimal0_part | decimal9_part | decimal18_part | decimal28_part | decimal38_part | ++---+---++++ | false | 3700 | 369000.00 | -66367900898250.61888 | 39579334534534.4 | 363945093845093890.900 | | true | 3700 | 369000.00 | -66367900898250.61888 | 39579334534534.4 | 363945093845093890.900 | ++---+---++++ {code} Hive output: {code} hive select * from readtest2; OK false 37 36.93289379872.9456539579334534534.4 363945093845093890.9 true37 36.93289379872.9456539579334534534.4 363945093845093890.9 Time taken: 0.053 seconds, Fetched: 2 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2673) Update UserServer == UserClient RPC to handle handshake response better
Venki Korukanti created DRILL-2673: -- Summary: Update UserServer == UserClient RPC to handle handshake response better Key: DRILL-2673 URL: https://issues.apache.org/jira/browse/DRILL-2673 Project: Apache Drill Issue Type: Improvement Reporter: Venki Korukanti Assignee: Venki Korukanti Currently if an exception occurs while UserServer is handling handshake message from UserClient, server terminates the connection which causes the client to not handle the handshake response properly. This JIRA is to modify the behavior of UserServer when an exception occurs or contents (ex. user/password credentials) of handshake request are not valid to: -- First send a handshake response message with error details -- Then terminate the connection. As the client always receives the handshake response, it can inform the application about the error if the response has any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1833) Views cannot be registered into the INFORMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data
[ https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-1833. Resolution: Fixed Fixed in [0b6cddf|https://github.com/apache/drill/commit/0b6cddfa4d8f9558f6386e7340429df4e8ec5f88]. Addressed the remaining review comment about changing the log message to be specific. Views cannot be registered into the INFORMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data --- Key: DRILL-1833 URL: https://issues.apache.org/jira/browse/DRILL-1833 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Environment: git.commit.id.abbrev=2396670 Reporter: Xiao Meng Assignee: Venki Korukanti Fix For: 0.9.0 Attachments: DRILL-1833-1.patch After wiping out the ZooKeeper data, the drillbit cannot automatically register the view into INFORMATION_SCHEMA.`TABLES` even after we query the view. For example, for a workspace dfs.tmp, there is a view file `varchar_view.view.drill` under the corresponding directory '/tmp'. We can query: {code} select * from dfs.test.`varchar_view` {code} But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. After I recreate the view based on the contents of `varchar_view.view.drill`, the view shows in the INFORMATION_SCHEMA.`TABLES`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2641) Move unrelated tests in exec/jdbc module into appropriate modules
Venki Korukanti created DRILL-2641: -- Summary: Move unrelated tests in exec/jdbc module into appropriate modules Key: DRILL-2641 URL: https://issues.apache.org/jira/browse/DRILL-2641 Project: Apache Drill Issue Type: Test Components: Tools, Build Test Affects Versions: 0.8.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 0.9.0 Move following unreleated tests out of exec/jdbc into appropriate modules. {{jdbc.TestHiveStoreage.java}} into contrib/storage-hive/core Split {{jdbc.TestMetadataDDL.java}} into exec/java-exec and contrib/storage-hive/core modules. Remove redundant tests {{TestHiveScalarUDFs.java}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2640) Move view related tests out of 'exec/jdbc' module into appropriate modules
Venki Korukanti created DRILL-2640: -- Summary: Move view related tests out of 'exec/jdbc' module into appropriate modules Key: DRILL-2640 URL: https://issues.apache.org/jira/browse/DRILL-2640 Project: Apache Drill Issue Type: Test Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 0.9.0 Currently view related tests are in exec/jdbc module which is not the right place for view tests. They should be in exec/java-exec and tests on views on hive tables should be in contrib/storage-hive/core module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2402) Current method of combining hash values can produce skew
[ https://issues.apache.org/jira/browse/DRILL-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2402. Resolution: Fixed Fix Version/s: (was: 0.9.0) 0.8.0 Target Version/s: 0.8.0 Fixed in [bb1d761|https://github.com/apache/drill/commit/bb1d7615e7eb6c0c17c0c8a1cde0ca070393e257]. Current method of combining hash values can produce skew Key: DRILL-2402 URL: https://issues.apache.org/jira/browse/DRILL-2402 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 0.8.0 Reporter: Aman Sinha Assignee: Jacques Nadeau Fix For: 0.8.0 Attachments: DRILL-2402-1.patch The current method of combining hash values of multiple columns can produce skew in some cases even though each individual hash function does not produce skew. The combining function is XOR: {code} hash(a, b) = XOR (hash(a), hash(b)) {code} The above result will be 0 for all rows where a = b, so hash(a) = hash(b). This will clearly create severe skew and affects the performance of queries that do HashAggregate based group-by on {a, b} or a HashJoin .on both columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2342) Nullability property of the view created from parquet file is not correct
[ https://issues.apache.org/jira/browse/DRILL-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2342. Resolution: Fixed Fix Version/s: (was: 0.9.0) 0.8.0 Target Version/s: 0.8.0 Fixed in [d7dc0b9|https://github.com/apache/drill/commit/d7dc0b95b8086b63523ec2e6a1cc9236d1a5bc44]. Nullability property of the view created from parquet file is not correct - Key: DRILL-2342 URL: https://issues.apache.org/jira/browse/DRILL-2342 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Venki Korukanti Priority: Critical Fix For: 0.8.0 Attachments: DRILL-2342-1.patch, DRILL-2342-3.patch, DRILL-2342-4.patch, DRILL-2343-2.patch, t1.parquet Here is my t1 table definition: {code} message root { optional int32 a1; optional binary b1 (UTF8); optional int32 c1 (DATE); } {code} I created a view on top of it: {code} 0: jdbc:drill:schema=dfs create view v1 as select cast(a1 as int), cast(b1 as varchar(10)), cast(c1 as date) from t1; +++ | ok | summary | +++ | true | View 'v1' created successfully in 'dfs.aggregation' schema | +++ 1 row selected (0.096 seconds) {code} IS_NULLABLE says 'NO', which is incorrect. {code} 0: jdbc:drill:schema=dfs describe v1; +-++-+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++-+ | EXPR$0 | INTEGER| NO | | EXPR$1 | VARCHAR| NO | | EXPR$2 | DATE | NO | +-++-+ 3 rows selected (0.067 seconds) {code} It is dangerous potentially, because if Calcite decided to take advantage over this property tomorrow and create an optimization where if column is not nullable is null predicate can be dropped, query : select * from v1 where x is null would return incorrect result. {code} 0: jdbc:drill:schema=dfs explain plan for select * from v1 where z is null; +++ |text|json| +++ | 00-00Screen 00-01 Project(x=[$0], y=[$1], z=[$2]) 00-02SelectionVectorRemover 00-03 Filter(condition=[IS NULL($2)]) 00-04Project(x=[CAST($2):ANY NOT NULL], y=[CAST($1):ANY NOT NULL], z=[CAST($0):ANY NOT NULL]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/t1]], selectionRoot=/aggregation/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]]) {code} It seems to me that in views column properties should be always nullable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2514) Add support for impersonation in FileSystem storage plugin
Venki Korukanti created DRILL-2514: -- Summary: Add support for impersonation in FileSystem storage plugin Key: DRILL-2514 URL: https://issues.apache.org/jira/browse/DRILL-2514 Project: Apache Drill Issue Type: Sub-task Components: Execution - Flow Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.9.0 Subtask under DRILL-2363 to add support for impersonation (including chained impersonation) for FileSystem storage plugin. Please see the design document mentioned in umbrella JIRA DRILL-2363 for design details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2210) Allow multithreaded copy and/or flush in PartitionSender
[ https://issues.apache.org/jira/browse/DRILL-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti resolved DRILL-2210. Resolution: Fixed Fixed in [49d316a|https://github.com/apache/drill/commit/49d316a1cb22f79061e246b5e197547dac730232]. Allow multithreaded copy and/or flush in PartitionSender Key: DRILL-2210 URL: https://issues.apache.org/jira/browse/DRILL-2210 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Reporter: Yuliya Feldman Assignee: Yuliya Feldman Fix For: 0.8.0 Related to DRILL-133. As in LocalExchange we merge data from multiple receivers into LocalExchange to fan it out later to multiple Senders, amount of data that needs to be sent out increases. Add ability to copy/flush data in multiple threads -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2475) Handle IterOutcome.NONE correctly in operators
Venki Korukanti created DRILL-2475: -- Summary: Handle IterOutcome.NONE correctly in operators Key: DRILL-2475 URL: https://issues.apache.org/jira/browse/DRILL-2475 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 0.8.0 Reporter: Venki Korukanti Assignee: Chris Westin Fix For: 0.9.0 Currently not all operators are handling the NONE (with no OK_NEW_SCHEMA) correctly. This JIRA is to go through the operators and check if it handling the NONE correctly or not and modify accordingly. (from DRILL-2453) -- This message was sent by Atlassian JIRA (v6.3.4#6332)