from:"Venki Korukanti \(JIRA\)"

[jira] [Resolved] (DRILL-4930) Metadata results are not sorted

2016-10-13 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4930.

   Resolution: Fixed
Fix Version/s: 1.9.0

> Metadata results are not sorted
> ---
>
> Key: DRILL-4930
> URL: https://issues.apache.org/jira/browse/DRILL-4930
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
> Fix For: 1.9.0
>
>
> According to JDBC and ODBC specs, metadata results should be ordered. 
> Currently, results are unordered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4880) Support JDBC driver registration using ServiceLoader

2016-10-12 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4880.

Resolution: Fixed

Fixed in 
[09abcc3|https://github.com/apache/drill/commit/09abcc32cc9d6e3de23d3daf633d34fb6183d0f3]

> Support JDBC driver registration using ServiceLoader 
> -
>
> Key: DRILL-4880
> URL: https://issues.apache.org/jira/browse/DRILL-4880
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.8.0
> Environment: Windows Server 2012
>Reporter: Sudip Mukherjee
>Assignee: Laurent Goujon
> Fix For: 1.9.0
>
>
> Currently drill-jdbc-all*.jar doesn't contain a 
> META_INF/services/java.sql.Driver file which is apparently used to discover a 
> service by Java ServiceLoader API.
> Can drill jdbc driver have this file like all the other jdbc drivers so that 
> the driver can be loaded using ServiceLoader instead of a direct 
> Class.forName?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4452) Update avatica version for Drill jdbc

2016-10-12 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4452.

   Resolution: Fixed
Fix Version/s: 1.9.0

Fixed in 
[a888ce6|https://github.com/apache/drill/commit/a888ce6ec289a5ecfe056d4db5da417dd4cc95f5]

> Update avatica version for Drill jdbc
> -
>
> Key: DRILL-4452
> URL: https://issues.apache.org/jira/browse/DRILL-4452
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Client - JDBC
>Affects Versions: 1.5.0
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
> Fix For: 1.9.0
>
>
> Drill depends on a very old version of Avatica (0.9.0/pre-calcite), which 
> makes integrating changes harder and harder. 
> Although Avatica has evolved to support a custom protocol, with a server 
> stub, I believe it is still possible for Drill to use the client part as the 
> JDBC facade, with small adjustments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient

2016-08-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4732.

Resolution: Fixed

Fixed in 
[e02aa59|https://github.com/apache/drill/commit/e02aa596fa64f38bce773fd108f15032f8086601]

> Update JDBC driver to use the new prepared statement APIs on DrillClient
> 
>
> Key: DRILL-4732
> URL: https://issues.apache.org/jira/browse/DRILL-4732
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.8.0
>
>
> DRILL-4729 is adding new prepared statement implementation on server side and 
> it provides APIs on DrillClient to create new prepared statement which 
> returns metadata along with a opaque handle and submit prepared statement for 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4729) Add support for prepared statement implementation on server side

2016-08-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4729.

Resolution: Fixed

Fixed in 
[14f6ec7|https://github.com/apache/drill/commit/14f6ec7dd9b010de6c884431e443eb788ce54339].

> Add support for prepared statement implementation on server side
> 
>
> Key: DRILL-4729
> URL: https://issues.apache.org/jira/browse/DRILL-4729
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.8.0
>
>
> Currently Drill JDBC/ODBC driver implements its own prepared statement 
> implementation, which basically issues limit 0 query to get the metadata and 
> then executes the actual query. So the query is planned twice (for metadata 
> fetch and actual execution). Proposal is to move that logic to server where 
> we can make optimizations without disrupting/updating the JDBC/ODBC drivers.
> *  {{PreparedStatement createPreparedStatement(String query)}}. 
> {{PreparedStatement}} object contains the following:
> ** {{ResultSetMetadata getResultSetMetadata()}}
> *** {{ResultsSetMetadata}} contains methods to fetch info about output 
> columns of the query. What info these methods provide is given in this 
> [spreadsheet|https://docs.google.com/spreadsheets/d/1A6nqUQo5xJaZDQlDTittpVrK7t4Kylycs3P32Yn_O5k/edit?usp=sharing].
>  It lists the ODBC/JDBC requirements and what Drill will provided through 
> object {{ResultsSetMetadata}}.
> *** Server can put more info here which is opaque to client and use it in 
> server when the client sends execute prepared statement query request. 
> Overload the current submit query API to take the {{PreparedStatement}} 
> returned above. 
> In the initial implementation, server side implementation of 
> {{createPreparedStatement}} API is implemented as follows:
> * Runs the query with {{LIMIT 0}}, gets the schema
> * Convert the query into a binary blob and set it as opaque object in 
> {{PreparedStatement}}.
> When the {{PreparedStatement}} is submitted for execution, reconstruct the 
> query from binary blob in opaque component of {{PreparedStatement}} and 
> execute it from scratch. 
> Opaque component of the {{PreparedStatement}} is where we can save more 
> information which we can use for optimizations/speedups.
> NOTE: We are not going to worry about parameters in prepared query in initial 
> implementation. We can provide the functionality later if there is sufficient 
> demand from Drill community.
> Changes in this patch are going to include protobuf messages, server side 
> messages and Java client APIs. Native client changes are going to be tracked 
> in a separate JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4603) Refactor FileSystem plugin code to allow customizations

2016-08-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4603.

Resolution: Won't Fix

> Refactor FileSystem plugin code to allow customizations
> ---
>
> Key: DRILL-4603
> URL: https://issues.apache.org/jira/browse/DRILL-4603
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: Future
>
>
> Currently FileSystemPlugin is hard to extend, lot of logic for creating 
> component implementations ({{WorkspaceSchemaFactory}}s, {{FormatCreator}, 
> defining default workspaces and configuration (implicit to FileSystem 
> implementation)) are hard coded in constructor.
>  
> This JIRA is to track 
>  * refactoring the FileSystemPlugin to allow custom component implementations 
> (Configuration, WorkSpaceSchemaFactory, FileSystemSchemaFactory or 
> FormatCreator).
>  * Share a single Hadoop {{Configuration}} object to create new 
> {{Configuration}} objects. Creating a new {{Configuration}} without an 
> existing copy is not efficient, because it involves scanning the classpath 
> for *-site files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4728) Add support for new metadata fetch APIs

2016-08-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4728.

Resolution: Fixed

Resolved in 
[ef6e522|https://github.com/apache/drill/commit/ef6e522c9cba816110aa43ff6bccedf29a901236]

> Add support for new metadata fetch APIs
> ---
>
> Key: DRILL-4728
> URL: https://issues.apache.org/jira/browse/DRILL-4728
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.8.0
>
>
> Please see the doc attached to the parent JIRA DRILL-4714 for details on APIs.
> Add support for following APIs (including {{protobuf}} messages, server 
> handling code and Java client APIs)
> {code}
>List getCatalogs(Filter catalogNameFilter)
>List getSchemas(
>   Filter catalogNameFilter,
>   Filter schemaNameFilter
>)
>List getTables(
>   Filter catalogNameFilter,
>   Filter schemaNameFilter,
>  Filter tableNameFilter
>)
>List getColumns(
>   Filter catalogNameFilter,
>   Filter schemaNameFilter,
>   Filter tableNameFilter,
>   Filter columnNameFilter
>)
> {code}
> Note: native client changes are not going to be included in this patch. Will 
> file a separate JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4785) Limit 0 queries regressed in Drill 1.7.0

2016-07-19 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4785.

Resolution: Fixed

> Limit 0 queries regressed in Drill 1.7.0 
> -
>
> Key: DRILL-4785
> URL: https://issues.apache.org/jira/browse/DRILL-4785
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.7.0
> Environment: Redhat EL6
>Reporter: Dechang Gu
>Assignee: Venki Korukanti
> Fix For: 1.8.0
>
>
> We noticed a bunch of limit 0 queries regressed quite a bit: +2500ms, while 
> the same queries took  ~400ms in Apache Drill 1.6.0. 5-6X regression. Further 
> investigation indicates that most likely the root cause of the regression is 
> in the commit: 
> vkorukanti committed with vkorukanti DRILL-4446: Support mandatory work 
> assignment to endpoint requirement…
> commit id:  10afc708600ea9f4cb0e7c2cd981b5b1001fea0d
> With drill build on this commit, query takes 3095ms
> and in the drillbit.log:
> 2016-07-15 17:27:55,048 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28768074-4ed6-a70a-2e6a-add3201ab801: SELECT * FROM (SELECT 
> CAST(EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) AS 
> INTEGER) AS `mn_business_date_ok`,AVG((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
> CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
> (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 
> + 1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
> `avg_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
> CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
> (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 
> + 1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
> `sum_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
> CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
> (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 
> + 1 AS INTEGER) <= 4)) THEN 1 ELSE NULL END)) AS 
> `sum_Calculation_CJEBBAEBBFADBDFJ_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
> CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
> (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 
> + 1 AS INTEGER) <= 4)) THEN (`rfm_sales`.`pos_comps` + 
> `rfm_sales`.`pos_promos`) ELSE NULL END)) AS `sum_Net_Sales__YTD___copy__ok` 
> FROM `dfs.xxx`.`views/rfm_sales` `rfm_sales` GROUP BY CAST(EXTRACT(MONTH FROM 
> CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER)) T LIMIT 0
> 2016-07-15 17:27:55,664 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 208 ms to read metadata from cache 
> file
> 2016-07-15 17:27:56,783 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 129 ms to read metadata from cache 
> file
> 2016-07-15 17:27:57,960 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2016-07-15 17:27:57,961 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State to report: RUNNING
> 2016-07-15 17:27:57,989 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State change requested RUNNING --> 
> FINISHED
> 2016-07-15 17:27:57,989 ucs-node2.perf.lab 
> [28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: State to report: FINISHED
> while running the same query on the parent commit (commit id 
> 9f4fff800d128878094ae70b454201f79976135d), it only takes  492ms.
> and in the drillbit.log:
> 2016-07-15 17:19:27,309 ucs-node7.perf.lab 
> [2876826f-ee19-9466-0c0c-869f47c409f8:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 2876826f-ee19-9466-0c0c-869f47c409f8: SELECT * FROM (SELECT 
> CAST(EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) AS 
> INTEGER) AS `mn_business_date_ok`,AVG((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
> CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
> (CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 
> + 1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
>

[jira] [Created] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient

2016-06-20 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4732:
--

 Summary: Update JDBC driver to use the new prepared statement APIs 
on DrillClient
 Key: DRILL-4732
 URL: https://issues.apache.org/jira/browse/DRILL-4732
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Venki Korukanti


DRILL-4729 is adding new prepared statement implementation on server side and 
it provides APIs on DrillClient to create new prepared statement which returns 
metadata along with a opaque handle and submit prepared statement for execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4730) Update JDBC DatabaseMetaData implementation to use new Metadata APIs

2016-06-17 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4730:
--

 Summary: Update JDBC DatabaseMetaData implementation to use new 
Metadata APIs
 Key: DRILL-4730
 URL: https://issues.apache.org/jira/browse/DRILL-4730
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Client - JDBC
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.8.0


DRILL-4728 is going to add support for new metadata APIs. Replace the 
INFORMATION_SCHEMA queries used to get the metadata with the new APIs provided 
in Java client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4729) Add support for prepared statement implementation on server side

2016-06-17 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4729:
--

 Summary: Add support for prepared statement implementation on 
server side
 Key: DRILL-4729
 URL: https://issues.apache.org/jira/browse/DRILL-4729
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Metadata
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.8.0


Currently Drill JDBC/ODBC driver implements its own prepared statement 
implementation, which basically issues limit 0 query to get the metadata and 
then executes the actual query. So the query is planned twice (for metadata 
fetch and actual execution). Proposal is to move that logic to server where we 
can make optimizations without disrupting/updating the JDBC/ODBC drivers.

*  {{PreparedStatement createPreparedStatement(String query)}}. 
{{PreparedStatement}} object contains the following:
** {{ResultSetMetadata getResultSetMetadata()}}
*** {{ResultsSetMetadata}} contains methods to fetch info about output columns 
of the query. What info these methods provide is given in this 
[spreadsheet|https://docs.google.com/spreadsheets/d/1A6nqUQo5xJaZDQlDTittpVrK7t4Kylycs3P32Yn_O5k/edit?usp=sharing].
 It lists the ODBC/JDBC requirements and what Drill will provided through 
object {{ResultsSetMetadata}}.
*** Server can put more info here which is opaque to client and use it in 
server when the client sends execute prepared statement query request. 

Overload the current submit query API to take the {{PreparedStatement}} 
returned above. 

In the initial implementation, server side implementation of 
{{createPreparedStatement}} API is implemented as follows:
* Runs the query with {{LIMIT 0}}, gets the schema
* Convert the query into a binary blob and set it as opaque object in 
{{PreparedStatement}}.

When the {{PreparedStatement}} is submitted for execution, reconstruct the 
query from binary blob in opaque component of {{PreparedStatement}} and execute 
it from scratch. 

Opaque component of the {{PreparedStatement}} is where we can save more 
information which we can use for optimizations/speedups.

NOTE: We are not going to worry about parameters in prepared query in initial 
implementation. We can provide the functionality later if there is sufficient 
demand from Drill community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4728) Add support for new metadata fetch APIs

2016-06-17 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4728:
--

 Summary: Add support for new metadata fetch APIs
 Key: DRILL-4728
 URL: https://issues.apache.org/jira/browse/DRILL-4728
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Metadata
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.8.0


Please see the doc attached to the parent JIRA DRILL-4714 for details on APIs.

Add support for following APIs (including {{protobuf}} messages, server 
handling code and Java client APIs)

{code}
   List getCatalogs(Filter catalogNameFilter)

   List getSchemas(
  Filter catalogNameFilter,
  Filter schemaNameFilter
   )

   List getTables(
  Filter catalogNameFilter,
  Filter schemaNameFilter,
 Filter tableNameFilter
   )

   List getColumns(
  Filter catalogNameFilter,
  Filter schemaNameFilter,
  Filter tableNameFilter,
  Filter columnNameFilter
   )
{code}

Note: native client changes are not going to be included in this patch. Will 
file a separate JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4725) Improvements to InfoSchema RecordGenerator needed for DRILL-4714

2016-06-17 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4725.

Resolution: Fixed

Fixed in [f70df990f 
|https://git1-us-west.apache.org/repos/asf?p=drill.git;a=commit;h=f70df990].

> Improvements to InfoSchema RecordGenerator needed for DRILL-4714
> 
>
> Key: DRILL-4725
> URL: https://issues.apache.org/jira/browse/DRILL-4725
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.7.0
>
>
> 1. Add support for pushing the filter on following fields into 
> InfoSchemaRecordGenerator:
>- CATALOG_NAME
>- COLUMN_NAME
> 2. Pushdown LIKE with ESCAPE. Add test.
> 3. Add a method visitCatalog() to InfoSchemaRecordGenerator to decide whether 
> to explore the catalog or not



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4725) Improvements to InfoSchema RecordGenerator needed for DRILL-4714

2016-06-15 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4725:
--

 Summary: Improvements to InfoSchema RecordGenerator needed for 
DRILL-4714
 Key: DRILL-4725
 URL: https://issues.apache.org/jira/browse/DRILL-4725
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Metadata
Reporter: Venki Korukanti
Assignee: Venki Korukanti


1. Add support for pushing the filter on following fields into 
InfoSchemaRecordGenerator:
   - CATALOG_NAME
   - COLUMN_NAME

2. Pushdown LIKE with ESCAPE. Add test.

3. Add a method visitCatalog() to InfoSchemaRecordGenerator to decide whether 
to explore the catalog or not





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4714) Add metadata and prepared statement APIs to DrillClient<->Drillbit interface

2016-06-08 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4714:
--

 Summary: Add metadata and prepared statement APIs to 
DrillClient<->Drillbit interface
 Key: DRILL-4714
 URL: https://issues.apache.org/jira/browse/DRILL-4714
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Venki Korukanti


Currently ODBC/JDBC drivers spawn a set of queries on INFORMATION_SCHEMA for 
metadata. Client has to deal with submitting a query, reading query results and 
constructing required objects. Sometimes same work is done twice (planning work 
in case of prepare statements) to get the metadata and execute query. Instead 
we could simplify the client by providing APIs on the client interface and let 
the server construct the required objects and send them to client directly. 
These APIs provide common info that can be consumed by the JDBC/ODBC driver.

Will attach a doc explaining the new APIs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4613) Skip the plugin if it throws errors when registering schemas

2016-04-18 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4613:
--

 Summary: Skip the plugin if it throws errors when registering 
schemas
 Key: DRILL-4613
 URL: https://issues.apache.org/jira/browse/DRILL-4613
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Currently when registering schemas in root schema, if a plugin throws an 
exception we fail the query. This causes every query to fail as every query 
needs a complete schema tree. Plugins could throw exceptions due to transient 
errors (storage server is temporarily not reachable). 

If a plugin throws an exception during schema registration, log an error, skip 
the plugin and continue registering schemas from rest of the plugins. If the 
user is querying tables from other plugins, the query should succeed. If the 
user is querying the tables in skipped plugin, a table not found exception is 
thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4446) Improve current fragment parallelization module

2016-04-13 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4446.

Resolution: Fixed

> Improve current fragment parallelization module
> ---
>
> Key: DRILL-4446
> URL: https://issues.apache.org/jira/browse/DRILL-4446
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.7.0
>
>
> Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle 
> correctly the case where an operator has mandatory scheduling requirement for 
> a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how 
> much portion of the total tasks to be scheduled on each DrillbitEndpoint). It 
> assumes that scheduling requirements are soft (except one case where Mux and 
> DeMux case where mandatory parallelization requirement of 1 unit). 
> An example is:
> Cluster has 3 nodes running Drillbits and storage service on each. Data for a 
> table is only present at storage services in two nodes. So a GroupScan needs 
> to be scheduled on these two nodes in order to read the data. Storage service 
> doesn't support (or costly) reading data from remote node.
> Inserting the mandatory scheduling requirements within existing 
> SimpleParallelizer is not sufficient as you may end up with a plan that has a 
> fragment with two GroupScans each having its own hard parallelization 
> requirements.
> Proposal is:
> Add a property to each operator which tells what parallelization 
> implementation to use. Most operators don't have any particular strategy 
> (such as Project or Filter), they depend on incoming operator. Current 
> existing operators which have requirements (all existing GroupScans) default 
> to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new 
> mandatory assignment parallelizer. It is possible that PhysicalPlan generated 
> can have a fragment with operators having different parallelization 
> strategies. In that case an exchange is inserted in between operators where a 
> change in parallelization strategy is required.
> Will send a detailed design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4603) Refactor FileSystem plugin code to allow customizations

2016-04-13 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4603:
--

 Summary: Refactor FileSystem plugin code to allow customizations
 Key: DRILL-4603
 URL: https://issues.apache.org/jira/browse/DRILL-4603
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.7.0


Currently FileSystemPlugin is hard to extend, lot of logic for creating 
component implementations ({{WorkspaceSchemaFactory}}s, {{FormatCreator}, 
defining default workspaces and configuration (implicit to FileSystem 
implementation)) are hard coded in constructor.
 
This JIRA is to track 
 * refactoring the FileSystemPlugin to allow custom component implementations 
(Configuration, WorkSpaceSchemaFactory, FileSystemSchemaFactory or 
FormatCreator).
 * Share a single Hadoop {{Configuration}} object to create new 
{{Configuration}} objects. Creating a new {{Configuration}} without an existing 
copy is not efficient, because it involves scanning the classpath for *-site 
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4593) Remove OldAssignmentCreator in FileSystemPlugin

2016-04-07 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4593:
--

 Summary: Remove OldAssignmentCreator in FileSystemPlugin
 Key: DRILL-4593
 URL: https://issues.apache.org/jira/browse/DRILL-4593
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Reporter: Venki Korukanti
Assignee: Venki Korukanti


AssignmentCreator was changed in DRILL-2725. Old assignment creator was kept as 
fallback option incase of any failures in the new assignment creator. New 
AssignmentCreator was added an year ago and is the default one. No problems are 
reported. I think it is safe to get rid of the old AssignmentCreator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4549) Add support for more truncation units in date_trunc function

2016-03-28 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4549.

Resolution: Fixed

> Add support for more truncation units in date_trunc function
> 
>
> Key: DRILL-4549
> URL: https://issues.apache.org/jira/browse/DRILL-4549
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.7.0
>
>
> Currently we support only {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} truncate 
> units for types {{TIME, TIMESTAMP and DATE}}. Extend the functions to support 
> {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEK, QUARTER, DECADE, CENTURY, 
> MILLENNIUM}} truncate units for types {{TIME, TIMESTAMP, DATE, INTERVAL DAY, 
> INTERVAL YEAR}}.
> Also get rid of the if-and-else (on truncation unit) implementation. Instead 
> resolve to a direct function based on the truncation unit in Calcite -> Drill 
> (DrillOptiq) expression conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4550) Add support more time units in extract function

2016-03-28 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4550:
--

 Summary: Add support more time units in extract function
 Key: DRILL-4550
 URL: https://issues.apache.org/jira/browse/DRILL-4550
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 1.6.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.7.0


Currently {{extract}} function support following units {{YEAR, MONTH, DAY, 
HOUR, MINUTE, SECOND}}. Add support for more units: {{CENTURY, DECADE, DOW, 
DOY, EPOCH, MILLENNIUM, QUARTER, WEEK}}.

We also need changes in the SQL parser. Currently the parser only allows 
{{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} as units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4549) Add support for more units in date_trunc function

2016-03-28 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4549:
--

 Summary: Add support for more units in date_trunc function
 Key: DRILL-4549
 URL: https://issues.apache.org/jira/browse/DRILL-4549
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.6.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.7.0


Currently we support only {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} truncate 
units for types {{TIME, TIMESTAMP and DATE}}. Extend the functions to support 
{{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEK, QUARTER, DECADE, CENTURY, 
MILLENNIUM}} truncate units for types {{TIME, TIMESTAMP, DATE, INTERVAL DAY, 
INTERVAL YEAR}}.

Also get rid of the if-and-else (on truncation unit) implementation. Instead 
resolve to a direct function based on the truncation unit in Calcite -> Drill 
(DrillOptiq) expression conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4509) Ignore unknown storage plugin configs while starting Drillbit

2016-03-14 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4509:
--

 Summary: Ignore unknown storage plugin configs while starting 
Drillbit
 Key: DRILL-4509
 URL: https://issues.apache.org/jira/browse/DRILL-4509
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Priority: Minor
 Fix For: 1.7.0


If zookeeper contains a storage plugin configuration whose implementation is 
not found while starting the Drillbit, Drillbit throws an error and fails to 
restart:

{code}
Could not resolve type id 'newPlugin' into a subtype of [simple type, class 
org.apache.drill.common.logical.StoragePluginConfig]: known type ids = 
[InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file, hbase, 
hive, jdbc, kudu, mock, mongo, named]
{code}

Should we ignore such plugins with a warning in logs and continue starting 
Drillbit?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4508) Null proof all AutoCloseable.close() methods

2016-03-14 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4508:
--

 Summary: Null proof all AutoCloseable.close() methods
 Key: DRILL-4508
 URL: https://issues.apache.org/jira/browse/DRILL-4508
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Priority: Minor
 Fix For: 1.7.0


If Drillbit fails to start (due to incorrect configuration or storage plugin 
information not found etc.), we end up calling close on various components such 
as WebServer, Drillbit etc. Some of these components may not have initialized 
and may have null values. Close() method is not checking for null values before 
reading them. One example is here:

{code}
java.lang.NullPointerException: null
at 
org.apache.drill.exec.server.options.SystemOptionManager.close(SystemOptionManager.java:280)
 ~[drill-java-exec-1.6.0.jar:1.6.0]
at 
org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:185) 
~[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157) 
~[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) 
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) 
[drill-java-exec-1.6.0.jar:1.6.0]
{code}

This masks the actual error (incorrect configuration) and it is hard to know 
what went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4483) Fix text plan issue in query profiles

2016-03-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4483.

   Resolution: Fixed
Fix Version/s: 1.6.0

> Fix text plan issue in query profiles
> -
>
> Key: DRILL-4483
> URL: https://issues.apache.org/jira/browse/DRILL-4483
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.6.0
>
>
> Text plan (and visualized plan) in query profiles is empty. As of 1.5.0, we 
> display text plan (and visualized plan) for SQL queries and CTAS (not for 
> DirectPlans (alter session, show tables etc.) or explan query).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4483) Fix text plan issue in query profiles

2016-03-07 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4483:
--

 Summary: Fix text plan issue in query profiles
 Key: DRILL-4483
 URL: https://issues.apache.org/jira/browse/DRILL-4483
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Text plan (and visualized plan) in query profiles is empty. As of 1.5.0, we 
display text plan (and visualized plan) for SQL queries and CTAS (not for 
DirectPlans (alter session, show tables etc.) or explan query).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4327) Fix rawtypes warning emitted by compiler

2016-03-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4327.

   Resolution: Fixed
Fix Version/s: 1.6.0

> Fix rawtypes warning emitted by compiler
> 
>
> Key: DRILL-4327
> URL: https://issues.apache.org/jira/browse/DRILL-4327
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
> Fix For: 1.6.0
>
>
> The Drill codebase references lots of rawtypes, which generates lots of 
> warning from the compiler.
> Since Drill is now compiled with Java 1.7, it should use generic types as 
> much as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4354) Remove sessions in anonymous (user auth disabled) mode in WebUI server

2016-03-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4354.

Resolution: Fixed

> Remove sessions in anonymous (user auth disabled) mode in WebUI server
> --
>
> Key: DRILL-4354
> URL: https://issues.apache.org/jira/browse/DRILL-4354
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.6.0
>
>
> Currently we open anonymous sessions when user auth disabled. These sessions 
> are cleaned up when they expire (controlled by boot config 
> {{drill.exec.http.session_max_idle_secs}}). This may lead to unnecessary 
> resource accumulation. This JIRA is to remove anonymous sessions and only 
> have sessions when user authentication is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4410) ListVector causes OversizedAllocationException

2016-03-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4410.

   Resolution: Fixed
Fix Version/s: 1.6.0

> ListVector causes OversizedAllocationException
> --
>
> Key: DRILL-4410
> URL: https://issues.apache.org/jira/browse/DRILL-4410
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
> Fix For: 1.6.0
>
>
> Reading large data set with array/list causes the following problem.  This 
> happens when union type is enabled.
> (org.apache.drill.exec.exception.OversizedAllocationException) Unable to 
> expand the buffer. Max allowed buffer size is reached.
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307
> org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100
> org.apache.drill.exec.vector.complex.ListVector.copyFrom():97
> org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config

2016-03-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4383.

Resolution: Fixed

> Allow passign custom configuration options to a file system through the 
> storage plugin config
> -
>
> Key: DRILL-4383
> URL: https://issues.apache.org/jira/browse/DRILL-4383
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>
> A similar feature already exists in the Hive and Hbase plugins, it simply 
> provides a key/value map for passing custom configuration options to the 
> underlying storage system.
> This would be useful for the filesystem plugin to configure S3 without 
> needing to create a core-site.xml file or restart Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4446) Improve current fragment parallelization module

2016-02-26 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4446:
--

 Summary: Improve current fragment parallelization module
 Key: DRILL-4446
 URL: https://issues.apache.org/jira/browse/DRILL-4446
 Project: Apache Drill
  Issue Type: New Feature
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.6.0


Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle 
correctly the case where an operator has mandatory scheduling requirement for a 
set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how much 
portion of the total tasks to be scheduled on each DrillbitEndpoint). It 
assumes that scheduling requirements are soft (except one case where Mux and 
DeMux case where mandatory parallelization requirement of 1 unit). 

An example is:
Cluster has 3 nodes running Drillbits and storage service on each. Data for a 
table is only present at storage services in two nodes. So a GroupScan needs to 
be scheduled on these two nodes in order to read the data. Storage service 
doesn't support (or costly) reading data from remote node.

Inserting the mandatory scheduling requirements within existing 
SimpleParallelizer is not sufficient as you may end up with a plan that has a 
fragment with two GroupScans each having its own hard parallelization 
requirements.

Proposal is:
Add a property to each operator which tells what parallelization implementation 
to use. Most operators don't have any particular strategy (such as Project or 
Filter), they depend on incoming operator. Current existing operators which 
have requirements (all existing GroupScans) default to current parallelizer 
{{SimpleParallelizer}}. {{Screen}} defaults to new mandatory assignment 
parallelizer. It is possible that PhysicalPlan generated can have a fragment 
with operators having different parallelization strategies. In that case an 
exchange is inserted in between operators where a change in parallelization 
strategy is required.

Will send a detailed design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4434) Remove (or deprecate) GroupScan.enforceWidth and use GroupScan.getMinParallelization

2016-02-24 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4434:
--

 Summary: Remove (or deprecate) GroupScan.enforceWidth and use 
GroupScan.getMinParallelization
 Key: DRILL-4434
 URL: https://issues.apache.org/jira/browse/DRILL-4434
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti


It seems like enforceWidth is not necessary which is used only in 
ExcessibleExchangeRemover. Instead we should rely on 
GroupScan.getMinParallelization().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak

2016-02-04 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4353:
--

 Summary: Expired sessions in web server are not cleaning up 
resources, leading to resource leak
 Key: DRILL-4353
 URL: https://issues.apache.org/jira/browse/DRILL-4353
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server, Client - HTTP
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Blocker
 Fix For: 1.5.0


Currently we store the session resources (including DrillClient) in attribute 
{{SessionAuthentication}} object which implements 
{{HttpSessionBindingListener}}. Whenever a session is invalidated, all 
attributes are removed and if an attribute class implements 
{{HttpSessionBindingListener}}, listener is informed. {{SessionAuthentication}} 
implementation of {{HttpSessionBindingListener}} logs out the user which 
includes cleaning up the resources as well, but {{SessionAuthentication}} 
relies on ServletContext stored in thread local variable (see 
[here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]).
 In case of thread that cleans up the expired sessions there is no 
{{ServletContext}} in thread local variable, leading to not logging out the 
user properly and resource leak.

Fix: Add {{HttpSessionEventListener}} to cleanup the {{SessionAuthentication}} 
and resources every time a HttpSession is expired or invalidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4354) Remove sessions in anonymous (user auth disabled) mode in WebUI server

2016-02-04 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4354:
--

 Summary: Remove sessions in anonymous (user auth disabled) mode in 
WebUI server
 Key: DRILL-4354
 URL: https://issues.apache.org/jira/browse/DRILL-4354
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.5.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.5.0


Currently we open anonymous sessions when user auth disabled. These sessions 
are cleaned up when they expire (controlled by boot config 
{{drill.exec.http.session_max_idle_secs}}). This may lead to unnecessary 
resource accumulation. This JIRA is to remove anonymous sessions and only have 
sessions when user authentication is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4328) Fix for backward compatibility regression caused by DRILL-4198

2016-01-29 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4328:
--

 Summary: Fix for backward compatibility regression caused by 
DRILL-4198
 Key: DRILL-4328
 URL: https://issues.apache.org/jira/browse/DRILL-4328
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Revert updates made to StoragePlugin interface in DRILL-4198. Instead add the 
new methods to AbstractStoragePlugin. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3624) Enhance Web UI to be able to select schema ("use")

2016-01-28 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3624.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.5.0

> Enhance Web UI to be able to select schema ("use")
> --
>
> Key: DRILL-3624
> URL: https://issues.apache.org/jira/browse/DRILL-3624
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Client - HTTP
>Affects Versions: 1.1.0
>Reporter: Uwe Geercken
>Priority: Minor
> Fix For: 1.5.0
>
>
> it would be advantageous to be able to select a schema ("use") in the web ui, 
> so that the information does not always have to be specified in each query.
> this could be realized e.g. through a drop down where the user selects the 
> schema from the list of available schemas. the ui should store this 
> information until a different schema is selected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4169) Upgrade Hive Storage Plugin to work with latest stable Hive (v1.2.1)

2015-12-15 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4169.

Resolution: Fixed

> Upgrade Hive Storage Plugin to work with latest stable Hive (v1.2.1)
> 
>
> Key: DRILL-4169
> URL: https://issues.apache.org/jira/browse/DRILL-4169
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> There have been few bug fixes in Hive SerDes since Hive 1.0.0. Its good to 
> update the Hive storage plugin to work with latest stable Hive version 
> (1.2.1), so that HiveRecordReader can use the latest SerDes.
> Compatibility when working with lower versions (v1.0.0 - currently supported 
> version) of Hive servers: There are no metastore API changes between Hive 
> 1.0.0 and Hive 1.2.1 that affect how Drill's Hive storage plugin is 
> interacting with Hive metastore. Tested to make sure it works fine. So users 
> can use Drill to query Hive 1.0.0 (currently supported) and Hive 1.2.1 (new 
> addition in this JIRA).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4198) Enhance StoragePlugin interface to expose logical space rules for planning purpose

2015-12-15 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4198.

Resolution: Fixed

> Enhance StoragePlugin interface to expose logical space rules for planning 
> purpose
> --
>
> Key: DRILL-4198
> URL: https://issues.apache.org/jira/browse/DRILL-4198
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>
> Currently StoragePlugins can only expose rules that are executed in physical 
> space. Add an interface method to StoragePlugin to expose logical space rules 
> to planner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4194) Improve the performance of metadata fetch operation in HiveScan

2015-12-15 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4194.

Resolution: Fixed

> Improve the performance of metadata fetch operation in HiveScan
> ---
>
> Key: DRILL-4194
> URL: https://issues.apache.org/jira/browse/DRILL-4194
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> Current HiveScan fetches the InputSplits for all partitions when {{HiveScan}} 
> is created. This causes long delays when the table contains large number of 
> partitions. If we end up pruning majority of partitions, this delay is 
> unnecessary.
> We need this InputSplits info from the beginning of planning because
>  * it is used in calculating the cost of the {{HiveScan}}. Currently when 
> calculating the cost first we look at the rowCount (from Hive MetaStore), if 
> it is available we use it in cost calculation. Otherwise we estimate the 
> rowCount from InputSplits. 
>  * We also need the InputSplits for determining whether {{HiveScan}} is a 
> singleton or distributed for adding appropriate traits in {{ScanPrule}}
> Fix is to delay the loading of the InputSplits until we need. There are two 
> cases where we need it. If we end up fetching the InputSplits, store them 
> until the query completes.
>  * If the stats are not available, then we need InputSplits
>  * If the partition is not pruned we need it for parallelization purposes.
> Regarding getting the parallelization info in {{ScanPrule}}: Had a discussion 
> with [~amansinha100]. All we need at this point is whether the data is 
> distributed or singleton at this point. Added a method {{isSingleton()}} to 
> GroupScan. Returning {{false}} seems to work fine for HiveScan, but I am not 
> sure of the implications here. We also have {{ExcessiveExchangeIdentifier}} 
> which removes unnecessary exchanges by looking at the parallelization info. I 
> think it is ok to return the parallelization info here as the pruning must 
> have already completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4198) Enhance StoragePlugin interface to expose logical space rules for planning purpose

2015-12-14 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4198:
--

 Summary: Enhance StoragePlugin interface to expose logical space 
rules for planning purpose
 Key: DRILL-4198
 URL: https://issues.apache.org/jira/browse/DRILL-4198
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Currently StoragePlugins can only expose rules that are executed in physical 
space. Add an interface method to StoragePlugin to expose logical space rules 
to planner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4194) Improve the performance of metadata fetch operation in HiveScan

2015-12-14 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-4194:
--

 Summary: Improve the performance of metadata fetch operation in 
HiveScan
 Key: DRILL-4194
 URL: https://issues.apache.org/jira/browse/DRILL-4194
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.4.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.5.0


Current HiveScan fetches the InputSplits for all partitions when {{HiveScan}} 
is created. This causes long delays when the table contains large number of 
partitions. If we end up pruning majority of partitions, this delay is 
unnecessary.

We need this InputSplits info from the beginning of planning because
 * it is used in calculating the cost of the {{HiveScan}}. Currently when 
calculating the cost first we look at the rowCount (from Hive MetaStore), if it 
is available we use it in cost calculation. Otherwise we estimate the rowCount 
from InputSplits. 
 * We also need the InputSplits for determining whether {{HiveScan}} is a 
singleton or distributed for adding appropriate traits in {{ScanPrule}}

Fix is to delay the loading of the InputSplits until we need. There are two 
cases where we need it. If we end up fetching the InputSplits, store them until 
the query completes.
 * If the stats are not available, then we need InputSplits
 * If the partition is not pruned we need it for parallelization purposes.

Regarding getting the parallelization info in {{ScanPrule}}: Had a discussion 
with [~amansinha100]. All we need at this point is whether the data is 
distributed or singleton at this point. Added a method {{isSingleton()}} to 
GroupScan. Returning {{false}} seems to work fine for HiveScan, but I am not 
sure of the implications here. We also have {{ExcessiveExchangeIdentifier}} 
which removes unnecessary exchanges by looking at the parallelization info. I 
think it is ok to return the parallelization info here as the pruning must have 
already completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4165) IllegalStateException in MergeJoin for a query against TPC-DS data

2015-12-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4165.

   Resolution: Fixed
Fix Version/s: 1.4.0

> IllegalStateException in MergeJoin for a query against TPC-DS data
> --
>
> Key: DRILL-4165
> URL: https://issues.apache.org/jira/browse/DRILL-4165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: amit hadke
> Fix For: 1.4.0
>
>
> I am seeing the following on the 1.4.0 branch. 
> {noformat}
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> ..
> 0: jdbc:drill:zk=local> select count(*) from dfs.`tpcds/store_sales` ss1, 
> dfs.`tpcds/store_sales` ss2 where ss1.ss_customer_sk = ss2.ss_customer_sk and 
> ss1.ss_store_sk = 1 and ss2.ss_store_sk = 2;
> Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#55, 
> MergeJoinBatch] has size 1984616, which is beyond the limit of 65536
> Fragment 0:0
> [Error Id: 18bf00fe-52d7-4d84-97ec-b04a035afb4e on 192.168.1.103:31010]
>   (java.lang.IllegalStateException) Incoming batch [#55, MergeJoinBatch] has 
> size 1984616, which is beyond the limit of 65536
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():305
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4159) TestCsvHeader sometimes fails due to ordering issue

2015-12-04 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4159.

   Resolution: Fixed
Fix Version/s: 1.4.0

> TestCsvHeader sometimes fails due to ordering issue
> ---
>
> Key: DRILL-4159
> URL: https://issues.apache.org/jira/browse/DRILL-4159
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.4.0
>
>
> This test should be rewritten to use the query test framework, rather than 
> doing a string comparison of the entire result set. And it should be 
> specified as unordered, so that results aren't affected by the random order 
> in which files are read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4047) Select with options

2015-12-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4047.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4053) Reduce metadata cache file size

2015-12-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4053.

Resolution: Fixed

> Reduce metadata cache file size
> ---
>
> Key: DRILL-4053
> URL: https://issues.apache.org/jira/browse/DRILL-4053
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.4.0
>
>
> The parquet metadata cache file has fair amount of redundant metadata that 
> causes the size of the cache file to bloat. Two things that we can reduce are 
> :
> 1) Schema is repeated for every row group. We can keep a merged schema 
> (similar to what was discussed for insert into functionality) 2) The max and 
> min value in the stats are used for partition pruning when the values are the 
> same. We can keep the maxValue only and that too only if it is the same as 
> the minValue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4108) Query on csv file w/ header fails with an exception when non existing column is requested

2015-12-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4108.

Resolution: Fixed
  Assignee: Abhijit Pol

> Query on csv file w/ header fails with an exception when non existing column 
> is requested
> -
>
> Key: DRILL-4108
> URL: https://issues.apache.org/jira/browse/DRILL-4108
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.3.0
>Reporter: Abhi Pol
>Assignee: Abhijit Pol
> Fix For: 1.4.0
>
>
> Drill query on a csv file with header requesting column(s) that do not exists 
> in header fails with an exception.
> *Current behavior:* once extractHeader is enabled, query columns must be 
> columns from the header
> *Expected behavior:* non existing columns should appear with 'null' values 
> like default drill behavior
> {noformat}
> 0: jdbc:drill:zk=local> select Category from dfs.`/tmp/cars.csvh` limit 10;
> java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> org.apache.drill.exec.store.easy.text.compliant.FieldVarCharOutput.(FieldVarCharOutput.java:104)
>   at 
> org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:118)
>   at 
> org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:108)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:198)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:151)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:105)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: -1
> Fragment 0:0
> [Error Id: f272960e-fa2f-408e-918c-722190398cd3 on blackhole:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4081) Handle schema changes in ExternalSort

2015-12-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4081.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.4.0
>
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4094) Respect -DskipTests=true for JDBC plugin tests

2015-12-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4094.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Respect -DskipTests=true for JDBC plugin tests
> --
>
> Key: DRILL-4094
> URL: https://issues.apache.org/jira/browse/DRILL-4094
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Andrew
>Assignee: Andrew
>Priority: Trivial
> Fix For: 1.4.0
>
>
> The maven config for the JDBC storage plugin does not respect the -DskipTests 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-12-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4124.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3938) Hive: Failure reading from a partition when a new column is added to the table after the partition creation

2015-12-01 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3938.

Resolution: Fixed

> Hive: Failure reading from a partition when a new column is added to the 
> table after the partition creation
> ---
>
> Key: DRILL-3938
> URL: https://issues.apache.org/jira/browse/DRILL-3938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 0.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.4.0
>
>
> Repro:
> From Hive:
> {code}
> CREATE TABLE kv(key INT, value STRING);
> LOAD DATA LOCAL INPATH 
> '/Users/hadoop/apache-repos/hive-install/apache-hive-1.0.0-bin/examples/files/kv1.txt'
>  INTO TABLE kv;
> CREATE TABLE kv_p(key INT, value STRING, part1 STRING);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.max.dynamic.partitions=1;
> set hive.exec.max.dynamic.partitions.pernode=1;
> INSERT INTO TABLE kv_p PARTITION (part1) SELECT key, value, value as s FROM 
> kv;
> ALTER TABLE kv_p ADD COLUMNS (newcol STRING);
> {code}
> From Drill:
> {code}
> USE hive;
> DESCRIBE kv_p;
> SELECT newcol FROM kv_p;
> throws column 'newcol' not found error in HiveRecordReader while selecting 
> only the projected columns.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3893) Issue with Drill after Hive Alters the Table

2015-12-01 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3893.

   Resolution: Fixed
 Assignee: Venki Korukanti
Fix Version/s: 1.4.0

>  Issue with Drill after Hive Alters the Table
> -
>
> Key: DRILL-3893
> URL: https://issues.apache.org/jira/browse/DRILL-3893
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Storage - Hive
>Affects Versions: 1.0.0, 1.1.0
> Environment: DEV
>Reporter: arnab chatterjee
>Assignee: Venki Korukanti
> Fix For: 1.4.0
>
>
> I reproduced this again on another partitioned table with existing data.
> Providing some more details. I have enabled the version mode for errors. 
> Drill is unable to fetch the new column name that was introduced.This most 
> likely to me seems  to me that it’s still picking up the stale metadata of 
> hive.
> if (!tableColumns.contains(columnName)) {
> if (partitionNames.contains(columnName)) {
>   selectedPartitionNames.add(columnName);
> } else {
>   throw new ExecutionSetupException(String.format("Column %s does 
> not exist", columnName));
> }
>   }
> select testdata from testtable;
> Error: SYSTEM ERROR: ExecutionSetupException: Column testdata does not exist
> Fragment 0:0
> [Error Id: be5cccba-97f6-4cc4-94e8-c11a4c53c8f4 on x.x.com:]
>   (org.apache.drill.common.exceptions.ExecutionSetupException) Failure while 
> initializing HiveRecordReader: Column testdata does not exist
> org.apache.drill.exec.store.hive.HiveRecordReader.init():241
> org.apache.drill.exec.store.hive.HiveRecordReader.():138
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) 
> Column testdata does not exist
> org.apache.drill.exec.store.hive.HiveRecordReader.init():206
> org.apache.drill.exec.store.hive.HiveRecordReader.():138
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> #
> Please note that this is a partitioned table with existing data.
> Does Drill Cache the Meta somewhere and hence it’s not getting reflected 
> immediately ?
> DRILL CLI
> > select x from xx;
> Error: SYSTEM ERROR: ExecutionSetupException: Column x does not exist
> Fragment 0:0
> [Error Id: 62086e22-1341-459e-87ce-430a24cc5119 on x.x.com:999] 
> (state=,code=0)
> HIVE CLI
> hive> describe formatted x;
> OK
> # col_name  data_type   comment
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3739) NPE on select from Hive for HBase table

2015-12-01 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3739.

   Resolution: Fixed
 Assignee: Venki Korukanti
Fix Version/s: 1.4.0

> NPE on select from Hive for HBase table
> ---
>
> Key: DRILL-3739
> URL: https://issues.apache.org/jira/browse/DRILL-3739
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: ckran
>Assignee: Venki Korukanti
>Priority: Critical
> Fix For: 1.4.0
>
>
> For a table in HBase or MapR-DB with metadata created in Hive so that it can 
> be accessed through beeline or Hue. From Drill query fail with
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4111) turn tests off in travis as they don't work there

2015-12-01 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4111.

   Resolution: Fixed
Fix Version/s: 1.4.0

> turn tests off in travis as they don't work there
> -
>
> Key: DRILL-4111
> URL: https://issues.apache.org/jira/browse/DRILL-4111
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> Since the travis build always fails, we should just turn it off for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3980) Build failure in -Pmapr profile (due to DRILL-3749)

2015-10-26 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3980:
--

 Summary: Build failure in -Pmapr profile (due to DRILL-3749)
 Key: DRILL-3980
 URL: https://issues.apache.org/jira/browse/DRILL-3980
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Currently mapr profile is relying on older version (< 2.7.1) of Hadoop, causing 
compile failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3938) Hive: Failure reading from a partition when a new column is added to the table after the partition creation

2015-10-15 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3938:
--

 Summary: Hive: Failure reading from a partition when a new column 
is added to the table after the partition creation
 Key: DRILL-3938
 URL: https://issues.apache.org/jira/browse/DRILL-3938
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 0.4.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.3.0


Repro:

>From Hive:
{code}
CREATE TABLE kv(key INT, value STRING);
LOAD DATA LOCAL INPATH 
'/Users/hadoop/apache-repos/hive-install/apache-hive-1.0.0-bin/examples/files/kv1.txt'
 INTO TABLE kv;
CREATE TABLE kv_p(key INT, value STRING, part1 STRING);
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1;
set hive.exec.max.dynamic.partitions.pernode=1;
INSERT INTO TABLE kv_p PARTITION (part1) SELECT key, value, value as s FROM kv;

ALTER TABLE kv_p ADD COLUMNS (newcol STRING);
{code}

>From Drill:
{code}
USE hive;
DESCRIBE kv_p;
SELECT newcol FROM kv_p;
throws column 'newcol' not found error in HiveRecordReader while selecting only 
the projected columns.
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3931) Upgrade fileclient dependency in mapr profile

2015-10-13 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3931:
--

 Summary: Upgrade fileclient dependency in mapr profile 
 Key: DRILL-3931
 URL: https://issues.apache.org/jira/browse/DRILL-3931
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.3.0


Current dependency version is 4.1.0-mapr. There is a critical fix that went 
into 4.1.0.34989-mapr. Upgrade the dependency version to 4.1.0.34989-mapr. Only 
pom file changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3911) Upgrade Hadoop from 2.4.1 to latest stable

2015-10-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3911.

Resolution: Duplicate

> Upgrade Hadoop from 2.4.1 to latest stable
> --
>
> Key: DRILL-3911
> URL: https://issues.apache.org/jira/browse/DRILL-3911
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Andrew
>Assignee: Andrew
> Fix For: 1.3.0
>
>
> Later versions of Hadoop have improved S3 compatibility 
> (https://issues.apache.org/jira/browse/HADOOP-10400). 
> Since users are increasingly using Drill with S3, we should upgrade our 
> Hadoop dependency so we can get the best integration possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3884) Hive native scan has lower parallelization leading to performance degradation

2015-10-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3884.

Resolution: Fixed

> Hive native scan has lower parallelization leading to performance degradation
> -
>
> Key: DRILL-3884
> URL: https://issues.apache.org/jira/browse/DRILL-3884
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Hive
>Affects Versions: 1.2.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>Priority: Critical
> Fix For: 1.2.0
>
>
> Currently {{HiveDrillNativeParquetScan.getScanStats()}} divides the rowCount 
> got from {{HiveScan}} by a factor and returns that as cost. Problem is all 
> cost calculations and parallelization depends on the rowCount. Value 
> {{cpuCost}} is not taken into consideration in current cost calculations in 
> {{ScanPrel}}. In order for the planner to choose 
> {{HiveDrillNativeParquetScan}} over {{HiveScan}}, rowCount has to be lowered 
> for the former, but this leads to lower parallelization and performance 
> degradation.
> Temporary fix for Drill 1.2 before DRILL-3856 fully resolves considering CPU 
> cost in cost model:
> 1. Change ScanPrel to consider the CPU cost in given Stats from GroupScan
> 2. Have higher CPU cost for {{HiveScan}} (SerDe route)
> 3. Lower CPU cost for {{HiveDrillNativeParquetScan}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3884) Hive native scan has lower parallelization leading to performance degradation

2015-10-01 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3884:
--

 Summary: Hive native scan has lower parallelization leading to 
performance degradation
 Key: DRILL-3884
 URL: https://issues.apache.org/jira/browse/DRILL-3884
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization, Storage - Hive
Affects Versions: 1.2.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Critical
 Fix For: 1.2.0


Currently {{HiveDrillNativeParquetScan.getScanStats()}} divides the rowCount 
got from {{HiveScan}} by a factor and returns that as cost. Problem is all cost 
calculations and parallelization depends on the rowCount. Value {{cpuCost}} is 
not taken into consideration in current cost calculations in {{ScanPrel}}. In 
order for the planner to choose {{HiveDrillNativeParquetScan}} over 
{{HiveScan}}, rowCount has to be lowered for the former, but this leads to 
lower parallelization and performance degradation.

Temporary fix for Drill 1.2 before DRILL-3856 fully resolves considering CPU 
cost in cost model:
1. Change ScanPrel to consider the CPU cost in given Stats from GroupScan
2. Have higher CPU cost for {{HiveScan}} (SerDe route)
3. Lower CPU cost for {{HiveDrillNativeParquetScan}}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3852) HiveScan is not expanding '*'

2015-09-29 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3852:
--

 Summary: HiveScan is not expanding '*'
 Key: DRILL-3852
 URL: https://issues.apache.org/jira/browse/DRILL-3852
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization, Storage - Hive
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.3.0


Any {{SELECT \*}} query on hive table is not expanding the '\*' into columns in 
HiveScan. This results in execution to have code that handles the '\*' 
separately. We should expand the '\*' as for Hive the schema is known to avoid 
complexity of execution code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3857) Enhance HiveDrillNativeParquetScan and related classed to support multiple formats.

2015-09-29 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3857:
--

 Summary: Enhance HiveDrillNativeParquetScan and related classed to 
support multiple formats.
 Key: DRILL-3857
 URL: https://issues.apache.org/jira/browse/DRILL-3857
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning & Optimization, Storage - Hive
Affects Versions: 1.2.0
Reporter: Venki Korukanti


Currently DRILL-3209 only adds support for reading Hive parquet tables using 
Drill's native parquet reader. It would be better if we can abstract out or 
define clear interfaces so that it can be extended by other formats such as 
text or avro to use the Drill's native reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3746) Hive query fails if the table contains external partitions

2015-09-10 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3746.

Resolution: Fixed

> Hive query fails if the table contains external partitions
> --
>
> Key: DRILL-3746
> URL: https://issues.apache.org/jira/browse/DRILL-3746
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.2.0
>
>
> If Hive contains a table which has external partitions, Drill fails in 
> partition pruning code, which causes the query to fail.
> {code}
> CREATE TABLE external_partition_test (boolean_field BOOLEAN) PARTITIONED BY 
> (boolean_part BOOLEAN);
> ALTER TABLE external_partition_test ADD PARTITION (boolean_part='true') 
> LOCATION '/some/path';
> ALTER TABLE external_partition_test ADD PARTITION (boolean_part='false') 
> LOCATION '/some/path';
> {code}
> Query:
> {code}
> SELECT * FROM hive.`default`.external_partition_test where boolean_part = 
> false
> {code}
> Exception:
> {code}
> java.lang.StringIndexOutOfBoundsException
> String index out of range: -14
> at java.lang.String.substring(String.java:1875) ~[na:1.7.0_45]
> at 
> org.apache.drill.exec.planner.sql.HivePartitionLocation.(HivePartitionLocation.java:31)
>  ~[classes/:na]
> at 
> org.apache.drill.exec.planner.sql.HivePartitionDescriptor.getPartitions(HivePartitionDescriptor.java:117)
>  ~[classes/:na]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:185)
>  ~[classes/:na]
> at 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2.onMatch(HivePushPartitionFilterIntoScan.java:92)
>  ~[classes/:na]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r0.jar:1.4.0-drill-r0]
> {code}
> Looking at {{HivePartitionLocation}}, it looks like we are depending on the 
> organization of files on FileSystem to get the partition values. We should 
> get the partition values from MetaStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3746) Hive partition pruning fails if the table contains external partitions

2015-09-08 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3746:
--

 Summary: Hive partition pruning fails if the table contains 
external partitions
 Key: DRILL-3746
 URL: https://issues.apache.org/jira/browse/DRILL-3746
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti


If Hive contains a table which has external partitions, Drill fails in 
partition pruning code, which causes the query to fail.

{code}
CREATE TABLE external_partition_test (boolean_field BOOLEAN) PARTITIONED BY 
(boolean_part BOOLEAN);

ALTER TABLE external_partition_test ADD PARTITION (boolean_part='true') 
LOCATION '/some/path';

ALTER TABLE external_partition_test ADD PARTITION (boolean_part='false') 
LOCATION '/some/path';
{code}

Exception:
{code}


java.lang.StringIndexOutOfBoundsException

String index out of range: -14

at java.lang.String.substring(String.java:1875) ~[na:1.7.0_45]

at 
org.apache.drill.exec.planner.sql.HivePartitionLocation.(HivePartitionLocation.java:31)
 ~[classes/:na]

at 
org.apache.drill.exec.planner.sql.HivePartitionDescriptor.getPartitions(HivePartitionDescriptor.java:117)
 ~[classes/:na]

at 
org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:185)
 ~[classes/:na]

at 
org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2.onMatch(HivePushPartitionFilterIntoScan.java:92)
 ~[classes/:na]

at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
 ~[calcite-core-1.4.0-drill-r0.jar:1.4.0-drill-r0]
{code}

Looking at the {{HivePartitionLocation}}, it looks like we are depending on the 
structure of the files on FileSystem to get the partition values. We should get 
these partition from MetaStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3749) Upgrade Hadoop dependency to latest version (2.7.1)

2015-09-08 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3749:
--

 Summary: Upgrade Hadoop dependency to latest version (2.7.1)
 Key: DRILL-3749
 URL: https://issues.apache.org/jira/browse/DRILL-3749
 Project: Apache Drill
  Issue Type: New Feature
  Components: Tools, Build & Test
Affects Versions: 1.1.0
Reporter: Venki Korukanti
Assignee: Steven Phillips
 Fix For: 1.3.0


Logging a JIRA to track and discuss upgrading Drill's Hadoop dependency 
version. Currently Drill depends on Hadoop 2.5.0 version. Newer version of 
Hadoop (2.7.1) has following features.

1) Better S3 support
2) Ability to check if a user has certain permissions on file/directory without 
performing operations on the file/dir. Useful for cases like DRILL-3467.

As Drill is going to use higher version of Hadoop fileclient, there could be 
potential issues when interacting with Hadoop services (such as HDFS) of lower 
version than the fileclient.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3725) Add HTTPS support for Drill web interface

2015-08-31 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3725:
--

 Summary: Add HTTPS support for Drill web interface
 Key: DRILL-3725
 URL: https://issues.apache.org/jira/browse/DRILL-3725
 Project: Apache Drill
  Issue Type: New Feature
  Components: Client - HTTP
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.2.0


Currently web UI or REST API calls don't support transport layer security 
(TLS). This jira is to add support for TLS. We need this feature before adding 
the user authentication to Drill's web interface.

Proposal is:
* Always default to HTTPS
* Cluster admin can set the following SSL configuration to specify their own 
keystore and/or truststore.
** java.net.ssl.keyStore
** java.net.ssl.keyStorePassword
** java.net.ssl.trustStore
** java.net.ssl.trustStorePassword
* If cluster admin didn't specified the above SSL config, generate a self 
signed certificate programmatically and use it by using libraries such as 
[Bouncy Castle|http://www.bouncycastle.org/].
* Make use of the Jetty APIs to add a HTTPS connection. Example is 
[here|http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/examples/embedded/src/main/java/org/eclipse/jetty/embedded/LikeJettyXml.java].

Let me know if you have any comments.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3193) TestDrillbitResilience#interruptingWhileFragmentIsBlockedInAcquiringSendingTicket hangs and fails

2015-07-13 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3193.

Resolution: Cannot Reproduce

Tried running the test repeated 1000times. No hangs reproed.

 TestDrillbitResilience#interruptingWhileFragmentIsBlockedInAcquiringSendingTicket
  hangs and fails
 -

 Key: DRILL-3193
 URL: https://issues.apache.org/jira/browse/DRILL-3193
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sudheesh Katkam
Assignee: Venki Korukanti
 Fix For: 1.2.0


 TestDrillbitResilience#interruptingWhileFragmentIsBlockedInAcquiringSendingTicket
  hangs when it is run multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3271) Hive : Tpch 01.q fails with a verification issue for SF100 dataset

2015-07-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3271.

Resolution: Invalid

Just had a discussion with [~adeneche]. Floating point differences between runs 
are due to truncation in arithmetic operations and the order of data received 
at aggregator. The differences here still seems to be in acceptable range. We 
need to update the margin error constant in test framework.

 Hive : Tpch 01.q fails with a verification issue for SF100 dataset
 --

 Key: DRILL-3271
 URL: https://issues.apache.org/jira/browse/DRILL-3271
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Rahul Challapalli
Assignee: Venki Korukanti
 Fix For: 1.2.0

 Attachments: tpch100_hive.ddl


 git.commit.id.abbrev=5f26b8b
 Query :
 {code}
 select
   l_returnflag,
   l_linestatus,
   sum(l_quantity) as sum_qty,
   sum(l_extendedprice) as sum_base_price,
   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
   avg(l_quantity) as avg_qty,
   avg(l_extendedprice) as avg_price,
   avg(l_discount) as avg_disc,
   count(*) as count_order
 from
   lineitem
 where
   l_shipdate = date '1998-12-01' - interval '120' day (3)
 group by
   l_returnflag,
   l_linestatus
 order by
   l_returnflag,
   l_linestatus;
 {code}
 The 4th column appears to have some differences. Not sure if it is within 
 acceptable range
 Expected :
 {code}
 A   F   3.775127758E9   5.660776097194428E125.377736398183942E12  
   5.592847429515948E1225.499370423275426  38236.11698430475   
 0.05000224353079674 148047881
 N   O   7.269911583E9   1.0901214476134316E13   1.0356163586785008E13 
   1.077041889123738E1325.499873337396807  38236.997134222445  
 0.04999763132401859 285095988
 R   F   3.77572497E95.661603032745362E125.378513563915394E12  
   5.593662252666902E1225.50006628406532   38236.69725845312   
 0.05000130433952159 148067261
 N   F   9.8553062E7 1.4777109838597995E11   1.403849659650348E11  
   1.459997930327757E1125.501556956882876  38237.19938880449   
 0.04998528433803118 3864590
 {code}
 Actual : 
 {code}
 A   F   3.775127758E9   5.660776097194352E125.37773639818398E12   
   5.592847429515874E1225.499370423275426  38236.11698430423   
 0.0500022435305286  148047881
 N   O   7.269911583E9   1.0901214476134352E13   1.0356163586784926E13 
   1.0770418891237576E13   25.499873337396807  38236.99713422257   
 0.04999763132535226 285095988
 R   F   3.77572497E95.661603032745394E125.378513563915313E12  
   5.593662252666848E1225.50006628406532   38236.69725845333   
 0.05000130433925318 148067261
 N   F   9.8553062E7 1.4777109838598022E11   1.4038496596503506E11 
   1.45999793032776E11 25.501556956882876  38237.19938880456   
 0.0499852843380938843864590
 {code}
 The data is 100 GB, so I couldn't attach it here.
 I attached the hive ddl. Let me know if you need anything else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3239) Join between empty hive tables throws an IllegalStateException

2015-07-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3239.

Resolution: Cannot Reproduce

Can't reproduce on latest master 48d8a59. Please reopen if it reproes.

 Join between empty hive tables throws an IllegalStateException
 --

 Key: DRILL-3239
 URL: https://issues.apache.org/jira/browse/DRILL-3239
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Rahul Challapalli
Assignee: Venki Korukanti
 Fix For: 1.2.0

 Attachments: error.log


 git.commit.id.abbrev=6f54223
 Created 2 hive tables on top of tpch data in orc format. The tables are 
 empty. Below query returns 0 rows from hive. However it fails with an 
 IllegalStateException from drill
 {code}
 select * from customer c, orders o where c.c_custkey = o.o_custkey;
 Error: SYSTEM ERROR: java.lang.IllegalStateException: You tried to do a batch 
 data read operation when you were in a state of NONE.  You can only do this 
 type of operation when you are in a state of OK or OK_NEW_SCHEMA.
 Fragment 0:0
 [Error Id: 8483cab2-d771-4337-ae65-1db41eb5720d on qa-node191.qa.lab:31010] 
 (state=,code=0)
 {code}
 Below is the hive ddl I used
 {code}
 create table if not exists tpch01_orc.customer (
 c_custkey int,
 c_name string,
 c_address string,
 c_nationkey int,
 c_phone string,
 c_acctbal double,
 c_mktsegment string,
 c_comment string
 )
 STORED AS orc
 LOCATION '/drill/testdata/Tpch0.01/orc/customer';
 create table if not exists tpch01_orc.orders (
 o_orderkey int,
 o_custkey int,
 o_orderstatus string,
 o_totalprice double,
 o_orderdate date,
 o_orderpriority string,
 o_clerk string,
 o_shippriority int,
 o_comment string
 )
 STORED AS orc
 LOCATION '/drill/testdata/Tpch0.01/orc/orders';
 {code}
 I attached the log files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2643) HashAggBatch/HashAggTemplate call incoming.cleanup() twice resulting in warnings

2015-07-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2643.

Resolution: Fixed

This issue is resolved through DRILL-2826 which made a change to close/cleanup 
an operator exactly once.

 HashAggBatch/HashAggTemplate call incoming.cleanup() twice resulting in 
 warnings
 

 Key: DRILL-2643
 URL: https://issues.apache.org/jira/browse/DRILL-2643
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Venki Korukanti
 Fix For: 1.2.0

 Attachments: DRILL-2643.patch, t1.parquet, t2.parquet


 In this case j1,j2 are views created on top of parquet files, BOTH  views 
 have order by on multiple columns in different order with nulls first/last.
 Also, table in in view j1, consists of 99 parquet files.  See attached 
 views.txt file on how to create views (make sure to create views in a 
 different workspace, views have the same names as tables)
 {code}
 select DISTINCT
 COALESCE(j1.c_varchar || j2.c_varchar || 'EMPTY') as 
 concatentated_string
 from
 j1  INNER JOIN j2 ON
 (j1.d18 = j2.d18)
 ;
 {code}
 The same can be reproduced with parquet files and subqueries:
 (pay attention parquet files are named the same as views: j1, j2)
 {code}
 select DISTINCT
 COALESCE(sq1.c_varchar || sq2.c_varchar || 'EMPTY') as 
 concatentated_string
 from
 (select c_varchar, c_integer from j1 order by j1.c_varchar desc nulls 
 first ) as sq1(c_varchar, c_integer)
 INNER JOIN
 (select c_varchar, c_integer from j2 order by j2.c_varchar nulls 
 last) as sq2(c_varchar, c_integer)
 ON (sq1.c_integer = sq2.c_integer)
 {code}
 You do need to have sort in order to reproduce the problem.
 This query works:
 {code}
 select DISTINCT
 COALESCE(j1.c_varchar || j2.c_varchar || 'EMPTY') as 
 concatentated_string
 from j1,j2
 where j1.c_integer = j2.c_integer;
 {code}
 {code}
 2015-04-01 00:43:42,455 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO  
 o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: 
 Executed 99 out of 99 using 16 threads. Time: 20ms total, 2.877318ms avg, 3ms 
 max.
 2015-04-01 00:43:42,458 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO  
 o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
 atsqa4-136.qa.lab.  Skipping affinity to that host.
 2015-04-01 00:43:42,458 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO  
 o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: 
 Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.562620ms avg, 1ms max.
 2015-04-01 00:43:42,485 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:foreman] INFO  
 o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING -- 
 RUNNING
 2015-04-01 00:43:45,613 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] WARN  
 o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 32 batch groups. 
 Current allocated memory: 16642330
 2015-04-01 00:43:45,676 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO  
 o.a.d.exec.vector.BaseValueVector - Realloc vector null. [16384] - [32768]
 2015-04-01 00:43:45,676 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO  
 o.a.d.exec.vector.BaseValueVector - Realloc vector 
 ``c_varchar`(VARCHAR:OPTIONAL)_bits`(UINT1:REQUIRED). [4096] - [8192]
 2015-04-01 00:43:45,679 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO  
 o.a.d.exec.vector.BaseValueVector - Realloc vector null. [32768] - [65536]
 2015-04-01 00:43:45,680 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] INFO  
 o.a.d.exec.vector.BaseValueVector - Realloc vector 
 ``c_varchar`(VARCHAR:OPTIONAL)_bits`(UINT1:REQUIRED). [8192] - [16384]
 2015-04-01 00:43:45,709 [2ae4c0c0-c408-3e66-4fb3-e7bf80a42bad:frag:0:0] WARN  
 o.a.d.exec.memory.AtomicRemainder - Tried to close remainder, but it has 
 already been closed
 java.lang.Exception: null
 at 
 org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:196) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:298)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.cleanup(ExternalSortBatch.java:162)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at

[jira] [Resolved] (DRILL-3424) Hive Views are not accessible through Drill Query

2015-06-30 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3424.

Resolution: Duplicate

 Hive Views are not accessible through Drill Query
 -

 Key: DRILL-3424
 URL: https://issues.apache.org/jira/browse/DRILL-3424
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.0.0
 Environment: CentOS 6.5, MapR, Drill 1.0
Reporter: Soumendra Kumar Mishra
Assignee: Venki Korukanti

 Hive Views are not accessible through Drill Query. 
 Error Message: Hive Views are Not Supported in Current Version



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3413) Use DIGEST mechanism in creating Hive MetaStoreClient for proxy users when SASL authentication is enabled

2015-06-28 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3413:
--

 Summary: Use DIGEST mechanism in creating Hive MetaStoreClient for 
proxy users when SASL authentication is enabled
 Key: DRILL-3413
 URL: https://issues.apache.org/jira/browse/DRILL-3413
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.1.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.1.0


Currently we fail to create HiveMetaStoreClient for proxy users when SASL 
authentication is enabled between HiveMeaStore server and clients. We fail to 
create the client because when SASL (kerberos or vendor specific custom SASL 
implementations) is enabled some vendor specific versions of Hive only accept 
DIGEST as the authentication mechanism for proxy client.

To fix this issue:
1. Drillbit need to create a HiveMetaStoreClient with its credentials (these 
are directly credentials and not proxy)
2. Whenever Drillbit need to create a HiveMetaStoreClient for proxy user (user 
being impersonated), get the delegation token for proxy user from MetaStore 
server using the Drillbit process user HiveMetaStoreClient. Set this delegation 
token in a new HiveConf object and pass it to HiveMetaStoreClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3398) WebServer is leaking memory for queries submitted through REST API or WebUI

2015-06-26 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3398:
--

 Summary: WebServer is leaking memory for queries submitted through 
REST API or WebUI
 Key: DRILL-3398
 URL: https://issues.apache.org/jira/browse/DRILL-3398
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.1.0


1. Start embedded drillbit
2. Submit queries through WebUI or REST APIs
3. Shutdown drillbit. Here TopLevelAllocator close prints out the leaked pools.

[~sudheeshkatkam] and I looked the issue, it turns out we don't release the 
RecordBatchLoaded in QueryWrapper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3398) WebServer is leaking memory for queries submitted through REST API or WebUI

2015-06-26 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3398.

Resolution: Fixed

Fixed in 
[60bc945|https://github.com/apache/drill/commit/60bc9459bd8ef29e9d90ffe885771090ab658a40].

 WebServer is leaking memory for queries submitted through REST API or WebUI
 ---

 Key: DRILL-3398
 URL: https://issues.apache.org/jira/browse/DRILL-3398
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.1.0

 Attachments: DRILL-3398-1.patch


 1. Start embedded drillbit
 2. Submit queries through WebUI or REST APIs
 3. Shutdown drillbit. Here TopLevelAllocator close prints out the leaked 
 pools.
 [~sudheeshkatkam] and I looked the issue, it turns out we don't release the 
 RecordBatchLoaded in QueryWrapper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-669) Information Schema should be schema sensitive

2015-06-18 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-669.
---
   Resolution: Invalid
Fix Version/s: (was: 1.1.0)

 Information Schema should be schema sensitive
 -

 Key: DRILL-669
 URL: https://issues.apache.org/jira/browse/DRILL-669
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Rahul Challapalli
Assignee: Venki Korukanti
Priority: Minor

 If I am currently using the 'hive' schema/database, then information schema 
 should only contain information relevant to 'hive'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2023) Hive function

2015-06-12 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2023.

Resolution: Fixed

{{getCumulativeCost}} is implemented as part of DRILL-2269. 

 Hive function 
 --

 Key: DRILL-2023
 URL: https://issues.apache.org/jira/browse/DRILL-2023
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Hive
Reporter: Jacques Nadeau
Assignee: Venki Korukanti
 Fix For: 1.1.0


 If you try to do a query that uses regexp_extract with Drill expressions 
 inside of it, Drill doesn't handle it correctly.  The exception is:
 The type of org.apache.drill.exec.expr.HiveFuncHolderExpr doesn't currently 
 support LogicalExpression.getCumulativeCost()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3260) Conflicting servlet-api jar causing web UI to be slow

2015-06-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3260.

   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed in 
[6796006|https://github.com/apache/drill/commit/6796006f2df5aa598f3715be9de2a724b5c338e3].

 Conflicting servlet-api jar causing web UI to be slow
 -

 Key: DRILL-3260
 URL: https://issues.apache.org/jira/browse/DRILL-3260
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.1.0

 Attachments: DRILL-3260-1.patch


 This is the same issue that we had sometime back. Recent changes to 
 [pom|https://github.com/apache/drill/commit/1de6aed93efce8a524964371d96673b8ef192d89]
  files are pulling a problematic jar {{servlet-api-2.5.jar}}.
 {code}
 +- com.mapr.fs:mapr-hbase:jar:4.1.0-mapr:compile
 .
 [INFO] |  +- org.apache.hbase:hbase-server:jar:0.98.9-mapr-1503:compile
 [INFO] |  |  +- (org.apache.hbase:hbase-common:jar:0.98.9-mapr-1503:compile - 
 omitted for conflict with 0.98.9-mapr-1503-m7-4.1.0)
 .
 [INFO] |  |  +- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile
 {code}
 We already have an maven enforcer to detect servlet-api jars, but this one 
 has a slightly different artifact id name {{servlet-api-2.5}} which is not 
 detected by the maven enforcer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2952) Hive 1.0 plugin for Drill

2015-06-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2952.

Resolution: Fixed

+1. Fixed in 
[9353383|https://github.com/apache/drill/commit/93533835bdcaff018a6b6ee6ea5999f3c5659d70].

 Hive 1.0 plugin for Drill
 -

 Key: DRILL-2952
 URL: https://issues.apache.org/jira/browse/DRILL-2952
 Project: Apache Drill
  Issue Type: Task
  Components: Functions - Hive, Storage - Hive
Affects Versions: 1.0.0
Reporter: Na Yang
Assignee: Na Yang
 Fix For: 1.1.0

 Attachments: DRILL-2952.2.patch, DRILL-2952.patch


 Currently Drill works with Hive 0.13 only. It needs a newer version of Hive 
 plugin. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3260) Conflicting servlet-api jar causing web UI to be slow

2015-06-08 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3260:
--

 Summary: Conflicting servlet-api jar causing web UI to be slow
 Key: DRILL-3260
 URL: https://issues.apache.org/jira/browse/DRILL-3260
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti


This is the same issue that we had sometime back. Recent changes to 
[pom|https://github.com/apache/drill/commit/1de6aed93efce8a524964371d96673b8ef192d89]
 files are pulling a problematic jar {{servlet-api-2.5.jar}}.

{code}
+- com.mapr.fs:mapr-hbase:jar:4.1.0-mapr:compile
.
[INFO] |  +- org.apache.hbase:hbase-server:jar:0.98.9-mapr-1503:compile
[INFO] |  |  +- (org.apache.hbase:hbase-common:jar:0.98.9-mapr-1503:compile - 
omitted for conflict with 0.98.9-mapr-1503-m7-4.1.0)
.
[INFO] |  |  +- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile
{code}

We already have an maven enforcer to detect servlet-api jars, but this one has 
a slightly different artifact id name {{servlet-api-2.5}} which is not detected 
by the maven enforcer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3240) Fetch hadoop maven profile specific Hive version in Hive storage plugin

2015-06-01 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3240:
--

 Summary: Fetch hadoop maven profile specific Hive version in Hive 
storage plugin
 Key: DRILL-3240
 URL: https://issues.apache.org/jira/browse/DRILL-3240
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Hive, Tools, Build  Test
Affects Versions: 0.4.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 1.1.0


Currently we always fetch the Apache Hive libs irrespective of the Hadoop 
vendor profile used in {{mvn clean install}}. This jira is to allow specifying 
the custom version of Hive in hadoop vendor profile.

Note: Hive storage plugin assumes there are no major differences in Hive APIs 
between different vendor specific custom Hive builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3208) Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables

2015-05-29 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-3208.

Resolution: Invalid

 Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is 
 backed by hive tables
 ---

 Key: DRILL-3208
 URL: https://issues.apache.org/jira/browse/DRILL-3208
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Rahul Challapalli
Assignee: Venki Korukanti
 Attachments: customer.parquet, error.log, lineitem_nodate.parquet, 
 nation.parquet, orders_nodate.parquet, tpch.ddl


 git.commit.id.abbrev=6f54223
 I created hive tables on top of tpch parquet data. (Attached the hive ddl 
 script). Since hive does not support date in parquet serde, I regenerated the 
 parquet files for orders and lineitem to use string for the date fields. 
 Remaining files do not have a date column.
 When I executed query 10 in the tpch suite, it failed with a system error.
 {code}
 0: jdbc:drill:schema=dfs_eea use hive.tpch01_parquet_nodate;
 +---+-+
 |  ok   | summary |
 +---+-+
 | true  | Default schema changed to [hive.tpch01_parquet_nodate]  |
 +---+-+
 1 row selected (0.091 seconds)
 0: jdbc:drill:schema=dfs_eea
 select
   c.c_custkey,
   c.c_name,
   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
   c.c_acctbal,
   n.n_name,
   c.c_address,
   c.c_phone,
   c.c_comment
 from
   customer c,
   orders o,
   lineitem l,
   nation n
 where
   c.c_custkey = o.o_custkey
   and l.l_orderkey = o.o_orderkey
   and cast(o.o_orderdate as date) = date '1994-03-01'
   and cast(o.o_orderdate as date)  date '1994-03-01' + interval '3' month
   and l.l_returnflag = 'R'
   and c.c_nationkey = n.n_nationkey
 group by
   c.c_custkey,
   c.c_name,
   c.c_acctbal,
   c.c_phone,
   n.n_name,
   c.c_address,
   c.c_comment
 order by
   revenue desc
 limit 20;
 Error: SYSTEM ERROR: 
 Fragment 0:0
 [Error Id: 1d327ae0-1cf2-4776-acd3-8eef6cca4b6a on qa-node191.qa.lab:31010] 
 (state=,code=0)
 {code}
 I tried running the above query using dfs instead of hive and it worked as 
 expected.
 I attached the newly generated parquet files and the hive ddl for creating 
 hive tables. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3203) Add support for impersonation in Hive storage plugin

2015-05-28 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3203:
--

 Summary: Add support for impersonation in Hive storage plugin
 Key: DRILL-3203
 URL: https://issues.apache.org/jira/browse/DRILL-3203
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Storage - Hive
Affects Versions: 0.9.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.1.0


Subtask under DRILL-2363 to add support for impersonation for Hive storage 
plugin.

When impersonation is enabled, Drill currently impersonates as process user 
(user who started the Drillbits) when accessing table metadata/data in Hive. 
This task is to add support for impersonating the user who issued the query 
when accessing Hive metadata/data.

Detailed design doc is coming soon...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3074) ReconnectingClient.waitAndRun can stuck in infinite loop if it fails to establish the connection

2015-05-13 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3074:
--

 Summary: ReconnectingClient.waitAndRun can stuck in infinite loop 
if it fails to establish the connection
 Key: DRILL-3074
 URL: https://issues.apache.org/jira/browse/DRILL-3074
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Critical
 Fix For: 1.0.0


Currently we enter into a infinite loop if a connection exception occurs or we 
timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3017) NPE when cleaning up some RecordReader implementations

2015-05-10 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3017:
--

 Summary: NPE when cleaning up some RecordReader implementations
 Key: DRILL-3017
 URL: https://issues.apache.org/jira/browse/DRILL-3017
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.9.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Run the following unittest:

{code}
  @Test
  public void testParquetReaderCleanupNPE() throws Exception {
test(SELECT * FROM cp.`parquet2/decimal28_38.parquet`);
  }
{code}

Following is the output:

{code}
Exception (no rows returned): 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 

Fragment 0:0

[Error Id: 49db6650-8f62-4c5c-b9dc-3f5d6a4413a0 on localhost:31010].  Returned 
in 407ms.

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 

Fragment 0:0
{code}

Ideally in this case we should get the following query error:

{code}
Exception (no rows returned): 
org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION 
ERROR: Decimal data type is disabled. 
As of this release decimal data type is a beta level feature and should not be 
used in production 
Use option 'planner.enable_decimal_data_type' to enable decimal data type

Fragment 0:0

[Error Id: d91a70ac-93c9-4be4-a542-4f3c7615b677 on localhost:31010].  Returned 
in 392ms.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3010) Convert bad command error messages into UserExceptions in SqlHandlers

2015-05-09 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-3010:
--

 Summary: Convert bad command error messages into UserExceptions in 
SqlHandlers
 Key: DRILL-3010
 URL: https://issues.apache.org/jira/browse/DRILL-3010
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 0.8.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.0.0


Currently SqlHandlers such as CreateTable or ViewHandler send the error 
messages as bad command records.  Instead we should throw a UserException.

{code}
0: jdbc:drill:zk=local create table t1 as select * from cp.`region.json`;
+++
| ok |  summary   |
+++
| false  | Unable to create table. Schema [dfs.default] is immutable.  |
+++
1 row selected (0.103 seconds)
{code}

Instead it should be like:

{code}
0: jdbc:drill:zk=10.10.30.143:5181 create table t1 as select * from 
cp.`region.json`;
Error: PARSE ERROR: Unable to create or drop tables/views. Schema [dfs.default] 
is immutable.


[Error Id: 3a92d026-3df7-4e8b-8988-2300463fa00b on centos64-30146.qa.lab:31010] 
(state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2902) Add support for context UDFs: user (and its synonyms session_user, system_user) and current_schema

2015-04-29 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2902:
--

 Summary: Add support for context UDFs: user (and its synonyms 
session_user, system_user) and current_schema
 Key: DRILL-2902
 URL: https://issues.apache.org/jira/browse/DRILL-2902
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Functions - Drill, Metadata
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.0.0


Add support for following UDFs
 - user (and its synonyms session_user, system_user): get the query userName
 - current_schema: get the default schema in current user session



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2083) order by on large dataset returns wrong results

2015-04-27 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2083.

Resolution: Fixed

Fixed in 
[57a96d2|https://github.com/apache/drill/commit/57a96d200e12c0efcad3f3ca9d935c42647234b1].

 order by on large dataset returns wrong results
 ---

 Key: DRILL-2083
 URL: https://issues.apache.org/jira/browse/DRILL-2083
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Chun Chang
Assignee: Steven Phillips
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2083.patch


 #Mon Jan 26 14:10:51 PST 2015
 git.commit.id.abbrev=3c6d0ef
 Test data has 1 million rows and can be accessed at 
 http://apache-drill.s3.amazonaws.com/files/complex.json.gz
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select count (t.id) from 
 `complex.json` t;
 ++
 |   EXPR$0   |
 ++
 | 100|
 ++
 {code}
 But order by returned 30 more rows.
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select t.id from 
 `complex.json` t order by t.id;
 
 | 97 |
 | 98 |
 | 99 |
 | 100|
 ++
 1,000,030 rows selected (19.449 seconds)
 {code}
 physical plan
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select t.id 
 from `complex.json` t order by t.id;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  SingleMergeExchange(sort0=[0 ASC])
 01-01SelectionVectorRemover
 01-02  Sort(sort0=[$0], dir0=[ASC])
 01-03HashToRandomExchange(dist0=[[$0]])
 02-01  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
 columns=[`id`], 
 files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2858) Refactor hash expression construction in InsertLocalExchangeVisitor and PrelUtil into one place

2015-04-23 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2858:
--

 Summary: Refactor hash expression construction in 
InsertLocalExchangeVisitor and PrelUtil into one place
 Key: DRILL-2858
 URL: https://issues.apache.org/jira/browse/DRILL-2858
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Currently there are two place where we construct the hash expression based on 
the partition fields:
1. InsertLocalExchangeVistor (generates RexExpr type)
2. PRelUtil.getHashExpression (generate LogicalExpression type)

Having this logic in two places makes them prone to errors and they can easily 
go out of sync causing hard to debug verification failures.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2856) StreamingAggBatch goes into infinite loop due to state management issues

2015-04-23 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2856:
--

 Summary: StreamingAggBatch goes into infinite loop due to state 
management issues
 Key: DRILL-2856
 URL: https://issues.apache.org/jira/browse/DRILL-2856
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venki Korukanti
Assignee: Steven Phillips






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2857) Update the StreamingAggBatch current workspace record counter variable type to long from current type int

2015-04-23 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2857:
--

 Summary: Update the StreamingAggBatch current workspace record 
counter variable type to long from current type int
 Key: DRILL-2857
 URL: https://issues.apache.org/jira/browse/DRILL-2857
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.9.0


This is causing invalid results in cases where the incoming batch has more than 
(2^31) - 1 records due to overflow issues.

Example query: (make sure the nested query returns more than (2^31-1) records.
{code}
SELECT count(*) FROM 
  (SELECT L_ORDERKEY, 
  L_PARTKEY, 
  L_SUPPKEY, 
  count(*), 
  count(l_quantity) 
FROM dfs.`lineitem` 
   GROUP BY 
  L_ORDERKEY, 
  L_PARTKEY, 
  L_SUPPKEY
  );
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2855) Fix invalid result issues with StreamingAggBatch

2015-04-23 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2855:
--

 Summary: Fix invalid result issues with StreamingAggBatch
 Key: DRILL-2855
 URL: https://issues.apache.org/jira/browse/DRILL-2855
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.9.0


There are two issues that are causing invalid results:
1. In some conditions we are failing to add the record to current aggregation 
workspace around batch boundary or output batch is full.
2. Incorrectly cleaning up the previous batch. Currently we keep a reference to 
the current batch in previous and try to get the next incoming batch which 
has more than zero records or there are no incoming batches. If the next 
incoming batch has zero records, we are cleaning up the previous batch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2729) Hive partition columns of decimal type are deserialized incorrectly

2015-04-08 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2729:
--

 Summary: Hive partition columns of decimal type are deserialized 
incorrectly
 Key: DRILL-2729
 URL: https://issues.apache.org/jira/browse/DRILL-2729
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 0.6.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.9.0


Repro steps:

{code}
CREATE TABLE IF NOT EXISTS readtest2 (
  a BOOLEAN
) PARTITIONED BY (
  decimal0_part DECIMAL,
  decimal9_part DECIMAL(6, 2),
  decimal18_part DECIMAL(15, 5),
  decimal28_part DECIMAL(23, 1),
  decimal38_part DECIMAL(30, 3)
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

ALTER TABLE readtest2 ADD IF NOT EXISTS PARTITION (
  decimal0_part='36.9',
  decimal9_part='36.9',
  decimal18_part='3289379872.945645',
  decimal28_part='39579334534534.35345',
  decimal38_part='363945093845093890.9');

LOAD DATA LOCAL INPATH '/tmp/data.txt' OVERWRITE INTO TABLE default.readtest2 
PARTITION (
  decimal0_part='36.9',
  decimal9_part='36.9',
  decimal18_part='3289379872.945645',
  decimal28_part='39579334534534.35345',
  decimal38_part='363945093845093890.9');
{code}

Contents of /tmp/data.txt:
{code}
false
true
{code}

Drill output:

{code}
0: jdbc:drill:zk=10.10.30.143:5181 select * from readtest2;
++---+---++++
| a  | decimal0_part | decimal9_part | decimal18_part | decimal28_part 
| decimal38_part |
++---+---++++
| false  | 3700  | 369000.00 | -66367900898250.61888 | 
39579334534534.4 | 363945093845093890.900 |
| true   | 3700  | 369000.00 | -66367900898250.61888 | 
39579334534534.4 | 363945093845093890.900 |
++---+---++++
{code}

Hive output:

{code}
hive select * from readtest2;
OK
false   37  36.93289379872.9456539579334534534.4
363945093845093890.9
true37  36.93289379872.9456539579334534534.4
363945093845093890.9
Time taken: 0.053 seconds, Fetched: 2 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2673) Update UserServer == UserClient RPC to handle handshake response better

2015-04-03 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2673:
--

 Summary: Update UserServer == UserClient RPC to handle handshake 
response better
 Key: DRILL-2673
 URL: https://issues.apache.org/jira/browse/DRILL-2673
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Venki Korukanti
Assignee: Venki Korukanti


Currently if an exception occurs while UserServer is handling handshake message 
from UserClient, server terminates the connection which causes the client to 
not handle the handshake response properly. 

This JIRA is to modify the behavior of UserServer when an exception occurs or 
contents (ex. user/password credentials) of handshake request are not valid to:
-- First send a handshake response message with error details 
-- Then terminate the connection.

As the client always receives the handshake response, it can inform the 
application about the error if the response has any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-1833) Views cannot be registered into the INFORMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data

2015-04-02 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-1833.

Resolution: Fixed

Fixed in 
[0b6cddf|https://github.com/apache/drill/commit/0b6cddfa4d8f9558f6386e7340429df4e8ec5f88].

Addressed the remaining review comment about changing the log message to be 
specific.

 Views cannot be registered into the INFORMATION_SCHEMA.`TABLES` after wiping 
 out ZooKeeper data
 ---

 Key: DRILL-1833
 URL: https://issues.apache.org/jira/browse/DRILL-1833
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Information Schema
 Environment: git.commit.id.abbrev=2396670
Reporter: Xiao Meng
Assignee: Venki Korukanti
 Fix For: 0.9.0

 Attachments: DRILL-1833-1.patch


 After wiping out the ZooKeeper data, the drillbit cannot automatically 
 register the view into INFORMATION_SCHEMA.`TABLES` even after we query the 
 view.
 For example, for a workspace dfs.tmp, there is a view file 
 `varchar_view.view.drill` under the corresponding directory '/tmp'.
 We can query:
 {code}
 select * from dfs.test.`varchar_view`
 {code}
  
 But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. 
 After I recreate the view based on the contents of `varchar_view.view.drill`, 
 the view shows in the INFORMATION_SCHEMA.`TABLES`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2641) Move unrelated tests in exec/jdbc module into appropriate modules

2015-03-31 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2641:
--

 Summary: Move unrelated tests in exec/jdbc module into appropriate 
modules
 Key: DRILL-2641
 URL: https://issues.apache.org/jira/browse/DRILL-2641
 Project: Apache Drill
  Issue Type: Test
  Components: Tools, Build  Test
Affects Versions: 0.8.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 0.9.0


Move following unreleated tests out of exec/jdbc into appropriate modules.

{{jdbc.TestHiveStoreage.java}} into contrib/storage-hive/core
Split {{jdbc.TestMetadataDDL.java}} into exec/java-exec and 
contrib/storage-hive/core modules.
Remove redundant tests {{TestHiveScalarUDFs.java}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2640) Move view related tests out of 'exec/jdbc' module into appropriate modules

2015-03-31 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2640:
--

 Summary: Move view related tests out of 'exec/jdbc' module into 
appropriate modules
 Key: DRILL-2640
 URL: https://issues.apache.org/jira/browse/DRILL-2640
 Project: Apache Drill
  Issue Type: Test
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 0.9.0


Currently view related tests are in exec/jdbc module which is not the right 
place for view tests. They should be in exec/java-exec and tests on views on 
hive tables should be in contrib/storage-hive/core module.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2402) Current method of combining hash values can produce skew

2015-03-20 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2402.

  Resolution: Fixed
   Fix Version/s: (was: 0.9.0)
  0.8.0
Target Version/s: 0.8.0

Fixed in 
[bb1d761|https://github.com/apache/drill/commit/bb1d7615e7eb6c0c17c0c8a1cde0ca070393e257].

 Current method of combining hash values can produce skew
 

 Key: DRILL-2402
 URL: https://issues.apache.org/jira/browse/DRILL-2402
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 0.8.0
Reporter: Aman Sinha
Assignee: Jacques Nadeau
 Fix For: 0.8.0

 Attachments: DRILL-2402-1.patch


 The current method of combining hash values of multiple columns can produce 
 skew in some cases even though each individual hash function does not produce 
 skew.  The combining function is XOR: 
 {code}
hash(a, b) = XOR (hash(a), hash(b))
 {code}
 The above result will be 0 for all  rows where a = b, so hash(a) = hash(b).  
 This will clearly create severe skew and affects the performance of queries 
 that do HashAggregate based group-by on {a, b} or a HashJoin .on both columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2342) Nullability property of the view created from parquet file is not correct

2015-03-20 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2342.

  Resolution: Fixed
   Fix Version/s: (was: 0.9.0)
  0.8.0
Target Version/s: 0.8.0

Fixed in 
[d7dc0b9|https://github.com/apache/drill/commit/d7dc0b95b8086b63523ec2e6a1cc9236d1a5bc44].

 Nullability property of the view created from parquet file is not correct
 -

 Key: DRILL-2342
 URL: https://issues.apache.org/jira/browse/DRILL-2342
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Venki Korukanti
Priority: Critical
 Fix For: 0.8.0

 Attachments: DRILL-2342-1.patch, DRILL-2342-3.patch, 
 DRILL-2342-4.patch, DRILL-2343-2.patch, t1.parquet


 Here is my t1 table definition:
 {code}
 message root {
   optional int32 a1;
   optional binary b1 (UTF8);
   optional int32 c1 (DATE);
 }
 {code}
 I created a view on top of it:
 {code}
 0: jdbc:drill:schema=dfs create view v1 as select cast(a1 as int), cast(b1 
 as varchar(10)), cast(c1 as date) from t1;
 +++
 | ok |  summary   |
 +++
 | true   | View 'v1' created successfully in 'dfs.aggregation' schema |
 +++
 1 row selected (0.096 seconds)
 {code}
 IS_NULLABLE says 'NO', which is incorrect.
 {code}
 0: jdbc:drill:schema=dfs describe v1;
 +-++-+
 | COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
 +-++-+
 | EXPR$0  | INTEGER| NO  |
 | EXPR$1  | VARCHAR| NO  |
 | EXPR$2  | DATE   | NO  |
 +-++-+
 3 rows selected (0.067 seconds)
 {code}
 It is dangerous potentially, because if Calcite decided to take advantage 
 over this property tomorrow and create an optimization where if column is not 
 nullable is null predicate can be dropped, query : select * from v1 where 
 x is null would return incorrect result.
 {code}
 0: jdbc:drill:schema=dfs explain plan for select * from v1 where z is null;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(x=[$0], y=[$1], z=[$2])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[IS NULL($2)])
 00-04Project(x=[CAST($2):ANY NOT NULL], y=[CAST($1):ANY NOT 
 NULL], z=[CAST($0):ANY NOT NULL])
 00-05  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/aggregation/t1]], 
 selectionRoot=/aggregation/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]])
 {code}
 It seems to me that in views column properties should be always nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2514) Add support for impersonation in FileSystem storage plugin

2015-03-20 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2514:
--

 Summary: Add support for impersonation in FileSystem storage plugin
 Key: DRILL-2514
 URL: https://issues.apache.org/jira/browse/DRILL-2514
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Flow
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.9.0


Subtask under DRILL-2363 to add support for impersonation (including chained 
impersonation) for FileSystem storage plugin. Please see the design document 
mentioned in umbrella JIRA DRILL-2363 for design details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2210) Allow multithreaded copy and/or flush in PartitionSender

2015-03-18 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2210.

Resolution: Fixed

Fixed in 
[49d316a|https://github.com/apache/drill/commit/49d316a1cb22f79061e246b5e197547dac730232].

 Allow multithreaded copy and/or flush in PartitionSender
 

 Key: DRILL-2210
 URL: https://issues.apache.org/jira/browse/DRILL-2210
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Reporter: Yuliya Feldman
Assignee: Yuliya Feldman
 Fix For: 0.8.0


 Related to DRILL-133. As in LocalExchange we merge data from multiple 
 receivers into LocalExchange to fan it out later to multiple Senders, amount 
 of data that needs to be sent out increases. Add ability to copy/flush data 
 in multiple threads



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2475) Handle IterOutcome.NONE correctly in operators

2015-03-16 Thread Venki Korukanti (JIRA)

Venki Korukanti created DRILL-2475:
--

 Summary: Handle IterOutcome.NONE correctly in operators
 Key: DRILL-2475
 URL: https://issues.apache.org/jira/browse/DRILL-2475
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 0.8.0
Reporter: Venki Korukanti
Assignee: Chris Westin
 Fix For: 0.9.0


Currently not all operators are handling the NONE (with no OK_NEW_SCHEMA) 
correctly. This JIRA is to go through the operators and check if it handling 
the NONE correctly or not and modify accordingly.

(from DRILL-2453)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 111 matches

Mail list logo