[jira] [Updated] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-14 Thread Nitin Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitin Sharma updated DRILL-6853:

Affects Version/s: 1.14.0

> Parquet Complex Reader for nested schema should have configurable memory or 
> max records to fetch
> 
>
> Key: DRILL-6853
> URL: https://issues.apache.org/jira/browse/DRILL-6853
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Nitin Sharma
>Priority: Major
>
> Parquet Complex reader while fetching nested schema should have configurable 
> memory or max records to fetch and not default to 4000 records.
> While scanning TB of data with wider columns, this could easily cause OOM 
> issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6854) Profiles page should provide more insights on parquet statistics for complex reader

2018-11-14 Thread Nitin Sharma (JIRA)
Nitin Sharma created DRILL-6854:
---

 Summary: Profiles page should provide more insights on parquet 
statistics for complex reader
 Key: DRILL-6854
 URL: https://issues.apache.org/jira/browse/DRILL-6854
 Project: Apache Drill
  Issue Type: Bug
Reporter: Nitin Sharma


Profiles page should provide more insights on parquet statistics for complex 
reader.

E.g. For plain reader the operator metrics are good. For complex reader, 
operator metrics are always empty. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-14 Thread Nitin Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687560#comment-16687560
 ] 

Nitin Sharma commented on DRILL-6853:
-

[~vitalii] [~sachouche] Filing as per our discussion earlier today

> Parquet Complex Reader for nested schema should have configurable memory or 
> max records to fetch
> 
>
> Key: DRILL-6853
> URL: https://issues.apache.org/jira/browse/DRILL-6853
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Nitin Sharma
>Priority: Major
>
> Parquet Complex reader while fetching nested schema should have configurable 
> memory or max records to fetch and not default to 4000 records.
> While scanning TB of data with wider columns, this could easily cause OOM 
> issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-14 Thread Nitin Sharma (JIRA)
Nitin Sharma created DRILL-6853:
---

 Summary: Parquet Complex Reader for nested schema should have 
configurable memory or max records to fetch
 Key: DRILL-6853
 URL: https://issues.apache.org/jira/browse/DRILL-6853
 Project: Apache Drill
  Issue Type: Bug
Reporter: Nitin Sharma


Parquet Complex reader while fetching nested schema should have configurable 
memory or max records to fetch and not default to 4000 records.

While scanning TB of data with wider columns, this could easily cause OOM 
issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-11-14 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687412#comment-16687412
 ] 

Kunal Khatua commented on DRILL-6668:
-

[~paul-rogers]
Prototype... mouse over {{Default}} button prompts with the message
 !screenshot-1.png! 

If the value is already default, the {{Default}} button is disabled.

Let me know if this is sufficient and I'll open a PR

> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
> Attachments: screenshot-1.png
>
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (system) 200 \[Update]
> (The Web UI does not, I believe, show session settings, since the Web UI has 
> no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
> Otherwise, we could also have a "(session)" suffix above.)
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-11-14 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6668:

Attachment: screenshot-1.png

> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
> Attachments: screenshot-1.png
>
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (system) 200 \[Update]
> (The Web UI does not, I believe, show session settings, since the Web UI has 
> no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
> Otherwise, we could also have a "(session)" suffix above.)
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-11-14 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687290#comment-16687290
 ] 

Kunal Khatua commented on DRILL-6668:
-

I think a *DEFAULT* button with the value itself displayed (instead of the 
button label {{[DEFAULT]}} or {{[RESET]}} ) might be an option to avoid the 
confusion.
[~paul-rogers] which do you prefer?

> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (system) 200 \[Update]
> (The Web UI does not, I believe, show session settings, since the Web UI has 
> no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
> Otherwise, we could also have a "(session)" suffix above.)
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687287#comment-16687287
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

kkhatua commented on issue #1539: DRILL-6847: Add Query Metadata to RESTful 
Interface
URL: https://github.com/apache/drill/pull/1539#issuecomment-438860831
 
 
   @cgivre  I didn't build and try this out, but I'm curious on how we manage 
for `select * from` queries. Especially with schema changes... like fields 
being added or dropped between rows.
   Also, I agree with @arina-ielchiieva 's point of not repeating the field 
names (unless schema change requires it). But I'm not sure how badly we break 
backward compatibility (perhaps carry a `version` in the REST response?). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-11-14 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6668:

Fix Version/s: 1.16.0

> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (system) 200 \[Update]
> (The Web UI does not, I believe, show session settings, since the Web UI has 
> no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
> Otherwise, we could also have a "(session)" suffix above.)
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-11-14 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-6668:
---

Assignee: Kunal Khatua

> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (system) 200 \[Update]
> (The Web UI does not, I believe, show session settings, since the Web UI has 
> no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
> Otherwise, we could also have a "(session)" suffix above.)
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6851) Create Drill Operator for Kubernetes

2018-11-14 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687269#comment-16687269
 ] 

Paul Rogers commented on DRILL-6851:


[~agirish], my comment refers to using K8s to run a Drill cluster. (The work 
you've released this far is for an embedded Drill.)

For a Drill cluster, the operator should:

* Provide an API to add/remove Drillbits.
* Provide an API to monitor the cluster.
* Monitor Drillbits, restarting any Drillbit that fails.

K8s provides stateful sets for simple clusters, but once you create an 
operator, you need to manage things like pod count, pod restart, etc.

The point here is simply that DoY already uses ZK to monitor Drill cluster 
health. Already provides an API and UI for controlling the cluster.

Of course, it does so in a YARN-like manner; some revision is needed to be 
K8s-like.

> Create Drill Operator for Kubernetes
> 
>
> Key: DRILL-6851
> URL: https://issues.apache.org/jira/browse/DRILL-6851
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>
> This task is to track creating an initial version of the Drill Operator for 
> Kubernetes. I'll shortly update the JIRA on background, details on Operator, 
> and what's planned for the first version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6851) Create Drill Operator for Kubernetes

2018-11-14 Thread Abhishek Girish (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687253#comment-16687253
 ] 

Abhishek Girish commented on DRILL-6851:


[~Paul.Rogers], I've already begun work in Go, so it's a challenge to borrow 
stuff from DoY. You can take a look at the early implementation here: 
https://github.com/Agirish/drill-operator

I already have Drill running in distributed mode on K8S using YAML def & also 
with Helm. This supports Drill in both embedded & distributed modes. That work 
can be accessed here: https://github.com/Agirish/drill-containers
I'm working on translating this into GO for the Drill Operator. The reason 
being that the operator would provide for much more flexibility & Drill 
specific customization. 

Also, current plan is to use mostly K8S for cluster management. I'm using the 
Operator SDK, which is pretty good.

> Create Drill Operator for Kubernetes
> 
>
> Key: DRILL-6851
> URL: https://issues.apache.org/jira/browse/DRILL-6851
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>
> This task is to track creating an initial version of the Drill Operator for 
> Kubernetes. I'll shortly update the JIRA on background, details on Operator, 
> and what's planned for the first version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6851) Create Drill Operator for Kubernetes

2018-11-14 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687098#comment-16687098
 ] 

Paul Rogers commented on DRILL-6851:


Needless to say, we should borrow liberally from Drill-on-YARN. This code 
includes YARN integration (to be replaced by K8s integration here). DoY also 
includes a complete Drill cluster management state machine and UI; features 
that we'd want to preserve in Drill-on-K8s (DoK).

The challenge is that K8s operators are often implemented in Go. Might make 
sense to do DoK in Java so we can leverage our existing code.

> Create Drill Operator for Kubernetes
> 
>
> Key: DRILL-6851
> URL: https://issues.apache.org/jira/browse/DRILL-6851
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>
> This task is to track creating an initial version of the Drill Operator for 
> Kubernetes. I'll shortly update the JIRA on background, details on Operator, 
> and what's planned for the first version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6833) MapRDB queries with Run Time Filters with row_key/Secondary Index Should Support Pushdown

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6833:

Reviewer: Aman Sinha

> MapRDB queries with Run Time Filters with row_key/Secondary Index Should 
> Support Pushdown
> -
>
> Key: DRILL-6833
> URL: https://issues.apache.org/jira/browse/DRILL-6833
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.15.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Drill should push down all row key filters to maprDB for queries that only 
> have WHERE conditions on row_keys. In the following example, the query only 
> has where clause on row_keys:
> select t.mscIdentities from dfs.root.`/user/mapr/MixTable` t where t.row_key=
> (select max(convert_fromutf8(i.KeyA.ENTRY_KEY)) from 
> dfs.root.`/user/mapr/TableIMSI` i where i.row_key='460021050005636')
> row_keys can return at most 1 row. So the physical planning must leverage 
> MapRDB row_key push down to execute the sub query, with its results execute 
> the outer query. Currently only the inner query is pushed down. The outer 
> query requires a table scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6852) Adapt current Parquet Metadata cache implementation to use Drill Metastore API

2018-11-14 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6852:
--

 Summary: Adapt current Parquet Metadata cache implementation to 
use Drill Metastore API
 Key: DRILL-6852
 URL: https://issues.apache.org/jira/browse/DRILL-6852
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi


According to the design document for DRILL-6552, existing metadata cache API 
should be adapted to use generalized API for metastore and parquet metadata 
cache will be presented as the implementation of metastore API.

The aim of this Jira is to refactor Parquet Metadata cache implementation and 
adapt it to use Drill Metastore API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6852) Adapt current Parquet Metadata cache implementation to use Drill Metastore API

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6852:

Fix Version/s: 1.16.0

> Adapt current Parquet Metadata cache implementation to use Drill Metastore API
> --
>
> Key: DRILL-6852
> URL: https://issues.apache.org/jira/browse/DRILL-6852
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> According to the design document for DRILL-6552, existing metadata cache API 
> should be adapted to use generalized API for metastore and parquet metadata 
> cache will be presented as the implementation of metastore API.
> The aim of this Jira is to refactor Parquet Metadata cache implementation and 
> adapt it to use Drill Metastore API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6851) Create Drill Operator for Kubernetes

2018-11-14 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-6851:
--

 Summary: Create Drill Operator for Kubernetes
 Key: DRILL-6851
 URL: https://issues.apache.org/jira/browse/DRILL-6851
 Project: Apache Drill
  Issue Type: Task
Reporter: Abhishek Girish
Assignee: Abhishek Girish


This task is to track creating an initial version of the Drill Operator for 
Kubernetes. I'll shortly update the JIRA on background, details on Operator, 
and what's planned for the first version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6830) Hook.REL_BUILDER_SIMPLIFY handler didn't removed cause performance degression

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686818#comment-16686818
 ] 

ASF GitHub Bot commented on DRILL-6830:
---

vvysotskyi commented on issue #1524: DRILL-6830: Remove 
Hook.REL_BUILDER_SIMPLIFY handler after use
URL: https://github.com/apache/drill/pull/1524#issuecomment-438724649
 
 
   @lushuifeng, you are right that the problem connected with 
`TestCaseNullableTypes#testCaseNullableTypesVarchar` is in Calcite. The initial 
goal for adding `Hook.REL_BUILDER_SIMPLIFY.add(Hook.propertyJ(false));` was to 
avoid failures connected with treating empty strings as null feature, but looks 
like it was fixed in another place.
   
   The interesting thing is that problem connected with 
`TestCaseNullableTypes#testCaseNullableTypesVarchar` failure was fixed after 
1.17 version, so when Drill is rebased onto Calcite 1.18, 
`Hook.REL_BUILDER_SIMPLIFY.add(Hook.propertyJ(false));` may be removed.
   
   @ihuzenko, since you are working on Calcite rebase, could you please also 
take care of it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hook.REL_BUILDER_SIMPLIFY handler didn't removed cause performance degression
> -
>
> Key: DRILL-6830
> URL: https://issues.apache.org/jira/browse/DRILL-6830
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: shuifeng lu
>Assignee: shuifeng lu
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Screen Shot 2018-11-06 at 16.14.16.png
>
>
> Planning performance degression has been observed that the duration of 
> planning increased from 30ms to 160ms after running drill a long period of 
> time(say a month).
> RelBuilder.simplify never becomes true if Hook.REL_BUILDER_SIMPLIFY handlers 
> are not removed.
> Here is some clue (after running 40 days):
> Hook.get takes 8ms per-invocation, it may be called serveral times per query.
>  ---[8.816063ms] org.apache.calcite.tools.RelBuilder:()
>    +---[0.020218ms] java.util.ArrayDeque:()
>    +---[0.018493ms] java.lang.Boolean:valueOf()
>    +---[8.341566ms] org.apache.calcite.runtime.Hook:get()
>    +---[0.008489ms] java.lang.Boolean:booleanValue()
>    +---[min=5.21E-4ms,max=0.015832ms,total=0.025233ms,count=12] 
> org.apache.calcite.plan.Context:unwrap()
>    +---[min=3.83E-4ms,max=0.009494ms,total=0.014516ms,count=13] 
> org.apache.calcite.util.Util:first()
>    +---[0.006892ms] org.apache.calcite.plan.RelOptCluster:getPlanner()
>    +---[0.009104ms] org.apache.calcite.plan.RelOptPlanner:getExecutor()
>    +---[min=4.8E-4ms,max=0.002277ms,total=0.002757ms,count=2] 
> org.apache.calcite.plan.RelOptCluster:getRexBuilder()
>    ---[min=4.91E-4ms,max=0.004586ms,total=0.005077ms,count=2] 
> org.apache.calcite.rex.RexSimplify:()
> The top instances in JVM
> num   #instances     #bytes           class name
> --
>  1:       116333      116250440     [B
>  2:       890126      105084536    [C
>  3:       338062        37415944    [Ljava.lang.Object;
>  4:     1715004        27440064    org.apache.calcite.runtime.Hook$4
>  5:      803909         19293816    java.lang.String
> !Screen Shot 2018-11-06 at 16.14.16.png!  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6850) JDBC integration tests failures

2018-11-14 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6850:
---
Description: 
The following command will run Drill integratiom tests for RDBMS (Derby and 
MySQL):
_mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc_

Currently some drill/exec/store/jdbc TestJdbcPluginWithDerbyIT and 
TestJdbcPluginWithMySQLIT tests fail:
{code}
Results :

Failed tests: 
  TestJdbcPluginWithDerbyIT.showTablesDefaultSchema:117 expected:<1> but was:<0>
Tests in error: 
  TestJdbcPluginWithDerbyIT.describe » UserRemote VALIDATION ERROR: Unknown 
tabl...
  
TestJdbcPluginWithDerbyIT.pushdownDoubleJoinAndFilter:111->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:89->PlanTestBase.getPlanInString:369->BaseTestQuery.testSqlWithResults:322->BaseTestQuery.testRunAndReturn:341
 » Rpc
  TestJdbcPluginWithDerbyIT.testCrossSourceMultiFragmentJoin » UserRemote 
VALIDA...
  TestJdbcPluginWithDerbyIT.validateResult:71 »  at position 0 column 
'`NUMERIC_...
  TestJdbcPluginWithMySQLIT.validateResult:108 »  at position 0 column 
'`numeric...

Tests run: 14, Failures: 1, Errors: 5, Skipped: 0
{code} 

Most likely these are old regressions.

Additionally NPE for empty result is resolved:
http://drill.apache.org/blog/2018/08/05/drill-1.14-released/#comment-4082559169

  was:
The following command will run Drill integratiom tests for RDBMS (Derby and 
MySQL):
_mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc_

Currently some drill/exec/store/jdbc TestJdbcPluginWithDerbyIT and 
TestJdbcPluginWithMySQLIT tests fail:
{code}
Results :

Failed tests: 
  TestJdbcPluginWithDerbyIT.showTablesDefaultSchema:117 expected:<1> but was:<0>
Tests in error: 
  TestJdbcPluginWithDerbyIT.describe » UserRemote VALIDATION ERROR: Unknown 
tabl...
  
TestJdbcPluginWithDerbyIT.pushdownDoubleJoinAndFilter:111->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:89->PlanTestBase.getPlanInString:369->BaseTestQuery.testSqlWithResults:322->BaseTestQuery.testRunAndReturn:341
 » Rpc
  TestJdbcPluginWithDerbyIT.testCrossSourceMultiFragmentJoin » UserRemote 
VALIDA...
  TestJdbcPluginWithDerbyIT.validateResult:71 »  at position 0 column 
'`NUMERIC_...
  TestJdbcPluginWithMySQLIT.validateResult:108 »  at position 0 column 
'`numeric...

Tests run: 14, Failures: 1, Errors: 5, Skipped: 0
{code} 

Most likely these are old regressions.


> JDBC integration tests failures
> ---
>
> Key: DRILL-6850
> URL: https://issues.apache.org/jira/browse/DRILL-6850
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Priority: Major
> Fix For: 1.15.0
>
>
> The following command will run Drill integratiom tests for RDBMS (Derby and 
> MySQL):
> _mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc_
> Currently some drill/exec/store/jdbc TestJdbcPluginWithDerbyIT and 
> TestJdbcPluginWithMySQLIT tests fail:
> {code}
> Results :
> Failed tests: 
>   TestJdbcPluginWithDerbyIT.showTablesDefaultSchema:117 expected:<1> but 
> was:<0>
> Tests in error: 
>   TestJdbcPluginWithDerbyIT.describe » UserRemote VALIDATION ERROR: Unknown 
> tabl...
>   
> TestJdbcPluginWithDerbyIT.pushdownDoubleJoinAndFilter:111->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:89->PlanTestBase.getPlanInString:369->BaseTestQuery.testSqlWithResults:322->BaseTestQuery.testRunAndReturn:341
>  » Rpc
>   TestJdbcPluginWithDerbyIT.testCrossSourceMultiFragmentJoin » UserRemote 
> VALIDA...
>   TestJdbcPluginWithDerbyIT.validateResult:71 »  at position 0 column 
> '`NUMERIC_...
>   TestJdbcPluginWithMySQLIT.validateResult:108 »  at position 0 column 
> '`numeric...
> Tests run: 14, Failures: 1, Errors: 5, Skipped: 0
> {code} 
> Most likely these are old regressions.
> Additionally NPE for empty result is resolved:
> http://drill.apache.org/blog/2018/08/05/drill-1.14-released/#comment-4082559169



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 


> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: 

[jira] [Updated] (DRILL-540) Allow querying hive views in Drill

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-540:
---
Summary: Allow querying hive views in Drill  (was: Allow querying hive 
views in drill)

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}

TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 


> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive 

[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}

TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 


> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> *Suggested 

[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 


> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> *Suggested 

[jira] [Updated] (DRILL-6850) JDBC integration tests failures

2018-11-14 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6850:
---
Summary: JDBC integration tests failures  (was: JDBC integration tests)

> JDBC integration tests failures
> ---
>
> Key: DRILL-6850
> URL: https://issues.apache.org/jira/browse/DRILL-6850
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Priority: Major
> Fix For: 1.15.0
>
>
> The following command will run Drill integratiom tests for RDBMS (Derby and 
> MySQL):
> _mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc_
> Currently some drill/exec/store/jdbc TestJdbcPluginWithDerbyIT and 
> TestJdbcPluginWithMySQLIT tests fail:
> {code}
> Results :
> Failed tests: 
>   TestJdbcPluginWithDerbyIT.showTablesDefaultSchema:117 expected:<1> but 
> was:<0>
> Tests in error: 
>   TestJdbcPluginWithDerbyIT.describe » UserRemote VALIDATION ERROR: Unknown 
> tabl...
>   
> TestJdbcPluginWithDerbyIT.pushdownDoubleJoinAndFilter:111->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:89->PlanTestBase.getPlanInString:369->BaseTestQuery.testSqlWithResults:322->BaseTestQuery.testRunAndReturn:341
>  » Rpc
>   TestJdbcPluginWithDerbyIT.testCrossSourceMultiFragmentJoin » UserRemote 
> VALIDA...
>   TestJdbcPluginWithDerbyIT.validateResult:71 »  at position 0 column 
> '`NUMERIC_...
>   TestJdbcPluginWithMySQLIT.validateResult:108 »  at position 0 column 
> '`numeric...
> Tests run: 14, Failures: 1, Errors: 5, Skipped: 0
> {code} 
> Most likely these are old regressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 


> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> *Suggested 

[jira] [Created] (DRILL-6850) JDBC integration tests

2018-11-14 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6850:
--

 Summary: JDBC integration tests
 Key: DRILL-6850
 URL: https://issues.apache.org/jira/browse/DRILL-6850
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JDBC
Affects Versions: 1.14.0
Reporter: Vitalii Diravka
 Fix For: 1.15.0


The following command will run Drill integratiom tests for RDBMS (Derby and 
MySQL):
_mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc_

Currently some drill/exec/store/jdbc TestJdbcPluginWithDerbyIT and 
TestJdbcPluginWithMySQLIT tests fail:
{code}
Results :

Failed tests: 
  TestJdbcPluginWithDerbyIT.showTablesDefaultSchema:117 expected:<1> but was:<0>
Tests in error: 
  TestJdbcPluginWithDerbyIT.describe » UserRemote VALIDATION ERROR: Unknown 
tabl...
  
TestJdbcPluginWithDerbyIT.pushdownDoubleJoinAndFilter:111->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:89->PlanTestBase.getPlanInString:369->BaseTestQuery.testSqlWithResults:322->BaseTestQuery.testRunAndReturn:341
 » Rpc
  TestJdbcPluginWithDerbyIT.testCrossSourceMultiFragmentJoin » UserRemote 
VALIDA...
  TestJdbcPluginWithDerbyIT.validateResult:71 »  at position 0 column 
'`NUMERIC_...
  TestJdbcPluginWithMySQLIT.validateResult:108 »  at position 0 column 
'`numeric...

Tests run: 14, Failures: 1, Errors: 5, Skipped: 0
{code} 

Most likely these are old regressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.



> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> *Suggested approach*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }{noformat}
>         Later drill parses the metadata and uses it to treat view names in 
> SQL as a subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |{noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views I'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both 

[jira] [Updated] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6744:
---
Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.15.0
>
>
> Since now Drill is using Apache Parquet 1.10.0 where issue with incorrectly 
> stored varchar / decimal min / max statistics is resolved, we should add 
> support for varchar / decimal filter push down. Only files created with 
> parquet lib 1.9.1 (1.10.0)) and later will be subjected to push down. In 
> cases if user knows that prior created files have correct min / max 
> statistics (i.e. user exactly knows that data in binary columns in ASCII (not 
> UTF-8)) than parquet.strings.signed-min-max.enabled can be set to true to 
> enable filter push down.
> *Description*
> _Note: Drill is using Parquet 1.10.0 library since 1.13.0 version._
> *Varchar Partition Pruning*
> Varchar Pruning will work for files generated prior and after Parquet 1.10.0 
> version, since to enable partition pruning both min and max values should be 
> the same and there are no issues with incorrectly stored statistics for 
> binary data for the same min and max values. Partition pruning using Drill 
> metadata files will also work, no matter when metadata file was created 
> (prior or after Drill 1.15.0).
> Partition pruning won't work for files where partition is null due to 
> PARQUET-1341, issue will be fixed in Parquet 1.11.0.
> *Varchar Filter Push Down*
> Varchar filter push down will work for parquet files created with Parquet 
> 1.10.0 and later.
> There are two options how to enable push down for files generated with prior 
> Parquet versions, when user exactly knows that binary data is in ASCII (not 
> UTF-8):
> 1. set configuration {{enableStringsSignedMinMax}} to true (false by default) 
> for parquet format plugin: 
> {noformat}
> "parquet" : {
>   type: "parquet",
>   enableStringsSignedMinMax: true 
> }
> {noformat}
> This would apply to all parquet files of a given file plugin, including all 
> workspaces.
> 2. If user wants to enable / disable allowing reading binary statistics for 
> old parquet files per session, session option 
> {{store.parquet.reader.strings_signed_min_max}} can be used. By default, it 
> has empty string value. Setting such option will take priority over config in 
> parquet format plugin. Option allows three values: 'true', 'false', '' (empty 
> string).
> _Note: store.parquet.reader.strings_signed_min_max also can be set at system 
> level, thus it will apply to all parquet files in the system._
> The same config / session option will apply to allow reading binary 
> statistics from Drill metadata files generated prior to Drill 1.15.0. If 
> Drill metadata file was created prior to  Drill 1.15.0 but for parquet files 
> created with Parquet library 1.10.0 and later, user would have to enable 
> config / session option or regenerate Drill metadata file with Drill 1.15.0 
> or later, because from the metadata file we don't know if statistics is 
> stored correctly (prior Drill was writing reading and writing binary 
> statistics by default though did not use it).
> When creating Drill metadata file with Drill 1.15.0 and later for old parquet 
> files, user should mind config / session option. If strings_signed_min_max is 
> enabled,  Drill will store in the Drill metadata file binary statistics but 
> since metadata file was created with Drill 1.15.0 and later, Drill would read 
> it back disregarding the option (assuming that if statistics is present in 
> the Drill metadata file, it is correct). If user mistakenly enabled 
> strings_signed_min_max, he needs to disable it and regenerated Drill metadata 
> file. The same is in the opposite way, if user created metadata file when 
> strings_signed_min_max was disabled, no min / max values for binary 
> statistics will be written and thus read back, even if during reading the 
> metadata strings_signed_min_max is enabled.
> *Decimal Partition Pruning*
> Decimal values can be represented in four logical types: int_32, int_64, 
> fixed_len_byte_array and binary.
> Partition pruning will work for all  logical types for old and new decimal 
> files, i.e. created with Parquet 1.10.0, prior and after. Partition pruning 
> won't work for files with null partition due to PARQUET-1341 which will be 
> fixed in Parquet 1.11.0.
> 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686540#comment-16686540
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233458212
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestSchemaSmoothing.java
 ##
 @@ -0,0 +1,681 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.protocol.SchemaTracker;
+import org.apache.drill.exec.physical.impl.scan.ScanTestUtils;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import 
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple.ResolvedRow;
+import org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection;
+import org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator.ReaderSchemaOrchestrator;
+import org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection;
+import 
org.apache.drill.exec.physical.impl.scan.project.WildcardSchemaProjection;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.impl.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests schema smoothing at the schema projection level.
+ * This level handles reusing prior types when filling null
+ * values. But, because no actual vectors are involved, it
+ * does not handle the schema chosen for a table ahead of
+ * time, only the schema as it is merged with prior schema to
+ * detect missing columns.
+ * 
+ * Focuses on the SmoothingProjection class itself.
+ * 
+ * Note that, at present, schema smoothing does not work for entire
+ * maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}}
+ * and file 2 has, say, {a: null}, then schema smoothing does
+ * not currently know how to recreate the map. The same is true of
+ * lists and unions. Handling such cases is complex and is probably
+ * better handled via a system that allows the user to specify their
+ * intent by providing a schema to apply to the two files.
+ */
+
+public class TestSchemaSmoothing extends SubOperatorTest {
+
+  /**
+   * Low-level test of the smoothing projection, including the exceptions
+   * it throws when things are not going its way.
+   */
+
+  @Test
+  public void testSmoothingProjection() {
+final ScanLevelProjection scanProj = new ScanLevelProjection(
+RowSetTestUtils.projectAll(),
+ScanTestUtils.parsers());
+
+// Table 1: (a: nullable bigint, b)
+
+final TupleMetadata schema1 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.addNullable("b", MinorType.VARCHAR)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+ResolvedRow priorSchema;
+{
+  final NullColumnBuilder builder = new NullColumnBuilder(null, false);
+  final ResolvedRow rootTuple = new ResolvedRow(builder);
+  new WildcardSchemaProjection(
+  scanProj, schema1, rootTuple,
+  ScanTestUtils.resolvers());
+  priorSchema = rootTuple;
+}
+
+// Table 2: (a: 

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686559#comment-16686559
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

arina-ielchiieva commented on issue #1537: DRILL-6744: Support varchar and 
decimal push down
URL: https://github.com/apache/drill/pull/1537#issuecomment-438675122
 
 
   @vvysotskyi addressed code review comments:
   1. Used VersionUtil from Hadoop lib.
   2. Made ParquetReaderConfig immutable.
   3. Added trace login instead of removing default in switch.
   4. Made other changes as requested.
   
   Thanks for the code review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Since now Drill is using Apache Parquet 1.10.0 where issue with incorrectly 
> stored varchar / decimal min / max statistics is resolved, we should add 
> support for varchar / decimal filter push down. Only files created with 
> parquet lib 1.9.1 (1.10.0)) and later will be subjected to push down. In 
> cases if user knows that prior created files have correct min / max 
> statistics (i.e. user exactly knows that data in binary columns in ASCII (not 
> UTF-8)) than parquet.strings.signed-min-max.enabled can be set to true to 
> enable filter push down.
> *Description*
> _Note: Drill is using Parquet 1.10.0 library since 1.13.0 version._
> *Varchar Partition Pruning*
> Varchar Pruning will work for files generated prior and after Parquet 1.10.0 
> version, since to enable partition pruning both min and max values should be 
> the same and there are no issues with incorrectly stored statistics for 
> binary data for the same min and max values. Partition pruning using Drill 
> metadata files will also work, no matter when metadata file was created 
> (prior or after Drill 1.15.0).
> Partition pruning won't work for files where partition is null due to 
> PARQUET-1341, issue will be fixed in Parquet 1.11.0.
> *Varchar Filter Push Down*
> Varchar filter push down will work for parquet files created with Parquet 
> 1.10.0 and later.
> There are two options how to enable push down for files generated with prior 
> Parquet versions, when user exactly knows that binary data is in ASCII (not 
> UTF-8):
> 1. set configuration {{enableStringsSignedMinMax}} to true (false by default) 
> for parquet format plugin: 
> {noformat}
> "parquet" : {
>   type: "parquet",
>   enableStringsSignedMinMax: true 
> }
> {noformat}
> This would apply to all parquet files of a given file plugin, including all 
> workspaces.
> 2. If user wants to enable / disable allowing reading binary statistics for 
> old parquet files per session, session option 
> {{store.parquet.reader.strings_signed_min_max}} can be used. By default, it 
> has empty string value. Setting such option will take priority over config in 
> parquet format plugin. Option allows three values: 'true', 'false', '' (empty 
> string).
> _Note: store.parquet.reader.strings_signed_min_max also can be set at system 
> level, thus it will apply to all parquet files in the system._
> The same config / session option will apply to allow reading binary 
> statistics from Drill metadata files generated prior to Drill 1.15.0. If 
> Drill metadata file was created prior to  Drill 1.15.0 but for parquet files 
> created with Parquet library 1.10.0 and later, user would have to enable 
> config / session option or regenerate Drill metadata file with Drill 1.15.0 
> or later, because from the metadata file we don't know if statistics is 
> stored correctly (prior Drill was writing reading and writing binary 
> statistics by default though did not use it).
> When creating Drill metadata file with Drill 1.15.0 and later for old parquet 
> files, user should mind config / session option. If strings_signed_min_max is 
> enabled,  Drill will store in the Drill metadata file binary statistics but 
> since metadata file was created with Drill 1.15.0 and later, Drill would read 
> it back disregarding the option (assuming that if statistics is present in 
> the Drill metadata file, it is correct). If user mistakenly enabled 
> strings_signed_min_max, he needs to disable it and 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686537#comment-16686537
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454765
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+public class ScanTestUtils {
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
+   */
+
+  public static List parsers(ScanProjectionParser... 
parsers) {
+return ImmutableList.copyOf(parsers);
 
 Review comment:
   Consider using java built-in utils rather than guava: here and below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686536#comment-16686536
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233453077
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedTuple.java
 ##
 @@ -0,0 +1,427 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Drill rows are made up of a tree of tuples, with the row being the root
+ * tuple. Each tuple contains columns, some of which may be maps. This
+ * class represents each row or map in the output projection.
+ * 
+ * Output columns within the tuple can be projected from the data source,
+ * might be null (requested columns that don't match a data source column)
+ * or might be a constant (such as an implicit column.) This class
+ * orchestrates assembling an output tuple from a collection of these
+ * three column types. (Though implicit columns appear only in the root
+ * tuple.)
+ *
+ * Null Handling
+ *
+ * The project list might reference a "missing" map if the project list
+ * includes, say, SELECT a.b.c but `a` does not exist
+ * in the data source. In this case, the column a is implied to be a map,
+ * so the projection mechanism will create a null map for `a`
+ * and `b`, and will create a null column for `c`.
+ * 
+ * To accomplish this recursive null processing, each tuple is associated
+ * with a null builder. (The null builder can be null if projection is
+ * implicit with a wildcard; in such a case no null columns can occur.
+ * But, even here, with schema persistence, a SELECT * query
+ * may need null columns if a second file does not contain a column
+ * that appeared in a first file.)
+ * 
+ * The null builder is bound to each tuple to allow vector persistence
+ * via the result vector cache. If we must create a null column
+ * `x` in two different readers, then the rules of Drill
+ * require that the same vector be used for both (or else a schema
+ * change is signaled.) The vector cache works by name (and type).
+ * Since maps may contain columns with the same names as other maps,
+ * the vector cache must be associated with each tuple. And, by extension,
+ * the null builder must also be associated with each tuple.
+ *
+ * Lifecycle
+ *
+ * The lifecycle of a resolved tuple is:
+ * 
+ * The projection mechanism creates the output tuple, and its columns,
+ * by comparing the project list against the table schema. The result is
+ * a set of table, null, or constant columns.
+ * Once per schema change, the resolved tuple creates the output
+ * tuple by linking to vectors in their original locations. As it turns out,
+ * we can simply share the vectors; we don't need to transfer the buffers.
+ * To prepare for the transfer, the tuple asks the null column builder
+ * (if present) to build the required null columns.
+ * Once the output tuple is built, it can be used for any number of
+ * batches without further work. (The same vectors appear in the various inputs
+ * and the output, eliminating the need for any transfers.)
+ * Once per batch, the client must set the row count. This is needed for 
the
+ * output container, and for any "null" maps 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686539#comment-16686539
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233455585
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestNullColumnLoader.java
 ##
 @@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertSame;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnLoader;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedNullColumn;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.physical.rowSet.impl.NullResultVectorCacheImpl;
+import org.apache.drill.exec.physical.rowSet.impl.ResultVectorCacheImpl;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.Test;
+
+/**
+ * Test the mechanism that handles all-null columns during projection.
+ * An all-null column is one projected in the query, but which does
+ * not actually exist in the underlying data source (or input
+ * operator.)
+ * 
+ * In anticipation of having type information, this mechanism
+ * can create the classic nullable Int null column, or one of
+ * any other type and mode.
+ */
+
+public class TestNullColumnLoader extends SubOperatorTest {
+
+  private ResolvedNullColumn makeNullCol(String name, MajorType nullType) {
+
+// For this test, we don't need the projection, so just
+// set it to null.
+
+return new ResolvedNullColumn(name, nullType, null, 0);
+  }
+
+  private ResolvedNullColumn makeNullCol(String name) {
+return makeNullCol(name, null);
+  }
+
+  /**
+   * Test the simplest case: default null type, nothing in the vector
+   * cache. Specify no column type, the special NULL type, or a
+   * predefined type. Output types should be set accordingly.
+   */
+
+  @Test
+  public void testBasics() {
+
+final List defns = new ArrayList<>();
+defns.add(makeNullCol("unspecified", null));
+defns.add(makeNullCol("nullType", Types.optional(MinorType.NULL)));
+defns.add(makeNullCol("specifiedOpt", Types.optional(MinorType.VARCHAR)));
+defns.add(makeNullCol("specifiedReq", Types.required(MinorType.VARCHAR)));
+defns.add(makeNullCol("specifiedArray", 
Types.repeated(MinorType.VARCHAR)));
+
+final ResultVectorCache cache = new 
NullResultVectorCacheImpl(fixture.allocator());
+final NullColumnLoader staticLoader = new NullColumnLoader(cache, defns, 
null, false);
+
+// Create a batch
+
+final VectorContainer output = staticLoader.load(2);
+
+// Verify values and types
+
+final BatchSchema expectedSchema = new SchemaBuilder()
+.add("unspecified", NullColumnLoader.DEFAULT_NULL_TYPE)
+.add("nullType", NullColumnLoader.DEFAULT_NULL_TYPE)
+.addNullable("specifiedOpt", MinorType.VARCHAR)
+.addNullable("specifiedReq", MinorType.VARCHAR)
+.addArray("specifiedArray", MinorType.VARCHAR)
+.build();
+final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema)
+.addRow(null, null, null, null, new String[] {})
+.addRow(null, null, null, null, new String[] {})
+.build();
+
+new 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686543#comment-16686543
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233452637
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedTuple.java
 ##
 @@ -0,0 +1,427 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Drill rows are made up of a tree of tuples, with the row being the root
+ * tuple. Each tuple contains columns, some of which may be maps. This
+ * class represents each row or map in the output projection.
+ * 
+ * Output columns within the tuple can be projected from the data source,
+ * might be null (requested columns that don't match a data source column)
+ * or might be a constant (such as an implicit column.) This class
+ * orchestrates assembling an output tuple from a collection of these
+ * three column types. (Though implicit columns appear only in the root
+ * tuple.)
+ *
+ * Null Handling
+ *
+ * The project list might reference a "missing" map if the project list
+ * includes, say, SELECT a.b.c but `a` does not exist
+ * in the data source. In this case, the column a is implied to be a map,
+ * so the projection mechanism will create a null map for `a`
+ * and `b`, and will create a null column for `c`.
+ * 
+ * To accomplish this recursive null processing, each tuple is associated
+ * with a null builder. (The null builder can be null if projection is
+ * implicit with a wildcard; in such a case no null columns can occur.
+ * But, even here, with schema persistence, a SELECT * query
+ * may need null columns if a second file does not contain a column
+ * that appeared in a first file.)
+ * 
+ * The null builder is bound to each tuple to allow vector persistence
+ * via the result vector cache. If we must create a null column
+ * `x` in two different readers, then the rules of Drill
+ * require that the same vector be used for both (or else a schema
+ * change is signaled.) The vector cache works by name (and type).
+ * Since maps may contain columns with the same names as other maps,
+ * the vector cache must be associated with each tuple. And, by extension,
+ * the null builder must also be associated with each tuple.
+ *
+ * Lifecycle
+ *
+ * The lifecycle of a resolved tuple is:
+ * 
+ * The projection mechanism creates the output tuple, and its columns,
+ * by comparing the project list against the table schema. The result is
+ * a set of table, null, or constant columns.
+ * Once per schema change, the resolved tuple creates the output
+ * tuple by linking to vectors in their original locations. As it turns out,
+ * we can simply share the vectors; we don't need to transfer the buffers.
+ * To prepare for the transfer, the tuple asks the null column builder
+ * (if present) to build the required null columns.
+ * Once the output tuple is built, it can be used for any number of
+ * batches without further work. (The same vectors appear in the various inputs
+ * and the output, eliminating the need for any transfers.)
+ * Once per batch, the client must set the row count. This is needed for 
the
+ * output container, and for any "null" maps 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686542#comment-16686542
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233451408
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedColumn.java
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * A resolved column has a name, and a specification for how to project
+ * data from a source vector to a vector in the final output container.
+ * Describes the projection of a single column from
+ * an input to an output batch.
+ * 
+ * Although the table schema mechanism uses the newer "metadata"
+ * mechanism, resolved columns revert back to the original
+ * {@link MajorType} and {@link MaterializedField} mechanism used
+ * by the rest of Drill. Doing so loses a bit of additional
+ * information, but at present there is no way to export that information
+ * along with a serialized record batch; each operator must rediscover
+ * it after deserialization.
+ */
+
+public abstract class ResolvedColumn implements ColumnProjection {
+
+  public final VectorSource source;
+  public final int sourceIndex;
+
+  public ResolvedColumn(VectorSource source, int sourceIndex) {
+this.source = source;
+this.sourceIndex = sourceIndex;
+  }
+
+  public VectorSource source() { return source; }
+
+  public int sourceIndex() { return sourceIndex; }
+
+  /**
+   * Return the type of this column. Used primarily by the schema smoothing
+   * mechanism.
+   *
+   * @return
 
 Review comment:
   Move description here to avoid warning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686535#comment-16686535
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454585
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+public class ScanTestUtils {
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
 
 Review comment:
   add description


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686544#comment-16686544
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454002
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * 
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * 
+ * An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.
+ * Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.
+ * 
+ * 
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * 
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * 
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple priorSchema,
+  ResolvedTuple outputTuple,
+  List resolvers) throws 
IncompatibleSchemaException {
+
+super(resolvers);
+
+for (ResolvedColumn priorCol : priorSchema.columns()) {
+  switch (priorCol.nodeType()) {
+  case ResolvedTableColumn.ID:
 
 Review comment:
   indent


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686541#comment-16686541
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233458346
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestSchemaSmoothing.java
 ##
 @@ -0,0 +1,681 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.protocol.SchemaTracker;
+import org.apache.drill.exec.physical.impl.scan.ScanTestUtils;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import 
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple.ResolvedRow;
+import org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection;
+import org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator.ReaderSchemaOrchestrator;
+import org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection;
+import 
org.apache.drill.exec.physical.impl.scan.project.WildcardSchemaProjection;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.impl.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests schema smoothing at the schema projection level.
+ * This level handles reusing prior types when filling null
+ * values. But, because no actual vectors are involved, it
+ * does not handle the schema chosen for a table ahead of
+ * time, only the schema as it is merged with prior schema to
+ * detect missing columns.
+ * 
+ * Focuses on the SmoothingProjection class itself.
+ * 
+ * Note that, at present, schema smoothing does not work for entire
+ * maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}}
+ * and file 2 has, say, {a: null}, then schema smoothing does
+ * not currently know how to recreate the map. The same is true of
+ * lists and unions. Handling such cases is complex and is probably
+ * better handled via a system that allows the user to specify their
+ * intent by providing a schema to apply to the two files.
+ */
+
+public class TestSchemaSmoothing extends SubOperatorTest {
+
+  /**
+   * Low-level test of the smoothing projection, including the exceptions
+   * it throws when things are not going its way.
+   */
+
+  @Test
+  public void testSmoothingProjection() {
+final ScanLevelProjection scanProj = new ScanLevelProjection(
+RowSetTestUtils.projectAll(),
+ScanTestUtils.parsers());
+
+// Table 1: (a: nullable bigint, b)
+
+final TupleMetadata schema1 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.addNullable("b", MinorType.VARCHAR)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+ResolvedRow priorSchema;
+{
+  final NullColumnBuilder builder = new NullColumnBuilder(null, false);
+  final ResolvedRow rootTuple = new ResolvedRow(builder);
+  new WildcardSchemaProjection(
+  scanProj, schema1, rootTuple,
+  ScanTestUtils.resolvers());
+  priorSchema = rootTuple;
+}
+
+// Table 2: (a: 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686534#comment-16686534
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233450250
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/MetadataManager.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+
+/**
+ * Queries can contain a wildcard (*), table columns, or special
+ * system-defined columns (the file metadata columns AKA implicit
+ * columns, the `columns` column of CSV, etc.).
+ * 
+ * This class provides a generalized way of handling such extended
+ * columns. That is, this handles metadata for columns defined by
+ * the scan or file; columns defined by the table (the actual
+ * data metadata) is handled elsewhere.
+ * 
+ * Objects of this interface are driven by the projection processing
+ * framework which provides a vector cache from which to obtain
+ * materialized columns. The implementation must provide a projection
+ * parser to pick out the columns which this object handles.
+ * 
+ * A better name might be ImplicitMetadataManager to signify that
 
 Review comment:
   Agree, let's rename to avoid the confusion in future.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686538#comment-16686538
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454141
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * 
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * 
+ * An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.
+ * Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.
+ * 
+ * 
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * 
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * 
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple priorSchema,
+  ResolvedTuple outputTuple,
+  List resolvers) throws 
IncompatibleSchemaException {
+
+super(resolvers);
+
+for (ResolvedColumn priorCol : priorSchema.columns()) {
+  switch (priorCol.nodeType()) {
+  case ResolvedTableColumn.ID:
+  case ResolvedNullColumn.ID:
+// TODO: To fix this, the null column loader must declare
 
 Review comment:
   Please explain this todo


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686533#comment-16686533
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233449683
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ExplicitSchemaProjection.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Perform a schema projection for the case of an explicit list of
+ * projected columns. Example: SELECT a, b, c.
+ * 
+ * An explicit projection starts with the requested set of columns,
+ * then looks in the table schema to find matches. That is, it is
+ * driven by the query itself.
+ * 
+ * An explicit projection may include columns that do not exist in
+ * the source schema. In this case, we fill in null columns for
+ * unmatched projections.
+ */
+
+public class ExplicitSchemaProjection extends SchemaLevelProjection {
+
+  public ExplicitSchemaProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple rootTuple,
+  List resolvers) {
+super(resolvers);
+resolveRootTuple(scanProj, rootTuple, tableSchema);
+  }
+
+  private void resolveRootTuple(ScanLevelProjection scanProj,
+  ResolvedTuple rootTuple,
+  TupleMetadata tableSchema) {
+for (ColumnProjection col : scanProj.columns()) {
+  if (col.nodeType() == UnresolvedColumn.UNRESOLVED) {
+resolveColumn(rootTuple, ((UnresolvedColumn) col).element(), 
tableSchema);
+  } else {
+resolveSpecial(rootTuple, col, tableSchema);
+  }
+}
+  }
+
+  private void resolveColumn(ResolvedTuple outputTuple,
+  RequestedColumn inputCol, TupleMetadata tableSchema) {
+int tableColIndex = tableSchema.index(inputCol.name());
+if (tableColIndex == -1) {
+  resolveNullColumn(outputTuple, inputCol);
+} else {
+  resolveTableColumn(outputTuple, inputCol,
+  tableSchema.metadata(tableColIndex),
+  tableColIndex);
+}
+  }
+
+  private void resolveTableColumn(ResolvedTuple outputTuple,
+  RequestedColumn requestedCol,
+  ColumnMetadata column, int sourceIndex) {
+
+// Is the requested column implied to be a map?
+// A requested column is a map if the user requests x.y and we
+// are resolving column x. The presence of y as a member implies
+// that x is a map.
+
+if (requestedCol.isTuple()) {
+  resolveMap(outputTuple, requestedCol, column, sourceIndex);
+}
+
+// Is the requested column implied to be an array?
+// This occurs when the projection list contains at least one
+// array index reference such as x[10].
+
+else if (requestedCol.isArray()) {
+  resolveArray(outputTuple, requestedCol, column, sourceIndex);
+}
+
+// A plain old column. Might be an array or a map, but if
+// so, the request list just mentions it by name without implying
+// the column type. That is, the project list just contains x
+// by itself.
+
+else {
+  projectTableColumn(outputTuple, requestedCol, column, sourceIndex);
+}
+  }
+
+  private void resolveMap(ResolvedTuple outputTuple,
+  RequestedColumn requestedCol, ColumnMetadata column,
+  int sourceIndex) {
+
+// If the actual column isn't a map, then the request is invalid.
+
+if (! column.isMap()) {
+  throw UserException
+.validationError()
+.message("Project list implies a map 

[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686531#comment-16686531
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233457684
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   1. Duplicating column name does not make sense.
   2. You may not output precision and scale is they are absent, depends in 
which Object you plan to deserialize this information.
   3. Look at major type, for example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686523#comment-16686523
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

cgivre commented on a change in pull request #1539: DRILL-6847: Add Query 
Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233456389
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   How would you recommend designing that?  I was trying to keep this PR 
relatively simple, and backwards compatible, but one option might be to make 
the metadata a little duplicative so something like:
   
   ```
   "metadata": [{
  "name": "price",
  "type": "FLOAT4"
  "precision":
  "scale"
  },{
"name": "customer",
"type": "VARCHAR"
 ...
   ]
   ```
   Do you know off hand where the precision/scale or any other attributes of 
the columns can be accessed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686436#comment-16686436
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233423505
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   I see but though it is not your use case, I think we should consider 
shipping not only String type but as well information about precision and scale 
for those who might need it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686379#comment-16686379
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

cgivre commented on a change in pull request #1539: DRILL-6847: Add Query 
Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233405972
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   Hi @arina-ielchiieva ,
   The use case I had in mind was integrating Drill with SQLPad and Apache 
Superset.  In these instances basically, the UI needed to know if a field was 
numeric, temporal of any sort, or text so that it could render visualizations 
properly.  
   
   I'm sure there are other use cases out there, but I know that for me at 
least, this was a major blocker in getting Drill to work with the various BI 
tools.  The JDBC interface provided this information, but the RESTful interface 
did not, so I had to resort to hackery. 
   
   So to answer your question, it might be useful for other use cases to 
provide precision and scale, but for the one I had in mind, that would not be 
helpful. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686353#comment-16686353
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233135431
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -132,62 +129,163 @@ public 
ParquetMetaStatCollector(ParquetTableMetadataBase parquetTableMetadata,
   }
 
   /**
-   * Builds column statistics using given primitiveType, originalType, scale,
-   * precision, numNull, min and max values.
+   * Helper class that creates parquet {@link ColumnStatistics} based on given
+   * min and max values, type, number of nulls, precision and scale.
*
-   * @param min min value for statistics
-   * @param max max value for statistics
-   * @param numNullsnum_nulls for statistics
-   * @param primitiveType   type that determines statistics class
-   * @param originalTypetype that determines statistics class
-   * @param scale   scale value (used for DECIMAL type)
-   * @param precision   precision value (used for DECIMAL type)
-   * @return column statistics
*/
-  private ColumnStatistics getStat(Object min, Object max, long numNulls,
-   PrimitiveType.PrimitiveTypeName 
primitiveType, OriginalType originalType,
-   int scale, int precision) {
-Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
-Statistics convertedStat = stat;
-
-TypeProtos.MajorType type = ParquetReaderUtility.getType(primitiveType, 
originalType, scale, precision);
-stat.setNumNulls(numNulls);
-
-if (min != null && max != null ) {
-  switch (type.getMinorType()) {
-  case INT :
-  case TIME:
-((IntStatistics) stat).setMinMax(Integer.parseInt(min.toString()), 
Integer.parseInt(max.toString()));
-break;
-  case BIGINT:
-  case TIMESTAMP:
-((LongStatistics) stat).setMinMax(Long.parseLong(min.toString()), 
Long.parseLong(max.toString()));
-break;
-  case FLOAT4:
-((FloatStatistics) stat).setMinMax(Float.parseFloat(min.toString()), 
Float.parseFloat(max.toString()));
-break;
-  case FLOAT8:
-((DoubleStatistics) 
stat).setMinMax(Double.parseDouble(min.toString()), 
Double.parseDouble(max.toString()));
-break;
-  case DATE:
-convertedStat = new LongStatistics();
-convertedStat.setNumNulls(stat.getNumNulls());
-final long minMS = 
convertToDrillDateValue(Integer.parseInt(min.toString()));
-final long maxMS = 
convertToDrillDateValue(Integer.parseInt(max.toString()));
-((LongStatistics) convertedStat ).setMinMax(minMS, maxMS);
-break;
-  case BIT:
-((BooleanStatistics) 
stat).setMinMax(Boolean.parseBoolean(min.toString()), 
Boolean.parseBoolean(max.toString()));
-break;
-  default:
-  }
+  private static class ColumnStatisticsBuilder {
+
+private Object min;
+private Object max;
+private long numNulls;
+private PrimitiveType.PrimitiveTypeName primitiveType;
+private OriginalType originalType;
+private int scale;
+private int precision;
+
+static ColumnStatisticsBuilder builder() {
+  return new ColumnStatisticsBuilder();
 }
 
-return new ColumnStatistics(convertedStat, type);
-  }
+ColumnStatisticsBuilder setMin(Object min) {
+  this.min = min;
+  return this;
+}
+
+ColumnStatisticsBuilder setMax(Object max) {
+  this.max = max;
+  return this;
+}
+
+ColumnStatisticsBuilder setNumNulls(long numNulls) {
+  this.numNulls = numNulls;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrimitiveType(PrimitiveType.PrimitiveTypeName 
primitiveType) {
+  this.primitiveType = primitiveType;
+  return this;
+}
+
+ColumnStatisticsBuilder setOriginalType(OriginalType originalType) {
+  this.originalType = originalType;
+  return this;
+}
 
-  private static long convertToDrillDateValue(int dateValue) {
+ColumnStatisticsBuilder setScale(int scale) {
+  this.scale = scale;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrecision(int precision) {
+  this.precision = precision;
+  return this;
+}
+
+
+/**
+ * Builds column statistics using given primitive and original types,
+ * scale, precision, number of nulls, min and max values.
+ * Min and max values for binary statistics are set only if allowed.
+ *
+ * @return column statistics
+ */
+ColumnStatistics build() {
+  Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
+  Statistics convertedStat 

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686349#comment-16686349
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233097319
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/VersionUtil.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.common;
+
+import org.apache.maven.artifact.versioning.DefaultArtifactVersion;
+
+/**
+ * Utility class for project version.
+ */
+public class VersionUtil {
 
 Review comment:
   Looks like `hadoop-common` has the class which does similar things: 
`org.apache.hadoop.util.VersionUtil`. Can we replace this class with hadoop one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Since now Drill is using Apache Parquet 1.10.0 where issue with incorrectly 
> stored varchar / decimal min / max statistics is resolved, we should add 
> support for varchar / decimal filter push down. Only files created with 
> parquet lib 1.9.1 (1.10.0)) and later will be subjected to push down. In 
> cases if user knows that prior created files have correct min / max 
> statistics (i.e. user exactly knows that data in binary columns in ASCII (not 
> UTF-8)) than parquet.strings.signed-min-max.enabled can be set to true to 
> enable filter push down.
> *Description*
> _Note: Drill is using Parquet 1.10.0 library since 1.13.0 version._
> *Varchar Partition Pruning*
> Varchar Pruning will work for files generated prior and after Parquet 1.10.0 
> version, since to enable partition pruning both min and max values should be 
> the same and there are no issues with incorrectly stored statistics for 
> binary data for the same min and max values. Partition pruning using Drill 
> metadata files will also work, no matter when metadata file was created 
> (prior or after Drill 1.15.0).
> Partition pruning won't work for files where partition is null due to 
> PARQUET-1341, issue will be fixed in Parquet 1.11.0.
> *Varchar Filter Push Down*
> Varchar filter push down will work for parquet files created with Parquet 
> 1.10.0 and later.
> There are two options how to enable push down for files generated with prior 
> Parquet versions, when user exactly knows that binary data is in ASCII (not 
> UTF-8):
> 1. set configuration {{enableStringsSignedMinMax}} to true (false by default) 
> for parquet format plugin: 
> {noformat}
> "parquet" : {
>   type: "parquet",
>   enableStringsSignedMinMax: true 
> }
> {noformat}
> This would apply to all parquet files of a given file plugin, including all 
> workspaces.
> 2. If user wants to enable / disable allowing reading binary statistics for 
> old parquet files per session, session option 
> {{store.parquet.reader.strings_signed_min_max}} can be used. By default, it 
> has empty string value. Setting such option will take priority over config in 
> parquet format plugin. Option allows three values: 'true', 'false', '' (empty 
> string).
> _Note: store.parquet.reader.strings_signed_min_max also can be set at system 
> level, thus it will apply to all parquet files in the system._
> The same config / session option will apply to allow reading binary 
> 

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686351#comment-16686351
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233127110
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderConfig.java
 ##
 @@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.ParquetReadOptions;
+
+import java.util.Objects;
+
+import static 
org.apache.parquet.format.converter.ParquetMetadataConverter.NO_FILTER;
+
+/**
+ * Stores consolidated parquet reading configuration. Can obtain config values 
from various sources:
+ * Assignment priority of configuration values is the following:
+ * parquet format config
+ * Hadoop configuration
+ * session options
+ *
+ * During serialization does not deserialize the default values in keep 
serialized object smaller.
+ * Should be initialized using {@link Builder}, constructor is made public 
only for ser / de purposes.
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ParquetReaderConfig {
+
+  public static final String ENABLE_BYTES_READ_COUNTER = 
"parquet.benchmark.bytes.read";
+  public static final String ENABLE_BYTES_TOTAL_COUNTER = 
"parquet.benchmark.bytes.total";
+  public static final String ENABLE_TIME_READ_COUNTER = 
"parquet.benchmark.time.read";
+
+  // keep variables public for ser / de to avoid creating getters and 
constructor with params for all variables
+  // add defaults to keep deserialized object smaller
+  public boolean enableBytesReadCounter = false;
+  public boolean enableBytesTotalCounter = false;
+  public boolean enableTimeReadCounter = false;
+  public boolean autoCorrectCorruptedDates = true;
+  public boolean enableStringsSignedMinMax = false;
+
+  public static ParquetReaderConfig.Builder builder() {
+return new ParquetReaderConfig.Builder();
+  }
+
+  public static ParquetReaderConfig getDefaultInstance() {
+return new ParquetReaderConfig();
+  }
+
+  // default constructor should be used only for ser / de and testing
+  public ParquetReaderConfig() { }
+
+  public boolean autoCorrectCorruptedDates() {
+return autoCorrectCorruptedDates;
+  }
+
+  public boolean enableStringsSignedMinMax() {
+return enableStringsSignedMinMax;
+  }
+
+  public ParquetReadOptions toReadOptions() {
+return ParquetReadOptions.builder()
+  .withMetadataFilter(NO_FILTER)
 
 Review comment:
   `NO_FILTER` is set in `ParquetReadOptions.Builder` by default, so it may be 
removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Since now Drill is using Apache Parquet 1.10.0 where issue with incorrectly 
> stored varchar / decimal min / max statistics is resolved, we should add 
> support for varchar / decimal filter push down. Only files created with 
> parquet lib 1.9.1 (1.10.0)) and later will be subjected to push down. In 
> cases if user knows that prior created files have correct min 

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686354#comment-16686354
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233394516
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -132,62 +129,163 @@ public 
ParquetMetaStatCollector(ParquetTableMetadataBase parquetTableMetadata,
   }
 
   /**
-   * Builds column statistics using given primitiveType, originalType, scale,
-   * precision, numNull, min and max values.
+   * Helper class that creates parquet {@link ColumnStatistics} based on given
+   * min and max values, type, number of nulls, precision and scale.
*
-   * @param min min value for statistics
-   * @param max max value for statistics
-   * @param numNullsnum_nulls for statistics
-   * @param primitiveType   type that determines statistics class
-   * @param originalTypetype that determines statistics class
-   * @param scale   scale value (used for DECIMAL type)
-   * @param precision   precision value (used for DECIMAL type)
-   * @return column statistics
*/
-  private ColumnStatistics getStat(Object min, Object max, long numNulls,
-   PrimitiveType.PrimitiveTypeName 
primitiveType, OriginalType originalType,
-   int scale, int precision) {
-Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
-Statistics convertedStat = stat;
-
-TypeProtos.MajorType type = ParquetReaderUtility.getType(primitiveType, 
originalType, scale, precision);
-stat.setNumNulls(numNulls);
-
-if (min != null && max != null ) {
-  switch (type.getMinorType()) {
-  case INT :
-  case TIME:
-((IntStatistics) stat).setMinMax(Integer.parseInt(min.toString()), 
Integer.parseInt(max.toString()));
-break;
-  case BIGINT:
-  case TIMESTAMP:
-((LongStatistics) stat).setMinMax(Long.parseLong(min.toString()), 
Long.parseLong(max.toString()));
-break;
-  case FLOAT4:
-((FloatStatistics) stat).setMinMax(Float.parseFloat(min.toString()), 
Float.parseFloat(max.toString()));
-break;
-  case FLOAT8:
-((DoubleStatistics) 
stat).setMinMax(Double.parseDouble(min.toString()), 
Double.parseDouble(max.toString()));
-break;
-  case DATE:
-convertedStat = new LongStatistics();
-convertedStat.setNumNulls(stat.getNumNulls());
-final long minMS = 
convertToDrillDateValue(Integer.parseInt(min.toString()));
-final long maxMS = 
convertToDrillDateValue(Integer.parseInt(max.toString()));
-((LongStatistics) convertedStat ).setMinMax(minMS, maxMS);
-break;
-  case BIT:
-((BooleanStatistics) 
stat).setMinMax(Boolean.parseBoolean(min.toString()), 
Boolean.parseBoolean(max.toString()));
-break;
-  default:
-  }
+  private static class ColumnStatisticsBuilder {
+
+private Object min;
+private Object max;
+private long numNulls;
+private PrimitiveType.PrimitiveTypeName primitiveType;
+private OriginalType originalType;
+private int scale;
+private int precision;
+
+static ColumnStatisticsBuilder builder() {
+  return new ColumnStatisticsBuilder();
 }
 
-return new ColumnStatistics(convertedStat, type);
-  }
+ColumnStatisticsBuilder setMin(Object min) {
+  this.min = min;
+  return this;
+}
+
+ColumnStatisticsBuilder setMax(Object max) {
+  this.max = max;
+  return this;
+}
+
+ColumnStatisticsBuilder setNumNulls(long numNulls) {
+  this.numNulls = numNulls;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrimitiveType(PrimitiveType.PrimitiveTypeName 
primitiveType) {
+  this.primitiveType = primitiveType;
+  return this;
+}
+
+ColumnStatisticsBuilder setOriginalType(OriginalType originalType) {
+  this.originalType = originalType;
+  return this;
+}
 
-  private static long convertToDrillDateValue(int dateValue) {
+ColumnStatisticsBuilder setScale(int scale) {
+  this.scale = scale;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrecision(int precision) {
+  this.precision = precision;
+  return this;
+}
+
+
+/**
+ * Builds column statistics using given primitive and original types,
+ * scale, precision, number of nulls, min and max values.
+ * Min and max values for binary statistics are set only if allowed.
+ *
+ * @return column statistics
+ */
+ColumnStatistics build() {
+  Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
+  Statistics convertedStat 

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686355#comment-16686355
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233397050
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetReaderConfig.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.SystemOptionManager;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.ParquetReadOptions;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+public class TestParquetReaderConfig {
+
+  @Test
+  public void testDefaultsDeserialization() throws Exception {
+ObjectMapper mapper = new ObjectMapper();
+ParquetReaderConfig readerConfig = ParquetReaderConfig.builder().build(); 
// all defaults
+String value = mapper.writeValueAsString(readerConfig);
+assertEquals("{}", value);
+
+readerConfig = mapper.readValue(value, ParquetReaderConfig.class);
+assertTrue(readerConfig.autoCorrectCorruptedDates); // check that default 
value is restored
+
+readerConfig.autoCorrectCorruptedDates = false; // change the default
+readerConfig.enableStringsSignedMinMax = false; // update to the default
+
+value = mapper.writeValueAsString(readerConfig);
+assertEquals("{\"autoCorrectCorruptedDates\":false}", value);
+  }
+
+  @Test
+  public void testAddConfigToConf() {
+Configuration conf = new Configuration();
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, true);
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, true);
+conf.setBoolean(ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, true);
+
+ParquetReaderConfig readerConfig = 
ParquetReaderConfig.builder().withConf(conf).build();
+Configuration newConf = readerConfig.addCountersToConf(new 
Configuration());
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, 
"true");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, 
"true");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, 
"true");
+
+conf = new Configuration();
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, false);
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, false);
+conf.setBoolean(ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, false);
+
+readerConfig = ParquetReaderConfig.builder().withConf(conf).build();
+newConf = readerConfig.addCountersToConf(new Configuration());
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, 
"false");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, 
"false");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, 
"false");
+  }
+
+  @Test
+  public void testReadOptions() {
+ParquetReaderConfig readerConfig = new ParquetReaderConfig();
 
 Review comment:
   Looks like we contravene the recommendation from its Javadoc


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686352#comment-16686352
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233390608
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderConfig.java
 ##
 @@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.ParquetReadOptions;
+
+import java.util.Objects;
+
+import static 
org.apache.parquet.format.converter.ParquetMetadataConverter.NO_FILTER;
+
+/**
+ * Stores consolidated parquet reading configuration. Can obtain config values 
from various sources:
+ * Assignment priority of configuration values is the following:
+ * parquet format config
+ * Hadoop configuration
+ * session options
+ *
+ * During serialization does not deserialize the default values in keep 
serialized object smaller.
+ * Should be initialized using {@link Builder}, constructor is made public 
only for ser / de purposes.
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ParquetReaderConfig {
 
 Review comment:
   Is it possible to make this class immutable, so in `getDefaultInstance()` 
method we will be able to use the same object instead of instantiating the new 
one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Since now Drill is using Apache Parquet 1.10.0 where issue with incorrectly 
> stored varchar / decimal min / max statistics is resolved, we should add 
> support for varchar / decimal filter push down. Only files created with 
> parquet lib 1.9.1 (1.10.0)) and later will be subjected to push down. In 
> cases if user knows that prior created files have correct min / max 
> statistics (i.e. user exactly knows that data in binary columns in ASCII (not 
> UTF-8)) than parquet.strings.signed-min-max.enabled can be set to true to 
> enable filter push down.
> *Description*
> _Note: Drill is using Parquet 1.10.0 library since 1.13.0 version._
> *Varchar Partition Pruning*
> Varchar Pruning will work for files generated prior and after Parquet 1.10.0 
> version, since to enable partition pruning both min and max values should be 
> the same and there are no issues with incorrectly stored statistics for 
> binary data for the same min and max values. Partition pruning using Drill 
> metadata files will also work, no matter when metadata file was created 
> (prior or after Drill 1.15.0).
> Partition pruning won't work for files where partition is null due to 
> PARQUET-1341, issue will be fixed in Parquet 1.11.0.
> *Varchar Filter Push Down*
> Varchar filter push down will work for parquet files created with Parquet 
> 1.10.0 and later.
> There are two options how to enable push down for files generated with prior 
> Parquet versions, when user exactly knows that binary data is in ASCII (not 
> UTF-8):
> 1. set configuration {{enableStringsSignedMinMax}} to true (false by default) 
> for parquet format plugin: 
> 

[jira] [Commented] (DRILL-6744) Support filter push down for varchar / decimal data types

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686350#comment-16686350
 ] 

ASF GitHub Bot commented on DRILL-6744:
---

vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233132202
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -86,30 +87,26 @@ public ParquetMetaStatCollector(ParquetTableMetadataBase 
parquetTableMetadata,
   columnMetadataMap.put(schemaPath, columnMetadata);
 }
 
-for (final SchemaPath field : fields) {
-  final PrimitiveType.PrimitiveTypeName primitiveType;
-  final OriginalType originalType;
-
-  final ColumnMetadata columnMetadata = 
columnMetadataMap.get(field.getUnIndexed());
-
+for (SchemaPath field : fields) {
+  ColumnMetadata columnMetadata = 
columnMetadataMap.get(field.getUnIndexed());
   if (columnMetadata != null) {
-final Object min = columnMetadata.getMinValue();
-final Object max = columnMetadata.getMaxValue();
-final long numNulls = columnMetadata.getNulls() == null ? -1 : 
columnMetadata.getNulls();
-
-primitiveType = 
this.parquetTableMetadata.getPrimitiveType(columnMetadata.getName());
-originalType = 
this.parquetTableMetadata.getOriginalType(columnMetadata.getName());
-int precision = 0;
-int scale = 0;
+ColumnStatisticsBuilder statisticsBuilder = 
ColumnStatisticsBuilder.builder()
+  .setMin(columnMetadata.getMinValue())
+  .setMax(columnMetadata.getMaxValue())
+  .setNumNulls(columnMetadata.getNulls() == null ? -1 : 
columnMetadata.getNulls())
 
 Review comment:
   Please replace -1 with `GroupScan.NO_COLUMN_STATS`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support filter push down for varchar / decimal data types
> -
>
> Key: DRILL-6744
> URL: https://issues.apache.org/jira/browse/DRILL-6744
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> Since now Drill is using Apache Parquet 1.10.0 where issue with incorrectly 
> stored varchar / decimal min / max statistics is resolved, we should add 
> support for varchar / decimal filter push down. Only files created with 
> parquet lib 1.9.1 (1.10.0)) and later will be subjected to push down. In 
> cases if user knows that prior created files have correct min / max 
> statistics (i.e. user exactly knows that data in binary columns in ASCII (not 
> UTF-8)) than parquet.strings.signed-min-max.enabled can be set to true to 
> enable filter push down.
> *Description*
> _Note: Drill is using Parquet 1.10.0 library since 1.13.0 version._
> *Varchar Partition Pruning*
> Varchar Pruning will work for files generated prior and after Parquet 1.10.0 
> version, since to enable partition pruning both min and max values should be 
> the same and there are no issues with incorrectly stored statistics for 
> binary data for the same min and max values. Partition pruning using Drill 
> metadata files will also work, no matter when metadata file was created 
> (prior or after Drill 1.15.0).
> Partition pruning won't work for files where partition is null due to 
> PARQUET-1341, issue will be fixed in Parquet 1.11.0.
> *Varchar Filter Push Down*
> Varchar filter push down will work for parquet files created with Parquet 
> 1.10.0 and later.
> There are two options how to enable push down for files generated with prior 
> Parquet versions, when user exactly knows that binary data is in ASCII (not 
> UTF-8):
> 1. set configuration {{enableStringsSignedMinMax}} to true (false by default) 
> for parquet format plugin: 
> {noformat}
> "parquet" : {
>   type: "parquet",
>   enableStringsSignedMinMax: true 
> }
> {noformat}
> This would apply to all parquet files of a given file plugin, including all 
> workspaces.
> 2. If user wants to enable / disable allowing reading binary statistics for 
> old parquet files per session, session option 
> {{store.parquet.reader.strings_signed_min_max}} can be used. By default, it 
> has empty string value. Setting such option will take priority over config in 
> parquet format plugin. Option allows three values: 'true', 'false', '' (empty 

[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686189#comment-16686189
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233348011
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   Some types can have precision and scale, does this information needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6847) Add Query Metadata to RESTful Interface

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686190#comment-16686190
 ] 

ASF GitHub Bot commented on DRILL-6847:
---

arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233348011
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   Some types can have precision and scale, is this information needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Query Metadata to RESTful Interface
> ---
>
> Key: DRILL-6847
> URL: https://issues.apache.org/jira/browse/DRILL-6847
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
>
> The Drill RESTful interface does not return the structure of the query 
> results.   This makes integrating Drill with other BI tools difficult because 
> they do not know what kind of data to expect.  
> This PR adds a new section to the results called Metadata which contains a 
> list of the minor types of all the columns returned.
> The query below will now return the following in the RESTful interface:
> {code:sql}
> SELECT CAST( employee_id AS INT) AS employee_id,
> full_name,
> first_name, 
> last_name, 
> CAST( position_id AS BIGINT) AS position_id, 
> position_title 
> FROM cp.`employee.json` LIMIT 2
> {code}
> {code}
> {
>   "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
>   "columns": [
> "employee_id",
> "full_name",
> "first_name",
> "last_name",
> "position_id",
> "position_title"
>   ],
>   "metadata": [
> "INT",
> "VARCHAR",
> "VARCHAR",
> "VARCHAR",
> "BIGINT",
> "VARCHAR"
>   ],
>   "rows": [
> {
>   "full_name": "Sheri Nowmer",
>   "employee_id": "1",
>   "last_name": "Nowmer",
>   "position_title": "President",
>   "first_name": "Sheri",
>   "position_id": "1"
> },
> {
>   "full_name": "Derrick Whelply",
>   "employee_id": "2",
>   "last_name": "Whelply",
>   "position_title": "VP Country Manager",
>   "first_name": "Derrick",
>   "position_id": "2"
> }
>   ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)