[jira] [Updated] (DRILL-7535) Convert Ltsv to EVF

2020-01-26 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7535:

Summary: Convert Ltsv to EVF  (was: Convert ltsv to EVF)

> Convert Ltsv to EVF
> ---
>
> Key: DRILL-7535
> URL: https://issues.apache.org/jira/browse/DRILL-7535
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7554) Convert LTSV Format Plugin to EVF

2020-01-26 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024134#comment-17024134
 ] 

Arina Ielchiieva commented on DRILL-7554:
-

[~cgivre] I have created master Jira to track EVF format conversions: 
https://issues.apache.org/jira/browse/DRILL-7531, please do not create 
duplicates and use created sub-tasks. Thanks.

> Convert LTSV Format Plugin to EVF
> -
>
> Key: DRILL-7554
> URL: https://issues.apache.org/jira/browse/DRILL-7554
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7535) Convert ltsv to EVF

2020-01-26 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7535:

Fix Version/s: 1.18.0

> Convert ltsv to EVF
> ---
>
> Key: DRILL-7535
> URL: https://issues.apache.org/jira/browse/DRILL-7535
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7535) Convert ltsv to EVF

2020-01-26 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7535:
---

Assignee: Arina Ielchiieva

> Convert ltsv to EVF
> ---
>
> Key: DRILL-7535
> URL: https://issues.apache.org/jira/browse/DRILL-7535
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7535) Convert ltsv to EVF

2020-01-26 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7535:
---

Assignee: Charles Givre  (was: Arina Ielchiieva)

> Convert ltsv to EVF
> ---
>
> Key: DRILL-7535
> URL: https://issues.apache.org/jira/browse/DRILL-7535
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7535) Convert ltsv to EVF

2020-01-26 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7535:

Summary: Convert ltsv to EVF  (was: Convert Lstv to EVF)

> Convert ltsv to EVF
> ---
>
> Key: DRILL-7535
> URL: https://issues.apache.org/jira/browse/DRILL-7535
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7554) Convert LTSV Format Plugin to EVF

2020-01-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024060#comment-17024060
 ] 

ASF GitHub Bot commented on DRILL-7554:
---

cgivre commented on pull request #1962: DRILL-7554: Convert LTSV Format Plugin 
to EVF
URL: https://github.com/apache/drill/pull/1962
 
 
   # [DRILL-7554](https://issues.apache.org/jira/browse/DRILL-7554): Convert 
LTSV Format Plugin to EVF
   
   ## Description
   This PR converts the existing LTSV Format Plugin to EVF.   This PR also 
changes the traditional format of format plugins.  Instead of having a minimum 
of three files, the `XXXFormatPlugin`, `XXXFormatPluginConfig`, and 
`XXXFormatBatchReader`, this plugin introduces a new abstract class: 
`EasyEVFBatchReader` which the `XXXBatchReader` extends.  
   
   Instead of implementing a BatchReader, the proposed pattern is that for new 
format plugins, most of the code which is frequently duplicated in every format 
plugin, new format plugins can be created simply by extending the 
`EasyEVFBatchReader` class and implementing a regular iterator to read through 
the data and perform the column mappings.
   
   This PR is the first in a series of format plugin conversions to EVF, so the 
`EasyEVFBatchReader` should not be considered a final work, but the basis for a 
cleaner API for format plugins.  I still need to add schema definition methods, 
but will do so with format plugins with known schemata.
   
   ## Documentation
   No user-visible changes.  LTSV is already documented in both `README.md` and 
the Drill web site.
   
   ## Testing
   As a part of this PR, I updated all unit tests to use the RowSet framework.  
I also added unit tests for:
   
   - Serialization/Deserialization
   - Compressed Files
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert LTSV Format Plugin to EVF
> -
>
> Key: DRILL-7554
> URL: https://issues.apache.org/jira/browse/DRILL-7554
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7554) Convert LTSV Format Plugin to EVF

2020-01-26 Thread Charles Givre (Jira)
Charles Givre created DRILL-7554:


 Summary: Convert LTSV Format Plugin to EVF
 Key: DRILL-7554
 URL: https://issues.apache.org/jira/browse/DRILL-7554
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Text & CSV
Affects Versions: 1.17.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7553) Modernize type management

2020-01-26 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7553:
--

 Summary: Modernize type management
 Key: DRILL-7553
 URL: https://issues.apache.org/jira/browse/DRILL-7553
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Paul Rogers


This is a roll-up issue for our ongoing discussion around improving and 
modernizing Drill's runtime type system. At present, Drill approaches types 
vastly differently than most other DB and query tools:

 * Drill does little (or no) plan-time type checking and propagation. Instead, 
all type management is done at execution time, in each reader, in each 
operator, and ultimately in the client.
 * Drill allows structured types (Map, Dict, Arrays), but does not have the 
extended SQL statements to fully utilize these types.
* Drill supports varying types: two readers can both read column {{c}}, but can 
do so with different types. We've always hoped to discover some way to 
reconcile the types. But, at present, the functionality is buggy and 
incomplete. It is not clear that a viable solution exists. Drill also provides 
"formal" varying types: Union and List. These types are also not fully 
supported.

These three topics are closely related. "Schema-free" means we must infer types 
at read time and so Drill cannot do plan-type type analysis of the kind done in 
other engines. Because of schema-on-read (which is what "schema-free" really 
means), two readers can read different types for the same fields, and so we end 
up with varying or inconsistent types, and are forced to figure out some way to 
manage the conflicts.

The gist of the proposal explored in this ticket is to exploit the learning 
from other engines: to embrace types when available, and to impose tractable 
rules when types are discovered at run time.

h4. Proposal Summary

This is very much a discussion draft. Here are some suggestions to get started.

# Set as our goal to manage types at plan time. Runtime type discovery becomes 
a (limited) special case.
# Pull type resolution, propagation and checking into the planner where it can 
be done once per query. Move it out of execution where it must be done multiple 
times: once per operator per minor fragment. Implement the standard DB type 
checking and propagation rules. (These rules are currently implicitly 
implemented deep in the code gen code.)
# Generate operator code in the planner; send it to workers as part of the 
physical plan (to avoid the need to generate the code on each worker.)
# Provide schema-aware extensions for storage and format plugins so that they 
can advertise a schema when known. (Examples; Hive sources get schemas from 
HMS, JDBC sources get schema from the underlying database, Avro, Parquet and 
others obtain schema from the target files, etc.) This mechanism works with, 
but is in addition to, the Drill metastore. 
# Separate the concepts of "schema-free" (no plan-time schema) from 
"schema-on-read" (schema is known in the planner, and data is read into that 
schema by readers; e.g. the Hive model.) Drill remains schema-on-read (for 
sources that need it), but does not attempt the impossible with schema-free 
(that is, we no longer read inconsistent data into a relational model and hope 
we can make it work.)
# For convenience, allow "schema-free" (no plan-time schema). The restriction 
is that all readers *must* produce the same schema It is a fatal (to the query) 
error for an operator to receive batches with different schemas. (The reasons 
can be discussed separately.)
# Preserve the Map, Dict and Array types, but with tighter semantics: all 
elements must be of the same type.
# Replace the Union and List types with a new type: Java objects. Java objects 
can be anything and can vary from row-to-row. Java types are processed using 
UDFs (or Drill functions.)
# All "extended" types (complex: Map, Dict and Array, or Java objects) must be 
reduced to primitive types in a top-level tuple if the client is ODBC (which 
cannot handle non-relational types.) The same is true if the destination is a 
simple sink such as CSV or JDBC.
# Provide a light-weight way to resolve schema ambiguities that are identified 
by the new, stricter type rules. The light-weight solution is either a file or 
some kind of simple Drill-managed registry akin to the plugin registry. Users 
can run a query, see if there are conflicting types, and, if so, add a 
resolution rule to the registry. The user then reruns the query with a clean 
result.

In the past couple of years we have made progress in some of these areas. This 
ticket suggests we bring those threads together in a coherent strategy.

h4. Arrow/Java/Fixed Block/Something Else Storage

The ideas here are independent of choices we might make for our internal data 
representation format. The above design works equally well with either Drill or 
Arrow vectors, or with something else entir

[jira] [Comment Edited] (DRILL-7551) Improve Error Reporting

2020-01-26 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023322#comment-17023322
 ] 

Paul Rogers edited comment on DRILL-7551 at 1/27/20 1:05 AM:
-

Fixing errors has a number of dimensions:
 # Inconsistent use of exceptions at runtime. We have {{UserException}} which 
creates some structure, but we also throw random other unchecked exceptions. 
\{{UserException}}s do not, however, provide a mapping into SQL errors of the 
type understood by xDBC drivers.
 # Inconsistent error context. A low level bit of code (a file open call, say) 
only knows that it failed and that is what it tends to report: ("IO Error 10".) 
At the next level up, the surrounding code might know a bit more. ("Error 
reading HDFS:/foo/bar1234.parquet".) What we need is a bit of synthesis to say, 
("Too many network timeouts reading block 17 from the bar1234.parquet of the 
`foo` table stored in the HDFS system `sales`".)
 # Errors are exceptions and we are overly generous in showing every last bit 
of stack trace on the client, the server and so on. Even those of us who live 
in the code find that the few lines we care about (NPE in such-and-such call 
stack) is lost in hundreds of lines that, frankly, I've never personally looked 
at.
 # The client API is a bit of a mess in error reporting: returning unchecked 
{{UserException}}s rather than a well-structured {{DrillException}} (say) 
designed for client use. (This is probably because the Drill client was a quick 
short-term solution based on Drill's internal Drillbit-to-Drillbit RPC.)
# Catch errors as early as possible. Example: plan-time type checking 
(eventually), storage plugin validation in the UI (see comment below.)

In addition to the above execution-focused items, it would be good to look at 
the SQL parser/planner errors as well. Not sure that returning 20-30 lines of 
possible tokens is super-helpful when I make a SQL typo. Probably fine to say, 
"Didn't understand the SQL at line 10, position 3.");

To clean up our error act, we must move forward on each of these fronts.

For my part, I've been chipping away at item 1: trying to convert all code to 
throw {{UserException}}. EVF provides an "error context" that helps (but does 
not solve) item 2. I've also made a pass on items 3 & 4, but have been hesitant 
to make any changes to the client API for fear of breaking the two JDBC drivers 
and our (currently unstaffed) C++ client.

Would be great to get some help. For example, how can we provide 
user-meaningful context in our errors (Item 2)? How can we map errors in to 
standard SQL error and warning codes (part of item 1)? Maybe someone can help 
us figure out how to achieve item 4 with minimal client impact. And, of course, 
once we set the pattern we want to use, everyone can help by improving each of 
the many places were we raise exceptions.

Item 5 can be done independently of other tasks.


was (Author: paul.rogers):
Fixing errors has a number of dimensions:
 # Inconsistent use of exceptions at runtime. We have {{UserException}} which 
creates some structure, but we also throw random other unchecked exceptions. 
\{{UserException}}s do not, however, provide a mapping into SQL errors of the 
type understood by xDBC drivers.
 # Inconsistent error context. A low level bit of code (a file open call, say) 
only knows that it failed and that is what it tends to report: ("IO Error 10".) 
At the next level up, the surrounding code might know a bit more. ("Error 
reading HDFS:/foo/bar1234.parquet".) What we need is a bit of synthesis to say, 
("Too many network timeouts reading block 17 from the bar1234.parquet of the 
`foo` table stored in the HDFS system `sales`".)
 # Errors are exceptions and we are overly generous in showing every last bit 
of stack trace on the client, the server and so on. Even those of us who live 
in the code find that the few lines we care about (NPE in such-and-such call 
stack) is lost in hundreds of lines that, frankly, I've never personally looked 
at.
 # The client API is a bit of a mess in error reporting: returning unchecked 
{{UserException}}s rather than a well-structured {{DrillException}} (say) 
designed for client use. (This is probably because the Drill client was a quick 
short-term solution based on Drill's internal Drillbit-to-Drillbit RPC.)

In addition to the above execution-focused items, it would be good to look at 
the SQL parser/planner errors as well. Not sure that returning 20-30 lines of 
possible tokens is super-helpful when I make a SQL typo. Probably fine to say, 
"Didn't understand the SQL at line 10, position 3.");

To clean up our error act, we must move forward on each of these fronts.

For my part, I've been chipping away at item 1: trying to convert all code to 
throw {{UserException}}. EVF provides an "error context" that helps (but does 
not solve) item 2. I've also m

[jira] [Commented] (DRILL-7551) Improve Error Reporting

2020-01-26 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023943#comment-17023943
 ] 

Charles Givre commented on DRILL-7551:
--

[~Paul.Rogers]

One thing that might be worth doing is putting a syntax checker in the UI and 
disabling the 'submit' button if it encounters an error. 

> Improve Error Reporting
> ---
>
> Key: DRILL-7551
> URL: https://issues.apache.org/jira/browse/DRILL-7551
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> This Jira is to serve as a master Jira issue to improve the usability of 
> error messages. Instead of dumping stack traces, the overall goal is to give 
> the user something that can actually explain:
>  # What went wrong
>  # How to fix 
> Work that relates to this, should be created as subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7551) Improve Error Reporting

2020-01-26 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023940#comment-17023940
 ] 

Charles Givre commented on DRILL-7551:
--

[~arina] I created DRILL-7552: Add Helpful Error Message on Storage Plugin 
Creation/Update and linked it as a sub task.

> Improve Error Reporting
> ---
>
> Key: DRILL-7551
> URL: https://issues.apache.org/jira/browse/DRILL-7551
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> This Jira is to serve as a master Jira issue to improve the usability of 
> error messages. Instead of dumping stack traces, the overall goal is to give 
> the user something that can actually explain:
>  # What went wrong
>  # How to fix 
> Work that relates to this, should be created as subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7552) Add Helpful Error Message on Storage Plugin Creation/Update

2020-01-26 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-7552:
-
Parent: DRILL-7551
Issue Type: Sub-task  (was: Bug)

> Add Helpful Error Message on Storage Plugin Creation/Update
> ---
>
> Key: DRILL-7552
> URL: https://issues.apache.org/jira/browse/DRILL-7552
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
>  Labels: error_message_improvement
> Attachments: image-2020-01-26-16-47-46-398.png
>
>
> If you are attempting to create or update a storage plugin and for whatever 
> reason an error occurs, the only error message that is displayed in the GUI 
> is 
> {code:java}
> Please retry: Error (unable to parse JSON)
> {code}
> This is unhelpful to the user as the user may have entered in valid JSON, but 
> specified an invalid option. The error gives no indication as to what 
> actually went wrong and how to fix.
> See example below:
> !image-2020-01-26-16-47-46-398.png!
> In this example, the cause of the error is the final option isMysql: false, 
> which does not exist as a configuration option for the JDBC plugin.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7552) Add Helpful Error Message on Storage Plugin Creation/Update

2020-01-26 Thread Charles Givre (Jira)
Charles Givre created DRILL-7552:


 Summary: Add Helpful Error Message on Storage Plugin 
Creation/Update
 Key: DRILL-7552
 URL: https://issues.apache.org/jira/browse/DRILL-7552
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.17.0
Reporter: Charles Givre
 Attachments: image-2020-01-26-16-47-46-398.png

If you are attempting to create or update a storage plugin and for whatever 
reason an error occurs, the only error message that is displayed in the GUI is 
{code:java}
Please retry: Error (unable to parse JSON)
{code}
This is unhelpful to the user as the user may have entered in valid JSON, but 
specified an invalid option. The error gives no indication as to what actually 
went wrong and how to fix.

See example below:

!image-2020-01-26-16-47-46-398.png!

In this example, the cause of the error is the final option isMysql: false, 
which does not exist as a configuration option for the JDBC plugin.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7551) Improve Error Reporting

2020-01-26 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023857#comment-17023857
 ] 

Arina Ielchiieva commented on DRILL-7551:
-

[~cgivre] could you please create sub-task and provide reproduce steps / 
screenshots indicating what problems are, this definitely would help developers 
to see what needs exactly to be done.

> Improve Error Reporting
> ---
>
> Key: DRILL-7551
> URL: https://issues.apache.org/jira/browse/DRILL-7551
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> This Jira is to serve as a master Jira issue to improve the usability of 
> error messages. Instead of dumping stack traces, the overall goal is to give 
> the user something that can actually explain:
>  # What went wrong
>  # How to fix 
> Work that relates to this, should be created as subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)