[jira] [Updated] (DRILL-7535) Convert Ltsv to EVF
[ https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7535: Summary: Convert Ltsv to EVF (was: Convert ltsv to EVF) > Convert Ltsv to EVF > --- > > Key: DRILL-7535 > URL: https://issues.apache.org/jira/browse/DRILL-7535 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Charles Givre >Priority: Major > Fix For: 1.18.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7554) Convert LTSV Format Plugin to EVF
[ https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024134#comment-17024134 ] Arina Ielchiieva commented on DRILL-7554: - [~cgivre] I have created master Jira to track EVF format conversions: https://issues.apache.org/jira/browse/DRILL-7531, please do not create duplicates and use created sub-tasks. Thanks. > Convert LTSV Format Plugin to EVF > - > > Key: DRILL-7554 > URL: https://issues.apache.org/jira/browse/DRILL-7554 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.18.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7535) Convert ltsv to EVF
[ https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7535: Fix Version/s: 1.18.0 > Convert ltsv to EVF > --- > > Key: DRILL-7535 > URL: https://issues.apache.org/jira/browse/DRILL-7535 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Priority: Major > Fix For: 1.18.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (DRILL-7535) Convert ltsv to EVF
[ https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-7535: --- Assignee: Arina Ielchiieva > Convert ltsv to EVF > --- > > Key: DRILL-7535 > URL: https://issues.apache.org/jira/browse/DRILL-7535 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.18.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (DRILL-7535) Convert ltsv to EVF
[ https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-7535: --- Assignee: Charles Givre (was: Arina Ielchiieva) > Convert ltsv to EVF > --- > > Key: DRILL-7535 > URL: https://issues.apache.org/jira/browse/DRILL-7535 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Charles Givre >Priority: Major > Fix For: 1.18.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7535) Convert ltsv to EVF
[ https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7535: Summary: Convert ltsv to EVF (was: Convert Lstv to EVF) > Convert ltsv to EVF > --- > > Key: DRILL-7535 > URL: https://issues.apache.org/jira/browse/DRILL-7535 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7554) Convert LTSV Format Plugin to EVF
[ https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024060#comment-17024060 ] ASF GitHub Bot commented on DRILL-7554: --- cgivre commented on pull request #1962: DRILL-7554: Convert LTSV Format Plugin to EVF URL: https://github.com/apache/drill/pull/1962 # [DRILL-7554](https://issues.apache.org/jira/browse/DRILL-7554): Convert LTSV Format Plugin to EVF ## Description This PR converts the existing LTSV Format Plugin to EVF. This PR also changes the traditional format of format plugins. Instead of having a minimum of three files, the `XXXFormatPlugin`, `XXXFormatPluginConfig`, and `XXXFormatBatchReader`, this plugin introduces a new abstract class: `EasyEVFBatchReader` which the `XXXBatchReader` extends. Instead of implementing a BatchReader, the proposed pattern is that for new format plugins, most of the code which is frequently duplicated in every format plugin, new format plugins can be created simply by extending the `EasyEVFBatchReader` class and implementing a regular iterator to read through the data and perform the column mappings. This PR is the first in a series of format plugin conversions to EVF, so the `EasyEVFBatchReader` should not be considered a final work, but the basis for a cleaner API for format plugins. I still need to add schema definition methods, but will do so with format plugins with known schemata. ## Documentation No user-visible changes. LTSV is already documented in both `README.md` and the Drill web site. ## Testing As a part of this PR, I updated all unit tests to use the RowSet framework. I also added unit tests for: - Serialization/Deserialization - Compressed Files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Convert LTSV Format Plugin to EVF > - > > Key: DRILL-7554 > URL: https://issues.apache.org/jira/browse/DRILL-7554 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.18.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7554) Convert LTSV Format Plugin to EVF
Charles Givre created DRILL-7554: Summary: Convert LTSV Format Plugin to EVF Key: DRILL-7554 URL: https://issues.apache.org/jira/browse/DRILL-7554 Project: Apache Drill Issue Type: Improvement Components: Storage - Text & CSV Affects Versions: 1.17.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7553) Modernize type management
Paul Rogers created DRILL-7553: -- Summary: Modernize type management Key: DRILL-7553 URL: https://issues.apache.org/jira/browse/DRILL-7553 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.17.0 Reporter: Paul Rogers This is a roll-up issue for our ongoing discussion around improving and modernizing Drill's runtime type system. At present, Drill approaches types vastly differently than most other DB and query tools: * Drill does little (or no) plan-time type checking and propagation. Instead, all type management is done at execution time, in each reader, in each operator, and ultimately in the client. * Drill allows structured types (Map, Dict, Arrays), but does not have the extended SQL statements to fully utilize these types. * Drill supports varying types: two readers can both read column {{c}}, but can do so with different types. We've always hoped to discover some way to reconcile the types. But, at present, the functionality is buggy and incomplete. It is not clear that a viable solution exists. Drill also provides "formal" varying types: Union and List. These types are also not fully supported. These three topics are closely related. "Schema-free" means we must infer types at read time and so Drill cannot do plan-type type analysis of the kind done in other engines. Because of schema-on-read (which is what "schema-free" really means), two readers can read different types for the same fields, and so we end up with varying or inconsistent types, and are forced to figure out some way to manage the conflicts. The gist of the proposal explored in this ticket is to exploit the learning from other engines: to embrace types when available, and to impose tractable rules when types are discovered at run time. h4. Proposal Summary This is very much a discussion draft. Here are some suggestions to get started. # Set as our goal to manage types at plan time. Runtime type discovery becomes a (limited) special case. # Pull type resolution, propagation and checking into the planner where it can be done once per query. Move it out of execution where it must be done multiple times: once per operator per minor fragment. Implement the standard DB type checking and propagation rules. (These rules are currently implicitly implemented deep in the code gen code.) # Generate operator code in the planner; send it to workers as part of the physical plan (to avoid the need to generate the code on each worker.) # Provide schema-aware extensions for storage and format plugins so that they can advertise a schema when known. (Examples; Hive sources get schemas from HMS, JDBC sources get schema from the underlying database, Avro, Parquet and others obtain schema from the target files, etc.) This mechanism works with, but is in addition to, the Drill metastore. # Separate the concepts of "schema-free" (no plan-time schema) from "schema-on-read" (schema is known in the planner, and data is read into that schema by readers; e.g. the Hive model.) Drill remains schema-on-read (for sources that need it), but does not attempt the impossible with schema-free (that is, we no longer read inconsistent data into a relational model and hope we can make it work.) # For convenience, allow "schema-free" (no plan-time schema). The restriction is that all readers *must* produce the same schema It is a fatal (to the query) error for an operator to receive batches with different schemas. (The reasons can be discussed separately.) # Preserve the Map, Dict and Array types, but with tighter semantics: all elements must be of the same type. # Replace the Union and List types with a new type: Java objects. Java objects can be anything and can vary from row-to-row. Java types are processed using UDFs (or Drill functions.) # All "extended" types (complex: Map, Dict and Array, or Java objects) must be reduced to primitive types in a top-level tuple if the client is ODBC (which cannot handle non-relational types.) The same is true if the destination is a simple sink such as CSV or JDBC. # Provide a light-weight way to resolve schema ambiguities that are identified by the new, stricter type rules. The light-weight solution is either a file or some kind of simple Drill-managed registry akin to the plugin registry. Users can run a query, see if there are conflicting types, and, if so, add a resolution rule to the registry. The user then reruns the query with a clean result. In the past couple of years we have made progress in some of these areas. This ticket suggests we bring those threads together in a coherent strategy. h4. Arrow/Java/Fixed Block/Something Else Storage The ideas here are independent of choices we might make for our internal data representation format. The above design works equally well with either Drill or Arrow vectors, or with something else entir
[jira] [Comment Edited] (DRILL-7551) Improve Error Reporting
[ https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023322#comment-17023322 ] Paul Rogers edited comment on DRILL-7551 at 1/27/20 1:05 AM: - Fixing errors has a number of dimensions: # Inconsistent use of exceptions at runtime. We have {{UserException}} which creates some structure, but we also throw random other unchecked exceptions. \{{UserException}}s do not, however, provide a mapping into SQL errors of the type understood by xDBC drivers. # Inconsistent error context. A low level bit of code (a file open call, say) only knows that it failed and that is what it tends to report: ("IO Error 10".) At the next level up, the surrounding code might know a bit more. ("Error reading HDFS:/foo/bar1234.parquet".) What we need is a bit of synthesis to say, ("Too many network timeouts reading block 17 from the bar1234.parquet of the `foo` table stored in the HDFS system `sales`".) # Errors are exceptions and we are overly generous in showing every last bit of stack trace on the client, the server and so on. Even those of us who live in the code find that the few lines we care about (NPE in such-and-such call stack) is lost in hundreds of lines that, frankly, I've never personally looked at. # The client API is a bit of a mess in error reporting: returning unchecked {{UserException}}s rather than a well-structured {{DrillException}} (say) designed for client use. (This is probably because the Drill client was a quick short-term solution based on Drill's internal Drillbit-to-Drillbit RPC.) # Catch errors as early as possible. Example: plan-time type checking (eventually), storage plugin validation in the UI (see comment below.) In addition to the above execution-focused items, it would be good to look at the SQL parser/planner errors as well. Not sure that returning 20-30 lines of possible tokens is super-helpful when I make a SQL typo. Probably fine to say, "Didn't understand the SQL at line 10, position 3."); To clean up our error act, we must move forward on each of these fronts. For my part, I've been chipping away at item 1: trying to convert all code to throw {{UserException}}. EVF provides an "error context" that helps (but does not solve) item 2. I've also made a pass on items 3 & 4, but have been hesitant to make any changes to the client API for fear of breaking the two JDBC drivers and our (currently unstaffed) C++ client. Would be great to get some help. For example, how can we provide user-meaningful context in our errors (Item 2)? How can we map errors in to standard SQL error and warning codes (part of item 1)? Maybe someone can help us figure out how to achieve item 4 with minimal client impact. And, of course, once we set the pattern we want to use, everyone can help by improving each of the many places were we raise exceptions. Item 5 can be done independently of other tasks. was (Author: paul.rogers): Fixing errors has a number of dimensions: # Inconsistent use of exceptions at runtime. We have {{UserException}} which creates some structure, but we also throw random other unchecked exceptions. \{{UserException}}s do not, however, provide a mapping into SQL errors of the type understood by xDBC drivers. # Inconsistent error context. A low level bit of code (a file open call, say) only knows that it failed and that is what it tends to report: ("IO Error 10".) At the next level up, the surrounding code might know a bit more. ("Error reading HDFS:/foo/bar1234.parquet".) What we need is a bit of synthesis to say, ("Too many network timeouts reading block 17 from the bar1234.parquet of the `foo` table stored in the HDFS system `sales`".) # Errors are exceptions and we are overly generous in showing every last bit of stack trace on the client, the server and so on. Even those of us who live in the code find that the few lines we care about (NPE in such-and-such call stack) is lost in hundreds of lines that, frankly, I've never personally looked at. # The client API is a bit of a mess in error reporting: returning unchecked {{UserException}}s rather than a well-structured {{DrillException}} (say) designed for client use. (This is probably because the Drill client was a quick short-term solution based on Drill's internal Drillbit-to-Drillbit RPC.) In addition to the above execution-focused items, it would be good to look at the SQL parser/planner errors as well. Not sure that returning 20-30 lines of possible tokens is super-helpful when I make a SQL typo. Probably fine to say, "Didn't understand the SQL at line 10, position 3."); To clean up our error act, we must move forward on each of these fronts. For my part, I've been chipping away at item 1: trying to convert all code to throw {{UserException}}. EVF provides an "error context" that helps (but does not solve) item 2. I've also m
[jira] [Commented] (DRILL-7551) Improve Error Reporting
[ https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023943#comment-17023943 ] Charles Givre commented on DRILL-7551: -- [~Paul.Rogers] One thing that might be worth doing is putting a syntax checker in the UI and disabling the 'submit' button if it encounters an error. > Improve Error Reporting > --- > > Key: DRILL-7551 > URL: https://issues.apache.org/jira/browse/DRILL-7551 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > This Jira is to serve as a master Jira issue to improve the usability of > error messages. Instead of dumping stack traces, the overall goal is to give > the user something that can actually explain: > # What went wrong > # How to fix > Work that relates to this, should be created as subtasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7551) Improve Error Reporting
[ https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023940#comment-17023940 ] Charles Givre commented on DRILL-7551: -- [~arina] I created DRILL-7552: Add Helpful Error Message on Storage Plugin Creation/Update and linked it as a sub task. > Improve Error Reporting > --- > > Key: DRILL-7551 > URL: https://issues.apache.org/jira/browse/DRILL-7551 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > This Jira is to serve as a master Jira issue to improve the usability of > error messages. Instead of dumping stack traces, the overall goal is to give > the user something that can actually explain: > # What went wrong > # How to fix > Work that relates to this, should be created as subtasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7552) Add Helpful Error Message on Storage Plugin Creation/Update
[ https://issues.apache.org/jira/browse/DRILL-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-7552: - Parent: DRILL-7551 Issue Type: Sub-task (was: Bug) > Add Helpful Error Message on Storage Plugin Creation/Update > --- > > Key: DRILL-7552 > URL: https://issues.apache.org/jira/browse/DRILL-7552 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Other >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Labels: error_message_improvement > Attachments: image-2020-01-26-16-47-46-398.png > > > If you are attempting to create or update a storage plugin and for whatever > reason an error occurs, the only error message that is displayed in the GUI > is > {code:java} > Please retry: Error (unable to parse JSON) > {code} > This is unhelpful to the user as the user may have entered in valid JSON, but > specified an invalid option. The error gives no indication as to what > actually went wrong and how to fix. > See example below: > !image-2020-01-26-16-47-46-398.png! > In this example, the cause of the error is the final option isMysql: false, > which does not exist as a configuration option for the JDBC plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7552) Add Helpful Error Message on Storage Plugin Creation/Update
Charles Givre created DRILL-7552: Summary: Add Helpful Error Message on Storage Plugin Creation/Update Key: DRILL-7552 URL: https://issues.apache.org/jira/browse/DRILL-7552 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.17.0 Reporter: Charles Givre Attachments: image-2020-01-26-16-47-46-398.png If you are attempting to create or update a storage plugin and for whatever reason an error occurs, the only error message that is displayed in the GUI is {code:java} Please retry: Error (unable to parse JSON) {code} This is unhelpful to the user as the user may have entered in valid JSON, but specified an invalid option. The error gives no indication as to what actually went wrong and how to fix. See example below: !image-2020-01-26-16-47-46-398.png! In this example, the cause of the error is the final option isMysql: false, which does not exist as a configuration option for the JDBC plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7551) Improve Error Reporting
[ https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023857#comment-17023857 ] Arina Ielchiieva commented on DRILL-7551: - [~cgivre] could you please create sub-task and provide reproduce steps / screenshots indicating what problems are, this definitely would help developers to see what needs exactly to be done. > Improve Error Reporting > --- > > Key: DRILL-7551 > URL: https://issues.apache.org/jira/browse/DRILL-7551 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > This Jira is to serve as a master Jira issue to improve the usability of > error messages. Instead of dumping stack traces, the overall goal is to give > the user something that can actually explain: > # What went wrong > # How to fix > Work that relates to this, should be created as subtasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)