[jira] [Updated] (DRILL-7751) Add Storage Plugin for Splunk

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7751:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Add Storage Plugin for Splunk
> -
>
> Key: DRILL-7751
> URL: https://issues.apache.org/jira/browse/DRILL-7751
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> # Drill Connector for Splunk
> This plugin enables Drill to query Splunk. 
> ## Configuration
> To connect Drill to Splunk, create a new storage plugin with the following 
> configuration:
> To successfully connect, Splunk uses port `8089` for interfaces.  This port 
> must be open for Drill to query Splunk. 
> ```json
> {
>"type":"splunk",
>"username": "admin",
>"password": "changeme",
>"hostname": "localhost",
>"port": 8089,
>"earliestTime": "-14d",
>"latestTime": "now",
>"enabled": false
> }
> ```
> ## Understanding Splunk's Data Model
> Splunk's primary use case is analyzing event logs with a timestamp. As such, 
> data is indexed by the timestamp, with the most recent data being indexed 
> first.  By default, Splunk
>  will sort the data in reverse chronological order.  Large Splunk 
> installations will put older data into buckets of hot, warm and cold storage 
> with the "cold" storage on the
>   slowest and cheapest disks.
>   
> With this understood, it is **very** important to put time boundaries on your 
> Splunk queries. The Drill plugin allows you to set default values in the 
> configuration such that every
>  query you run will be bounded by these boundaries.  Alternatively, you can 
> set the time boundaries at query time.  In either case, you will achieve the 
> best performance when
>   you are asking Splunk for the smallest amount of data possible.
>   
> ## Understanding Drill's Data Model with Splunk
> Drill treats Splunk indexes as tables. Splunk's access model does not 
> restrict to the catalog, but does restrict access to the actual data. It is 
> therefore possible that you can
>  see the names of indexes to which you do not have access.  You can view the 
> list of available indexes with a `SHOW TABLES IN splunk` query.
>   
> ```
> apache drill> SHOW TABLES IN splunk;
> +--++
> | TABLE_SCHEMA |   TABLE_NAME   |
> +--++
> | splunk   | summary|
> | splunk   | splunklogger   |
> | splunk   | _thefishbucket |
> | splunk   | _audit |
> | splunk   | _internal  |
> | splunk   | _introspection |
> | splunk   | main   |
> | splunk   | history|
> | splunk   | _telemetry |
> +--++
> 9 rows selected (0.304 seconds)
> ```
> To query Splunk from Drill, use the following format: 
> ```sql
> SELECT 
> FROM splunk.
> ```
>   
>  ## Bounding Your Queries
>   When you learn to query Splunk via their interface, the first thing you 
> learn is to bound your queries so that they are looking at the shortest time 
> span possible. When using
>Drill to query Splunk, it is advisable to do the same thing, and Drill 
> offers two ways to accomplish this: via the configuration and at query time.
>
>   ### Bounding your Queries at Query Time
>   The easiest way to bound your query is to do so at querytime via special 
> filters in the `WHERE` clause. There are two special fields, `earliestTime` 
> and `latestTime` which can
>be set to bound the query. If they are not set, the query will be bounded 
> to the defaults set in the configuration.
>
>You can use any of the time formats specified in the Splunk documentation 
> here:   
>   
> https://docs.splunk.com/Documentation/Splunk/8.0.3/SearchReference/SearchTimeModifiers
>   
>   So if you wanted to see your data for the last 15 minutes, you could 
> execute the following query:
> ```sql
> SELECT 
> FROM splunk.
> WHERE earliestTime='-15m' AND latestTime='now'
> ```
> The variables set in a query override the defaults from the configuration. 
>   
>  ## Data Types
>   Splunk does not have sophisticated data types and unfortunately does not 
> provide metadata from its query results.  With the exception of the fields 
> below, Drill will interpret
>all fields as `VARCHAR` and hence you will have to convert them to the 
> appropriate data type at query time.
>   
>    Timestamp Fields
>   * `_indextime`
>   * `_time` 
>   
>    Numeric Fields
>   * `date_hour` 
>   * `date_mday`
>   * `date_minute`
>   * `date_second` 
>   * `date_year`
>   * `linecount`
>   
>  ### Nested Data
>  Splunk has two different types of nested data which roughl

[jira] [Updated] (DRILL-7763) Add Limit Pushdown to File Based Storage Plugins

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7763:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Add Limit Pushdown to File Based Storage Plugins
> 
>
> Key: DRILL-7763
> URL: https://issues.apache.org/jira/browse/DRILL-7763
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> As currently implemented, when querying a file, Drill will read the entire 
> file even if a limit is specified in the query.  This PR does a few things:
>  # Refactors the EasyGroupScan, EasySubScan, and EasyFormatConfig to allow 
> the option of pushing down limits.
>  # Applies this to all the EVF based format plugins which are: LogRegex, 
> PCAP, SPSS, Esri, Excel and Text (CSV). 
> Due to JSON's fluid schema, it would be unwise to adopt the limit pushdown as 
> it could result in very inconsistent schemata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7223) Make the timeout in TimedCallable a configurable boot time parameter

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7223:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Make the timeout in TimedCallable a configurable boot time parameter
> 
>
> Key: DRILL-7223
> URL: https://issues.apache.org/jira/browse/DRILL-7223
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Boaz Ben-Zvi
>Priority: Minor
> Fix For: 1.19.0
>
>
> The 
> [TimedCallable.TIMEOUT_PER_RUNNABLE_IN_MSECS|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java#L52]
>  is currently an internal Drill constant defined as 15 secs. This has been 
> there from day 1 of the introduction. Drill's TimedCallable implements the 
> Java concurrency's Callable interface to create timed threads. It is used by 
> the REFRESH METADATA command which creates multiple threads on the Foreman 
> node to gather Parquet metadata to build the metadata cache.
> Depending on the load on the system or for very large scale number of parquet 
> files (millions) it is possible to exceed this timeout.  While the exact root 
> cause of exceeding the timeout is being investigated, it makes sense to make 
> this timeout a configurable parameter to aid with large scale testing. This 
> JIRA is to make this a configurable bootstrapping option in the 
> drill-override.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-6953) Merge row set-based JSON reader

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-6953:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Merge row set-based JSON reader
> ---
>
> Key: DRILL-6953
> URL: https://issues.apache.org/jira/browse/DRILL-6953
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.19.0
>
>
> The final step in the ongoing "result set loader" saga is to merge the 
> revised JSON reader into master. This reader does two key things:
> * Demonstrates the prototypical "late schema" style of data reading (discover 
> schema while reading).
> * Implements many tricks and hacks to handle schema changes while loading.
> * Shows that, even with all these tricks, the only true solution is to 
> actually have a schema.
> The new JSON reader:
> * Uses an expanded state machine when parsing rather than the complex set of 
> if-statements in the current version.
> * Handles reading a run of nulls before seeing the first data value (as long 
> as the data value shows up in the first record batch).
> * Uses the result-set loader to generate fixed-size batches regardless of the 
> complexity, depth of structure, or width of variable-length fields.
> While the JSON reader itself is helpful, the key contribution is that it 
> shows how to use the entire kit of parts: result set loader, projection 
> framework, and so on. Since the projection framework can handle an external 
> schema, it is also a handy foundation for the ongoing schema project.
> Key work to complete after this merger will be to reconcile actual data with 
> the external schema. For example, if we know a column is supposed to be a 
> VarChar, then read the column as a VarChar regardless of the type JSON itself 
> picks. Or, if a column is supposed to be a Double, then convert Int and 
> String JSON values into Doubles.
> The Row Set framework was designed to allow inserting custom column writers. 
> This would be a great opportunity to do the work needed to create them. Then, 
> use the new JSON framework to allow parsing a JSON field as a specified Drill 
> type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7728) Drill SPI framework

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7728:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Drill SPI framework
> ---
>
> Key: DRILL-7728
> URL: https://issues.apache.org/jira/browse/DRILL-7728
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> Provide the basic framework to load an extension in Drill, modelled after the 
> Java Service Provider concept. Excludes full class loader isolation for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7554) Convert LTSV Format Plugin to EVF

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7554:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Convert LTSV Format Plugin to EVF
> -
>
> Key: DRILL-7554
> URL: https://issues.apache.org/jira/browse/DRILL-7554
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7535) Convert Ltsv to EVF

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7535:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Convert Ltsv to EVF
> ---
>
> Key: DRILL-7535
> URL: https://issues.apache.org/jira/browse/DRILL-7535
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7729) Use java.time in column accessors

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7729:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Use java.time in column accessors
> -
>
> Key: DRILL-7729
> URL: https://issues.apache.org/jira/browse/DRILL-7729
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> Use {{java.time}} classes in the column accessors, except for {{Interval}}, 
> which has no {{java.time}} equivalent. Doing so allows us to create a row-set 
> version of Drill's JSON writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-4232) Support for EXCEPT set operator

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-4232:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Support for EXCEPT set operator
> ---
>
> Key: DRILL-4232
> URL: https://issues.apache.org/jira/browse/DRILL-4232
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Victoria Markman
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7112) Code Cleanup for HTTPD Format Plugin

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7112:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Code Cleanup for HTTPD Format Plugin
> 
>
> Key: DRILL-7112
> URL: https://issues.apache.org/jira/browse/DRILL-7112
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.15.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
> Fix For: 1.19.0
>
>
> Address code clean up issues cited in 
> https://github.com/apache/drill/pull/1635.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7733) Use streaming for REST JSON queries

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7733:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Use streaming for REST JSON queries
> ---
>
> Key: DRILL-7733
> URL: https://issues.apache.org/jira/browse/DRILL-7733
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> Several uses on the user and dev mail lists have complained about the memory 
> overhead when running a REST JSON query: {{http:://node:8047/query.json}}. 
> The current implementation buffers the entire result set in memory, then lets 
> Jersey/Jetty convert the results to JSON. The result is very heavy heap use 
> for larger query result sets.
> This ticket requests a change to use streaming. As each batch arrives at the 
> Screen operator, convert that batch to JSON and directly stream the results 
> to the client network connection, much as is done for the native client 
> connection.
> For backward compatibility, the form of the JSON must be the same as the 
> current API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7458) Base storage plugin framework

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7458:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Base storage plugin framework
> -
>
> Key: DRILL-7458
> URL: https://issues.apache.org/jira/browse/DRILL-7458
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.19.0
>
>
> The "Easy" framework allows third-parties to add format plugins to Drill with 
> moderate effort. (The process could be easier, but "Easy" makes it as simple 
> as possible given the current structure.)
> At present, no such "starter" framework exists for storage plugins. Further, 
> multiple storage plugins have implemented filter push down, seemingly by 
> copying large blocks of code.
> This ticket offers a "base" framework for storage plugins and for filter 
> push-downs. The framework builds on the EVF, allowing plugins to also support 
> project push down.
> The framework has a "test mule" storage plugin to verify functionality, and 
> was used as the basis of an REST-like plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7558:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Generalize filter push-down planner phase
> -
>
> Key: DRILL-7558
> URL: https://issues.apache.org/jira/browse/DRILL-7558
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> DRILL-7458 provides a base framework for storage plugins, including a 
> simplified filter push-down mechanism. [~volodymyr] notes that it may be 
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at 
> some point pushed another filter above the scan, for example, if we have such 
> case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
> Filter(b=3)
> Scan(t1)
> Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
> Filter(a=2)
> Scan(t1, b=3)
> Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of 
> *rules*. Most storage plugins perform filter push-down during the physical 
> planning stage. However, by this point, Drill has already decided on the 
> degree of parallelism: it is too late to use filter push-down to set the 
> degree of parallelism. Yet, if using something like a REST API, we want to 
> use filters to help us shard the query (that is, to set the degree of 
> parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work 
> around the above limitation. (In Drill, there are three different phases that 
> could be considered the logical phase, depending on which planning options 
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because 
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, 
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase 
> handle two tasks that depend on one another. That is, we cannot combine 
> filter push down in a phase which defines the filters, nor can we add filter 
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by 
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as 
> early as the [Cascades query framework 
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
>  which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7550) Add Storage Plugin for Cassandra

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7550:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Add Storage Plugin for Cassandra
> 
>
> Key: DRILL-7550
> URL: https://issues.apache.org/jira/browse/DRILL-7550
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Apache Cassandra is a free and open-source, distributed, wide column store, 
> NoSQL database management system designed to handle large amounts of data 
> across many commodity servers, providing high availability with no single 
> point of failure. [1]
> This PR would enable Drill to query Cassandra data stores.
>  
> [1]: https://en.wikipedia.org/wiki/Apache_Cassandra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7551) Improve Error Reporting

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7551:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Improve Error Reporting
> ---
>
> Key: DRILL-7551
> URL: https://issues.apache.org/jira/browse/DRILL-7551
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> This Jira is to serve as a master Jira issue to improve the usability of 
> error messages. Instead of dumping stack traces, the overall goal is to give 
> the user something that can actually explain:
>  # What went wrong
>  # How to fix 
> Work that relates to this, should be created as subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7712) Fix issues after ZK upgrade

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7712:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Fix issues after ZK upgrade
> ---
>
> Key: DRILL-7712
> URL: https://issues.apache.org/jira/browse/DRILL-7712
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.18.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
>
> Warnings during jdbc-all build (absent when building with Mapr profile):
> {noformat}
> netty-transport-native-epoll-4.1.45.Final.jar, 
> netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 46 
> overlapping classes: 
>   - io.netty.channel.epoll.AbstractEpollStreamChannel$2
>   - io.netty.channel.epoll.AbstractEpollServerChannel$EpollServerSocketUnsafe
>   - io.netty.channel.epoll.EpollDatagramChannel
>   - io.netty.channel.epoll.AbstractEpollStreamChannel$SpliceInChannelTask
>   - io.netty.channel.epoll.NativeDatagramPacketArray
>   - io.netty.channel.epoll.EpollSocketChannelConfig
>   - io.netty.channel.epoll.EpollTcpInfo
>   - io.netty.channel.epoll.EpollEventArray
>   - io.netty.channel.epoll.EpollEventLoop
>   - io.netty.channel.epoll.EpollSocketChannel
>   - 36 more...
> netty-transport-native-unix-common-4.1.45.Final.jar, 
> netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 15 
> overlapping classes: 
>   - io.netty.channel.unix.Errors$NativeConnectException
>   - io.netty.channel.unix.ServerDomainSocketChannel
>   - io.netty.channel.unix.DomainSocketAddress
>   - io.netty.channel.unix.Socket
>   - io.netty.channel.unix.NativeInetAddress
>   - io.netty.channel.unix.DomainSocketChannelConfig
>   - io.netty.channel.unix.Errors$NativeIoException
>   - io.netty.channel.unix.DomainSocketReadMode
>   - io.netty.channel.unix.ErrorsStaticallyReferencedJniMethods
>   - io.netty.channel.unix.UnixChannel
>   - 5 more...
> maven-shade-plugin has detected that some class files are
> present in two or more JARs. When this happens, only one
> single version of the class is copied to the uber jar.
> Usually this is not harmful and you can skip these warnings,
> otherwise try to manually exclude artifacts based on
> mvn dependency:tree -Ddetail=true and the above output.
> See http://maven.apache.org/plugins/maven-shade-plugin/
> {noformat}
> Additional warning build with Mapr profile:
> {noformat}
> The following patterns were never triggered in this artifact inclusion filter:
> o  'org.apache.zookeeper:zookeeper-jute'
> {noformat}
> NPEs in tests (though tests do not fail):
> {noformat}
> [INFO] Running org.apache.drill.exec.coord.zk.TestZookeeperClient
> 4880
> java.lang.NullPointerException
> 4881
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 4882
>   at 
> org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
> 4883
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
> 4884
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
> 4885
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:
> {noformat}
> {noformat}
> [INFO] Running org.apache.drill.exec.coord.zk.TestEphemeralStore
> 5278
> java.lang.NullPointerException
> 5279
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 5280
>   at org.apache.zookeepe
> {noformat}
> {noformat}
> [INFO] Running org.apache.drill.yarn.zk.TestAmRegistration
> 6767
> java.lang.NullPointerException
> 6768
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 6769
>   at 
> org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
> 6770
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
> 6771
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
> 6772
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:929)
> 6773
>   at org.apache.curator.t
> {noformat}
> {noformat}
> org.apache.drill.yarn.client.TestCommandLineOptions
> 6823
> java.lang.NullPointerException
> 6824
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 6825
>   at 
> org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
> 6826
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
> 6827
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
> 6828
>   

[jira] [Updated] (DRILL-7366) Improve Null Handling for UDFs with Complex Output

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7366:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Improve Null Handling for UDFs with Complex Output
> --
>
> Key: DRILL-7366
> URL: https://issues.apache.org/jira/browse/DRILL-7366
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> If there is a UDF which has a complex field (Map or List) as output, Drill 
> does not allow the UDF to have nullable input and it creates additional 
> complexity when writing these kinds of UDFs. 
> I therefore would like to propose that two options be added to the 
> FunctionTemplate for null handling:  {{EMPTY_LIST_IF_NULL}}, and 
> {{EMPTY_MAP_IF_NULL}} which, would simplify UDF creation.  I'm envisioning 
> that if either of these options were selected, and the UDF receives any null 
> value as input, the UDF will return either an empty map or list. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7597) Read selected JSON colums as JSON text

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7597:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Read selected JSON colums as JSON text
> --
>
> Key: DRILL-7597
> URL: https://issues.apache.org/jira/browse/DRILL-7597
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> See DRILL-7598. The use case wishes to read selected JSON columns as JSON 
> text rather than parsing the JSON into a relational structure as is done 
> today in the JSON reader.
> The JSON reader supports "all text mode", but, despite the name, this mode 
> only works for scalars (primitives) such as numbers. It does not work for 
> structured types such as objects or arrays: such types are always parsed into 
> Drill structures (which causes the conflict describe in DRILL-7598.)
> Instead, we need a feature to read an entire JSON value, including structure, 
> as a JSON string.
> This feature would work best when the user can parse some parts of a JSON 
> input file into relational structure, others as JSON. (This is the use case 
> which the user list user faced.) So, we need a way to do that.
> Drill has a "provided schema" feature, which, at present, is used only for 
> text files (and recently with limited support in Avro.) We are working on a 
> project to add such support for JSON.
> Perhaps we can leverage this feature to allow the JSON reader to read chunks 
> of JSON as text which can be manipulated by those future JSON functions. In 
> the example, column "c" would be read as JSON text; Drill would not attempt 
> to parse it into a relational structure.
> As it turns out, the "new" JSON reader we're working on originally had a 
> feature to do just that, but we took it out because we were not sure it was 
> needed. Sounds like we should restore it as part of our "provided schema" 
> support. It could work this way: if you CREATE SCHEMA with column "c" as 
> VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the 
> entire nested structure as JSON without trying to parse it into a relational 
> structure.
> This ticket asks to build the concept:
>  * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field 
> to be read as JSON.
>  * Implement the "read column as JSON" feature in the new EVF-based JSON 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7557) Revise "Base" storage plugin filter-push down listerner with a builder

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7557:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Revise "Base" storage plugin filter-push down listerner with a builder
> --
>
> Key: DRILL-7557
> URL: https://issues.apache.org/jira/browse/DRILL-7557
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> DRILL-7458 introduces a base framework for storage plugins and includes a 
> simplified mechanism for filter push down. Part of that mechanism includes a 
> "listener", with the bulk of the work done in a single method:
> {code:java}
> Pair> transform(GroupScan groupScan,
>   List> andTerms, Pair DisjunctionFilterSpec> orTerm);
> {code}
> Reviewers correctly pointed out that this method might be a bit too complex.
> The listener pattern pretty much forced the present design. To improve it, 
> we'd want to use a different design; maybe some kind of builder which might:
> * Accept the CNF and DNF terms via dedicated methods.
> * Perform a processing step.
> * Provide a number of methods to communicate the results, such as 1) whether 
> a new group scan is needed, 2) any CNF terms to retain, and 3) any DNF terms 
> to retain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7531) Convert format plugins to EVF

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7531:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Convert format plugins to EVF
> -
>
> Key: DRILL-7531
> URL: https://issues.apache.org/jira/browse/DRILL-7531
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.19.0
>
>
> This is umbrella Jira to track down process of converting format plugins to 
> EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7787) Apache drill failed to start

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7787:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Apache drill failed to start
> 
>
> Key: DRILL-7787
> URL: https://issues.apache.org/jira/browse/DRILL-7787
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Om Prasad Surapu
>Priority: Major
> Fix For: 1.19.0
>
>
> Hi Team,
> I have apache drill cluster setup with apache-drill-1.17.0 and started in 
> distributed mode (with zookeeper). Drill started and no issues reported.
>  
> Have installed apache-drill-1.18.0 to fix  DRILL-7786 but drill failed to 
> start with below exception. I have tried zookeeper version 3.5.8 and 3.4.11). 
> Could you help me out to fix this issue?
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: Failure during 
> initial startup of Drillbit.
>  at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:588)
>  at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:554)
>  at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:550)
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable 
> to put 
>  at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:326)
>  at 
> org.apache.drill.exec.store.sys.store.ZookeeperPersistentStore.putIfAbsent(ZookeeperPersistentStore.java:119)
>  at 
> org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.prepareStores(RemoteFunctionRegistry.java:201)
>  at 
> org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.init(RemoteFunctionRegistry.java:108)
>  at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:233)
>  at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:584)
>  ... 2 more
> Caused by: org.apache.zookeeper.KeeperException$UnimplementedException: 
> KeeperErrorCode = Unimplemented for /drill/udf/registry
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
>  at 
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67)
>  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
>  at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:318)
>  ... 7 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7526) Assertion Error when only type is used with schema in table function

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7526:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Assertion Error when only type is used with schema in table function
> 
>
> Key: DRILL-7526
> URL: https://issues.apache.org/jira/browse/DRILL-7526
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
>
> {{org.apache.drill.TestSchemaWithTableFunction}}
> {noformat}
>   @Test
>   public void testWithTypeAndSchema() {
> String query = "select Year from 
> table(dfs.`store/text/data/cars.csvh`(type=> 'text', " +
>   "schema=>'inline=(`Year` int)')) where Make = 'Ford'";
> queryBuilder().sql(query).print();
>   }
> {noformat}
> {noformat}
> Caused by: java.lang.AssertionError: BOOLEAN
>   at 
> org.apache.calcite.sql.type.SqlTypeExplicitPrecedenceList.compareTypePrecedence(SqlTypeExplicitPrecedenceList.java:140)
>   at org.apache.calcite.sql.SqlUtil.bestMatch(SqlUtil.java:687)
>   at 
> org.apache.calcite.sql.SqlUtil.filterRoutinesByTypePrecedence(SqlUtil.java:656)
>   at 
> org.apache.calcite.sql.SqlUtil.lookupSubjectRoutines(SqlUtil.java:515)
>   at org.apache.calcite.sql.SqlUtil.lookupRoutine(SqlUtil.java:435)
>   at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:240)
>   at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:218)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5640)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5627)
>   at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1692)
>   at 
> org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:53)
>   at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3129)
>   at 
> org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111)
>   at 
> org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3383)
>   at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
>   at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
>   at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:944)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:651)
>   at 
> org.apache.drill.exec.planner.sql.conversion.SqlConverter.validate(SqlConverter.java:189)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:648)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:196)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:170)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:590)
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:275)
>   ... 1 more
> {noformat}
> Note: when other format options are used or schema is used alone, everything 
> works fine.
> See test examples: 
> org.apache.drill.Test

[jira] [Updated] (DRILL-7671) Fix builds for cdh and hdp profiles

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7671:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Fix builds for cdh and hdp profiles
> ---
>
> Key: DRILL-7671
> URL: https://issues.apache.org/jira/browse/DRILL-7671
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
>
> cdh and hdp profiles use too obsolete versions of Hadoop and other libraries. 
> So when attempting to build the project with these profiles, the build fails 
> with compilation errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7325) Many operators do not set container record count

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7325:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Many operators do not set container record count
> 
>
> Key: DRILL-7325
> URL: https://issues.apache.org/jira/browse/DRILL-7325
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> See DRILL-7324. The following are problems found because some operators fail 
> to set the record count for their containers.
> h4. Scan
> TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ScanBatch
> ScanBatch: Container record count not set
> Reason: ScanBatch never sets the record count of its container (this is a 
> generic issue, not specific to the PojoRecordReader).
> h4. Filter
> {{TestComplexTypeReader.testNonExistentFieldConverting()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from FilterRecordBatch
> FilterRecordBatch: Container record count not set
> {noformat}
> h4. Hash Join
> {{TestComplexTypeReader.test_array()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashJoinBatch
> HashJoinBatch: Container record count not set
> {noformat}
> Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} 
> with no records.
> h4. Project
> TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
> schema-only batches):
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ProjectRecordBatch
> ProjectRecordBatch: Container record count not set
> {noformat}
> Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
> does not set the value count to 0.
> h4. Unordered Receiver
> {{TestCsvWithSchema.testMultiFileSchema()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnorderedReceiverBatch
> UnorderedReceiverBatch: Container record count not set
> {noformat}
> The problem is that {{RecordBatchLoader.load()}} does not set the container 
> record count.
> h4. Streaming Aggregate
> {{TestJsonReader.testSumWithTypeCase()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from StreamingAggBatch
> StreamingAggBatch: Container record count not set
> {noformat}
> The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
> container record count to 0.
> h4. Limit
> {{TestJsonReader.testDrill_1419()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from LimitRecordBatch
> LimitRecordBatch: Container record count not set
> {noformat}
> None of the paths in {{LimitRecordBatch.innerNext()}} set the container 
> record count.
> h4. Union All
> {{TestJsonReader.testKvgenWithUnionAll()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnionAllRecordBatch
> UnionAllRecordBatch: Container record count not set
> {noformat}
> When {{UnionAllRecordBatch}} calls 
> {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
> container count.
> h4. Hash Aggregate
> {{TestJsonReader.drill_4479()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashAggBatch
> HashAggBatch: Container record count not set
> {noformat}
> Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
> record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}
> h4. And Many More
> I turns out that most operators fail to set one of the many row count 
> variables somewhere in their code path: maybe in the schema setup path, maybe 
> when building a batch along one of the many paths that operators follow. 
> Further, we have multiple row counts that must be set:
> * Values in each vector ({{setValueCount()}},
> * Row count in the container ({{setRecordCount()}}), which must be the same 
> as the vector value count.
> * Row count in the operator (batch), which is the (possibly filtered) count 
> of records presented to downstream operators. It must be less than or equal 
> to the container row count (except for an SV4.)
> * The SV2 record count, which is the number of entries in the SV2 and must be 
> the same as the batch row count (and less or equal to the container row 
> count.)
> * The SV2 actual bactch record count, which must be the same as the container 
> row count.
> * The SV4 record count, which must be the same as the batch record count. 
> W

[jira] [Updated] (DRILL-7556) Generalize the "Base" storage plugin filter push down mechanism

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7556:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Generalize the "Base" storage plugin filter push down mechanism
> ---
>
> Key: DRILL-7556
> URL: https://issues.apache.org/jira/browse/DRILL-7556
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> DRILL-7458 adds a Base framework for storage plugins which includes a 
> simplified representation of filters that can be pushed down into Drill. It 
> makes the assumption that plugins can generally only handle filters of the 
> form:
> {code}
> column relop constant
> {code}
> For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" 
> expressions of the form {{constant relop column}}.)
> [~volodymyr] suggests this is too narrow and suggests two additional cases:
> {code}
> column-expr relop constant
> fn(column) = conststant
> {code}
> Examples:
> {code:sql}
> foo + 10 = 20
> substr(bar, 2, 6) = 'Fred'
> {code}
> The first case should be handled by a general expression rewriter: simplify 
> constant expressions:
> {code:sql}
> foo + 10 = 20 --> foo = 10
> {code}
> Then, filter-push down need only handle the simplified expression rather than 
> every push-down mechanism needing to do the simplification.
> For this ticket, we wish to handle the second case: any expression that 
> contains a single column associated with the target table. Provide a new 
> push-down node to handle the non-relop case so that simple plugins can simply 
> ignore such expressions, but more complex plugins (such as Parquet) can 
> optionally handle them.
> A second improvement is to handle the more complex case: two or more columns, 
> all of which come from the same target table. For example:
> {code:sql}
> foo + bar = 20
> {code}
> Where both {{foo}} and {{bar}} are from the same table. It would be a very 
> sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle 
> this case, but it should be available.
> As part of this work, we must handle join-equivalent columns:
> {code:sql}
> SELECT ... FROM t1, t2
>   WHERE t1.a = t2.b
>   AND t1.a = 20
> {code}
> If the plugin for table {{t2}} can handle filter push-down, then the 
> expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}.
> It is not clear if the Drill logical plan already handles join equivalence. 
> If not, it should be added. If so, the filter push-down mechanism should add 
> documentation that describes how the mechanism works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7270:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Fix non-https dependency urls and add checksum checks
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>  Components: Security
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.19.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7284) reusing the hashCodes computed at exchange nodes

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7284:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> reusing the hashCodes computed at exchange nodes
> 
>
> Key: DRILL-7284
> URL: https://issues.apache.org/jira/browse/DRILL-7284
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Major
> Fix For: 1.19.0
>
>
> To HashJoin or HashAggregate, we will shuffle the input data according to 
> hashCodes of join conditions or group by keys at the exchange nodes. This 
> computing of the hash codes will be redo at the HashJoin or HashAggregate 
> nodes. We could send the computed hashCodes of exchange nodes to the upper 
> nodes. So the HashJoin or HashAggregate nodes will not need to do the hash 
> computing again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7192) Drill limits rows when autoLimit is disabled

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7192:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Drill limits rows when autoLimit is disabled
> 
>
> Key: DRILL-7192
> URL: https://issues.apache.org/jira/browse/DRILL-7192
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
>
> In DRILL-7048 was implemented autoLimit for JDBC and rest clients.
> *Steps to reproduce the issue:*
>  1. Check that autoLimit was disabled, if not, disable it and restart Drill.
>  2. Submit any query, and verify that rows count is correct, for example,
> {code:sql}
> SELECT * FROM cp.`employee.json`;
> {code}
> returns 1,155 rows
>  3. Enable autoLimit for sqlLine sqlLine client:
> {code:sql}
> !set rowLimit 10
> {code}
> 4. Submit the same query and verify that the result has 10 rows.
>  5. Disable autoLimit:
> {code:sql}
> !set rowLimit 0
> {code}
> 6. Submit the same query, but for this time, *it returns 10 rows instead of 
> 1,155*.
> Correct rows count is returned only after creating a new connection.
> The same issue is also observed for SQuirreL SQL client, but for example, for 
> Postgres, it works correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7525) Convert SequenceFiles to EVF

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7525:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Convert SequenceFiles to EVF
> 
>
> Key: DRILL-7525
> URL: https://issues.apache.org/jira/browse/DRILL-7525
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.19.0
>
>
> Convert SequenceFiles to EVF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7133) Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7133:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin
> --
>
> Key: DRILL-7133
> URL: https://issues.apache.org/jira/browse/DRILL-7133
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> There was a JIRA (https://issues.apache.org/jira/browse/DRILL-7032) which 
> resulted in some improvements to the PCAP format plugin which converted the 
> TCP flags to boolean format and also added a {{is_corrupt}} boolean field.  
> This field allows users to look for packets that are corrupt. 
> Unfortunately, this functionality is not duplicated in the PCAP-NG format 
> plugin, so this JIRA proposes to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7621) Refactor ExecConstants and PlannerSettings constant classes

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7621:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Refactor ExecConstants and PlannerSettings constant classes
> ---
>
> Key: DRILL-7621
> URL: https://issues.apache.org/jira/browse/DRILL-7621
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.19.0
>
>
> According to [the 
> discussion|http://mail-archives.apache.org/mod_mbox/drill-dev/202003.mbox/%3CBCB4CFC2-8BC5-43C6-8BD4-956F66F6D0D3%40gmail.com%3E],
>  it makes sense to split the classes into multiple constant interfaces and 
> get rid of validator constants. Then the validator instances won't be used 
> for getting option values and the general approach will be getting type 
> specific option value by string key from config instance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7192) Drill limits rows when autoLimit is disabled

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7192:
---
Target Version/s: 1.19.0

> Drill limits rows when autoLimit is disabled
> 
>
> Key: DRILL-7192
> URL: https://issues.apache.org/jira/browse/DRILL-7192
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> In DRILL-7048 was implemented autoLimit for JDBC and rest clients.
> *Steps to reproduce the issue:*
>  1. Check that autoLimit was disabled, if not, disable it and restart Drill.
>  2. Submit any query, and verify that rows count is correct, for example,
> {code:sql}
> SELECT * FROM cp.`employee.json`;
> {code}
> returns 1,155 rows
>  3. Enable autoLimit for sqlLine sqlLine client:
> {code:sql}
> !set rowLimit 10
> {code}
> 4. Submit the same query and verify that the result has 10 rows.
>  5. Disable autoLimit:
> {code:sql}
> !set rowLimit 0
> {code}
> 6. Submit the same query, but for this time, *it returns 10 rows instead of 
> 1,155*.
> Correct rows count is returned only after creating a new connection.
> The same issue is also observed for SQuirreL SQL client, but for example, for 
> Postgres, it works correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7551) Improve Error Reporting

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7551:
---
Target Version/s: 1.19.0

> Improve Error Reporting
> ---
>
> Key: DRILL-7551
> URL: https://issues.apache.org/jira/browse/DRILL-7551
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> This Jira is to serve as a master Jira issue to improve the usability of 
> error messages. Instead of dumping stack traces, the overall goal is to give 
> the user something that can actually explain:
>  # What went wrong
>  # How to fix 
> Work that relates to this, should be created as subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7550) Add Storage Plugin for Cassandra

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7550:
---
Target Version/s: 1.19.0

> Add Storage Plugin for Cassandra
> 
>
> Key: DRILL-7550
> URL: https://issues.apache.org/jira/browse/DRILL-7550
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> Apache Cassandra is a free and open-source, distributed, wide column store, 
> NoSQL database management system designed to handle large amounts of data 
> across many commodity servers, providing high availability with no single 
> point of failure. [1]
> This PR would enable Drill to query Cassandra data stores.
>  
> [1]: https://en.wikipedia.org/wiki/Apache_Cassandra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7284) reusing the hashCodes computed at exchange nodes

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7284:
---
Target Version/s: 1.19.0

> reusing the hashCodes computed at exchange nodes
> 
>
> Key: DRILL-7284
> URL: https://issues.apache.org/jira/browse/DRILL-7284
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Major
> Fix For: 1.18.0
>
>
> To HashJoin or HashAggregate, we will shuffle the input data according to 
> hashCodes of join conditions or group by keys at the exchange nodes. This 
> computing of the hash codes will be redo at the HashJoin or HashAggregate 
> nodes. We could send the computed hashCodes of exchange nodes to the upper 
> nodes. So the HashJoin or HashAggregate nodes will not need to do the hash 
> computing again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7597) Read selected JSON colums as JSON text

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7597:
---
Target Version/s: 1.19.0

> Read selected JSON colums as JSON text
> --
>
> Key: DRILL-7597
> URL: https://issues.apache.org/jira/browse/DRILL-7597
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-7598. The use case wishes to read selected JSON columns as JSON 
> text rather than parsing the JSON into a relational structure as is done 
> today in the JSON reader.
> The JSON reader supports "all text mode", but, despite the name, this mode 
> only works for scalars (primitives) such as numbers. It does not work for 
> structured types such as objects or arrays: such types are always parsed into 
> Drill structures (which causes the conflict describe in DRILL-7598.)
> Instead, we need a feature to read an entire JSON value, including structure, 
> as a JSON string.
> This feature would work best when the user can parse some parts of a JSON 
> input file into relational structure, others as JSON. (This is the use case 
> which the user list user faced.) So, we need a way to do that.
> Drill has a "provided schema" feature, which, at present, is used only for 
> text files (and recently with limited support in Avro.) We are working on a 
> project to add such support for JSON.
> Perhaps we can leverage this feature to allow the JSON reader to read chunks 
> of JSON as text which can be manipulated by those future JSON functions. In 
> the example, column "c" would be read as JSON text; Drill would not attempt 
> to parse it into a relational structure.
> As it turns out, the "new" JSON reader we're working on originally had a 
> feature to do just that, but we took it out because we were not sure it was 
> needed. Sounds like we should restore it as part of our "provided schema" 
> support. It could work this way: if you CREATE SCHEMA with column "c" as 
> VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the 
> entire nested structure as JSON without trying to parse it into a relational 
> structure.
> This ticket asks to build the concept:
>  * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field 
> to be read as JSON.
>  * Implement the "read column as JSON" feature in the new EVF-based JSON 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7526) Assertion Error when only type is used with schema in table function

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7526:
---
Target Version/s: 1.19.0

> Assertion Error when only type is used with schema in table function
> 
>
> Key: DRILL-7526
> URL: https://issues.apache.org/jira/browse/DRILL-7526
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> {{org.apache.drill.TestSchemaWithTableFunction}}
> {noformat}
>   @Test
>   public void testWithTypeAndSchema() {
> String query = "select Year from 
> table(dfs.`store/text/data/cars.csvh`(type=> 'text', " +
>   "schema=>'inline=(`Year` int)')) where Make = 'Ford'";
> queryBuilder().sql(query).print();
>   }
> {noformat}
> {noformat}
> Caused by: java.lang.AssertionError: BOOLEAN
>   at 
> org.apache.calcite.sql.type.SqlTypeExplicitPrecedenceList.compareTypePrecedence(SqlTypeExplicitPrecedenceList.java:140)
>   at org.apache.calcite.sql.SqlUtil.bestMatch(SqlUtil.java:687)
>   at 
> org.apache.calcite.sql.SqlUtil.filterRoutinesByTypePrecedence(SqlUtil.java:656)
>   at 
> org.apache.calcite.sql.SqlUtil.lookupSubjectRoutines(SqlUtil.java:515)
>   at org.apache.calcite.sql.SqlUtil.lookupRoutine(SqlUtil.java:435)
>   at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:240)
>   at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:218)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5640)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5627)
>   at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1692)
>   at 
> org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:53)
>   at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3129)
>   at 
> org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111)
>   at 
> org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3383)
>   at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
>   at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
>   at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:944)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:651)
>   at 
> org.apache.drill.exec.planner.sql.conversion.SqlConverter.validate(SqlConverter.java:189)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:648)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:196)
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:170)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128)
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:590)
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:275)
>   ... 1 more
> {noformat}
> Note: when other format options are used or schema is used alone, everything 
> works fine.
> See test examples: 
> org.apache.drill.TestSchemaWithTableFunction#testSchema

[jira] [Updated] (DRILL-7787) Apache drill failed to start

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7787:
---
Target Version/s: 1.19.0

> Apache drill failed to start
> 
>
> Key: DRILL-7787
> URL: https://issues.apache.org/jira/browse/DRILL-7787
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Om Prasad Surapu
>Priority: Major
> Fix For: 1.18.0
>
>
> Hi Team,
> I have apache drill cluster setup with apache-drill-1.17.0 and started in 
> distributed mode (with zookeeper). Drill started and no issues reported.
>  
> Have installed apache-drill-1.18.0 to fix  DRILL-7786 but drill failed to 
> start with below exception. I have tried zookeeper version 3.5.8 and 3.4.11). 
> Could you help me out to fix this issue?
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: Failure during 
> initial startup of Drillbit.
>  at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:588)
>  at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:554)
>  at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:550)
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable 
> to put 
>  at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:326)
>  at 
> org.apache.drill.exec.store.sys.store.ZookeeperPersistentStore.putIfAbsent(ZookeeperPersistentStore.java:119)
>  at 
> org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.prepareStores(RemoteFunctionRegistry.java:201)
>  at 
> org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.init(RemoteFunctionRegistry.java:108)
>  at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:233)
>  at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:584)
>  ... 2 more
> Caused by: org.apache.zookeeper.KeeperException$UnimplementedException: 
> KeeperErrorCode = Unimplemented for /drill/udf/registry
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
>  at 
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67)
>  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
>  at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
>  at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:318)
>  ... 7 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7558:
---
Target Version/s: 1.19.0

> Generalize filter push-down planner phase
> -
>
> Key: DRILL-7558
> URL: https://issues.apache.org/jira/browse/DRILL-7558
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7458 provides a base framework for storage plugins, including a 
> simplified filter push-down mechanism. [~volodymyr] notes that it may be 
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at 
> some point pushed another filter above the scan, for example, if we have such 
> case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
> Filter(b=3)
> Scan(t1)
> Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
> Filter(a=2)
> Scan(t1, b=3)
> Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of 
> *rules*. Most storage plugins perform filter push-down during the physical 
> planning stage. However, by this point, Drill has already decided on the 
> degree of parallelism: it is too late to use filter push-down to set the 
> degree of parallelism. Yet, if using something like a REST API, we want to 
> use filters to help us shard the query (that is, to set the degree of 
> parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work 
> around the above limitation. (In Drill, there are three different phases that 
> could be considered the logical phase, depending on which planning options 
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because 
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, 
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase 
> handle two tasks that depend on one another. That is, we cannot combine 
> filter push down in a phase which defines the filters, nor can we add filter 
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by 
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as 
> early as the [Cascades query framework 
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
>  which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7556) Generalize the "Base" storage plugin filter push down mechanism

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7556:
---
Target Version/s: 1.19.0

> Generalize the "Base" storage plugin filter push down mechanism
> ---
>
> Key: DRILL-7556
> URL: https://issues.apache.org/jira/browse/DRILL-7556
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7458 adds a Base framework for storage plugins which includes a 
> simplified representation of filters that can be pushed down into Drill. It 
> makes the assumption that plugins can generally only handle filters of the 
> form:
> {code}
> column relop constant
> {code}
> For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" 
> expressions of the form {{constant relop column}}.)
> [~volodymyr] suggests this is too narrow and suggests two additional cases:
> {code}
> column-expr relop constant
> fn(column) = conststant
> {code}
> Examples:
> {code:sql}
> foo + 10 = 20
> substr(bar, 2, 6) = 'Fred'
> {code}
> The first case should be handled by a general expression rewriter: simplify 
> constant expressions:
> {code:sql}
> foo + 10 = 20 --> foo = 10
> {code}
> Then, filter-push down need only handle the simplified expression rather than 
> every push-down mechanism needing to do the simplification.
> For this ticket, we wish to handle the second case: any expression that 
> contains a single column associated with the target table. Provide a new 
> push-down node to handle the non-relop case so that simple plugins can simply 
> ignore such expressions, but more complex plugins (such as Parquet) can 
> optionally handle them.
> A second improvement is to handle the more complex case: two or more columns, 
> all of which come from the same target table. For example:
> {code:sql}
> foo + bar = 20
> {code}
> Where both {{foo}} and {{bar}} are from the same table. It would be a very 
> sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle 
> this case, but it should be available.
> As part of this work, we must handle join-equivalent columns:
> {code:sql}
> SELECT ... FROM t1, t2
>   WHERE t1.a = t2.b
>   AND t1.a = 20
> {code}
> If the plugin for table {{t2}} can handle filter push-down, then the 
> expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}.
> It is not clear if the Drill logical plan already handles join equivalence. 
> If not, it should be added. If so, the filter push-down mechanism should add 
> documentation that describes how the mechanism works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7671) Fix builds for cdh and hdp profiles

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7671:
---
Target Version/s: 1.19.0

> Fix builds for cdh and hdp profiles
> ---
>
> Key: DRILL-7671
> URL: https://issues.apache.org/jira/browse/DRILL-7671
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> cdh and hdp profiles use too obsolete versions of Hadoop and other libraries. 
> So when attempting to build the project with these profiles, the build fails 
> with compilation errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7712) Fix issues after ZK upgrade

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7712:
---
Target Version/s: 1.19.0

> Fix issues after ZK upgrade
> ---
>
> Key: DRILL-7712
> URL: https://issues.apache.org/jira/browse/DRILL-7712
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.18.0
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> Warnings during jdbc-all build (absent when building with Mapr profile):
> {noformat}
> netty-transport-native-epoll-4.1.45.Final.jar, 
> netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 46 
> overlapping classes: 
>   - io.netty.channel.epoll.AbstractEpollStreamChannel$2
>   - io.netty.channel.epoll.AbstractEpollServerChannel$EpollServerSocketUnsafe
>   - io.netty.channel.epoll.EpollDatagramChannel
>   - io.netty.channel.epoll.AbstractEpollStreamChannel$SpliceInChannelTask
>   - io.netty.channel.epoll.NativeDatagramPacketArray
>   - io.netty.channel.epoll.EpollSocketChannelConfig
>   - io.netty.channel.epoll.EpollTcpInfo
>   - io.netty.channel.epoll.EpollEventArray
>   - io.netty.channel.epoll.EpollEventLoop
>   - io.netty.channel.epoll.EpollSocketChannel
>   - 36 more...
> netty-transport-native-unix-common-4.1.45.Final.jar, 
> netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 15 
> overlapping classes: 
>   - io.netty.channel.unix.Errors$NativeConnectException
>   - io.netty.channel.unix.ServerDomainSocketChannel
>   - io.netty.channel.unix.DomainSocketAddress
>   - io.netty.channel.unix.Socket
>   - io.netty.channel.unix.NativeInetAddress
>   - io.netty.channel.unix.DomainSocketChannelConfig
>   - io.netty.channel.unix.Errors$NativeIoException
>   - io.netty.channel.unix.DomainSocketReadMode
>   - io.netty.channel.unix.ErrorsStaticallyReferencedJniMethods
>   - io.netty.channel.unix.UnixChannel
>   - 5 more...
> maven-shade-plugin has detected that some class files are
> present in two or more JARs. When this happens, only one
> single version of the class is copied to the uber jar.
> Usually this is not harmful and you can skip these warnings,
> otherwise try to manually exclude artifacts based on
> mvn dependency:tree -Ddetail=true and the above output.
> See http://maven.apache.org/plugins/maven-shade-plugin/
> {noformat}
> Additional warning build with Mapr profile:
> {noformat}
> The following patterns were never triggered in this artifact inclusion filter:
> o  'org.apache.zookeeper:zookeeper-jute'
> {noformat}
> NPEs in tests (though tests do not fail):
> {noformat}
> [INFO] Running org.apache.drill.exec.coord.zk.TestZookeeperClient
> 4880
> java.lang.NullPointerException
> 4881
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 4882
>   at 
> org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
> 4883
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
> 4884
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
> 4885
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:
> {noformat}
> {noformat}
> [INFO] Running org.apache.drill.exec.coord.zk.TestEphemeralStore
> 5278
> java.lang.NullPointerException
> 5279
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 5280
>   at org.apache.zookeepe
> {noformat}
> {noformat}
> [INFO] Running org.apache.drill.yarn.zk.TestAmRegistration
> 6767
> java.lang.NullPointerException
> 6768
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 6769
>   at 
> org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
> 6770
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
> 6771
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
> 6772
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:929)
> 6773
>   at org.apache.curator.t
> {noformat}
> {noformat}
> org.apache.drill.yarn.client.TestCommandLineOptions
> 6823
> java.lang.NullPointerException
> 6824
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269)
> 6825
>   at 
> org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251)
> 6826
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583)
> 6827
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546)
> 6828
>   at org.apac
> {noformat}



--

[jira] [Updated] (DRILL-7531) Convert format plugins to EVF

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7531:
---
Target Version/s: 1.19.0

> Convert format plugins to EVF
> -
>
> Key: DRILL-7531
> URL: https://issues.apache.org/jira/browse/DRILL-7531
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> This is umbrella Jira to track down process of converting format plugins to 
> EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7621) Refactor ExecConstants and PlannerSettings constant classes

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7621:
---
Target Version/s: 1.19.0

> Refactor ExecConstants and PlannerSettings constant classes
> ---
>
> Key: DRILL-7621
> URL: https://issues.apache.org/jira/browse/DRILL-7621
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.18.0
>
>
> According to [the 
> discussion|http://mail-archives.apache.org/mod_mbox/drill-dev/202003.mbox/%3CBCB4CFC2-8BC5-43C6-8BD4-956F66F6D0D3%40gmail.com%3E],
>  it makes sense to split the classes into multiple constant interfaces and 
> get rid of validator constants. Then the validator instances won't be used 
> for getting option values and the general approach will be getting type 
> specific option value by string key from config instance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7270:
---
Target Version/s: 1.19.0

> Fix non-https dependency urls and add checksum checks
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>  Components: Security
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7133) Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7133:
---
Target Version/s: 1.19.0

> Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin
> --
>
> Key: DRILL-7133
> URL: https://issues.apache.org/jira/browse/DRILL-7133
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> There was a JIRA (https://issues.apache.org/jira/browse/DRILL-7032) which 
> resulted in some improvements to the PCAP format plugin which converted the 
> TCP flags to boolean format and also added a {{is_corrupt}} boolean field.  
> This field allows users to look for packets that are corrupt. 
> Unfortunately, this functionality is not duplicated in the PCAP-NG format 
> plugin, so this JIRA proposes to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7366) Improve Null Handling for UDFs with Complex Output

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7366:
---
Target Version/s: 1.19.0

> Improve Null Handling for UDFs with Complex Output
> --
>
> Key: DRILL-7366
> URL: https://issues.apache.org/jira/browse/DRILL-7366
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> If there is a UDF which has a complex field (Map or List) as output, Drill 
> does not allow the UDF to have nullable input and it creates additional 
> complexity when writing these kinds of UDFs. 
> I therefore would like to propose that two options be added to the 
> FunctionTemplate for null handling:  {{EMPTY_LIST_IF_NULL}}, and 
> {{EMPTY_MAP_IF_NULL}} which, would simplify UDF creation.  I'm envisioning 
> that if either of these options were selected, and the UDF receives any null 
> value as input, the UDF will return either an empty map or list. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7325) Many operators do not set container record count

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7325:
---
Target Version/s: 1.19.0

> Many operators do not set container record count
> 
>
> Key: DRILL-7325
> URL: https://issues.apache.org/jira/browse/DRILL-7325
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-7324. The following are problems found because some operators fail 
> to set the record count for their containers.
> h4. Scan
> TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ScanBatch
> ScanBatch: Container record count not set
> Reason: ScanBatch never sets the record count of its container (this is a 
> generic issue, not specific to the PojoRecordReader).
> h4. Filter
> {{TestComplexTypeReader.testNonExistentFieldConverting()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from FilterRecordBatch
> FilterRecordBatch: Container record count not set
> {noformat}
> h4. Hash Join
> {{TestComplexTypeReader.test_array()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashJoinBatch
> HashJoinBatch: Container record count not set
> {noformat}
> Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} 
> with no records.
> h4. Project
> TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
> schema-only batches):
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ProjectRecordBatch
> ProjectRecordBatch: Container record count not set
> {noformat}
> Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
> does not set the value count to 0.
> h4. Unordered Receiver
> {{TestCsvWithSchema.testMultiFileSchema()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnorderedReceiverBatch
> UnorderedReceiverBatch: Container record count not set
> {noformat}
> The problem is that {{RecordBatchLoader.load()}} does not set the container 
> record count.
> h4. Streaming Aggregate
> {{TestJsonReader.testSumWithTypeCase()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from StreamingAggBatch
> StreamingAggBatch: Container record count not set
> {noformat}
> The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
> container record count to 0.
> h4. Limit
> {{TestJsonReader.testDrill_1419()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from LimitRecordBatch
> LimitRecordBatch: Container record count not set
> {noformat}
> None of the paths in {{LimitRecordBatch.innerNext()}} set the container 
> record count.
> h4. Union All
> {{TestJsonReader.testKvgenWithUnionAll()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnionAllRecordBatch
> UnionAllRecordBatch: Container record count not set
> {noformat}
> When {{UnionAllRecordBatch}} calls 
> {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
> container count.
> h4. Hash Aggregate
> {{TestJsonReader.drill_4479()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashAggBatch
> HashAggBatch: Container record count not set
> {noformat}
> Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
> record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}
> h4. And Many More
> I turns out that most operators fail to set one of the many row count 
> variables somewhere in their code path: maybe in the schema setup path, maybe 
> when building a batch along one of the many paths that operators follow. 
> Further, we have multiple row counts that must be set:
> * Values in each vector ({{setValueCount()}},
> * Row count in the container ({{setRecordCount()}}), which must be the same 
> as the vector value count.
> * Row count in the operator (batch), which is the (possibly filtered) count 
> of records presented to downstream operators. It must be less than or equal 
> to the container row count (except for an SV4.)
> * The SV2 record count, which is the number of entries in the SV2 and must be 
> the same as the batch row count (and less or equal to the container row 
> count.)
> * The SV2 actual bactch record count, which must be the same as the container 
> row count.
> * The SV4 record count, which must be the same as the batch record count. 
> With an SV4, the batch consists of 

[jira] [Updated] (DRILL-7525) Convert SequenceFiles to EVF

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7525:
---
Target Version/s: 1.19.0

> Convert SequenceFiles to EVF
> 
>
> Key: DRILL-7525
> URL: https://issues.apache.org/jira/browse/DRILL-7525
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert SequenceFiles to EVF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7557) Revise "Base" storage plugin filter-push down listerner with a builder

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7557:
---
Target Version/s: 1.19.0

> Revise "Base" storage plugin filter-push down listerner with a builder
> --
>
> Key: DRILL-7557
> URL: https://issues.apache.org/jira/browse/DRILL-7557
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7458 introduces a base framework for storage plugins and includes a 
> simplified mechanism for filter push down. Part of that mechanism includes a 
> "listener", with the bulk of the work done in a single method:
> {code:java}
> Pair> transform(GroupScan groupScan,
>   List> andTerms, Pair DisjunctionFilterSpec> orTerm);
> {code}
> Reviewers correctly pointed out that this method might be a bit too complex.
> The listener pattern pretty much forced the present design. To improve it, 
> we'd want to use a different design; maybe some kind of builder which might:
> * Accept the CNF and DNF terms via dedicated methods.
> * Perform a processing step.
> * Provide a number of methods to communicate the results, such as 1) whether 
> a new group scan is needed, 2) any CNF terms to retain, and 3) any DNF terms 
> to retain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)