[jira] [Updated] (DRILL-7751) Add Storage Plugin for Splunk
[ https://issues.apache.org/jira/browse/DRILL-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7751: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Add Storage Plugin for Splunk > - > > Key: DRILL-7751 > URL: https://issues.apache.org/jira/browse/DRILL-7751 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > # Drill Connector for Splunk > This plugin enables Drill to query Splunk. > ## Configuration > To connect Drill to Splunk, create a new storage plugin with the following > configuration: > To successfully connect, Splunk uses port `8089` for interfaces. This port > must be open for Drill to query Splunk. > ```json > { >"type":"splunk", >"username": "admin", >"password": "changeme", >"hostname": "localhost", >"port": 8089, >"earliestTime": "-14d", >"latestTime": "now", >"enabled": false > } > ``` > ## Understanding Splunk's Data Model > Splunk's primary use case is analyzing event logs with a timestamp. As such, > data is indexed by the timestamp, with the most recent data being indexed > first. By default, Splunk > will sort the data in reverse chronological order. Large Splunk > installations will put older data into buckets of hot, warm and cold storage > with the "cold" storage on the > slowest and cheapest disks. > > With this understood, it is **very** important to put time boundaries on your > Splunk queries. The Drill plugin allows you to set default values in the > configuration such that every > query you run will be bounded by these boundaries. Alternatively, you can > set the time boundaries at query time. In either case, you will achieve the > best performance when > you are asking Splunk for the smallest amount of data possible. > > ## Understanding Drill's Data Model with Splunk > Drill treats Splunk indexes as tables. Splunk's access model does not > restrict to the catalog, but does restrict access to the actual data. It is > therefore possible that you can > see the names of indexes to which you do not have access. You can view the > list of available indexes with a `SHOW TABLES IN splunk` query. > > ``` > apache drill> SHOW TABLES IN splunk; > +--++ > | TABLE_SCHEMA | TABLE_NAME | > +--++ > | splunk | summary| > | splunk | splunklogger | > | splunk | _thefishbucket | > | splunk | _audit | > | splunk | _internal | > | splunk | _introspection | > | splunk | main | > | splunk | history| > | splunk | _telemetry | > +--++ > 9 rows selected (0.304 seconds) > ``` > To query Splunk from Drill, use the following format: > ```sql > SELECT > FROM splunk. > ``` > > ## Bounding Your Queries > When you learn to query Splunk via their interface, the first thing you > learn is to bound your queries so that they are looking at the shortest time > span possible. When using >Drill to query Splunk, it is advisable to do the same thing, and Drill > offers two ways to accomplish this: via the configuration and at query time. > > ### Bounding your Queries at Query Time > The easiest way to bound your query is to do so at querytime via special > filters in the `WHERE` clause. There are two special fields, `earliestTime` > and `latestTime` which can >be set to bound the query. If they are not set, the query will be bounded > to the defaults set in the configuration. > >You can use any of the time formats specified in the Splunk documentation > here: > > https://docs.splunk.com/Documentation/Splunk/8.0.3/SearchReference/SearchTimeModifiers > > So if you wanted to see your data for the last 15 minutes, you could > execute the following query: > ```sql > SELECT > FROM splunk. > WHERE earliestTime='-15m' AND latestTime='now' > ``` > The variables set in a query override the defaults from the configuration. > > ## Data Types > Splunk does not have sophisticated data types and unfortunately does not > provide metadata from its query results. With the exception of the fields > below, Drill will interpret >all fields as `VARCHAR` and hence you will have to convert them to the > appropriate data type at query time. > > Timestamp Fields > * `_indextime` > * `_time` > > Numeric Fields > * `date_hour` > * `date_mday` > * `date_minute` > * `date_second` > * `date_year` > * `linecount` > > ### Nested Data > Splunk has two different types of nested data which roughl
[jira] [Updated] (DRILL-7763) Add Limit Pushdown to File Based Storage Plugins
[ https://issues.apache.org/jira/browse/DRILL-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7763: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Add Limit Pushdown to File Based Storage Plugins > > > Key: DRILL-7763 > URL: https://issues.apache.org/jira/browse/DRILL-7763 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > As currently implemented, when querying a file, Drill will read the entire > file even if a limit is specified in the query. This PR does a few things: > # Refactors the EasyGroupScan, EasySubScan, and EasyFormatConfig to allow > the option of pushing down limits. > # Applies this to all the EVF based format plugins which are: LogRegex, > PCAP, SPSS, Esri, Excel and Text (CSV). > Due to JSON's fluid schema, it would be unwise to adopt the limit pushdown as > it could result in very inconsistent schemata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7223) Make the timeout in TimedCallable a configurable boot time parameter
[ https://issues.apache.org/jira/browse/DRILL-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7223: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Make the timeout in TimedCallable a configurable boot time parameter > > > Key: DRILL-7223 > URL: https://issues.apache.org/jira/browse/DRILL-7223 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Aman Sinha >Assignee: Boaz Ben-Zvi >Priority: Minor > Fix For: 1.19.0 > > > The > [TimedCallable.TIMEOUT_PER_RUNNABLE_IN_MSECS|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java#L52] > is currently an internal Drill constant defined as 15 secs. This has been > there from day 1 of the introduction. Drill's TimedCallable implements the > Java concurrency's Callable interface to create timed threads. It is used by > the REFRESH METADATA command which creates multiple threads on the Foreman > node to gather Parquet metadata to build the metadata cache. > Depending on the load on the system or for very large scale number of parquet > files (millions) it is possible to exceed this timeout. While the exact root > cause of exceeding the timeout is being investigated, it makes sense to make > this timeout a configurable parameter to aid with large scale testing. This > JIRA is to make this a configurable bootstrapping option in the > drill-override. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-6953) Merge row set-based JSON reader
[ https://issues.apache.org/jira/browse/DRILL-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-6953: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Merge row set-based JSON reader > --- > > Key: DRILL-6953 > URL: https://issues.apache.org/jira/browse/DRILL-6953 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.15.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: doc-impacting > Fix For: 1.19.0 > > > The final step in the ongoing "result set loader" saga is to merge the > revised JSON reader into master. This reader does two key things: > * Demonstrates the prototypical "late schema" style of data reading (discover > schema while reading). > * Implements many tricks and hacks to handle schema changes while loading. > * Shows that, even with all these tricks, the only true solution is to > actually have a schema. > The new JSON reader: > * Uses an expanded state machine when parsing rather than the complex set of > if-statements in the current version. > * Handles reading a run of nulls before seeing the first data value (as long > as the data value shows up in the first record batch). > * Uses the result-set loader to generate fixed-size batches regardless of the > complexity, depth of structure, or width of variable-length fields. > While the JSON reader itself is helpful, the key contribution is that it > shows how to use the entire kit of parts: result set loader, projection > framework, and so on. Since the projection framework can handle an external > schema, it is also a handy foundation for the ongoing schema project. > Key work to complete after this merger will be to reconcile actual data with > the external schema. For example, if we know a column is supposed to be a > VarChar, then read the column as a VarChar regardless of the type JSON itself > picks. Or, if a column is supposed to be a Double, then convert Int and > String JSON values into Doubles. > The Row Set framework was designed to allow inserting custom column writers. > This would be a great opportunity to do the work needed to create them. Then, > use the new JSON framework to allow parsing a JSON field as a specified Drill > type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7728) Drill SPI framework
[ https://issues.apache.org/jira/browse/DRILL-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7728: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Drill SPI framework > --- > > Key: DRILL-7728 > URL: https://issues.apache.org/jira/browse/DRILL-7728 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > Provide the basic framework to load an extension in Drill, modelled after the > Java Service Provider concept. Excludes full class loader isolation for now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7554) Convert LTSV Format Plugin to EVF
[ https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7554: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Convert LTSV Format Plugin to EVF > - > > Key: DRILL-7554 > URL: https://issues.apache.org/jira/browse/DRILL-7554 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7535) Convert Ltsv to EVF
[ https://issues.apache.org/jira/browse/DRILL-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7535: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Convert Ltsv to EVF > --- > > Key: DRILL-7535 > URL: https://issues.apache.org/jira/browse/DRILL-7535 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7729) Use java.time in column accessors
[ https://issues.apache.org/jira/browse/DRILL-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7729: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Use java.time in column accessors > - > > Key: DRILL-7729 > URL: https://issues.apache.org/jira/browse/DRILL-7729 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > Use {{java.time}} classes in the column accessors, except for {{Interval}}, > which has no {{java.time}} equivalent. Doing so allows us to create a row-set > version of Drill's JSON writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-4232) Support for EXCEPT set operator
[ https://issues.apache.org/jira/browse/DRILL-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-4232: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Support for EXCEPT set operator > --- > > Key: DRILL-4232 > URL: https://issues.apache.org/jira/browse/DRILL-4232 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Victoria Markman >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7112) Code Cleanup for HTTPD Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7112: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Code Cleanup for HTTPD Format Plugin > > > Key: DRILL-7112 > URL: https://issues.apache.org/jira/browse/DRILL-7112 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.15.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.19.0 > > > Address code clean up issues cited in > https://github.com/apache/drill/pull/1635. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7733) Use streaming for REST JSON queries
[ https://issues.apache.org/jira/browse/DRILL-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7733: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Use streaming for REST JSON queries > --- > > Key: DRILL-7733 > URL: https://issues.apache.org/jira/browse/DRILL-7733 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > Several uses on the user and dev mail lists have complained about the memory > overhead when running a REST JSON query: {{http:://node:8047/query.json}}. > The current implementation buffers the entire result set in memory, then lets > Jersey/Jetty convert the results to JSON. The result is very heavy heap use > for larger query result sets. > This ticket requests a change to use streaming. As each batch arrives at the > Screen operator, convert that batch to JSON and directly stream the results > to the client network connection, much as is done for the native client > connection. > For backward compatibility, the form of the JSON must be the same as the > current API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7458) Base storage plugin framework
[ https://issues.apache.org/jira/browse/DRILL-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7458: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Base storage plugin framework > - > > Key: DRILL-7458 > URL: https://issues.apache.org/jira/browse/DRILL-7458 > Project: Apache Drill > Issue Type: Improvement >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: doc-impacting > Fix For: 1.19.0 > > > The "Easy" framework allows third-parties to add format plugins to Drill with > moderate effort. (The process could be easier, but "Easy" makes it as simple > as possible given the current structure.) > At present, no such "starter" framework exists for storage plugins. Further, > multiple storage plugins have implemented filter push down, seemingly by > copying large blocks of code. > This ticket offers a "base" framework for storage plugins and for filter > push-downs. The framework builds on the EVF, allowing plugins to also support > project push down. > The framework has a "test mule" storage plugin to verify functionality, and > was used as the basis of an REST-like plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase
[ https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7558: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Generalize filter push-down planner phase > - > > Key: DRILL-7558 > URL: https://issues.apache.org/jira/browse/DRILL-7558 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > DRILL-7458 provides a base framework for storage plugins, including a > simplified filter push-down mechanism. [~volodymyr] notes that it may be > *too* simple: > {quote} > What about the case when this rule was applied for one filter, but planner at > some point pushed another filter above the scan, for example, if we have such > case: > {code} > Filter(a=2) > Join(t1.b=t2.b, type=inner) > Filter(b=3) > Scan(t1) > Scan(t2) > {code} > Filter b=3 will be pushed into scan, planner will push filter above join: > {code} > Join(t1.b=t2.b, type=inner) > Filter(a=2) > Scan(t1, b=3) > Scan(t2) > {code} > In this case, check whether filter was pushed is not enough. > {quote} > Drill divides planning into a number of *phases*, each defined by a set of > *rules*. Most storage plugins perform filter push-down during the physical > planning stage. However, by this point, Drill has already decided on the > degree of parallelism: it is too late to use filter push-down to set the > degree of parallelism. Yet, if using something like a REST API, we want to > use filters to help us shard the query (that is, to set the degree of > parallelism.) > > DRILL-7458 performs filter push-down at *logical* planning time to work > around the above limitation. (In Drill, there are three different phases that > could be considered the logical phase, depending on which planning options > are set to control Calcite.) > [~volodymyr] points out that the the logical plan phase may be wrong because > it will perform rewrites of the type he cited. > Thus, we need to research where to insert filter push down. It must come: > * After rewrites of the kind described above. > * After join equivalence computations. (See DRILL-7556.) > * Before the decision is made about the number of minor fragments. > The goal of this ticket is to either: > * Research to identify an existing phase which satisfies these requirements, > or > * Create a new phase. > Due to the way Calcite works, it is not a good idea to have a single phase > handle two tasks that depend on one another. That is, we cannot combine > filter push down in a phase which defines the filters, nor can we add filter > push-down in a phase that choose parallelism. > Background: Calcite is a rule-based query planner inspired by > [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf]. > The above issue is a flaw with rule-based planners and was identified as > early as the [Cascades query framework > paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] > which was the follow-up to Volcano. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7550) Add Storage Plugin for Cassandra
[ https://issues.apache.org/jira/browse/DRILL-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7550: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Add Storage Plugin for Cassandra > > > Key: DRILL-7550 > URL: https://issues.apache.org/jira/browse/DRILL-7550 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Apache Cassandra is a free and open-source, distributed, wide column store, > NoSQL database management system designed to handle large amounts of data > across many commodity servers, providing high availability with no single > point of failure. [1] > This PR would enable Drill to query Cassandra data stores. > > [1]: https://en.wikipedia.org/wiki/Apache_Cassandra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7551) Improve Error Reporting
[ https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7551: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Improve Error Reporting > --- > > Key: DRILL-7551 > URL: https://issues.apache.org/jira/browse/DRILL-7551 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > This Jira is to serve as a master Jira issue to improve the usability of > error messages. Instead of dumping stack traces, the overall goal is to give > the user something that can actually explain: > # What went wrong > # How to fix > Work that relates to this, should be created as subtasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7712) Fix issues after ZK upgrade
[ https://issues.apache.org/jira/browse/DRILL-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7712: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Fix issues after ZK upgrade > --- > > Key: DRILL-7712 > URL: https://issues.apache.org/jira/browse/DRILL-7712 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Arina Ielchiieva >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > > Warnings during jdbc-all build (absent when building with Mapr profile): > {noformat} > netty-transport-native-epoll-4.1.45.Final.jar, > netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 46 > overlapping classes: > - io.netty.channel.epoll.AbstractEpollStreamChannel$2 > - io.netty.channel.epoll.AbstractEpollServerChannel$EpollServerSocketUnsafe > - io.netty.channel.epoll.EpollDatagramChannel > - io.netty.channel.epoll.AbstractEpollStreamChannel$SpliceInChannelTask > - io.netty.channel.epoll.NativeDatagramPacketArray > - io.netty.channel.epoll.EpollSocketChannelConfig > - io.netty.channel.epoll.EpollTcpInfo > - io.netty.channel.epoll.EpollEventArray > - io.netty.channel.epoll.EpollEventLoop > - io.netty.channel.epoll.EpollSocketChannel > - 36 more... > netty-transport-native-unix-common-4.1.45.Final.jar, > netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 15 > overlapping classes: > - io.netty.channel.unix.Errors$NativeConnectException > - io.netty.channel.unix.ServerDomainSocketChannel > - io.netty.channel.unix.DomainSocketAddress > - io.netty.channel.unix.Socket > - io.netty.channel.unix.NativeInetAddress > - io.netty.channel.unix.DomainSocketChannelConfig > - io.netty.channel.unix.Errors$NativeIoException > - io.netty.channel.unix.DomainSocketReadMode > - io.netty.channel.unix.ErrorsStaticallyReferencedJniMethods > - io.netty.channel.unix.UnixChannel > - 5 more... > maven-shade-plugin has detected that some class files are > present in two or more JARs. When this happens, only one > single version of the class is copied to the uber jar. > Usually this is not harmful and you can skip these warnings, > otherwise try to manually exclude artifacts based on > mvn dependency:tree -Ddetail=true and the above output. > See http://maven.apache.org/plugins/maven-shade-plugin/ > {noformat} > Additional warning build with Mapr profile: > {noformat} > The following patterns were never triggered in this artifact inclusion filter: > o 'org.apache.zookeeper:zookeeper-jute' > {noformat} > NPEs in tests (though tests do not fail): > {noformat} > [INFO] Running org.apache.drill.exec.coord.zk.TestZookeeperClient > 4880 > java.lang.NullPointerException > 4881 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 4882 > at > org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251) > 4883 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583) > 4884 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546) > 4885 > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java: > {noformat} > {noformat} > [INFO] Running org.apache.drill.exec.coord.zk.TestEphemeralStore > 5278 > java.lang.NullPointerException > 5279 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 5280 > at org.apache.zookeepe > {noformat} > {noformat} > [INFO] Running org.apache.drill.yarn.zk.TestAmRegistration > 6767 > java.lang.NullPointerException > 6768 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 6769 > at > org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251) > 6770 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583) > 6771 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546) > 6772 > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:929) > 6773 > at org.apache.curator.t > {noformat} > {noformat} > org.apache.drill.yarn.client.TestCommandLineOptions > 6823 > java.lang.NullPointerException > 6824 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 6825 > at > org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251) > 6826 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583) > 6827 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546) > 6828 >
[jira] [Updated] (DRILL-7366) Improve Null Handling for UDFs with Complex Output
[ https://issues.apache.org/jira/browse/DRILL-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7366: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Improve Null Handling for UDFs with Complex Output > -- > > Key: DRILL-7366 > URL: https://issues.apache.org/jira/browse/DRILL-7366 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > If there is a UDF which has a complex field (Map or List) as output, Drill > does not allow the UDF to have nullable input and it creates additional > complexity when writing these kinds of UDFs. > I therefore would like to propose that two options be added to the > FunctionTemplate for null handling: {{EMPTY_LIST_IF_NULL}}, and > {{EMPTY_MAP_IF_NULL}} which, would simplify UDF creation. I'm envisioning > that if either of these options were selected, and the UDF receives any null > value as input, the UDF will return either an empty map or list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7597) Read selected JSON colums as JSON text
[ https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7597: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Read selected JSON colums as JSON text > -- > > Key: DRILL-7597 > URL: https://issues.apache.org/jira/browse/DRILL-7597 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > See DRILL-7598. The use case wishes to read selected JSON columns as JSON > text rather than parsing the JSON into a relational structure as is done > today in the JSON reader. > The JSON reader supports "all text mode", but, despite the name, this mode > only works for scalars (primitives) such as numbers. It does not work for > structured types such as objects or arrays: such types are always parsed into > Drill structures (which causes the conflict describe in DRILL-7598.) > Instead, we need a feature to read an entire JSON value, including structure, > as a JSON string. > This feature would work best when the user can parse some parts of a JSON > input file into relational structure, others as JSON. (This is the use case > which the user list user faced.) So, we need a way to do that. > Drill has a "provided schema" feature, which, at present, is used only for > text files (and recently with limited support in Avro.) We are working on a > project to add such support for JSON. > Perhaps we can leverage this feature to allow the JSON reader to read chunks > of JSON as text which can be manipulated by those future JSON functions. In > the example, column "c" would be read as JSON text; Drill would not attempt > to parse it into a relational structure. > As it turns out, the "new" JSON reader we're working on originally had a > feature to do just that, but we took it out because we were not sure it was > needed. Sounds like we should restore it as part of our "provided schema" > support. It could work this way: if you CREATE SCHEMA with column "c" as > VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the > entire nested structure as JSON without trying to parse it into a relational > structure. > This ticket asks to build the concept: > * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field > to be read as JSON. > * Implement the "read column as JSON" feature in the new EVF-based JSON > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7557) Revise "Base" storage plugin filter-push down listerner with a builder
[ https://issues.apache.org/jira/browse/DRILL-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7557: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Revise "Base" storage plugin filter-push down listerner with a builder > -- > > Key: DRILL-7557 > URL: https://issues.apache.org/jira/browse/DRILL-7557 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > DRILL-7458 introduces a base framework for storage plugins and includes a > simplified mechanism for filter push down. Part of that mechanism includes a > "listener", with the bulk of the work done in a single method: > {code:java} > Pair> transform(GroupScan groupScan, > List> andTerms, Pair DisjunctionFilterSpec> orTerm); > {code} > Reviewers correctly pointed out that this method might be a bit too complex. > The listener pattern pretty much forced the present design. To improve it, > we'd want to use a different design; maybe some kind of builder which might: > * Accept the CNF and DNF terms via dedicated methods. > * Perform a processing step. > * Provide a number of methods to communicate the results, such as 1) whether > a new group scan is needed, 2) any CNF terms to retain, and 3) any DNF terms > to retain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7531) Convert format plugins to EVF
[ https://issues.apache.org/jira/browse/DRILL-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7531: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Convert format plugins to EVF > - > > Key: DRILL-7531 > URL: https://issues.apache.org/jira/browse/DRILL-7531 > Project: Apache Drill > Issue Type: Improvement >Reporter: Arina Ielchiieva >Priority: Major > Fix For: 1.19.0 > > > This is umbrella Jira to track down process of converting format plugins to > EVF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7787) Apache drill failed to start
[ https://issues.apache.org/jira/browse/DRILL-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7787: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Apache drill failed to start > > > Key: DRILL-7787 > URL: https://issues.apache.org/jira/browse/DRILL-7787 > Project: Apache Drill > Issue Type: Bug >Reporter: Om Prasad Surapu >Priority: Major > Fix For: 1.19.0 > > > Hi Team, > I have apache drill cluster setup with apache-drill-1.17.0 and started in > distributed mode (with zookeeper). Drill started and no issues reported. > > Have installed apache-drill-1.18.0 to fix DRILL-7786 but drill failed to > start with below exception. I have tried zookeeper version 3.5.8 and 3.4.11). > Could you help me out to fix this issue? > Exception in thread "main" > org.apache.drill.exec.exception.DrillbitStartupException: Failure during > initial startup of Drillbit. > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:588) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:554) > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:550) > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable > to put > at > org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:326) > at > org.apache.drill.exec.store.sys.store.ZookeeperPersistentStore.putIfAbsent(ZookeeperPersistentStore.java:119) > at > org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.prepareStores(RemoteFunctionRegistry.java:201) > at > org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.init(RemoteFunctionRegistry.java:108) > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:233) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:584) > ... 2 more > Caused by: org.apache.zookeeper.KeeperException$UnimplementedException: > KeeperErrorCode = Unimplemented for /drill/udf/registry > at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637) > at > org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180) > at > org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156) > at > org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81) > at > org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153) > at > org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51) > at > org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:318) > ... 7 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7526) Assertion Error when only type is used with schema in table function
[ https://issues.apache.org/jira/browse/DRILL-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7526: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Assertion Error when only type is used with schema in table function > > > Key: DRILL-7526 > URL: https://issues.apache.org/jira/browse/DRILL-7526 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > > {{org.apache.drill.TestSchemaWithTableFunction}} > {noformat} > @Test > public void testWithTypeAndSchema() { > String query = "select Year from > table(dfs.`store/text/data/cars.csvh`(type=> 'text', " + > "schema=>'inline=(`Year` int)')) where Make = 'Ford'"; > queryBuilder().sql(query).print(); > } > {noformat} > {noformat} > Caused by: java.lang.AssertionError: BOOLEAN > at > org.apache.calcite.sql.type.SqlTypeExplicitPrecedenceList.compareTypePrecedence(SqlTypeExplicitPrecedenceList.java:140) > at org.apache.calcite.sql.SqlUtil.bestMatch(SqlUtil.java:687) > at > org.apache.calcite.sql.SqlUtil.filterRoutinesByTypePrecedence(SqlUtil.java:656) > at > org.apache.calcite.sql.SqlUtil.lookupSubjectRoutines(SqlUtil.java:515) > at org.apache.calcite.sql.SqlUtil.lookupRoutine(SqlUtil.java:435) > at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:240) > at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:218) > at > org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5640) > at > org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5627) > at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1692) > at > org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:53) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3129) > at > org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111) > at > org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3383) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:944) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:651) > at > org.apache.drill.exec.planner.sql.conversion.SqlConverter.validate(SqlConverter.java:189) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:648) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:196) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:170) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93) > at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:590) > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:275) > ... 1 more > {noformat} > Note: when other format options are used or schema is used alone, everything > works fine. > See test examples: > org.apache.drill.Test
[jira] [Updated] (DRILL-7671) Fix builds for cdh and hdp profiles
[ https://issues.apache.org/jira/browse/DRILL-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7671: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Fix builds for cdh and hdp profiles > --- > > Key: DRILL-7671 > URL: https://issues.apache.org/jira/browse/DRILL-7671 > Project: Apache Drill > Issue Type: Task >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > > cdh and hdp profiles use too obsolete versions of Hadoop and other libraries. > So when attempting to build the project with these profiles, the build fails > with compilation errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7325) Many operators do not set container record count
[ https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7325: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Many operators do not set container record count > > > Key: DRILL-7325 > URL: https://issues.apache.org/jira/browse/DRILL-7325 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > See DRILL-7324. The following are problems found because some operators fail > to set the record count for their containers. > h4. Scan > TestComplexTypeReader, on cluster setup, using the PojoRecordReader: > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from ScanBatch > ScanBatch: Container record count not set > Reason: ScanBatch never sets the record count of its container (this is a > generic issue, not specific to the PojoRecordReader). > h4. Filter > {{TestComplexTypeReader.testNonExistentFieldConverting()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from FilterRecordBatch > FilterRecordBatch: Container record count not set > {noformat} > h4. Hash Join > {{TestComplexTypeReader.test_array()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from HashJoinBatch > HashJoinBatch: Container record count not set > {noformat} > Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} > with no records. > h4. Project > TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, > schema-only batches): > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from ProjectRecordBatch > ProjectRecordBatch: Container record count not set > {noformat} > Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but > does not set the value count to 0. > h4. Unordered Receiver > {{TestCsvWithSchema.testMultiFileSchema()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from UnorderedReceiverBatch > UnorderedReceiverBatch: Container record count not set > {noformat} > The problem is that {{RecordBatchLoader.load()}} does not set the container > record count. > h4. Streaming Aggregate > {{TestJsonReader.testSumWithTypeCase()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from StreamingAggBatch > StreamingAggBatch: Container record count not set > {noformat} > The problem is that {{StreamingAggBatch.buildSchema()}} does not set the > container record count to 0. > h4. Limit > {{TestJsonReader.testDrill_1419()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from LimitRecordBatch > LimitRecordBatch: Container record count not set > {noformat} > None of the paths in {{LimitRecordBatch.innerNext()}} set the container > record count. > h4. Union All > {{TestJsonReader.testKvgenWithUnionAll()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from UnionAllRecordBatch > UnionAllRecordBatch: Container record count not set > {noformat} > When {{UnionAllRecordBatch}} calls > {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the > container count. > h4. Hash Aggregate > {{TestJsonReader.drill_4479()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from HashAggBatch > HashAggBatch: Container record count not set > {noformat} > Problem is that {{HashAggBatch.buildSchema()}} does not set the container > record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}} > h4. And Many More > I turns out that most operators fail to set one of the many row count > variables somewhere in their code path: maybe in the schema setup path, maybe > when building a batch along one of the many paths that operators follow. > Further, we have multiple row counts that must be set: > * Values in each vector ({{setValueCount()}}, > * Row count in the container ({{setRecordCount()}}), which must be the same > as the vector value count. > * Row count in the operator (batch), which is the (possibly filtered) count > of records presented to downstream operators. It must be less than or equal > to the container row count (except for an SV4.) > * The SV2 record count, which is the number of entries in the SV2 and must be > the same as the batch row count (and less or equal to the container row > count.) > * The SV2 actual bactch record count, which must be the same as the container > row count. > * The SV4 record count, which must be the same as the batch record count. > W
[jira] [Updated] (DRILL-7556) Generalize the "Base" storage plugin filter push down mechanism
[ https://issues.apache.org/jira/browse/DRILL-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7556: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Generalize the "Base" storage plugin filter push down mechanism > --- > > Key: DRILL-7556 > URL: https://issues.apache.org/jira/browse/DRILL-7556 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > DRILL-7458 adds a Base framework for storage plugins which includes a > simplified representation of filters that can be pushed down into Drill. It > makes the assumption that plugins can generally only handle filters of the > form: > {code} > column relop constant > {code} > For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" > expressions of the form {{constant relop column}}.) > [~volodymyr] suggests this is too narrow and suggests two additional cases: > {code} > column-expr relop constant > fn(column) = conststant > {code} > Examples: > {code:sql} > foo + 10 = 20 > substr(bar, 2, 6) = 'Fred' > {code} > The first case should be handled by a general expression rewriter: simplify > constant expressions: > {code:sql} > foo + 10 = 20 --> foo = 10 > {code} > Then, filter-push down need only handle the simplified expression rather than > every push-down mechanism needing to do the simplification. > For this ticket, we wish to handle the second case: any expression that > contains a single column associated with the target table. Provide a new > push-down node to handle the non-relop case so that simple plugins can simply > ignore such expressions, but more complex plugins (such as Parquet) can > optionally handle them. > A second improvement is to handle the more complex case: two or more columns, > all of which come from the same target table. For example: > {code:sql} > foo + bar = 20 > {code} > Where both {{foo}} and {{bar}} are from the same table. It would be a very > sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle > this case, but it should be available. > As part of this work, we must handle join-equivalent columns: > {code:sql} > SELECT ... FROM t1, t2 > WHERE t1.a = t2.b > AND t1.a = 20 > {code} > If the plugin for table {{t2}} can handle filter push-down, then the > expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}. > It is not clear if the Drill logical plan already handles join equivalence. > If not, it should be added. If so, the filter push-down mechanism should add > documentation that describes how the mechanism works. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks
[ https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7270: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Fix non-https dependency urls and add checksum checks > - > > Key: DRILL-7270 > URL: https://issues.apache.org/jira/browse/DRILL-7270 > Project: Apache Drill > Issue Type: Task > Components: Security >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.19.0 > > > Review any build scripts and configurations for insecure urls and make > appropriate fixes to use secure urls. > Projects like Lucene do checksum whitelists of all their build dependencies, > and you may wish to consider that as a > protection against threats beyond just MITM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7284) reusing the hashCodes computed at exchange nodes
[ https://issues.apache.org/jira/browse/DRILL-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7284: --- Fix Version/s: (was: 1.18.0) 1.19.0 > reusing the hashCodes computed at exchange nodes > > > Key: DRILL-7284 > URL: https://issues.apache.org/jira/browse/DRILL-7284 > Project: Apache Drill > Issue Type: New Feature >Reporter: Weijie Tong >Assignee: Weijie Tong >Priority: Major > Fix For: 1.19.0 > > > To HashJoin or HashAggregate, we will shuffle the input data according to > hashCodes of join conditions or group by keys at the exchange nodes. This > computing of the hash codes will be redo at the HashJoin or HashAggregate > nodes. We could send the computed hashCodes of exchange nodes to the upper > nodes. So the HashJoin or HashAggregate nodes will not need to do the hash > computing again. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7192) Drill limits rows when autoLimit is disabled
[ https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7192: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Drill limits rows when autoLimit is disabled > > > Key: DRILL-7192 > URL: https://issues.apache.org/jira/browse/DRILL-7192 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > > In DRILL-7048 was implemented autoLimit for JDBC and rest clients. > *Steps to reproduce the issue:* > 1. Check that autoLimit was disabled, if not, disable it and restart Drill. > 2. Submit any query, and verify that rows count is correct, for example, > {code:sql} > SELECT * FROM cp.`employee.json`; > {code} > returns 1,155 rows > 3. Enable autoLimit for sqlLine sqlLine client: > {code:sql} > !set rowLimit 10 > {code} > 4. Submit the same query and verify that the result has 10 rows. > 5. Disable autoLimit: > {code:sql} > !set rowLimit 0 > {code} > 6. Submit the same query, but for this time, *it returns 10 rows instead of > 1,155*. > Correct rows count is returned only after creating a new connection. > The same issue is also observed for SQuirreL SQL client, but for example, for > Postgres, it works correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7525) Convert SequenceFiles to EVF
[ https://issues.apache.org/jira/browse/DRILL-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7525: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Convert SequenceFiles to EVF > > > Key: DRILL-7525 > URL: https://issues.apache.org/jira/browse/DRILL-7525 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.17.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.19.0 > > > Convert SequenceFiles to EVF -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7133) Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin
[ https://issues.apache.org/jira/browse/DRILL-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7133: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin > -- > > Key: DRILL-7133 > URL: https://issues.apache.org/jira/browse/DRILL-7133 > Project: Apache Drill > Issue Type: Improvement >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > There was a JIRA (https://issues.apache.org/jira/browse/DRILL-7032) which > resulted in some improvements to the PCAP format plugin which converted the > TCP flags to boolean format and also added a {{is_corrupt}} boolean field. > This field allows users to look for packets that are corrupt. > Unfortunately, this functionality is not duplicated in the PCAP-NG format > plugin, so this JIRA proposes to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7621) Refactor ExecConstants and PlannerSettings constant classes
[ https://issues.apache.org/jira/browse/DRILL-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7621: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Refactor ExecConstants and PlannerSettings constant classes > --- > > Key: DRILL-7621 > URL: https://issues.apache.org/jira/browse/DRILL-7621 > Project: Apache Drill > Issue Type: Task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.19.0 > > > According to [the > discussion|http://mail-archives.apache.org/mod_mbox/drill-dev/202003.mbox/%3CBCB4CFC2-8BC5-43C6-8BD4-956F66F6D0D3%40gmail.com%3E], > it makes sense to split the classes into multiple constant interfaces and > get rid of validator constants. Then the validator instances won't be used > for getting option values and the general approach will be getting type > specific option value by string key from config instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7192) Drill limits rows when autoLimit is disabled
[ https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7192: --- Target Version/s: 1.19.0 > Drill limits rows when autoLimit is disabled > > > Key: DRILL-7192 > URL: https://issues.apache.org/jira/browse/DRILL-7192 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.18.0 > > > In DRILL-7048 was implemented autoLimit for JDBC and rest clients. > *Steps to reproduce the issue:* > 1. Check that autoLimit was disabled, if not, disable it and restart Drill. > 2. Submit any query, and verify that rows count is correct, for example, > {code:sql} > SELECT * FROM cp.`employee.json`; > {code} > returns 1,155 rows > 3. Enable autoLimit for sqlLine sqlLine client: > {code:sql} > !set rowLimit 10 > {code} > 4. Submit the same query and verify that the result has 10 rows. > 5. Disable autoLimit: > {code:sql} > !set rowLimit 0 > {code} > 6. Submit the same query, but for this time, *it returns 10 rows instead of > 1,155*. > Correct rows count is returned only after creating a new connection. > The same issue is also observed for SQuirreL SQL client, but for example, for > Postgres, it works correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7551) Improve Error Reporting
[ https://issues.apache.org/jira/browse/DRILL-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7551: --- Target Version/s: 1.19.0 > Improve Error Reporting > --- > > Key: DRILL-7551 > URL: https://issues.apache.org/jira/browse/DRILL-7551 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > This Jira is to serve as a master Jira issue to improve the usability of > error messages. Instead of dumping stack traces, the overall goal is to give > the user something that can actually explain: > # What went wrong > # How to fix > Work that relates to this, should be created as subtasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7550) Add Storage Plugin for Cassandra
[ https://issues.apache.org/jira/browse/DRILL-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7550: --- Target Version/s: 1.19.0 > Add Storage Plugin for Cassandra > > > Key: DRILL-7550 > URL: https://issues.apache.org/jira/browse/DRILL-7550 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > Apache Cassandra is a free and open-source, distributed, wide column store, > NoSQL database management system designed to handle large amounts of data > across many commodity servers, providing high availability with no single > point of failure. [1] > This PR would enable Drill to query Cassandra data stores. > > [1]: https://en.wikipedia.org/wiki/Apache_Cassandra -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7284) reusing the hashCodes computed at exchange nodes
[ https://issues.apache.org/jira/browse/DRILL-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7284: --- Target Version/s: 1.19.0 > reusing the hashCodes computed at exchange nodes > > > Key: DRILL-7284 > URL: https://issues.apache.org/jira/browse/DRILL-7284 > Project: Apache Drill > Issue Type: New Feature >Reporter: Weijie Tong >Assignee: Weijie Tong >Priority: Major > Fix For: 1.18.0 > > > To HashJoin or HashAggregate, we will shuffle the input data according to > hashCodes of join conditions or group by keys at the exchange nodes. This > computing of the hash codes will be redo at the HashJoin or HashAggregate > nodes. We could send the computed hashCodes of exchange nodes to the upper > nodes. So the HashJoin or HashAggregate nodes will not need to do the hash > computing again. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7597) Read selected JSON colums as JSON text
[ https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7597: --- Target Version/s: 1.19.0 > Read selected JSON colums as JSON text > -- > > Key: DRILL-7597 > URL: https://issues.apache.org/jira/browse/DRILL-7597 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.18.0 > > > See DRILL-7598. The use case wishes to read selected JSON columns as JSON > text rather than parsing the JSON into a relational structure as is done > today in the JSON reader. > The JSON reader supports "all text mode", but, despite the name, this mode > only works for scalars (primitives) such as numbers. It does not work for > structured types such as objects or arrays: such types are always parsed into > Drill structures (which causes the conflict describe in DRILL-7598.) > Instead, we need a feature to read an entire JSON value, including structure, > as a JSON string. > This feature would work best when the user can parse some parts of a JSON > input file into relational structure, others as JSON. (This is the use case > which the user list user faced.) So, we need a way to do that. > Drill has a "provided schema" feature, which, at present, is used only for > text files (and recently with limited support in Avro.) We are working on a > project to add such support for JSON. > Perhaps we can leverage this feature to allow the JSON reader to read chunks > of JSON as text which can be manipulated by those future JSON functions. In > the example, column "c" would be read as JSON text; Drill would not attempt > to parse it into a relational structure. > As it turns out, the "new" JSON reader we're working on originally had a > feature to do just that, but we took it out because we were not sure it was > needed. Sounds like we should restore it as part of our "provided schema" > support. It could work this way: if you CREATE SCHEMA with column "c" as > VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the > entire nested structure as JSON without trying to parse it into a relational > structure. > This ticket asks to build the concept: > * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field > to be read as JSON. > * Implement the "read column as JSON" feature in the new EVF-based JSON > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7526) Assertion Error when only type is used with schema in table function
[ https://issues.apache.org/jira/browse/DRILL-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7526: --- Target Version/s: 1.19.0 > Assertion Error when only type is used with schema in table function > > > Key: DRILL-7526 > URL: https://issues.apache.org/jira/browse/DRILL-7526 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.18.0 > > > {{org.apache.drill.TestSchemaWithTableFunction}} > {noformat} > @Test > public void testWithTypeAndSchema() { > String query = "select Year from > table(dfs.`store/text/data/cars.csvh`(type=> 'text', " + > "schema=>'inline=(`Year` int)')) where Make = 'Ford'"; > queryBuilder().sql(query).print(); > } > {noformat} > {noformat} > Caused by: java.lang.AssertionError: BOOLEAN > at > org.apache.calcite.sql.type.SqlTypeExplicitPrecedenceList.compareTypePrecedence(SqlTypeExplicitPrecedenceList.java:140) > at org.apache.calcite.sql.SqlUtil.bestMatch(SqlUtil.java:687) > at > org.apache.calcite.sql.SqlUtil.filterRoutinesByTypePrecedence(SqlUtil.java:656) > at > org.apache.calcite.sql.SqlUtil.lookupSubjectRoutines(SqlUtil.java:515) > at org.apache.calcite.sql.SqlUtil.lookupRoutine(SqlUtil.java:435) > at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:240) > at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:218) > at > org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5640) > at > org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5627) > at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1692) > at > org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:53) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3129) > at > org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111) > at > org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:63) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3383) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:944) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:651) > at > org.apache.drill.exec.planner.sql.conversion.SqlConverter.validate(SqlConverter.java:189) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:648) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:196) > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:170) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128) > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93) > at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:590) > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:275) > ... 1 more > {noformat} > Note: when other format options are used or schema is used alone, everything > works fine. > See test examples: > org.apache.drill.TestSchemaWithTableFunction#testSchema
[jira] [Updated] (DRILL-7787) Apache drill failed to start
[ https://issues.apache.org/jira/browse/DRILL-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7787: --- Target Version/s: 1.19.0 > Apache drill failed to start > > > Key: DRILL-7787 > URL: https://issues.apache.org/jira/browse/DRILL-7787 > Project: Apache Drill > Issue Type: Bug >Reporter: Om Prasad Surapu >Priority: Major > Fix For: 1.18.0 > > > Hi Team, > I have apache drill cluster setup with apache-drill-1.17.0 and started in > distributed mode (with zookeeper). Drill started and no issues reported. > > Have installed apache-drill-1.18.0 to fix DRILL-7786 but drill failed to > start with below exception. I have tried zookeeper version 3.5.8 and 3.4.11). > Could you help me out to fix this issue? > Exception in thread "main" > org.apache.drill.exec.exception.DrillbitStartupException: Failure during > initial startup of Drillbit. > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:588) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:554) > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:550) > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable > to put > at > org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:326) > at > org.apache.drill.exec.store.sys.store.ZookeeperPersistentStore.putIfAbsent(ZookeeperPersistentStore.java:119) > at > org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.prepareStores(RemoteFunctionRegistry.java:201) > at > org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.init(RemoteFunctionRegistry.java:108) > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:233) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:584) > ... 2 more > Caused by: org.apache.zookeeper.KeeperException$UnimplementedException: > KeeperErrorCode = Unimplemented for /drill/udf/registry > at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637) > at > org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180) > at > org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156) > at > org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81) > at > org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153) > at > org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51) > at > org.apache.drill.exec.coord.zk.ZookeeperClient.putIfAbsent(ZookeeperClient.java:318) > ... 7 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase
[ https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7558: --- Target Version/s: 1.19.0 > Generalize filter push-down planner phase > - > > Key: DRILL-7558 > URL: https://issues.apache.org/jira/browse/DRILL-7558 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.18.0 > > > DRILL-7458 provides a base framework for storage plugins, including a > simplified filter push-down mechanism. [~volodymyr] notes that it may be > *too* simple: > {quote} > What about the case when this rule was applied for one filter, but planner at > some point pushed another filter above the scan, for example, if we have such > case: > {code} > Filter(a=2) > Join(t1.b=t2.b, type=inner) > Filter(b=3) > Scan(t1) > Scan(t2) > {code} > Filter b=3 will be pushed into scan, planner will push filter above join: > {code} > Join(t1.b=t2.b, type=inner) > Filter(a=2) > Scan(t1, b=3) > Scan(t2) > {code} > In this case, check whether filter was pushed is not enough. > {quote} > Drill divides planning into a number of *phases*, each defined by a set of > *rules*. Most storage plugins perform filter push-down during the physical > planning stage. However, by this point, Drill has already decided on the > degree of parallelism: it is too late to use filter push-down to set the > degree of parallelism. Yet, if using something like a REST API, we want to > use filters to help us shard the query (that is, to set the degree of > parallelism.) > > DRILL-7458 performs filter push-down at *logical* planning time to work > around the above limitation. (In Drill, there are three different phases that > could be considered the logical phase, depending on which planning options > are set to control Calcite.) > [~volodymyr] points out that the the logical plan phase may be wrong because > it will perform rewrites of the type he cited. > Thus, we need to research where to insert filter push down. It must come: > * After rewrites of the kind described above. > * After join equivalence computations. (See DRILL-7556.) > * Before the decision is made about the number of minor fragments. > The goal of this ticket is to either: > * Research to identify an existing phase which satisfies these requirements, > or > * Create a new phase. > Due to the way Calcite works, it is not a good idea to have a single phase > handle two tasks that depend on one another. That is, we cannot combine > filter push down in a phase which defines the filters, nor can we add filter > push-down in a phase that choose parallelism. > Background: Calcite is a rule-based query planner inspired by > [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf]. > The above issue is a flaw with rule-based planners and was identified as > early as the [Cascades query framework > paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] > which was the follow-up to Volcano. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7556) Generalize the "Base" storage plugin filter push down mechanism
[ https://issues.apache.org/jira/browse/DRILL-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7556: --- Target Version/s: 1.19.0 > Generalize the "Base" storage plugin filter push down mechanism > --- > > Key: DRILL-7556 > URL: https://issues.apache.org/jira/browse/DRILL-7556 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.18.0 > > > DRILL-7458 adds a Base framework for storage plugins which includes a > simplified representation of filters that can be pushed down into Drill. It > makes the assumption that plugins can generally only handle filters of the > form: > {code} > column relop constant > {code} > For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" > expressions of the form {{constant relop column}}.) > [~volodymyr] suggests this is too narrow and suggests two additional cases: > {code} > column-expr relop constant > fn(column) = conststant > {code} > Examples: > {code:sql} > foo + 10 = 20 > substr(bar, 2, 6) = 'Fred' > {code} > The first case should be handled by a general expression rewriter: simplify > constant expressions: > {code:sql} > foo + 10 = 20 --> foo = 10 > {code} > Then, filter-push down need only handle the simplified expression rather than > every push-down mechanism needing to do the simplification. > For this ticket, we wish to handle the second case: any expression that > contains a single column associated with the target table. Provide a new > push-down node to handle the non-relop case so that simple plugins can simply > ignore such expressions, but more complex plugins (such as Parquet) can > optionally handle them. > A second improvement is to handle the more complex case: two or more columns, > all of which come from the same target table. For example: > {code:sql} > foo + bar = 20 > {code} > Where both {{foo}} and {{bar}} are from the same table. It would be a very > sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle > this case, but it should be available. > As part of this work, we must handle join-equivalent columns: > {code:sql} > SELECT ... FROM t1, t2 > WHERE t1.a = t2.b > AND t1.a = 20 > {code} > If the plugin for table {{t2}} can handle filter push-down, then the > expression {{t1.a = 20}} is join-equivalent to {{t2.b = 20}}. > It is not clear if the Drill logical plan already handles join equivalence. > If not, it should be added. If so, the filter push-down mechanism should add > documentation that describes how the mechanism works. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7671) Fix builds for cdh and hdp profiles
[ https://issues.apache.org/jira/browse/DRILL-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7671: --- Target Version/s: 1.19.0 > Fix builds for cdh and hdp profiles > --- > > Key: DRILL-7671 > URL: https://issues.apache.org/jira/browse/DRILL-7671 > Project: Apache Drill > Issue Type: Task >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.18.0 > > > cdh and hdp profiles use too obsolete versions of Hadoop and other libraries. > So when attempting to build the project with these profiles, the build fails > with compilation errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7712) Fix issues after ZK upgrade
[ https://issues.apache.org/jira/browse/DRILL-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7712: --- Target Version/s: 1.19.0 > Fix issues after ZK upgrade > --- > > Key: DRILL-7712 > URL: https://issues.apache.org/jira/browse/DRILL-7712 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Arina Ielchiieva >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.18.0 > > > Warnings during jdbc-all build (absent when building with Mapr profile): > {noformat} > netty-transport-native-epoll-4.1.45.Final.jar, > netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 46 > overlapping classes: > - io.netty.channel.epoll.AbstractEpollStreamChannel$2 > - io.netty.channel.epoll.AbstractEpollServerChannel$EpollServerSocketUnsafe > - io.netty.channel.epoll.EpollDatagramChannel > - io.netty.channel.epoll.AbstractEpollStreamChannel$SpliceInChannelTask > - io.netty.channel.epoll.NativeDatagramPacketArray > - io.netty.channel.epoll.EpollSocketChannelConfig > - io.netty.channel.epoll.EpollTcpInfo > - io.netty.channel.epoll.EpollEventArray > - io.netty.channel.epoll.EpollEventLoop > - io.netty.channel.epoll.EpollSocketChannel > - 36 more... > netty-transport-native-unix-common-4.1.45.Final.jar, > netty-transport-native-epoll-4.0.48.Final-linux-x86_64.jar define 15 > overlapping classes: > - io.netty.channel.unix.Errors$NativeConnectException > - io.netty.channel.unix.ServerDomainSocketChannel > - io.netty.channel.unix.DomainSocketAddress > - io.netty.channel.unix.Socket > - io.netty.channel.unix.NativeInetAddress > - io.netty.channel.unix.DomainSocketChannelConfig > - io.netty.channel.unix.Errors$NativeIoException > - io.netty.channel.unix.DomainSocketReadMode > - io.netty.channel.unix.ErrorsStaticallyReferencedJniMethods > - io.netty.channel.unix.UnixChannel > - 5 more... > maven-shade-plugin has detected that some class files are > present in two or more JARs. When this happens, only one > single version of the class is copied to the uber jar. > Usually this is not harmful and you can skip these warnings, > otherwise try to manually exclude artifacts based on > mvn dependency:tree -Ddetail=true and the above output. > See http://maven.apache.org/plugins/maven-shade-plugin/ > {noformat} > Additional warning build with Mapr profile: > {noformat} > The following patterns were never triggered in this artifact inclusion filter: > o 'org.apache.zookeeper:zookeeper-jute' > {noformat} > NPEs in tests (though tests do not fail): > {noformat} > [INFO] Running org.apache.drill.exec.coord.zk.TestZookeeperClient > 4880 > java.lang.NullPointerException > 4881 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 4882 > at > org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251) > 4883 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583) > 4884 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546) > 4885 > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java: > {noformat} > {noformat} > [INFO] Running org.apache.drill.exec.coord.zk.TestEphemeralStore > 5278 > java.lang.NullPointerException > 5279 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 5280 > at org.apache.zookeepe > {noformat} > {noformat} > [INFO] Running org.apache.drill.yarn.zk.TestAmRegistration > 6767 > java.lang.NullPointerException > 6768 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 6769 > at > org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251) > 6770 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583) > 6771 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546) > 6772 > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:929) > 6773 > at org.apache.curator.t > {noformat} > {noformat} > org.apache.drill.yarn.client.TestCommandLineOptions > 6823 > java.lang.NullPointerException > 6824 > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:269) > 6825 > at > org.apache.zookeeper.server.ZKDatabase.fastForwardDataBase(ZKDatabase.java:251) > 6826 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:583) > 6827 > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:546) > 6828 > at org.apac > {noformat} --
[jira] [Updated] (DRILL-7531) Convert format plugins to EVF
[ https://issues.apache.org/jira/browse/DRILL-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7531: --- Target Version/s: 1.19.0 > Convert format plugins to EVF > - > > Key: DRILL-7531 > URL: https://issues.apache.org/jira/browse/DRILL-7531 > Project: Apache Drill > Issue Type: Improvement >Reporter: Arina Ielchiieva >Priority: Major > Fix For: 1.18.0 > > > This is umbrella Jira to track down process of converting format plugins to > EVF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7621) Refactor ExecConstants and PlannerSettings constant classes
[ https://issues.apache.org/jira/browse/DRILL-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7621: --- Target Version/s: 1.19.0 > Refactor ExecConstants and PlannerSettings constant classes > --- > > Key: DRILL-7621 > URL: https://issues.apache.org/jira/browse/DRILL-7621 > Project: Apache Drill > Issue Type: Task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.18.0 > > > According to [the > discussion|http://mail-archives.apache.org/mod_mbox/drill-dev/202003.mbox/%3CBCB4CFC2-8BC5-43C6-8BD4-956F66F6D0D3%40gmail.com%3E], > it makes sense to split the classes into multiple constant interfaces and > get rid of validator constants. Then the validator instances won't be used > for getting option values and the general approach will be getting type > specific option value by string key from config instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks
[ https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7270: --- Target Version/s: 1.19.0 > Fix non-https dependency urls and add checksum checks > - > > Key: DRILL-7270 > URL: https://issues.apache.org/jira/browse/DRILL-7270 > Project: Apache Drill > Issue Type: Task > Components: Security >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Review any build scripts and configurations for insecure urls and make > appropriate fixes to use secure urls. > Projects like Lucene do checksum whitelists of all their build dependencies, > and you may wish to consider that as a > protection against threats beyond just MITM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7133) Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin
[ https://issues.apache.org/jira/browse/DRILL-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7133: --- Target Version/s: 1.19.0 > Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin > -- > > Key: DRILL-7133 > URL: https://issues.apache.org/jira/browse/DRILL-7133 > Project: Apache Drill > Issue Type: Improvement >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > There was a JIRA (https://issues.apache.org/jira/browse/DRILL-7032) which > resulted in some improvements to the PCAP format plugin which converted the > TCP flags to boolean format and also added a {{is_corrupt}} boolean field. > This field allows users to look for packets that are corrupt. > Unfortunately, this functionality is not duplicated in the PCAP-NG format > plugin, so this JIRA proposes to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7366) Improve Null Handling for UDFs with Complex Output
[ https://issues.apache.org/jira/browse/DRILL-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7366: --- Target Version/s: 1.19.0 > Improve Null Handling for UDFs with Complex Output > -- > > Key: DRILL-7366 > URL: https://issues.apache.org/jira/browse/DRILL-7366 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Charles Givre >Priority: Major > Fix For: 1.18.0 > > > If there is a UDF which has a complex field (Map or List) as output, Drill > does not allow the UDF to have nullable input and it creates additional > complexity when writing these kinds of UDFs. > I therefore would like to propose that two options be added to the > FunctionTemplate for null handling: {{EMPTY_LIST_IF_NULL}}, and > {{EMPTY_MAP_IF_NULL}} which, would simplify UDF creation. I'm envisioning > that if either of these options were selected, and the UDF receives any null > value as input, the UDF will return either an empty map or list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7325) Many operators do not set container record count
[ https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7325: --- Target Version/s: 1.19.0 > Many operators do not set container record count > > > Key: DRILL-7325 > URL: https://issues.apache.org/jira/browse/DRILL-7325 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.18.0 > > > See DRILL-7324. The following are problems found because some operators fail > to set the record count for their containers. > h4. Scan > TestComplexTypeReader, on cluster setup, using the PojoRecordReader: > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from ScanBatch > ScanBatch: Container record count not set > Reason: ScanBatch never sets the record count of its container (this is a > generic issue, not specific to the PojoRecordReader). > h4. Filter > {{TestComplexTypeReader.testNonExistentFieldConverting()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from FilterRecordBatch > FilterRecordBatch: Container record count not set > {noformat} > h4. Hash Join > {{TestComplexTypeReader.test_array()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from HashJoinBatch > HashJoinBatch: Container record count not set > {noformat} > Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} > with no records. > h4. Project > TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, > schema-only batches): > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from ProjectRecordBatch > ProjectRecordBatch: Container record count not set > {noformat} > Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but > does not set the value count to 0. > h4. Unordered Receiver > {{TestCsvWithSchema.testMultiFileSchema()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from UnorderedReceiverBatch > UnorderedReceiverBatch: Container record count not set > {noformat} > The problem is that {{RecordBatchLoader.load()}} does not set the container > record count. > h4. Streaming Aggregate > {{TestJsonReader.testSumWithTypeCase()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from StreamingAggBatch > StreamingAggBatch: Container record count not set > {noformat} > The problem is that {{StreamingAggBatch.buildSchema()}} does not set the > container record count to 0. > h4. Limit > {{TestJsonReader.testDrill_1419()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from LimitRecordBatch > LimitRecordBatch: Container record count not set > {noformat} > None of the paths in {{LimitRecordBatch.innerNext()}} set the container > record count. > h4. Union All > {{TestJsonReader.testKvgenWithUnionAll()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from UnionAllRecordBatch > UnionAllRecordBatch: Container record count not set > {noformat} > When {{UnionAllRecordBatch}} calls > {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the > container count. > h4. Hash Aggregate > {{TestJsonReader.drill_4479()}}: > {noformat} > ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors > from HashAggBatch > HashAggBatch: Container record count not set > {noformat} > Problem is that {{HashAggBatch.buildSchema()}} does not set the container > record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}} > h4. And Many More > I turns out that most operators fail to set one of the many row count > variables somewhere in their code path: maybe in the schema setup path, maybe > when building a batch along one of the many paths that operators follow. > Further, we have multiple row counts that must be set: > * Values in each vector ({{setValueCount()}}, > * Row count in the container ({{setRecordCount()}}), which must be the same > as the vector value count. > * Row count in the operator (batch), which is the (possibly filtered) count > of records presented to downstream operators. It must be less than or equal > to the container row count (except for an SV4.) > * The SV2 record count, which is the number of entries in the SV2 and must be > the same as the batch row count (and less or equal to the container row > count.) > * The SV2 actual bactch record count, which must be the same as the container > row count. > * The SV4 record count, which must be the same as the batch record count. > With an SV4, the batch consists of
[jira] [Updated] (DRILL-7525) Convert SequenceFiles to EVF
[ https://issues.apache.org/jira/browse/DRILL-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7525: --- Target Version/s: 1.19.0 > Convert SequenceFiles to EVF > > > Key: DRILL-7525 > URL: https://issues.apache.org/jira/browse/DRILL-7525 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.17.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.18.0 > > > Convert SequenceFiles to EVF -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7557) Revise "Base" storage plugin filter-push down listerner with a builder
[ https://issues.apache.org/jira/browse/DRILL-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7557: --- Target Version/s: 1.19.0 > Revise "Base" storage plugin filter-push down listerner with a builder > -- > > Key: DRILL-7557 > URL: https://issues.apache.org/jira/browse/DRILL-7557 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.18.0 > > > DRILL-7458 introduces a base framework for storage plugins and includes a > simplified mechanism for filter push down. Part of that mechanism includes a > "listener", with the bulk of the work done in a single method: > {code:java} > Pair> transform(GroupScan groupScan, > List> andTerms, Pair DisjunctionFilterSpec> orTerm); > {code} > Reviewers correctly pointed out that this method might be a bit too complex. > The listener pattern pretty much forced the present design. To improve it, > we'd want to use a different design; maybe some kind of builder which might: > * Accept the CNF and DNF terms via dedicated methods. > * Perform a processing step. > * Provide a number of methods to communicate the results, such as 1) whether > a new group scan is needed, 2) any CNF terms to retain, and 3) any DNF terms > to retain. -- This message was sent by Atlassian Jira (v8.3.4#803005)