[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247062#comment-16247062 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150160362 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Ok, so I think I see how you've been trying to help me test the server side timeout. You are hoping to have a unit test force the awaiteFirstMessage() throw the exception by preventing the server from sending back any batch of data, since the sample test data doesn't allow for any query to run sufficiently long. All the current tests I've added essentially have already delivered the data from the 'Drill Server' to the 'DrillClient', but the application downstream has not consumed it. Your suggestion of putting a `pause` before the `execute()` call got me thinking that the timer had already begun after Statement initialization. My understanding now is that you're simply asking to block any SCREEN operator from sending back any batches. So, the DrillCursor should time out waiting for the first batch. In fact, I'm thinking that I don't even need a pause. The DrillCursor awaits all the time for something from the SCREEN operator that never comes and eventually times out. However, since the control injection is essentially applying to the Connection (`alter session ...`, any other unit tests in parallel execution on the same connection, would be affected by this. So, I would need to also undo this at the end of the test, if the connection is reused. Or fork off a connection exclusively for this. Was that what you've been suggesting all along? > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.12.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247043#comment-16247043 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150157923 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- I don't think you are wrong, but I think the interpretation of the timeout is ambiguous. My understanding based on what drivers like Oracle do is to start the timeout only when the execute call is made. So, for a regular Statement object, just initialization (or even setting the timeout) should not be the basis of starting the timer. With regards to whether we are testing for the time when only the DrillCursor is in operation, we'd need a query that is running sufficiently long to timeout before the server can send back anything for the very first time. The `awaitFirstMessage()` already has the timeout applied there and worked in some of my longer running sample queries. If you're hinting towards this, then yes.. it is certainly doesn't hurt to have the test, although the timeout does guarantee exactly that. I'm not familiar with the Drillbit Injection feature, so let me tinker a bit to confirm it before I update the PR. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.12.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase
[ https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247040#comment-16247040 ] ASF GitHub Bot commented on DRILL-5691: --- Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @amansinha100 thanks for sharing the information. Got your point. I think your propose on [CALCITE-1048](https://issues.apache.org/jira/browse/CALCITE-1048) is possible. Since [CALCITE-794](https://issues.apache.org/jira/browse/CALCITE-794) has completed at version 1.6 ,it seems there's a more perfect solution( to get the least max number of all the rels of the RelSubSet). But due to Drill's Caclite version is still based on 1.4 , I support your current temp solution. Only wonder that whether the explicitly searched RelNode's (such as DrillAggregateRel) maxRowCount can represent the best RelNode's maxRowCount ? > multiple count distinct query planning error at physical phase > --- > > Key: DRILL-5691 > URL: https://issues.apache.org/jira/browse/DRILL-5691 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.9.0, 1.10.0 >Reporter: weijie.tong > > I materialized the count distinct query result in a cache , added a plugin > rule to translate the (Aggregate、Aggregate、Project、Scan) or > (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. > Then ,once user issue count distinct queries , it will be translated to query > the cache to get the result. > eg1: " select count(*),sum(a) ,count(distinct b) from t where dt=xx " > eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t > where dt=xxx " > eg3:"select count(distinct b), count(distinct c) from t where dt=xxx" > eg1 will be right and have a query result as I expected , but eg2 will be > wrong at the physical phase.The error info is here: > https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. > eg3 will also get the similar error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"
[ https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246967#comment-16246967 ] ASF GitHub Bot commented on DRILL-5923: --- Github user prasadns14 commented on the issue: https://github.com/apache/drill/pull/1021 @arina-ielchiieva, @paul-rogers Reverted to the array approach, also added documentation. > State of a successfully completed query shown as "COMPLETED" > > > Key: DRILL-5923 > URL: https://issues.apache.org/jira/browse/DRILL-5923 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: 1.12.0 > > > Drill UI currently lists a successfully completed query as "COMPLETED". > Successfully completed, failed and canceled queries are all grouped as > Completed queries. > It would be better to list the state of a successfully completed query as > "Succeeded" to avoid confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5917) Ban org.json:json library in Drill
[ https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad Rozov updated DRILL-5917: -- Reviewer: Arina Ielchiieva > Ban org.json:json library in Drill > -- > > Key: DRILL-5917 > URL: https://issues.apache.org/jira/browse/DRILL-5917 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Vlad Rozov > Fix For: 1.12.0 > > > Apache Drill has dependencies on json.org lib indirectly from two libraries: > com.mapr.hadoop:maprfs:jar:5.2.1-mapr > com.mapr.fs:mapr-hbase:jar:5.2.1-mapr > {noformat} > [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT > [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile > [INFO] | \- org.json:json:jar:20080701:compile > [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile > [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate) > {noformat} > Need to make sure we won't have any dependencies from these libs to json.org > lib and ban this lib in main pom.xml file. > Issue is critical since Apache release won't happen until we make sure > json.org lib is not used (https://www.apache.org/legal/resolved.html). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5926) TestValueVector tests fail sporadically
[ https://issues.apache.org/jira/browse/DRILL-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246936#comment-16246936 ] Vlad Rozov commented on DRILL-5926: --- It may be OK to increase MaxDirectMemorySize from 3GB to 4GB as a short-term workaround to avoid unit test failure for unrelated PRs. In a long-term, it is necessary to investigate if memory can be reclaimed from the Pooled Allocator and whether tests indeed require more than 3 GB of memory. [~timothyfarkas] Can you create a separate PR for the workaround. > TestValueVector tests fail sporadically > --- > > Key: DRILL-5926 > URL: https://issues.apache.org/jira/browse/DRILL-5926 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Trivial > > As reported by [~Paul.Rogers]. The following tests fail sporadically with out > of memory exception: > * TestValueVector.testFixedVectorReallocation > * TestValueVector.testVariableVectorReallocation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5943) Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism
[ https://issues.apache.org/jira/browse/DRILL-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-5943: - Labels: ready-to-commit (was: ) Reviewer: Laurent Goujon (was: Parth Chandra) > Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism > --- > > Key: DRILL-5943 > URL: https://issues.apache.org/jira/browse/DRILL-5943 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia > Labels: ready-to-commit > Fix For: 1.12.0 > > > For PLAIN mechanism we will weaken the strong check introduced with > DRILL-5582 to keep the forward compatibility between Drill 1.12 client and > Drill 1.9 server. This is fine since with and without this strong check PLAIN > mechanism is still vulnerable to MITM during handshake itself unlike mutual > authentication protocols like Kerberos. > Also for keeping forward compatibility with respect to SASL we will treat > UNKNOWN_SASL_SUPPORT as valid value. For handshake message received from a > client which is running on later version (let say 1.13) then Drillbit (1.12) > and having a new value for SaslSupport field which is unknown to server, this > field will be decoded as UNKNOWN_SASL_SUPPORT. In this scenario client will > be treated as one aware about SASL protocol but server doesn't know exact > capabilities of client. Hence the SASL handshake will still be required from > server side. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246879#comment-16246879 ] ASF GitHub Bot commented on DRILL-5771: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1014 Not sure the description here is entirely correct. Let's separate two concepts: the plugin (code) and the plugin definition (the stuff in JSON.) Plugin definitions are stored in ZK and retrieved by the Foreman. There may be some form of race condition in the Foreman, but that's not my focus here. The plugin *definition* is read by the Forman and serialized into the physical plan. Each worker reads the definition from the physical plan. For this reason, the worker's definition can never be out of date: it is the definition used when serializing the plan. Further, Drill allows table functions which provide query-time name/value pair settings for format plugin properties. The only way these can work is to be serialized along with the query. So, the actual serialized format plugin definition, included with the query, includes both the ZK information and the table function information. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5936) Refactor MergingRecordBatch based on code review
[ https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246876#comment-16246876 ] Vlad Rozov commented on DRILL-5936: --- [~amansinha100] It is based on my code walkthrough completed as part of exchange operator analysis. The PR is mostly self-explanatory and the goal is to address two deficiencies mentioned in the JIRA description. > Refactor MergingRecordBatch based on code review > > > Key: DRILL-5936 > URL: https://issues.apache.org/jira/browse/DRILL-5936 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > > * Reorganize code to remove unnecessary {{pqueue.peek()}} > * Reuse Node -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5936) Refactor MergingRecordBatch based on code review
[ https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246872#comment-16246872 ] Aman Sinha commented on DRILL-5936: --- [~vrozov] the title says 'based on code review'..which code review are you referring to ? can you point me to the other JIRA or PR ? thanks. > Refactor MergingRecordBatch based on code review > > > Key: DRILL-5936 > URL: https://issues.apache.org/jira/browse/DRILL-5936 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > > * Reorganize code to remove unnecessary {{pqueue.peek()}} > * Reuse Node -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5950) Allow JSON files to be splittable - for sequence of objects format
Paul Rogers created DRILL-5950: -- Summary: Allow JSON files to be splittable - for sequence of objects format Key: DRILL-5950 URL: https://issues.apache.org/jira/browse/DRILL-5950 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.12.0 Reporter: Paul Rogers The JSON plugin format is not currently splittable. This means that every JSON file must be read by only a single thread. By contrast, text files are splittable. The key barrier to allowing JSON files to be splittable is the lack of a good mechanism to find the start of a record at some arbitrary point in the file. Text readers handle this by scanning forward looking for (say) the newline that separates records. (Though this process can be thrown off if a newline appears in a quoted value, and the start quote appears before the split point.) However, as was discovered in a previous JSON fix, Drill's form of JSON does provide the tools. In standard JSON, a list of records must be stuctured as a list: {code} [ { text: "first record"}, { text: "second record"}, ... { text: "final record" } ] {code} In this form, it is impossible to find the start of a record without parsing from the first character onwards. But, Drill uses a common, but non-standard, JSON structure that dispenses with the array and the commas between records: {code} { text: "first record" } { text: "second record" } ... { text: "last record" } {code} This form does unambiguously allow finding the start of the record. Simply scan until we find the tokens: , , possibly separated by white space. That sequence is not valid JSON and only occurs between records in the sequence-of-records format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246871#comment-16246871 ] ASF GitHub Bot commented on DRILL-5771: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1014 @arina-ielchiieva - The parts addressing DRILL-4640 and DRILL-5166 LGTM - I think the fix for DRILL-5771 LGTM but I would like write down what I think is happening and confirm with you that my understanding is correct. This is mostly just a learning exercise for me since I am not very familiar with this part of the code :). In DRILL-5771 there were two issues. ## Race Conditions With Format Plugins ### Issue The following used to happen before the fix: 1. When using an existing format plugin, the **FormatPlugin** would create a **DrillTable** with a **NamedFormatPluginConfig** which only contains the name of the format plugin to use. 1. The **ScanOperator** created for a **DrillTable** will contain the **NamedFormatPluginConfig** 1. When the **ScanOperators** are serialized in to the physical plan the serialized **ScanOperator** will only contain the name of the format plugin to use. 1. When a worker deserializes the physical plan to do a scan, he gets the name of the **FormatPluginConfig** to use. 1. The worker then looks up the correct **FormatPlugin** in the **FormatCreator** using the name he has. 1. The worker can get into trouble if the **FormatPlugins** he has cached in his **FormatCreator** is out of sync with the rest of the cluster. ### Fix Race conditions are eliminated because the **DrillTables** returned by the **FormatPlugins** no longer contain a **NamedFormatPluginConfig**, they contain the full **FormatPluginConfig** not just a name alias. So when a query is executed: 1. The ScanOperator contains the complete **FormatPluginConfig** 1. When the physical plan is serialized it contains the complete **FormatPluginConfig** for each scan operator. 1. When a worker node deserializes the ScanOperator it also has the complete **FormatPluginConfig** so it can reconstruct the **FormatPlugin** correctly, whereas previously the worker would have to do a lookup using the **FormatPlugin** name in the **FormatCreator** when the cache in the **FormatCreator** may be out of sync with the rest of the cluster. ## FormatPluginConfig Equals and HashCode ### Issue The **FileSystemPlugin** looks up **FormatPlugins** corresponding to a **FormatPluginConfig** in formatPluginsByConfig. However, the **FormatPluginConfig** implementations didn't override equals and hashCode. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key
[jira] [Updated] (DRILL-5936) Refactor MergingRecordBatch based on code review
[ https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad Rozov updated DRILL-5936: -- Description: * Reorganize code to remove unnecessary {{pqueue.peek()}} * Reuse Node was:* > Refactor MergingRecordBatch based on code review > > > Key: DRILL-5936 > URL: https://issues.apache.org/jira/browse/DRILL-5936 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > > * Reorganize code to remove unnecessary {{pqueue.peek()}} > * Reuse Node -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5936) Refactor MergingRecordBatch based on code review
[ https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad Rozov updated DRILL-5936: -- Description: * > Refactor MergingRecordBatch based on code review > > > Key: DRILL-5936 > URL: https://issues.apache.org/jira/browse/DRILL-5936 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > > * -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5949) JSON format options should be part of plugin config; not session options
[ https://issues.apache.org/jira/browse/DRILL-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246851#comment-16246851 ] Paul Rogers commented on DRILL-5949: Although, technically, it is quite easy to make this change; backward compatibility is a challenge because Drill has no mechanism to assist. Here are two possibilities. First, assign a priority to the settings as follows: * Table function options (highest priority) * Session options * Plugin options * System options (lowest priority) That is, if session options are set, use them. Else, use the plugin options, if set. Else use the system options (and the system option defaults.) This is possible because options now identify the scope in which they are set, so we can differentiate session from system options. The problem here is that the reader can't actually tell if a setting comes from a table function or from the plugin definition, so some work may be required to support this pattern. Second, modify the system/session options to have three values: {{true}}/{{false}}/{{unset}}. If the value is set to {{unset}}, use the plugin options. The default option value becomes {{unset}}. If the user changes the session (or system) option, this is used. So, if a user has changed the system option, and stored the value in ZK, then that setting will be {{true}} or {{false}} and will take precedence over the plugin options. > JSON format options should be part of plugin config; not session options > > > Key: DRILL-5949 > URL: https://issues.apache.org/jira/browse/DRILL-5949 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Paul Rogers > > Drill provides a JSON record reader. Drill provides two ways to configure > this reader: > * Using the JSON plugin configuration. > * Using a set of session options. > The plugin configuration defines the file suffix associated with JSON files. > The session options are: > * {{store.json.all_text_mode}} > * {{store.json.read_numbers_as_double}} > * {{store.json.reader.skip_invalid_records}} > * {{store.json.reader.print_skipped_invalid_record_number}} > Suppose I have to JSON files from different sources (and keep them in > distinct directories.) For the one, I want to use {{all_text_mode}} off as > the data is nicely formatted. Also, my numbers are fine, so I want > {{read_numbers_as_double}} off. > But, the other file is a mess and uses a rather ad-hoc format. So, I want > these two options turned on. > As it turns out I often query both files. Today, I must set the session > options one way to query my "clean" file, then reverse them to query the > "dirty" file. > Next, I want to join the two files. How do I set the options one way for the > "clean" file, and the other for the "dirty" file within the *same query*? > Can't. > Now, consider the text format plugin that can read CSV, TSV, PSV and so on. > It has a variety of options. But, the are *not* session options; they are > instead options in the plugin definition. This allows me to, say, have a > plugin config for CSV-with-headers files that I get from source A, and a > different plugin config for my CSV-without-headers files from source B. > Suppose we applied the text reader technique to the JSON reader. We'd move > the session options listed above into the JSON format plugin. Then, I can > define one plugin for my "clean" files, and a different plugin config for my > "dirty" files. > What's more, I can then use table functions to adjust the format for each > file as needed within a single query. Since table functions are part of a > query, I can add them to a view that I define for the various JSON files. > The result is a far simpler user experience than the tedium of resetting > session options for every query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246849#comment-16246849 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150131105 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Yes, I'm wrong? (asking because the rest of the sentence suggest I was right in my interpretation of the test). Maybe we can/should test both? I would have like to test for the first batch, but it's not possible to access the query id until `statement.execute()`, and I'd need it to unpause the request. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.12.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5936) Refactor MergingRecordBatch based on code review
[ https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246828#comment-16246828 ] ASF GitHub Bot commented on DRILL-5936: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1025 @amansinha100 can you review this change? > Refactor MergingRecordBatch based on code review > > > Key: DRILL-5936 > URL: https://issues.apache.org/jira/browse/DRILL-5936 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246827#comment-16246827 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150127658 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Yes. So I'm testing for the part when the batch has been fetched byt DrillCursor but not consumed via the DrillResultSetImpl. That's why I found the need for pausing the Screen operator odd and, hence, the question. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.12.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DRILL-5926) TestValueVector tests fail sporadically
[ https://issues.apache.org/jira/browse/DRILL-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238479#comment-16238479 ] Pritesh Maker edited comment on DRILL-5926 at 11/10/17 12:33 AM: - Is this the PR? https://github.com/apache/drill/pull/1023 Should we create a separate PR for this issue? cc [~vrozov] was (Author: priteshm): Is this the PR? https://github.com/apache/drill/pull/1023 Should we create a separate PR for this issue? > TestValueVector tests fail sporadically > --- > > Key: DRILL-5926 > URL: https://issues.apache.org/jira/browse/DRILL-5926 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Trivial > > As reported by [~Paul.Rogers]. The following tests fail sporadically with out > of memory exception: > * TestValueVector.testFixedVectorReallocation > * TestValueVector.testVariableVectorReallocation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246775#comment-16246775 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150119338 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- I wonder if we actually test timeout during DrillCursor operations. It seems your test relies on the user being slow to read data from the result set although the data has already been fetched by the client. Am I wrong? > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.12.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5943) Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism
[ https://issues.apache.org/jira/browse/DRILL-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246769#comment-16246769 ] ASF GitHub Bot commented on DRILL-5943: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/1028#discussion_r150118518 --- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.hpp --- @@ -59,6 +59,12 @@ class SaslAuthenticatorImpl { const char *getErrorMessage(int errorCode); +static const std::string KERBEROS_SIMPLE_NAME; + +static const std::string KERBEROS_SASL_NAME; --- End diff -- do we need to expose it? (it looks like we only look for the keys) > Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism > --- > > Key: DRILL-5943 > URL: https://issues.apache.org/jira/browse/DRILL-5943 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia > Fix For: 1.12.0 > > > For PLAIN mechanism we will weaken the strong check introduced with > DRILL-5582 to keep the forward compatibility between Drill 1.12 client and > Drill 1.9 server. This is fine since with and without this strong check PLAIN > mechanism is still vulnerable to MITM during handshake itself unlike mutual > authentication protocols like Kerberos. > Also for keeping forward compatibility with respect to SASL we will treat > UNKNOWN_SASL_SUPPORT as valid value. For handshake message received from a > client which is running on later version (let say 1.13) then Drillbit (1.12) > and having a new value for SaslSupport field which is unknown to server, this > field will be decoded as UNKNOWN_SASL_SUPPORT. In this scenario client will > be treated as one aware about SASL protocol but server doesn't know exact > capabilities of client. Hence the SASL handshake will still be required from > server side. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5863) Sortable table incorrectly sorts minor fragments and time elements lexically instead of sorting by implicit value
[ https://issues.apache.org/jira/browse/DRILL-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5863: - Reviewer: Paul Rogers (was: Paul Rogers) > Sortable table incorrectly sorts minor fragments and time elements lexically > instead of sorting by implicit value > - > > Key: DRILL-5863 > URL: https://issues.apache.org/jira/browse/DRILL-5863 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > The fix for this is to use dataTable library's {{data-order}} attribute for > the data elements that need to sort by an implicit value. > ||Old order of Minor Fragment||New order of Minor Fragment|| > |...|...| > |01-09-01 | 01-09-01| > |01-10-01 | 01-10-01| > |01-100-01 | 01-11-01| > |01-101-01 | 01-12-01| > |... | ... | > ||Old order of Duration||New order of Duration||| > |...|...| > |1m15s | 55.03s| > |55s | 1m15s| > |...|...| -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5717) change some date time unit cases with specific timezone or Local
[ https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-5717: Assignee: weijie.tong > change some date time unit cases with specific timezone or Local > > > Key: DRILL-5717 > URL: https://issues.apache.org/jira/browse/DRILL-5717 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Assignee: weijie.tong > Labels: ready-to-commit > > Some date time test cases like JodaDateValidatorTest is not Local > independent .This will cause other Local's users's test phase to fail. We > should let these test cases to be Local env independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246617#comment-16246617 ] ASF GitHub Bot commented on DRILL-5783: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150097140 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java --- @@ -85,8 +85,7 @@ * new row set with the updated columns, then merge the new * and old row sets to create a new immutable row set. */ - - public interface RowSetWriter extends TupleWriter { + interface RowSetWriter extends TupleWriter { --- End diff -- Ah, forgot that the file defines an interface, not a class. (The situation I described was an interface nested inside a class.) So, you're good. > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs batched together with it. The > full description of work is the following: > DRILL-5783 > * A unit test is created for the priority queue in the TopN operator > * The code generation classes passed around a completely unused function > registry reference in some places so I removed it. > * The priority queue had unused parameters for some of its methods so I > removed them. > DRILL-5841 > * There were many many ways in which temporary folders were created in unit > tests. I have unified the way these folders are created with the > DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests > have been updated to use these. The test watchers create temp directories in > ./target//. So all the files generated and used in the context of a test can > easily be found in the same consistent location. > * This change should fix the sporadic hashagg test failures, as well as > failures caused by stray files in /tmp > DRILL-5894 > * dfs_test is used as a storage plugin throughout the unit tests. This is > highly confusing and we can just use dfs instead. > *Misc* > * General code cleanup. > * There are many places where String.format is used unnecessarily. The test > builder methods already use String.format for you when you pass them args. I > cleaned some of these up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5949) JSON format options should be part of plugin config; not session options
Paul Rogers created DRILL-5949: -- Summary: JSON format options should be part of plugin config; not session options Key: DRILL-5949 URL: https://issues.apache.org/jira/browse/DRILL-5949 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.12.0 Reporter: Paul Rogers Drill provides a JSON record reader. Drill provides two ways to configure this reader: * Using the JSON plugin configuration. * Using a set of session options. The plugin configuration defines the file suffix associated with JSON files. The session options are: * {{store.json.all_text_mode}} * {{store.json.read_numbers_as_double}} * {{store.json.reader.skip_invalid_records}} * {{store.json.reader.print_skipped_invalid_record_number}} Suppose I have to JSON files from different sources (and keep them in distinct directories.) For the one, I want to use {{all_text_mode}} off as the data is nicely formatted. Also, my numbers are fine, so I want {{read_numbers_as_double}} off. But, the other file is a mess and uses a rather ad-hoc format. So, I want these two options turned on. As it turns out I often query both files. Today, I must set the session options one way to query my "clean" file, then reverse them to query the "dirty" file. Next, I want to join the two files. How do I set the options one way for the "clean" file, and the other for the "dirty" file within the *same query*? Can't. Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It has a variety of options. But, the are *not* session options; they are instead options in the plugin definition. This allows me to, say, have a plugin config for CSV-with-headers files that I get from source A, and a different plugin config for my CSV-without-headers files from source B. Suppose we applied the text reader technique to the JSON reader. We'd move the session options listed above into the JSON format plugin. Then, I can define one plugin for my "clean" files, and a different plugin config for my "dirty" files. What's more, I can then use table functions to adjust the format for each file as needed within a single query. Since table functions are part of a query, I can add them to a view that I define for the various JSON files. The result is a far simpler user experience than the tedium of resetting session options for every query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246608#comment-16246608 ] ASF GitHub Bot commented on DRILL-5783: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150096261 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java --- @@ -85,8 +85,7 @@ * new row set with the updated columns, then merge the new * and old row sets to create a new immutable row set. */ - - public interface RowSetWriter extends TupleWriter { + interface RowSetWriter extends TupleWriter { --- End diff -- IntelliJ gave a warning that the modifier is redundant. Also an interface nested inside another interface is public by default. https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/ > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs batched together with it. The > full description of work is the following: > DRILL-5783 > * A unit test is created for the priority queue in the TopN operator > * The code generation classes passed around a completely unused function > registry reference in some places so I removed it. > * The priority queue had unused parameters for some of its methods so I > removed them. > DRILL-5841 > * There were many many ways in which temporary folders were created in unit > tests. I have unified the way these folders are created with the > DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests > have been updated to use these. The test watchers create temp directories in > ./target//. So all the files generated and used in the context of a test can > easily be found in the same consistent location. > * This change should fix the sporadic hashagg test failures, as well as > failures caused by stray files in /tmp > DRILL-5894 > * dfs_test is used as a storage plugin throughout the unit tests. This is > highly confusing and we can just use dfs instead. > *Misc* > * General code cleanup. > * There are many places where String.format is used unnecessarily. The test > builder methods already use String.format for you when you pass them args. I > cleaned some of these up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246639#comment-16246639 ] ASF GitHub Bot commented on DRILL-5783: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r15009 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java --- @@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader ea, } } } + + // TODO make a native RowSetComparison comparator + public static class ObjectComparator implements Comparator { --- End diff -- This is used in the DrillTestWrapper to verify the ordering of results. I agree this is not suitable for equality tests, but it's intended to be used only for ordering tests. I didn't add support for all the supported RowSet types because we would first have to move DrillTestWrapper to use RowSets (currently it uses Maps and Lists to represent data). Currently it is not used by RowSets, but the intention is to move DrillTestWrapper to use RowSets and then make this comparator operate on RowSets, but that will be an incremental process. > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs batched together with it. The > full description of work is the following: > DRILL-5783 > * A unit test is created for the priority queue in the TopN operator > * The code generation classes passed around a completely unused function > registry reference in some places so I removed it. > * The priority queue had unused parameters for some of its methods so I > removed them. > DRILL-5841 > * There were many many ways in which temporary folders were created in unit > tests. I have unified the way these folders are created with the > DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests > have been updated to use these. The test watchers create temp directories in > ./target//. So all the files generated and used in the context of a test can > easily be found in the same consistent location. > * This change should fix the sporadic hashagg test failures, as well as > failures caused by stray files in /tmp > DRILL-5894 > * dfs_test is used as a storage plugin throughout the unit tests. This is > highly confusing and we can just use dfs instead. > *Misc* > * General code cleanup. > * There are many places where String.format is used unnecessarily. The test > builder methods already use String.format for you when you pass them args. I > cleaned some of these up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246609#comment-16246609 ] ASF GitHub Bot commented on DRILL-5783: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150096444 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.test.rowSet.file; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.accessor.ColumnAccessor; +import org.apache.drill.exec.vector.accessor.ColumnReader; +import org.apache.drill.test.rowSet.RowSet; + +import java.io.BufferedOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +public class JsonFileBuilder +{ + public static final String DEFAULT_DOUBLE_FORMATTER = "%f"; + public static final String DEFAULT_INTEGER_FORMATTER = "%d"; + public static final String DEFAULT_LONG_FORMATTER = "%d"; + public static final String DEFAULT_STRING_FORMATTER = "\"%s\""; + public static final String DEFAULT_DECIMAL_FORMATTER = "%s"; + public static final String DEFAULT_PERIOD_FORMATTER = "%s"; + + public static final MapDEFAULT_FORMATTERS = new ImmutableMap.Builder() +.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER) +.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER) +.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER) +.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER) +.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER) +.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER) +.build(); + + private final RowSet rowSet; + private final Map customFormatters = Maps.newHashMap(); + + public JsonFileBuilder(RowSet rowSet) { +this.rowSet = Preconditions.checkNotNull(rowSet); +Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset is empty."); + } + + public JsonFileBuilder setCustomFormatter(final String columnName, final String columnFormatter) { +Preconditions.checkNotNull(columnName); +Preconditions.checkNotNull(columnFormatter); + +Iterator fields = rowSet + .schema() + .batch() + .iterator(); + +boolean hasColumn = false; + +while (!hasColumn && fields.hasNext()) { + hasColumn = fields.next() +.getName() +.equals(columnName); +} + +final String message = String.format("(%s) is not a valid column", columnName); +Preconditions.checkArgument(hasColumn, message); + +customFormatters.put(columnName, columnFormatter); + +return this; + } + + public void build(File tableFile) throws IOException { --- End diff -- Sounds Good > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs batched together with it. The > full description of work is the following: > DRILL-5783 > * A unit test is created for the priority queue in the TopN operator > * The code generation classes passed around a
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246547#comment-16246547 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087815 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} --- End diff -- When will the columns be null? Not sure this is a valid state. However, as noted above, an empty list is a valid state (used for `COUNT(*)` queries.) > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246550#comment-16246550 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150086335 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246553#comment-16246553 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150086292 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246551#comment-16246551 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087650 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} --- End diff -- The column list can be shared by all readers, and so can be created outside of the loop over scan specs. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246548#comment-16246548 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150084784 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246552#comment-16246552 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087367 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246544#comment-16246544 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150088237 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} +readers.add(new KafkaRecordReader(scanSpec, columns, context, subScan.getKafkaStoragePlugin())); + } catch (Exception e) { +logger.error("KafkaRecordReader creation failed for subScan: " + subScan + ".",e); --- End diff -- Here we are catching all errors and putting a generic messages into the log, and sending a generic exception up the stack. Better is to use a `UserException` at the actual point of failure so we can tell the user exactly what is wrong. Then, here, have a `catch` block for `UserException` that simply rethrows, while handling all other exceptions as is done here. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246549#comment-16246549 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150084039 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246543#comment-16246543 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150084335 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246545#comment-16246545 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150083104 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246542#comment-16246542 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150083767 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246541#comment-16246541 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150082981 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246540#comment-16246540 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150081972 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { --- End diff -- Preparing columns is really quite difficult with many cases to handle. Drill appears to allow projection of the form `a.b`, `a.c` which means that `a` is a map and we wish to project just `b` and `c` from `a`. As it turns
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246546#comment-16246546 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087581 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} --- End diff -- `getCoulmns()` --> `getColumns()` > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246539#comment-16246539 ] ASF GitHub Bot commented on DRILL-4779: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150082711 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +
[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246535#comment-16246535 ] ASF GitHub Bot commented on DRILL-5867: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1029#discussion_r150086018 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String queryId, long startTime, long this.time = new Date(startTime); this.foreman = foreman; this.link = generateLink(drillConfig, foreman, queryId); - this.query = query.substring(0, Math.min(query.length(), 150)); + this.query = extractQuerySnippet(query); this.state = state; this.user = user; this.totalCost = totalCost; this.queueName = queueName; } +private String extractQuerySnippet(String queryText) { + //Extract upto max char limit as snippet + String sizeCappedQuerySnippet = queryText.substring(0, Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR)); + //Trimming down based on line-count + if ( QUERY_SNIPPET_MAX_LINES < sizeCappedQuerySnippet.split(System.lineSeparator()).length ) { --- End diff -- 1. We can create variable for `sizeCappedQuerySnippet.split(System.lineSeparator())` so we do split only once. 2. Please remove spaces in `if` clause: `if ( QUERY_SNIPPET_MAX_LINES < sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {` -> `if (QUERY_SNIPPET_MAX_LINES < splittedQuery.length) {` and in `if ( ++linesConstructed < QUERY_SNIPPET_MAX_LINES ) {` in the code below. > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246537#comment-16246537 ] ASF GitHub Bot commented on DRILL-5867: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1029#discussion_r150086785 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String queryId, long startTime, long this.time = new Date(startTime); this.foreman = foreman; this.link = generateLink(drillConfig, foreman, queryId); - this.query = query.substring(0, Math.min(query.length(), 150)); + this.query = extractQuerySnippet(query); this.state = state; this.user = user; this.totalCost = totalCost; this.queueName = queueName; } +private String extractQuerySnippet(String queryText) { + //Extract upto max char limit as snippet + String sizeCappedQuerySnippet = queryText.substring(0, Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR)); + //Trimming down based on line-count + if ( QUERY_SNIPPET_MAX_LINES < sizeCappedQuerySnippet.split(System.lineSeparator()).length ) { +int linesConstructed = 0; +StringBuilder lineCappedQuerySnippet = new StringBuilder(); +String[] queryParts = sizeCappedQuerySnippet.split(System.lineSeparator()); +for (String qPart : queryParts) { + lineCappedQuerySnippet.append(qPart); + if ( ++linesConstructed < QUERY_SNIPPET_MAX_LINES ) { +lineCappedQuerySnippet.append(System.lineSeparator()); --- End diff -- Do we want to append with new line or maybe space for better readability? > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246536#comment-16246536 ] ASF GitHub Bot commented on DRILL-5867: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1029#discussion_r150085841 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String queryId, long startTime, long this.time = new Date(startTime); this.foreman = foreman; this.link = generateLink(drillConfig, foreman, queryId); - this.query = query.substring(0, Math.min(query.length(), 150)); + this.query = extractQuerySnippet(query); this.state = state; this.user = user; this.totalCost = totalCost; this.queueName = queueName; } +private String extractQuerySnippet(String queryText) { --- End diff -- 1. I usually place private method int he end of of the class. 2. We can add javadoc here explaining that first we limit original query size and if size fits but query has too many lines we limit it as well for better readability on Web UI. > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"
[ https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246515#comment-16246515 ] ASF GitHub Bot commented on DRILL-5923: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1021 Well, I don't have strong preference here, we can use array, as long as Prasad makes it nicely documented as in your example rather then in one line. ``` String displayNames[] = { "First Value", // FIRST_VALUE = 0 "Second Value", // SECOND_VALUE = 1 ... }; ``` > State of a successfully completed query shown as "COMPLETED" > > > Key: DRILL-5923 > URL: https://issues.apache.org/jira/browse/DRILL-5923 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: 1.12.0 > > > Drill UI currently lists a successfully completed query as "COMPLETED". > Successfully completed, failed and canceled queries are all grouped as > Completed queries. > It would be better to list the state of a successfully completed query as > "Succeeded" to avoid confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246442#comment-16246442 ] ASF GitHub Bot commented on DRILL-5783: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150072992 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.test.rowSet.file; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.accessor.ColumnAccessor; +import org.apache.drill.exec.vector.accessor.ColumnReader; +import org.apache.drill.test.rowSet.RowSet; + +import java.io.BufferedOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +public class JsonFileBuilder +{ + public static final String DEFAULT_DOUBLE_FORMATTER = "%f"; + public static final String DEFAULT_INTEGER_FORMATTER = "%d"; + public static final String DEFAULT_LONG_FORMATTER = "%d"; + public static final String DEFAULT_STRING_FORMATTER = "\"%s\""; + public static final String DEFAULT_DECIMAL_FORMATTER = "%s"; + public static final String DEFAULT_PERIOD_FORMATTER = "%s"; + + public static final MapDEFAULT_FORMATTERS = new ImmutableMap.Builder() +.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER) +.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER) +.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER) +.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER) +.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER) +.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER) +.build(); + + private final RowSet rowSet; + private final Map customFormatters = Maps.newHashMap(); + + public JsonFileBuilder(RowSet rowSet) { +this.rowSet = Preconditions.checkNotNull(rowSet); +Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset is empty."); + } + + public JsonFileBuilder setCustomFormatter(final String columnName, final String columnFormatter) { +Preconditions.checkNotNull(columnName); +Preconditions.checkNotNull(columnFormatter); + +Iterator fields = rowSet + .schema() + .batch() + .iterator(); + +boolean hasColumn = false; + +while (!hasColumn && fields.hasNext()) { + hasColumn = fields.next() +.getName() +.equals(columnName); +} + +final String message = String.format("(%s) is not a valid column", columnName); +Preconditions.checkArgument(hasColumn, message); + +customFormatters.put(columnName, columnFormatter); + +return this; + } + + public void build(File tableFile) throws IOException { --- End diff -- Great! This does not yet handle nested tuples or arrays; in part because the row set work for that is still sitting in PR #914. You can update this to be aware of maps and map arrays once that PR is committed. > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246441#comment-16246441 ] ASF GitHub Bot commented on DRILL-5783: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150073673 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java --- @@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader ea, } } } + + // TODO make a native RowSetComparison comparator + public static class ObjectComparator implements Comparator { --- End diff -- Defined here, but not used in this file. Does not include all types that Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not allow for ranges for floats & doubles as does JUnit. (Two floats are seldom exactly equal.) > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs batched together with it. The > full description of work is the following: > DRILL-5783 > * A unit test is created for the priority queue in the TopN operator > * The code generation classes passed around a completely unused function > registry reference in some places so I removed it. > * The priority queue had unused parameters for some of its methods so I > removed them. > DRILL-5841 > * There were many many ways in which temporary folders were created in unit > tests. I have unified the way these folders are created with the > DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests > have been updated to use these. The test watchers create temp directories in > ./target//. So all the files generated and used in the context of a test can > easily be found in the same consistent location. > * This change should fix the sporadic hashagg test failures, as well as > failures caused by stray files in /tmp > DRILL-5894 > * dfs_test is used as a storage plugin throughout the unit tests. This is > highly confusing and we can just use dfs instead. > *Misc* > * General code cleanup. > * There are many places where String.format is used unnecessarily. The test > builder methods already use String.format for you when you pass them args. I > cleaned some of these up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it
[ https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246443#comment-16246443 ] ASF GitHub Bot commented on DRILL-5783: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150073945 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java --- @@ -85,8 +85,7 @@ * new row set with the updated columns, then merge the new * and old row sets to create a new immutable row set. */ - - public interface RowSetWriter extends TupleWriter { + interface RowSetWriter extends TupleWriter { --- End diff -- Aren't nested interfaces `protected` by default? Just had to change one from default to `public` so I could use it in another package... > Make code generation in the TopN operator more modular and test it > -- > > Key: DRILL-5783 > URL: https://issues.apache.org/jira/browse/DRILL-5783 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The work for this PR has had several other PRs batched together with it. The > full description of work is the following: > DRILL-5783 > * A unit test is created for the priority queue in the TopN operator > * The code generation classes passed around a completely unused function > registry reference in some places so I removed it. > * The priority queue had unused parameters for some of its methods so I > removed them. > DRILL-5841 > * There were many many ways in which temporary folders were created in unit > tests. I have unified the way these folders are created with the > DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests > have been updated to use these. The test watchers create temp directories in > ./target//. So all the files generated and used in the context of a test can > easily be found in the same consistent location. > * This change should fix the sporadic hashagg test failures, as well as > failures caused by stray files in /tmp > DRILL-5894 > * dfs_test is used as a storage plugin throughout the unit tests. This is > highly confusing and we can just use dfs instead. > *Misc* > * General code cleanup. > * There are many places where String.format is used unnecessarily. The test > builder methods already use String.format for you when you pass them args. I > cleaned some of these up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5948) The wrong number of batches is displayed
[ https://issues.apache.org/jira/browse/DRILL-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246292#comment-16246292 ] Paul Rogers commented on DRILL-5948: As it turns out, Drill will generate two batches even for a single record: * The first batch is empty and carries just the schema. (Used for JDBC/ODBC to report schema up front.) * The second batch carries the first (or only) set of records. >From a metric perspective, we could change Drill to not count the initial, >empty batch. > The wrong number of batches is displayed > > > Key: DRILL-5948 > URL: https://issues.apache.org/jira/browse/DRILL-5948 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Vlad > Attachments: json_profile.json > > > I suppose, when you execute a query with a small amount of data drill must > create 1 batch, but here you can see that drill created 2 batches. I think > it's a wrong behaviour for the drill. Full JSON file will be in the > attachment. > {code:html} > "fragmentProfile": [ > { > "majorFragmentId": 0, > "minorFragmentProfile": [ > { > "state": 3, > "minorFragmentId": 0, > "operatorProfile": [ > { > "inputProfile": [ > { > "records": 1, > "batches": 2, > "schemas": 1 > } > ], > "operatorId": 2, > "operatorType": 29, > "setupNanos": 0, > "processNanos": 1767363740, > "peakLocalMemoryAllocated": 639120, > "waitNanos": 25787 > }, > {code} > Step to reproduce: > # Create JSON file with 1 row > # Execute star query whith this file, for example > {code:sql} > select * from dfs.`/path/to/your/file/example.json` > {code} > # Go to the Profile page on the UI, and open info about your query > # Open JSON profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"
[ https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246283#comment-16246283 ] ASF GitHub Bot commented on DRILL-5923: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1021 @arina-ielchiieva, it helps to think about the source of the enum. This is a Protobuf enum. The ordinal values cannot change; they are a contract between sender and receiver. We can add new ones, or retire old ones, but otherwise the values are frozen in time. The array approach captures this reality. We could document the array better: ``` String displayNames[] = { "First Value", // FIRST_VALUE = 0 "Second Value", // SECOND_VALUE = 1 ... }; ``` We can also do a bounds check: ``` if (enumValue.ordinal() >= displayNames.length) { return enumValue.toString(); else return displayNames[enumValue.ordinal()); } ``` But, IMHO a map seems overkill for such a simple task. Yes, it works, but is unnecessary. As they say, "make it as simple as possible (but no simpler)." > State of a successfully completed query shown as "COMPLETED" > > > Key: DRILL-5923 > URL: https://issues.apache.org/jira/browse/DRILL-5923 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: 1.12.0 > > > Drill UI currently lists a successfully completed query as "COMPLETED". > Successfully completed, failed and canceled queries are all grouped as > Completed queries. > It would be better to list the state of a successfully completed query as > "Succeeded" to avoid confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5259) Allow listing a user-defined number of profiles
[ https://issues.apache.org/jira/browse/DRILL-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua closed DRILL-5259. --- Verified and committed to Apache master > Allow listing a user-defined number of profiles > > > Key: DRILL-5259 > URL: https://issues.apache.org/jira/browse/DRILL-5259 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.9.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Trivial > Fix For: 1.10.0, 1.12.0 > > > Currently, the web UI only lists the last 100 profiles. > This count is currently hard coded. The proposed change would be to create an > option in drill-override.conf to provide a flexible default value, and also > an option within the UI (via optional parameter in the path). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5802) Provide a sortable table for tables within a query profile
[ https://issues.apache.org/jira/browse/DRILL-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua closed DRILL-5802. --- Verified and Committed into Master on 2 Oct, 2017. > Provide a sortable table for tables within a query profile > -- > > Key: DRILL-5802 > URL: https://issues.apache.org/jira/browse/DRILL-5802 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4286) Have an ability to put server in quiescent mode of operation
[ https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246219#comment-16246219 ] ASF GitHub Bot commented on DRILL-4286: --- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/921#discussion_r150047464 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -348,6 +354,21 @@ public void run() { */ } + /* +Check if the foreman is ONLINE. If not dont accept any new queries. + */ + public void checkForemanState() throws ForemanException{ +DrillbitEndpoint foreman = drillbitContext.getEndpoint(); +Collection dbs = drillbitContext.getAvailableBits(); --- End diff -- I was thinking of encapsulating code from lines 360 to 367 into a boolean isOnline(), since all the values in that code are derived from the current DrillbitContext. Then your code would be simplified to ` public void checkForemanState() throws ForemanException{ if (!drillbitContext.isOnline()) { throw new ForemanException("Query submission failed since Foreman is shutting down."); } } ` > Have an ability to put server in quiescent mode of operation > > > Key: DRILL-4286 > URL: https://issues.apache.org/jira/browse/DRILL-4286 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - Flow >Reporter: Victoria Markman >Assignee: Venkata Jyothsna Donapati > > I think drill will benefit from mode of operation that is called "quiescent" > in some databases. > From IBM Informix server documentation: > {code} > Change gracefully from online to quiescent mode > Take the database server gracefully from online mode to quiescent mode to > restrict access to the database server without interrupting current > processing. After you perform this task, the database server sets a flag that > prevents new sessions from gaining access to the database server. The current > sessions are allowed to finish processing. After you initiate the mode > change, it cannot be canceled. During the mode change from online to > quiescent, the database server is considered to be in Shutdown mode. > {code} > This is different from shutdown, when processes are terminated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reopened DRILL-5867: - > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5803) Show the hostname for each minor fragment in operator table
[ https://issues.apache.org/jira/browse/DRILL-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua closed DRILL-5803. --- Verified and committed into Apache master on 26 September, 2017 > Show the hostname for each minor fragment in operator table > --- > > Key: DRILL-5803 > URL: https://issues.apache.org/jira/browse/DRILL-5803 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Reporter: Kunal Khatua >Assignee: Kunal Khatua > Fix For: 1.12.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246196#comment-16246196 ] ASF GitHub Bot commented on DRILL-5867: --- Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1029 Snapshot when testing with search filter for FAILED queries and navigating to page 2 of that list. Information about the number of filtered items, etc is also provided. ![image](https://user-images.githubusercontent.com/4335237/32622085-a8826c56-c536-11e7-9a18-7a09142b250e.png) > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246193#comment-16246193 ] ASF GitHub Bot commented on DRILL-5867: --- Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1029 Snapshot when rendering the defaults (10 per page) from a pre-loaded set of the latest 123 profiles ![image](https://user-images.githubusercontent.com/4335237/32621917-412a90ba-c536-11e7-9d51-83220ce072d3.png) The query snippet is restricted to 8 lines at most and indicates if there is more to the query text with a trailing set of `...` > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5867: Attachment: FilteringFailed.png > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png, FilteringFailed.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5867) List profiles in pages rather than a long verbose listing
[ https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5867: Attachment: DefaultRendering.png > List profiles in pages rather than a long verbose listing > - > > Key: DRILL-5867 > URL: https://issues.apache.org/jira/browse/DRILL-5867 > Project: Apache Drill > Issue Type: Sub-task > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Fix For: 1.12.0 > > Attachments: DefaultRendering.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246101#comment-16246101 ] ASF GitHub Bot commented on DRILL-4779: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150030044 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 +0.11.0.1 +**/KafkaTestSuit.class + + + + + +org.apache.maven.plugins +maven-surefire-plugin + + +${kafka.TestSuite} + + +**/TestKafkaQueries.java + + + + logback.log.dir + ${project.build.directory}/surefire-reports + + + + + + + + + + org.apache.drill.exec + drill-java-exec + ${project.version} + + --- End diff -- Why is it necessary to exclude zookeeper? If a specific version of zookeeper is required, will it be better to explicitly add zookeeper to the dependency management? > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246099#comment-16246099 ] ASF GitHub Bot commented on DRILL-4779: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150028303 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 +0.11.0.1 +**/KafkaTestSuit.class --- End diff -- What is the reason to define `kafka.TestSuite` property? > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246100#comment-16246100 ] ASF GitHub Bot commented on DRILL-4779: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150029170 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 +0.11.0.1 +**/KafkaTestSuit.class + + + + + +org.apache.maven.plugins +maven-surefire-plugin + --- End diff -- It will be better to go with the default `maven-surefire-plugin` configuration unless there is a good justification to use custom config. Most of the time this can be achieved by using default test name convention. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5948) The wrong number of batches is displayed
[ https://issues.apache.org/jira/browse/DRILL-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad updated DRILL-5948: Attachment: json_profile.json JSON profile of the query. > The wrong number of batches is displayed > > > Key: DRILL-5948 > URL: https://issues.apache.org/jira/browse/DRILL-5948 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Vlad > Attachments: json_profile.json > > > I suppose, when you execute a query with a small amount of data drill must > create 1 batch, but here you can see that drill created 2 batches. I think > it's a wrong behaviour for the drill. Full JSON file will be in the > attachment. > {code:html} > "fragmentProfile": [ > { > "majorFragmentId": 0, > "minorFragmentProfile": [ > { > "state": 3, > "minorFragmentId": 0, > "operatorProfile": [ > { > "inputProfile": [ > { > "records": 1, > "batches": 2, > "schemas": 1 > } > ], > "operatorId": 2, > "operatorType": 29, > "setupNanos": 0, > "processNanos": 1767363740, > "peakLocalMemoryAllocated": 639120, > "waitNanos": 25787 > }, > {code} > Step to reproduce: > # Create JSON file with 1 row > # Execute star query whith this file, for example > {code:sql} > select * from dfs.`/path/to/your/file/example.json` > {code} > # Go to the Profile page on the UI, and open info about your query > # Open JSON profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5948) The wrong number of batches is displayed
Vlad created DRILL-5948: --- Summary: The wrong number of batches is displayed Key: DRILL-5948 URL: https://issues.apache.org/jira/browse/DRILL-5948 Project: Apache Drill Issue Type: Bug Affects Versions: 1.11.0 Reporter: Vlad I suppose, when you execute a query with a small amount of data drill must create 1 batch, but here you can see that drill created 2 batches. I think it's a wrong behaviour for the drill. Full JSON file will be in the attachment. {code:html} "fragmentProfile": [ { "majorFragmentId": 0, "minorFragmentProfile": [ { "state": 3, "minorFragmentId": 0, "operatorProfile": [ { "inputProfile": [ { "records": 1, "batches": 2, "schemas": 1 } ], "operatorId": 2, "operatorType": 29, "setupNanos": 0, "processNanos": 1767363740, "peakLocalMemoryAllocated": 639120, "waitNanos": 25787 }, {code} Step to reproduce: # Create JSON file with 1 row # Execute star query whith this file, for example {code:sql} select * from dfs.`/path/to/your/file/example.json` {code} # Go to the Profile page on the UI, and open info about your query # Open JSON profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246003#comment-16246003 ] ASF GitHub Bot commented on DRILL-4779: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150019576 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 --- End diff -- If the setting is necessary, it will be better to set it at the root pom. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"
[ https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245865#comment-16245865 ] ASF GitHub Bot commented on DRILL-5923: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149997750 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +import java.util.Collections; +import java.util.Map; + +import com.google.common.collect.Maps; + +public class ProfileUtil { + // Mapping query state names to display names + private static final MapqueryStateDisplayName; + + static { +Map displayNames = Maps.newHashMap(); --- End diff -- 1. Please use `Map ` since you're already receiving `QueryState` as in parameter in method. Besides, it would guarantee you did not make mistake writing query state enum names. 2. `queryStateDisplayName` -> `queryStateDisplayNames` > State of a successfully completed query shown as "COMPLETED" > > > Key: DRILL-5923 > URL: https://issues.apache.org/jira/browse/DRILL-5923 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: 1.12.0 > > > Drill UI currently lists a successfully completed query as "COMPLETED". > Successfully completed, failed and canceled queries are all grouped as > Completed queries. > It would be better to list the state of a successfully completed query as > "Succeeded" to avoid confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245883#comment-16245883 ] ASF GitHub Bot commented on DRILL-4779: --- Github user akumarb2010 commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150002516 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/DrillKafkaConfig.java --- @@ -0,0 +1,31 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +public class DrillKafkaConfig { + + /** + * Timeout for fetching messages from Kafka --- End diff -- Thanks Paul, this is very good point and it perfectly make sense to add them as Drill session options instead of Drill config properties. We are working on these changes. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"
[ https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245866#comment-16245866 ] ASF GitHub Bot commented on DRILL-5923: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149998367 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +import java.util.Collections; +import java.util.Map; + +import com.google.common.collect.Maps; + +public class ProfileUtil { + // Mapping query state names to display names + private static final MapqueryStateDisplayName; + + static { +Map displayNames = Maps.newHashMap(); +displayNames.put("STARTING", "Starting"); +displayNames.put("RUNNING", "Running"); +displayNames.put("COMPLETED", "Succeeded"); +displayNames.put("CANCELED", "Canceled"); +displayNames.put("FAILED", "Failed"); +displayNames.put("CANCELLATION_REQUESTED", "Cancellation Requested"); +displayNames.put("ENQUEUED", "Enqueued"); +queryStateDisplayName = Collections.unmodifiableMap(displayNames); + } + + + /** + * Utility to return display name for query state + * @param queryState + * @return display string for query state + */ + public final static String getQueryStateDisplayName(QueryState queryState) { +String state = queryState.name(); +if (queryStateDisplayName.containsKey(state)) { --- End diff -- This would be more optimal: ``` String state = queryStateDisplayNames.get(queryState); if (state == null) { state = "Unknown State" } return state; ``` > State of a successfully completed query shown as "COMPLETED" > > > Key: DRILL-5923 > URL: https://issues.apache.org/jira/browse/DRILL-5923 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: 1.12.0 > > > Drill UI currently lists a successfully completed query as "COMPLETED". > Successfully completed, failed and canceled queries are all grouped as > Completed queries. > It would be better to list the state of a successfully completed query as > "Succeeded" to avoid confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5717) change some date time unit cases with specific timezone or Local
[ https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-5717: --- Assignee: Arina Ielchiieva > change some date time unit cases with specific timezone or Local > > > Key: DRILL-5717 > URL: https://issues.apache.org/jira/browse/DRILL-5717 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Assignee: Arina Ielchiieva > Labels: ready-to-commit > > Some date time test cases like JodaDateValidatorTest is not Local > independent .This will cause other Local's users's test phase to fail. We > should let these test cases to be Local env independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5921) Counters metrics should be listed in table
[ https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5921: Labels: ready-to-commit (was: ) > Counters metrics should be listed in table > -- > > Key: DRILL-5921 > URL: https://issues.apache.org/jira/browse/DRILL-5921 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > Counter metrics are currently displayed as json string in the Drill UI. They > should be listed in a table similar to other metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5717) change some date time unit cases with specific timezone or Local
[ https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-5717: --- Assignee: (was: Arina Ielchiieva) > change some date time unit cases with specific timezone or Local > > > Key: DRILL-5717 > URL: https://issues.apache.org/jira/browse/DRILL-5717 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong > Labels: ready-to-commit > > Some date time test cases like JodaDateValidatorTest is not Local > independent .This will cause other Local's users's test phase to fail. We > should let these test cases to be Local env independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5717) change some date time unit cases with specific timezone or Local
[ https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5717: Labels: ready-to-commit (was: ) > change some date time unit cases with specific timezone or Local > > > Key: DRILL-5717 > URL: https://issues.apache.org/jira/browse/DRILL-5717 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Assignee: Arina Ielchiieva > Labels: ready-to-commit > > Some date time test cases like JodaDateValidatorTest is not Local > independent .This will cause other Local's users's test phase to fail. We > should let these test cases to be Local env independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table
[ https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245835#comment-16245835 ] ASF GitHub Bot commented on DRILL-5921: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1020 +1, LGTM. > Counters metrics should be listed in table > -- > > Key: DRILL-5921 > URL: https://issues.apache.org/jira/browse/DRILL-5921 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > Counter metrics are currently displayed as json string in the Drill UI. They > should be listed in a table similar to other metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table
[ https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245834#comment-16245834 ] ASF GitHub Bot commented on DRILL-5921: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1020#discussion_r149994673 --- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl --- @@ -138,21 +154,14 @@ }); }; -function updateOthers(metrics) { - $.each(["counters", "meters"], function(i, key) { -if(! $.isEmptyObject(metrics[key])) { - $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2)); -} - }); -}; - var update = function() { $.get("/status/metrics", function(metrics) { updateGauges(metrics.gauges); updateBars(metrics.gauges); if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, "timers"); if(! $.isEmptyObject(metrics.histograms)) createTable(metrics.histograms, "histograms"); -updateOthers(metrics); +if(! $.isEmptyObject(metrics.counters)) createCountersTable(metrics.counters); +if(! $.isEmptyObject(metrics.meters)) $("#metersVal").html(JSON.stringify(metrics.meters, null, 2)); --- End diff -- Well, sounds good then, thanks for making the changes. > Counters metrics should be listed in table > -- > > Key: DRILL-5921 > URL: https://issues.apache.org/jira/browse/DRILL-5921 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > Counter metrics are currently displayed as json string in the Drill UI. They > should be listed in a table similar to other metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5717) change some date time unit cases with specific timezone or Local
[ https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245836#comment-16245836 ] ASF GitHub Bot commented on DRILL-5717: --- Github user vvysotskyi commented on the issue: https://github.com/apache/drill/pull/904 @weijietong, thanks for the pull request, +1 > change some date time unit cases with specific timezone or Local > > > Key: DRILL-5717 > URL: https://issues.apache.org/jira/browse/DRILL-5717 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong > > Some date time test cases like JodaDateValidatorTest is not Local > independent .This will cause other Local's users's test phase to fail. We > should let these test cases to be Local env independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5771: - Reviewer: Timothy Farkas > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245783#comment-16245783 ] ASF GitHub Bot commented on DRILL-5771: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1014 @ilooner can you please review this? > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits
[ https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5941: - Reviewer: Padma Penumarthy > Skip header / footer logic works incorrectly for Hive tables when file has > several input splits > --- > > Key: DRILL-5941 > URL: https://issues.apache.org/jira/browse/DRILL-5941 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: Future > > > *To reproduce* > 1. Create csv file with two columns (key, value) for 329 rows, where > first row is a header. > The data file has size of should be greater than chunk size of 256 MB. Copy > file to the distributed file system. > 2. Create table in Hive: > {noformat} > CREATE EXTERNAL TABLE `h_table`( > `key` bigint, > `value` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/tmp/h_table' > TBLPROPERTIES ( > 'skip.header.line.count'='1'); > {noformat} > 3. Execute query {{select * from hive.h_table}} in Drill (query data using > Hive plugin). The result will return less rows then expected. Expected result > is 328 (total count minus one row as header). > *The root cause* > Since file is greater than default chunk size, it's split into several > fragments, known as input splits. For example: > {noformat} > maprfs:/tmp/h_table/h_table.csv:0+268435456 > maprfs:/tmp/h_table/h_table.csv:268435457+492782112 > {noformat} > TextHiveReader is responsible for handling skip header and / or footer logic. > Currently Drill creates reader [for each input > split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84] > and skip header and /or footer logic is applied for each input splits, > though ideally the above mentioned input splits should have been read by one > reader, so skip / header footer logic was applied correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits
[ https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245778#comment-16245778 ] ASF GitHub Bot commented on DRILL-5941: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1030 @ppadma can you review this? > Skip header / footer logic works incorrectly for Hive tables when file has > several input splits > --- > > Key: DRILL-5941 > URL: https://issues.apache.org/jira/browse/DRILL-5941 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: Future > > > *To reproduce* > 1. Create csv file with two columns (key, value) for 329 rows, where > first row is a header. > The data file has size of should be greater than chunk size of 256 MB. Copy > file to the distributed file system. > 2. Create table in Hive: > {noformat} > CREATE EXTERNAL TABLE `h_table`( > `key` bigint, > `value` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/tmp/h_table' > TBLPROPERTIES ( > 'skip.header.line.count'='1'); > {noformat} > 3. Execute query {{select * from hive.h_table}} in Drill (query data using > Hive plugin). The result will return less rows then expected. Expected result > is 328 (total count minus one row as header). > *The root cause* > Since file is greater than default chunk size, it's split into several > fragments, known as input splits. For example: > {noformat} > maprfs:/tmp/h_table/h_table.csv:0+268435456 > maprfs:/tmp/h_table/h_table.csv:268435457+492782112 > {noformat} > TextHiveReader is responsible for handling skip header and / or footer logic. > Currently Drill creates reader [for each input > split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84] > and skip header and /or footer logic is applied for each input splits, > though ideally the above mentioned input splits should have been read by one > reader, so skip / header footer logic was applied correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"
[ https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245731#comment-16245731 ] ASF GitHub Bot commented on DRILL-5923: --- Github user prasadns14 commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149974787 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/QueryStateDisplayName.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +public class QueryStateDisplayName { + // Values should correspond to the QueryState enum in UserBitShared.proto --- End diff -- @arina-ielchiieva yes, map will definitely make it to easier to visualize the mapping. Made the changes > State of a successfully completed query shown as "COMPLETED" > > > Key: DRILL-5923 > URL: https://issues.apache.org/jira/browse/DRILL-5923 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: 1.12.0 > > > Drill UI currently lists a successfully completed query as "COMPLETED". > Successfully completed, failed and canceled queries are all grouped as > Completed queries. > It would be better to list the state of a successfully completed query as > "Succeeded" to avoid confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table
[ https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245717#comment-16245717 ] ASF GitHub Bot commented on DRILL-5921: --- Github user prasadns14 commented on a diff in the pull request: https://github.com/apache/drill/pull/1020#discussion_r149972346 --- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl --- @@ -138,21 +154,14 @@ }); }; -function updateOthers(metrics) { - $.each(["counters", "meters"], function(i, key) { -if(! $.isEmptyObject(metrics[key])) { - $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2)); -} - }); -}; - var update = function() { $.get("/status/metrics", function(metrics) { updateGauges(metrics.gauges); updateBars(metrics.gauges); if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, "timers"); if(! $.isEmptyObject(metrics.histograms)) createTable(metrics.histograms, "histograms"); -updateOthers(metrics); +if(! $.isEmptyObject(metrics.counters)) createCountersTable(metrics.counters); +if(! $.isEmptyObject(metrics.meters)) $("#metersVal").html(JSON.stringify(metrics.meters, null, 2)); --- End diff -- @arina-ielchiieva, I have considered reusing existing methods before deciding to have a separate method. With the above suggestion, the table will now look as below- drill.connections.rpc.control.encrypted| {count: 0} '|' here is column delimiter. Do we want to display only the number in the second column or a key/value pair? I just wanted it to be consistent with the other metrics tables. (so I print value.count) Removed meters section. > Counters metrics should be listed in table > -- > > Key: DRILL-5921 > URL: https://issues.apache.org/jira/browse/DRILL-5921 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya >Priority: Minor > Fix For: 1.12.0 > > > Counter metrics are currently displayed as json string in the Drill UI. They > should be listed in a table similar to other metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits
[ https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5941: Fix Version/s: (was: 1.12.0) Future > Skip header / footer logic works incorrectly for Hive tables when file has > several input splits > --- > > Key: DRILL-5941 > URL: https://issues.apache.org/jira/browse/DRILL-5941 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: Future > > > *To reproduce* > 1. Create csv file with two columns (key, value) for 329 rows, where > first row is a header. > The data file has size of should be greater than chunk size of 256 MB. Copy > file to the distributed file system. > 2. Create table in Hive: > {noformat} > CREATE EXTERNAL TABLE `h_table`( > `key` bigint, > `value` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/tmp/h_table' > TBLPROPERTIES ( > 'skip.header.line.count'='1'); > {noformat} > 3. Execute query {{select * from hive.h_table}} in Drill (query data using > Hive plugin). The result will return less rows then expected. Expected result > is 328 (total count minus one row as header). > *The root cause* > Since file is greater than default chunk size, it's split into several > fragments, known as input splits. For example: > {noformat} > maprfs:/tmp/h_table/h_table.csv:0+268435456 > maprfs:/tmp/h_table/h_table.csv:268435457+492782112 > {noformat} > TextHiveReader is responsible for handling skip header and / or footer logic. > Currently Drill creates reader [for each input > split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84] > and skip header and /or footer logic is applied for each input splits, > though ideally the above mentioned input splits should have been read by one > reader, so skip / header footer logic was applied correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits
[ https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245686#comment-16245686 ] ASF GitHub Bot commented on DRILL-5941: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/1030 DRILL-5941: Skip header / footer improvements for Hive storage plugin Overview: 1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941). 2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106). 3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRIL-5106). Code changes: 1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization. It will have three implementations: a. Default (each input split gets its own reader); b. Empty (for empty tables); c. InputSplitGroups (applied when table has header / footer and input splits of the same file should be processed together). 2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0. It will have two implementations: a. Default (records will be processed one by one without buffering); b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing). 3. Allow HiveAbstractReader to have multiple input splits. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-5941 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1030.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1030 > Skip header / footer logic works incorrectly for Hive tables when file has > several input splits > --- > > Key: DRILL-5941 > URL: https://issues.apache.org/jira/browse/DRILL-5941 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.12.0 > > > *To reproduce* > 1. Create csv file with two columns (key, value) for 329 rows, where > first row is a header. > The data file has size of should be greater than chunk size of 256 MB. Copy > file to the distributed file system. > 2. Create table in Hive: > {noformat} > CREATE EXTERNAL TABLE `h_table`( > `key` bigint, > `value` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/tmp/h_table' > TBLPROPERTIES ( > 'skip.header.line.count'='1'); > {noformat} > 3. Execute query {{select * from hive.h_table}} in Drill (query data using > Hive plugin). The result will return less rows then expected. Expected result > is 328 (total count minus one row as header). > *The root cause* > Since file is greater than default chunk size, it's split into several > fragments, known as input splits. For example: > {noformat} > maprfs:/tmp/h_table/h_table.csv:0+268435456 > maprfs:/tmp/h_table/h_table.csv:268435457+492782112 > {noformat} > TextHiveReader is responsible for handling skip header and / or footer logic. > Currently Drill creates reader [for each input > split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84] > and skip header and /or footer logic is applied for each input splits, > though ideally the above mentioned input splits should have been read by one > reader, so skip / header footer logic was applied correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5919: Description: Add session options to allow drill working with non standard json strings number literals like: NaN, Infinity, -Infinity. By default these options will be switched off, the user will be able to toggle them during working session. *For documentation* 1. Added two session options {{store.json.reader.non_numeric_numbers}} and {{store.json.reader.non_numeric_numbers}} that allow to read/write NaN and Infinity as numbers. By default these options are set to false. 2. Extended signature of {{convert_toJSON}} and {{convert_fromJSON}} functions by adding second optional parameter that enables read/write NaN and Infinity. For example: {noformat} select convert_fromJSON('{"key": NaN}') from (values(1)); will result with JsonParseException, but select convert_fromJSON('{"key": NaN}', true) from (values(1)); will parse NaN as a number. {noformat} was:Add session options to allow drill working with non standard json strings number literals like: NaN, Infinity, -Infinity. By default these options will be switched off, the user will be able to toggle them during working session. > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting, ready-to-commit > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. > *For documentation* > 1. Added two session options {{store.json.reader.non_numeric_numbers}} and > {{store.json.reader.non_numeric_numbers}} that allow to read/write NaN and > Infinity as numbers. By default these options are set to false. > 2. Extended signature of {{convert_toJSON}} and {{convert_fromJSON}} > functions by adding second optional parameter that enables read/write NaN and > Infinity. > For example: > {noformat} > select convert_fromJSON('{"key": NaN}') from (values(1)); will result with > JsonParseException, but > select convert_fromJSON('{"key": NaN}', true) from (values(1)); will parse > NaN as a number. > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5919: Labels: doc-impacting ready-to-commit (was: doc-impacting) > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting, ready-to-commit > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245650#comment-16245650 ] ASF GitHub Bot commented on DRILL-5919: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1026 Thanks, +1, LGTM. > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5717) change some date time unit cases with specific timezone or Local
[ https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245620#comment-16245620 ] ASF GitHub Bot commented on DRILL-5717: --- Github user weijietong commented on the issue: https://github.com/apache/drill/pull/904 done > change some date time unit cases with specific timezone or Local > > > Key: DRILL-5717 > URL: https://issues.apache.org/jira/browse/DRILL-5717 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong > > Some date time test cases like JodaDateValidatorTest is not Local > independent .This will cause other Local's users's test phase to fail. We > should let these test cases to be Local env independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite master branch
[ https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245614#comment-16245614 ] Roman Kulyk commented on DRILL-3993: There are 9 error left in the java-exec test suite: {noformat} TestUnionDistinct.testDiffDataTypesAndModes:288->BaseTestQuery.testRunAndReturn:360 » Rpc TestUnionAll.testDiffDataTypesAndModes:272->BaseTestQuery.testRunAndReturn:360 » Rpc TestFunctionsWithTypeExpoQueries.testEqualBetweenIntervalAndTimestampDiff:403->BaseTestQuery.testRunAndReturn:360 » TestExampleQueries.testDRILL_3004:1036->BaseTestQuery.testRunAndReturn:360 » Rpc TestExampleQueries.testFilterInSubqueryAndOutside » UserRemote DATA_READ ERROR... TestNestedLoopJoin.testNLJWithEmptyBatch:229->BaseTestQuery.testRunAndReturn:360 » Rpc TestSqlBracketlessSyntax.checkComplexExpressionParsing:54 » NoClassDefFound co... TestDateTruncFunctions.dateTruncOnIntervalDay:301->BaseTestQuery.testRunAndReturn:360 » Rpc TestUtf8SupportInQueryString.testDisableUtf8SupportInQueryString » Unexpected... {noformat} > Rebase Drill on Calcite master branch > - > > Key: DRILL-3993 > URL: https://issues.apache.org/jira/browse/DRILL-3993 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Sudheesh Katkam >Assignee: Roman Kulyk > > Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure > there are no regressions. > Also, how do we resolve this 'catching up' issue in the long term? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245416#comment-16245416 ] ASF GitHub Bot commented on DRILL-5919: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1026#discussion_r149903182 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java --- @@ -0,0 +1,167 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.drill.exec.vector.complex.writer; + +import com.google.common.collect.ImmutableMap; +import org.apache.commons.io.FileUtils; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.common.exceptions.UserRemoteException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.RecordBatchLoader; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.rpc.user.QueryDataBatch; +import org.apache.drill.exec.vector.VarCharVector; +import org.junit.Test; + +import java.io.File; +import java.util.List; + +import static org.hamcrest.CoreMatchers.containsString; +import static org.junit.Assert.*; + +public class TestJsonNonNumerics extends BaseTestQuery { + + @Test + public void testNonNumericSelect() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); +String json = "{\"nan\":NaN, \"inf\":Infinity}"; +String query = String.format("select * from dfs.`%s`",file.getAbsolutePath()); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `store.json.reader.non_numeric_numbers` = true"); + testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns("nan", "inf") +.baselineValues(Double.NaN, Double.POSITIVE_INFINITY) +.build() +.run(); +} finally { + test("alter session reset `store.json.reader.non_numeric_numbers`"); + FileUtils.deleteQuietly(file); +} + } + + @Test(expected = UserRemoteException.class) + public void testNonNumericFailure() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); +test("alter session set `store.json.reader.non_numeric_numbers` = false"); +String json = "{\"nan\":NaN, \"inf\":Infinity}"; +try { + FileUtils.writeStringToFile(file, json); + test("select * from dfs.`%s`;", file.getAbsolutePath()); +} catch (UserRemoteException e) { + assertThat(e.getMessage(), containsString("Error parsing JSON")); + throw e; +} finally { + test("alter session reset `store.json.reader.non_numeric_numbers`"); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testCreateTableNonNumerics() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); +String json = "{\"nan\":NaN, \"inf\":Infinity}"; +String tableName = "ctas_test"; +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `store.json.reader.non_numeric_numbers` = true"); + test("alter session set `store.json.writer.non_numeric_numbers` = true"); + test("alter session set `store.format`='json'"); + test("create table dfs_test.tmp.`%s` as select * from dfs.`%s`;", tableName, file.getAbsolutePath()); + + // ensuring that `NaN` and `Infinity` tokens ARE NOT enclosed with double quotes + File resultFile = new File(new File(getDfsTestTmpSchemaLocation(),tableName),"0_0_0.json"); + String resultJson = FileUtils.readFileToString(resultFile); + int nanIndex = resultJson.indexOf("NaN"); + assertFalse("`NaN` must not be enclosed with \"\" ", resultJson.charAt(nanIndex - 1) == '"'); + assertFalse("`NaN` must not be enclosed with \"\" ",
[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245417#comment-16245417 ] ASF GitHub Bot commented on DRILL-5919: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1026#discussion_r149903705 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java --- @@ -0,0 +1,167 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.drill.exec.vector.complex.writer; + +import com.google.common.collect.ImmutableMap; +import org.apache.commons.io.FileUtils; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.common.exceptions.UserRemoteException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.RecordBatchLoader; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.rpc.user.QueryDataBatch; +import org.apache.drill.exec.vector.VarCharVector; +import org.junit.Test; + +import java.io.File; +import java.util.List; + +import static org.hamcrest.CoreMatchers.containsString; +import static org.junit.Assert.*; + +public class TestJsonNonNumerics extends BaseTestQuery { + + @Test + public void testNonNumericSelect() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); --- End diff -- It's better to pass dir name as well, rather than emptiness. Ex: `getTempDir("test_nan")` > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5919: Labels: doc-impacting (was: ) > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5919: Reviewer: Arina Ielchiieva > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5863) Sortable table incorrectly sorts minor fragments and time elements lexically instead of sorting by implicit value
[ https://issues.apache.org/jira/browse/DRILL-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-5863: --- Assignee: Kunal Khatua (was: Arina Ielchiieva) > Sortable table incorrectly sorts minor fragments and time elements lexically > instead of sorting by implicit value > - > > Key: DRILL-5863 > URL: https://issues.apache.org/jira/browse/DRILL-5863 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > The fix for this is to use dataTable library's {{data-order}} attribute for > the data elements that need to sort by an implicit value. > ||Old order of Minor Fragment||New order of Minor Fragment|| > |...|...| > |01-09-01 | 01-09-01| > |01-10-01 | 01-10-01| > |01-100-01 | 01-11-01| > |01-101-01 | 01-12-01| > |... | ... | > ||Old order of Duration||New order of Duration||| > |...|...| > |1m15s | 55.03s| > |55s | 1m15s| > |...|...| -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (DRILL-5863) Sortable table incorrectly sorts minor fragments and time elements lexically instead of sorting by implicit value
[ https://issues.apache.org/jira/browse/DRILL-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reopened DRILL-5863: - Assignee: Arina Ielchiieva (was: Kunal Khatua) > Sortable table incorrectly sorts minor fragments and time elements lexically > instead of sorting by implicit value > - > > Key: DRILL-5863 > URL: https://issues.apache.org/jira/browse/DRILL-5863 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Kunal Khatua >Assignee: Arina Ielchiieva >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > The fix for this is to use dataTable library's {{data-order}} attribute for > the data elements that need to sort by an implicit value. > ||Old order of Minor Fragment||New order of Minor Fragment|| > |...|...| > |01-09-01 | 01-09-01| > |01-10-01 | 01-10-01| > |01-100-01 | 01-11-01| > |01-101-01 | 01-12-01| > |... | ... | > ||Old order of Duration||New order of Duration||| > |...|...| > |1m15s | 55.03s| > |55s | 1m15s| > |...|...| -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245378#comment-16245378 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r149893459 --- Diff: contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/cluster/EmbeddedZKQuorum.java --- @@ -0,0 +1,83 @@ +/** --- End diff -- Apache header should be in a form of comment, not Java doc. Please update here and in other newly added files. Hope, somebody will add to check-style so we won't have to remind about it all the time. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245379#comment-16245379 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r149893582 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java --- @@ -343,4 +343,4 @@ public void close() { } } } -} +} --- End diff -- Please revert changes in this file. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245377#comment-16245377 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r149893057 --- Diff: contrib/storage-kafka/src/test/resources/logback-test.xml --- @@ -0,0 +1,51 @@ + --- End diff -- Please remove. Now we have common logging configuration for all in drill-common module. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table
[ https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245358#comment-16245358 ] ASF GitHub Bot commented on DRILL-5921: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1020#discussion_r149891566 --- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl --- @@ -138,21 +154,14 @@ }); }; -function updateOthers(metrics) { - $.each(["counters", "meters"], function(i, key) { -if(! $.isEmptyObject(metrics[key])) { - $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2)); -} - }); -}; - var update = function() { $.get("/status/metrics", function(metrics) { updateGauges(metrics.gauges); updateBars(metrics.gauges); if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, "timers"); if(! $.isEmptyObject(metrics.histograms)) createTable(metrics.histograms, "histograms"); -updateOthers(metrics); +if(! $.isEmptyObject(metrics.counters)) createCountersTable(metrics.counters); +if(! $.isEmptyObject(metrics.meters)) $("#metersVal").html(JSON.stringify(metrics.meters, null, 2)); --- End diff -- @prasadns14 1. Thanks for adding the screenshots. 2. Most of the code in `createTable` and `createCountersTable` coincide. I suggested you make one function. For example, with three parameters, `createTable(metric, name, addReportingClass)`. When you don't need to add reporting class you'll call this method with false. Our goal here is generify existing methods rather then adding new specific with almost the same content. 3. If we don't have any meters, let's remove them. > Counters metrics should be listed in table > -- > > Key: DRILL-5921 > URL: https://issues.apache.org/jira/browse/DRILL-5921 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya >Priority: Minor > Fix For: 1.12.0 > > > Counter metrics are currently displayed as json string in the Drill UI. They > should be listed in a table similar to other metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)