[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247062#comment-16247062
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150160362
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Ok, so I think I see how you've been trying to help me test the server side 
timeout.

You are hoping to have a unit test force the awaiteFirstMessage() throw the 
exception by preventing the server from sending back any batch of data, since 
the sample test data doesn't allow for any query to run sufficiently long. All 
the current tests I've added essentially have already delivered the data from 
the 'Drill Server' to the 'DrillClient', but the application downstream has not 
consumed it.

Your suggestion of putting a `pause` before the `execute()` call got me 
thinking that the timer had already begun after Statement initialization. My 
understanding now is that you're simply asking to block any SCREEN operator 
from sending back any batches. So, the DrillCursor should time out waiting for 
the first batch. In fact, I'm thinking that I don't even need a pause. The 
DrillCursor awaits all the time for something from the SCREEN operator that 
never comes and eventually times out.

However, since the control injection is essentially applying to the 
Connection (`alter session ...`, any other unit tests in parallel execution on 
the same connection, would be affected by this. So, I would need to also undo 
this at the end of the test, if the connection is reused. Or fork off a 
connection exclusively for this.

Was that what you've been suggesting all along?


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247043#comment-16247043
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150157923
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

I don't think you are wrong, but I think the interpretation of the timeout 
is ambiguous. My understanding based on what drivers like Oracle do is to start 
the timeout only when the execute call is made. So, for a regular Statement 
object, just initialization (or even setting the timeout) should not be the 
basis of starting the timer. 
With regards to whether we are testing for the time when only the 
DrillCursor is in operation, we'd need a query that is running sufficiently 
long to timeout before the server can send back anything for the very first 
time. The `awaitFirstMessage()` already has the timeout applied there and 
worked in some of my longer running sample queries. If you're hinting towards 
this, then yes.. it is certainly doesn't hurt to have the test, although the 
timeout does guarantee exactly that.

I'm not familiar with the Drillbit Injection feature, so let me tinker a 
bit to confirm it before I update the PR.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247040#comment-16247040
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/889
  
@amansinha100 thanks for sharing the information.  Got your point. I think 
your propose on  
[CALCITE-1048](https://issues.apache.org/jira/browse/CALCITE-1048) is possible. 
Since [CALCITE-794](https://issues.apache.org/jira/browse/CALCITE-794) has 
completed at version 1.6 ,it seems there's a more perfect solution( to get the 
least max number of all the rels of the RelSubSet). But due to Drill's Caclite 
version is still based on 1.4 , I support your current temp solution. Only 
wonder that whether the explicitly searched RelNode's (such as 
DrillAggregateRel) maxRowCount can represent the best RelNode's maxRowCount ?  


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246967#comment-16246967
 ] 

ASF GitHub Bot commented on DRILL-5923:
---

Github user prasadns14 commented on the issue:

https://github.com/apache/drill/pull/1021
  
@arina-ielchiieva, @paul-rogers 
Reverted to the array approach, also added documentation.


> State of a successfully completed query shown as "COMPLETED"
> 
>
> Key: DRILL-5923
> URL: https://issues.apache.org/jira/browse/DRILL-5923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> Drill UI currently lists a successfully completed query as "COMPLETED". 
> Successfully completed, failed and canceled queries are all grouped as 
> Completed queries. 
> It would be better to list the state of a successfully completed query as 
> "Succeeded" to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5917) Ban org.json:json library in Drill

2017-11-09 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-5917:
--
Reviewer: Arina Ielchiieva

> Ban org.json:json library in Drill
> --
>
> Key: DRILL-5917
> URL: https://issues.apache.org/jira/browse/DRILL-5917
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Vlad Rozov
> Fix For: 1.12.0
>
>
> Apache Drill has dependencies on json.org lib indirectly from two libraries:
> com.mapr.hadoop:maprfs:jar:5.2.1-mapr
> com.mapr.fs:mapr-hbase:jar:5.2.1-mapr
> {noformat}
> [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT
> [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile
> [INFO] |  \- org.json:json:jar:20080701:compile
> [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile
> [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate)
> {noformat}
> Need to make sure we won't have any dependencies from these libs to json.org 
> lib and ban this lib in main pom.xml file.
> Issue is critical since Apache release won't happen until we make sure 
> json.org lib is not used (https://www.apache.org/legal/resolved.html).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5926) TestValueVector tests fail sporadically

2017-11-09 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246936#comment-16246936
 ] 

Vlad Rozov commented on DRILL-5926:
---

It may be OK to increase MaxDirectMemorySize from 3GB to 4GB as a short-term 
workaround to avoid unit test failure for unrelated PRs. In a long-term, it is 
necessary to investigate if memory can be reclaimed from the Pooled Allocator 
and whether tests indeed require more than 3 GB of memory.

[~timothyfarkas] Can you create a separate PR for the workaround.

> TestValueVector tests fail sporadically
> ---
>
> Key: DRILL-5926
> URL: https://issues.apache.org/jira/browse/DRILL-5926
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Trivial
>
> As reported by [~Paul.Rogers]. The following tests fail sporadically with out 
> of memory exception:
> * TestValueVector.testFixedVectorReallocation
> * TestValueVector.testVariableVectorReallocation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5943) Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism

2017-11-09 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5943:
-
  Labels: ready-to-commit  (was: )
Reviewer: Laurent Goujon  (was: Parth Chandra)

> Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism
> ---
>
> Key: DRILL-5943
> URL: https://issues.apache.org/jira/browse/DRILL-5943
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> For PLAIN mechanism we will weaken the strong check introduced with 
> DRILL-5582 to keep the forward compatibility between Drill 1.12 client and 
> Drill 1.9 server. This is fine since with and without this strong check PLAIN 
> mechanism is still vulnerable to MITM during handshake itself unlike mutual 
> authentication protocols like Kerberos.
> Also for keeping forward compatibility with respect to SASL we will treat 
> UNKNOWN_SASL_SUPPORT as valid value. For handshake message received from a 
> client which is running on later version (let say 1.13) then Drillbit (1.12) 
> and having a new value for SaslSupport field which is unknown to server, this 
> field will be decoded as UNKNOWN_SASL_SUPPORT. In this scenario client will 
> be treated as one aware about SASL protocol but server doesn't know exact 
> capabilities of client. Hence the SASL handshake will still be required from 
> server side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246879#comment-16246879
 ] 

ASF GitHub Bot commented on DRILL-5771:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1014
  
Not sure the description here is entirely correct. Let's separate two 
concepts: the plugin (code) and the plugin definition (the stuff in JSON.)

Plugin definitions are stored in ZK and retrieved by the Foreman. There may 
be some form of race condition in the Foreman, but that's not my focus here.

The plugin *definition* is read by the Forman and serialized into the 
physical plan. Each worker reads the definition from the physical plan. For 
this reason, the worker's definition can never be out of date: it is the 
definition used when serializing the plan.

Further, Drill allows table functions which provide query-time name/value 
pair settings for format plugin properties. The only way these can work is to 
be serialized along with the query. So, the actual serialized format plugin 
definition, included with the query, includes both the ZK information and the 
table function information.


> Fix serDe errors for format plugins
> ---
>
> Key: DRILL-5771
> URL: https://issues.apache.org/jira/browse/DRILL-5771
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be 
> successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and 
> then submit it back to Drill.
> One example of found errors is described in the first comment. Another 
> example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded 
> each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can 
> not find one we try to [create 
> one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists 
> based on 
> configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node 
> based on format configuration.
> Then we have sent major fragment to the different node where we used this 
> format configuration we could not get format plugin based on it and 
> deserialization has failed.
> To fix this problem we need to create format plugin during query 
> deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and 
> equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for 
> configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin 
> priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default 
> parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration 
> (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all 
> nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format 
> plugin with configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5936) Refactor MergingRecordBatch based on code review

2017-11-09 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246876#comment-16246876
 ] 

Vlad Rozov commented on DRILL-5936:
---

[~amansinha100] It is based on my code walkthrough completed as part of 
exchange operator analysis. The PR is mostly self-explanatory and the goal is 
to address two deficiencies mentioned in the JIRA description. 

> Refactor MergingRecordBatch based on code review
> 
>
> Key: DRILL-5936
> URL: https://issues.apache.org/jira/browse/DRILL-5936
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> * Reorganize code to remove unnecessary {{pqueue.peek()}}
> * Reuse Node



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5936) Refactor MergingRecordBatch based on code review

2017-11-09 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246872#comment-16246872
 ] 

Aman Sinha commented on DRILL-5936:
---

[~vrozov] the title says 'based on code review'..which code review are you 
referring to ?  can you point me to the other JIRA or PR ?  thanks. 

> Refactor MergingRecordBatch based on code review
> 
>
> Key: DRILL-5936
> URL: https://issues.apache.org/jira/browse/DRILL-5936
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> * Reorganize code to remove unnecessary {{pqueue.peek()}}
> * Reuse Node



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5950) Allow JSON files to be splittable - for sequence of objects format

2017-11-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5950:
--

 Summary: Allow JSON files to be splittable - for sequence of 
objects format
 Key: DRILL-5950
 URL: https://issues.apache.org/jira/browse/DRILL-5950
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Paul Rogers


The JSON plugin format is not currently splittable. This means that every JSON 
file must be read by only a single thread. By contrast, text files are 
splittable.

The key barrier to allowing JSON files to be splittable is the lack of a good 
mechanism to find the start of a record at some arbitrary point in the file. 
Text readers handle this by scanning forward looking for (say) the newline that 
separates records. (Though this process can be thrown off if a newline appears 
in a quoted value, and the start quote appears before the split point.)

However, as was discovered in a previous JSON fix, Drill's form of JSON does 
provide the tools. In standard JSON, a list of records must be stuctured as a 
list:

{code}
[ { text: "first record"},
  { text: "second record"},
  ...
  { text: "final record" }
]
{code}

In this form, it is impossible to find the start of a record without parsing 
from the first character onwards.

But, Drill uses a common, but non-standard, JSON structure that dispenses with 
the array and the commas between records:

{code}
{ text: "first record" }
{ text: "second record" }
...
{ text: "last record" }
{code}

This form does unambiguously allow finding the start of the record. Simply scan 
until we find the tokens: , , possibly separated by white space. 
That sequence is not valid JSON and only occurs between records in the 
sequence-of-records format.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246871#comment-16246871
 ] 

ASF GitHub Bot commented on DRILL-5771:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1014
  
@arina-ielchiieva 

- The parts addressing DRILL-4640 and DRILL-5166 LGTM
- I think the fix for DRILL-5771 LGTM but I would like write down what I 
think is happening and confirm with you that my understanding is correct. This 
is mostly just a learning exercise for me since I am not very familiar with 
this part of the code :).

In DRILL-5771 there were two issues.

## Race Conditions With Format Plugins

### Issue

The following used to happen before the fix:

  1. When using an existing format plugin, the **FormatPlugin** would 
create a **DrillTable** with a **NamedFormatPluginConfig** which only contains 
the name of the format plugin to use.
  1. The **ScanOperator** created for a **DrillTable** will contain the 
**NamedFormatPluginConfig**
  1. When the **ScanOperators** are serialized in to the physical plan the 
serialized **ScanOperator** will only contain the name of the format plugin to 
use.
  1. When a worker deserializes the physical plan to do a scan, he gets the 
name of the **FormatPluginConfig** to use.
  1. The worker then looks up the correct **FormatPlugin** in the 
**FormatCreator** using the name he has.
  1. The worker can get into trouble if the **FormatPlugins** he has cached 
in his **FormatCreator** is out of sync with the rest of the cluster.

### Fix

Race conditions are eliminated because the **DrillTables** returned by the 
**FormatPlugins** no longer contain a **NamedFormatPluginConfig**, they contain 
the full **FormatPluginConfig** not just a name alias. So when a query is 
executed:
  1. The ScanOperator contains the complete **FormatPluginConfig**
  1. When the physical plan is serialized it contains the complete 
**FormatPluginConfig** for each scan operator.
  1. When a worker node deserializes the ScanOperator it also has the 
complete **FormatPluginConfig** so it can reconstruct the **FormatPlugin** 
correctly, whereas previously the worker would have to do a lookup using the 
**FormatPlugin** name in the **FormatCreator** when the cache in the 
**FormatCreator** may be out of sync with the rest of the cluster. 

## FormatPluginConfig Equals and HashCode

### Issue

The **FileSystemPlugin** looks up **FormatPlugins** corresponding to a 
**FormatPluginConfig** in formatPluginsByConfig. However, the 
**FormatPluginConfig** implementations didn't override equals and hashCode.




> Fix serDe errors for format plugins
> ---
>
> Key: DRILL-5771
> URL: https://issues.apache.org/jira/browse/DRILL-5771
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be 
> successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and 
> then submit it back to Drill.
> One example of found errors is described in the first comment. Another 
> example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded 
> each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can 
> not find one we try to [create 
> one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists 
> based on 
> configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node 
> based on format configuration.
> Then we have sent major fragment to the different node where we used this 
> format configuration we could not get format plugin based on it and 
> deserialization has failed.
> To fix this problem we need to create format plugin during query 
> deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key 

[jira] [Updated] (DRILL-5936) Refactor MergingRecordBatch based on code review

2017-11-09 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-5936:
--
Description: 
* Reorganize code to remove unnecessary {{pqueue.peek()}}
* Reuse Node

  was:* 


> Refactor MergingRecordBatch based on code review
> 
>
> Key: DRILL-5936
> URL: https://issues.apache.org/jira/browse/DRILL-5936
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> * Reorganize code to remove unnecessary {{pqueue.peek()}}
> * Reuse Node



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5936) Refactor MergingRecordBatch based on code review

2017-11-09 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-5936:
--
Description: * 

> Refactor MergingRecordBatch based on code review
> 
>
> Key: DRILL-5936
> URL: https://issues.apache.org/jira/browse/DRILL-5936
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> * 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5949) JSON format options should be part of plugin config; not session options

2017-11-09 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246851#comment-16246851
 ] 

Paul Rogers commented on DRILL-5949:


Although, technically, it is quite easy to make this change; backward 
compatibility is a challenge because Drill has no mechanism to assist. Here are 
two possibilities.

First, assign a priority to the settings as follows:

* Table function options (highest priority)
* Session options
* Plugin options
* System options (lowest priority)

That is, if session options are set, use them. Else, use the plugin options, if 
set. Else use the system options (and the system option defaults.) This is 
possible because options now identify the scope in which they are set, so we 
can differentiate session from system options. The problem here is that the 
reader can't actually tell if a setting comes from a table function or from the 
plugin definition, so some work may be required to support this pattern.

Second, modify the system/session options to have three values: 
{{true}}/{{false}}/{{unset}}. If the value is set to {{unset}}, use the plugin 
options. The default option value becomes {{unset}}. If the user changes the 
session (or system) option, this is used. So, if a user has changed the system 
option, and stored the value in ZK, then that setting will be {{true}} or 
{{false}} and will take precedence over the plugin options.

> JSON format options should be part of plugin config; not session options
> 
>
> Key: DRILL-5949
> URL: https://issues.apache.org/jira/browse/DRILL-5949
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>
> Drill provides a JSON record reader. Drill provides two ways to configure 
> this reader:
> * Using the JSON plugin configuration.
> * Using a set of session options.
> The plugin configuration defines the file suffix associated with JSON files. 
> The session options are:
> * {{store.json.all_text_mode}}
> * {{store.json.read_numbers_as_double}}
> * {{store.json.reader.skip_invalid_records}}
> * {{store.json.reader.print_skipped_invalid_record_number}}
> Suppose I have to JSON files from different sources (and keep them in 
> distinct directories.) For the one, I want to use {{all_text_mode}} off as 
> the data is nicely formatted. Also, my numbers are fine, so I want 
> {{read_numbers_as_double}} off.
> But, the other file is a mess and uses a rather ad-hoc format. So, I want 
> these two options turned on.
> As it turns out I often query both files. Today, I must set the session 
> options one way to query my "clean" file, then reverse them to query the 
> "dirty" file.
> Next, I want to join the two files. How do I set the options one way for the 
> "clean" file, and the other for the "dirty" file within the *same query*? 
> Can't.
> Now, consider the text format plugin that can read CSV, TSV, PSV and so on. 
> It has a variety of options. But, the are *not* session options; they are 
> instead options in the plugin definition. This allows me to, say, have a 
> plugin config for CSV-with-headers files that I get from source A, and a 
> different plugin config for my CSV-without-headers files from source B.
> Suppose we applied the text reader technique to the JSON reader. We'd move 
> the session options listed above into the JSON format plugin. Then, I can 
> define one plugin for my "clean" files, and a different plugin config for my 
> "dirty" files.
> What's more, I can then use table functions to adjust the format for each 
> file as needed within a single query. Since table functions are part of a 
> query, I can add them to a view that I define for the various JSON files.
> The result is a far simpler user experience than the tedium of resetting 
> session options for every query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246849#comment-16246849
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150131105
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Yes, I'm wrong? (asking because the rest of the sentence suggest I was 
right in my interpretation of the test). Maybe we can/should test both? I would 
have like to test for the first batch, but it's not possible to access the 
query id until `statement.execute()`, and I'd need it to unpause the request.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5936) Refactor MergingRecordBatch based on code review

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246828#comment-16246828
 ] 

ASF GitHub Bot commented on DRILL-5936:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1025
  
@amansinha100 can you review this change?


> Refactor MergingRecordBatch based on code review
> 
>
> Key: DRILL-5936
> URL: https://issues.apache.org/jira/browse/DRILL-5936
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246827#comment-16246827
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150127658
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Yes. So I'm testing for the part when the batch has been fetched byt 
DrillCursor but not consumed via the DrillResultSetImpl. That's why I found the 
need for pausing the Screen operator odd and, hence, the question.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DRILL-5926) TestValueVector tests fail sporadically

2017-11-09 Thread Pritesh Maker (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238479#comment-16238479
 ] 

Pritesh Maker edited comment on DRILL-5926 at 11/10/17 12:33 AM:
-

Is this the PR? https://github.com/apache/drill/pull/1023 

Should we create a separate PR for this issue?

cc [~vrozov]


was (Author: priteshm):
Is this the PR? https://github.com/apache/drill/pull/1023 

Should we create a separate PR for this issue?

> TestValueVector tests fail sporadically
> ---
>
> Key: DRILL-5926
> URL: https://issues.apache.org/jira/browse/DRILL-5926
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Trivial
>
> As reported by [~Paul.Rogers]. The following tests fail sporadically with out 
> of memory exception:
> * TestValueVector.testFixedVectorReallocation
> * TestValueVector.testVariableVectorReallocation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246775#comment-16246775
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150119338
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

I wonder if we actually test timeout during DrillCursor operations. It 
seems your test relies on the user being slow to read data from the result set 
although the data has already been fetched by the client. Am I wrong?


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5943) Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246769#comment-16246769
 ] 

ASF GitHub Bot commented on DRILL-5943:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/1028#discussion_r150118518
  
--- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.hpp ---
@@ -59,6 +59,12 @@ class SaslAuthenticatorImpl {
 
 const char *getErrorMessage(int errorCode);
 
+static const std::string KERBEROS_SIMPLE_NAME;
+
+static const std::string KERBEROS_SASL_NAME;
--- End diff --

do we need to expose it? (it looks like we only look for the keys)


> Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism
> ---
>
> Key: DRILL-5943
> URL: https://issues.apache.org/jira/browse/DRILL-5943
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
> Fix For: 1.12.0
>
>
> For PLAIN mechanism we will weaken the strong check introduced with 
> DRILL-5582 to keep the forward compatibility between Drill 1.12 client and 
> Drill 1.9 server. This is fine since with and without this strong check PLAIN 
> mechanism is still vulnerable to MITM during handshake itself unlike mutual 
> authentication protocols like Kerberos.
> Also for keeping forward compatibility with respect to SASL we will treat 
> UNKNOWN_SASL_SUPPORT as valid value. For handshake message received from a 
> client which is running on later version (let say 1.13) then Drillbit (1.12) 
> and having a new value for SaslSupport field which is unknown to server, this 
> field will be decoded as UNKNOWN_SASL_SUPPORT. In this scenario client will 
> be treated as one aware about SASL protocol but server doesn't know exact 
> capabilities of client. Hence the SASL handshake will still be required from 
> server side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5863) Sortable table incorrectly sorts minor fragments and time elements lexically instead of sorting by implicit value

2017-11-09 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5863:
-
Reviewer: Paul Rogers  (was: Paul Rogers)

> Sortable table incorrectly sorts minor fragments and time elements lexically 
> instead of sorting by implicit value
> -
>
> Key: DRILL-5863
> URL: https://issues.apache.org/jira/browse/DRILL-5863
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> The fix for this is to use dataTable library's {{data-order}} attribute for 
> the data elements that need to sort by an implicit value.
> ||Old order of Minor Fragment||New order of Minor Fragment||
> |...|...|
> |01-09-01  | 01-09-01|
> |01-10-01  | 01-10-01|
> |01-100-01 | 01-11-01|
> |01-101-01 | 01-12-01|
> |... | ... |
> ||Old order of Duration||New order of Duration|||
> |...|...|
> |1m15s  | 55.03s|
> |55s  | 1m15s|
> |...|...|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5717) change some date time unit cases with specific timezone or Local

2017-11-09 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5717:


Assignee: weijie.tong

> change some date time unit cases with specific timezone or Local
> 
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>  Labels: ready-to-commit
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246617#comment-16246617
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150097140
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Ah, forgot that the file defines an interface, not a class. (The situation 
I described was an interface nested inside a class.) So, you're good.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5949) JSON format options should be part of plugin config; not session options

2017-11-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5949:
--

 Summary: JSON format options should be part of plugin config; not 
session options
 Key: DRILL-5949
 URL: https://issues.apache.org/jira/browse/DRILL-5949
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Paul Rogers


Drill provides a JSON record reader. Drill provides two ways to configure this 
reader:

* Using the JSON plugin configuration.
* Using a set of session options.

The plugin configuration defines the file suffix associated with JSON files. 
The session options are:

* {{store.json.all_text_mode}}
* {{store.json.read_numbers_as_double}}
* {{store.json.reader.skip_invalid_records}}
* {{store.json.reader.print_skipped_invalid_record_number}}

Suppose I have to JSON files from different sources (and keep them in distinct 
directories.) For the one, I want to use {{all_text_mode}} off as the data is 
nicely formatted. Also, my numbers are fine, so I want 
{{read_numbers_as_double}} off.

But, the other file is a mess and uses a rather ad-hoc format. So, I want these 
two options turned on.

As it turns out I often query both files. Today, I must set the session options 
one way to query my "clean" file, then reverse them to query the "dirty" file.

Next, I want to join the two files. How do I set the options one way for the 
"clean" file, and the other for the "dirty" file within the *same query*? Can't.

Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It 
has a variety of options. But, the are *not* session options; they are instead 
options in the plugin definition. This allows me to, say, have a plugin config 
for CSV-with-headers files that I get from source A, and a different plugin 
config for my CSV-without-headers files from source B.

Suppose we applied the text reader technique to the JSON reader. We'd move the 
session options listed above into the JSON format plugin. Then, I can define 
one plugin for my "clean" files, and a different plugin config for my "dirty" 
files.

What's more, I can then use table functions to adjust the format for each file 
as needed within a single query. Since table functions are part of a query, I 
can add them to a view that I define for the various JSON files.

The result is a far simpler user experience than the tedium of resetting 
session options for every query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246608#comment-16246608
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096261
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

IntelliJ gave a warning that the modifier is redundant. Also an interface 
nested inside another interface is public by default.

https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246639#comment-16246639
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r15009
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

This is used in the DrillTestWrapper to verify the ordering of results. I 
agree this is not suitable for equality tests, but it's intended to be used 
only for ordering tests. I didn't add support for all the supported RowSet 
types because we would first have to move DrillTestWrapper to use RowSets 
(currently it uses Maps and Lists to represent data). Currently it is not used 
by RowSets, but the intention is to move DrillTestWrapper to use RowSets and 
then make this comparator operate on RowSets, but that will be an incremental 
process.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246609#comment-16246609
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096444
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Sounds Good


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a 

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246547#comment-16246547
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087815
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
--- End diff --

When will the columns be null? Not sure this is a valid state. However, as 
noted above, an empty list is a valid state (used for `COUNT(*)` queries.)


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246550#comment-16246550
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150086335
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246553#comment-16246553
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150086292
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246551#comment-16246551
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087650
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
--- End diff --

The column list can be shared by all readers, and so can be created outside 
of the loop over scan specs.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246548#comment-16246548
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150084784
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246552#comment-16246552
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087367
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246544#comment-16246544
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150088237
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
+readers.add(new KafkaRecordReader(scanSpec, columns, context, 
subScan.getKafkaStoragePlugin()));
+  } catch (Exception e) {
+logger.error("KafkaRecordReader creation failed for subScan:  " + 
subScan + ".",e);
--- End diff --

Here we are catching all errors and putting a generic messages into the 
log, and sending a generic exception up the stack. Better is to use a 
`UserException` at the actual point of failure so we can tell the user exactly 
what is wrong. Then, here, have a `catch` block for `UserException` that simply 
rethrows, while handling all other exceptions as is done here.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246549#comment-16246549
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150084039
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246543#comment-16246543
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150084335
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246545#comment-16246545
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150083104
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246542#comment-16246542
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150083767
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246541#comment-16246541
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150082981
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246540#comment-16246540
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150081972
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
--- End diff --

Preparing columns is really quite difficult with many cases to handle. 
Drill appears to allow projection of the form `a.b`, `a.c` which means that `a` 
is a map and we wish to project just `b` and `c` from `a`.

As it turns 

[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246546#comment-16246546
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087581
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
--- End diff --

`getCoulmns()` --> `getColumns()`


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246539#comment-16246539
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150082711
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+  

[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246535#comment-16246535
 ] 

ASF GitHub Bot commented on DRILL-5867:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1029#discussion_r150086018
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String 
queryId, long startTime, long
   this.time = new Date(startTime);
   this.foreman = foreman;
   this.link = generateLink(drillConfig, foreman, queryId);
-  this.query = query.substring(0,  Math.min(query.length(), 150));
+  this.query = extractQuerySnippet(query);
   this.state = state;
   this.user = user;
   this.totalCost = totalCost;
   this.queueName = queueName;
 }
 
+private String extractQuerySnippet(String queryText) {
+  //Extract upto max char limit as snippet
+  String sizeCappedQuerySnippet = queryText.substring(0,  
Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR));
+  //Trimming down based on line-count
+  if ( QUERY_SNIPPET_MAX_LINES < 
sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {
--- End diff --

1. We can create variable for 
`sizeCappedQuerySnippet.split(System.lineSeparator())` so we do split only once.
2. Please remove spaces in `if` clause: `if ( QUERY_SNIPPET_MAX_LINES < 
sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {` -> `if 
(QUERY_SNIPPET_MAX_LINES < splittedQuery.length) {` and in `if ( 
++linesConstructed < QUERY_SNIPPET_MAX_LINES ) {` in the code below.


> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246537#comment-16246537
 ] 

ASF GitHub Bot commented on DRILL-5867:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1029#discussion_r150086785
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String 
queryId, long startTime, long
   this.time = new Date(startTime);
   this.foreman = foreman;
   this.link = generateLink(drillConfig, foreman, queryId);
-  this.query = query.substring(0,  Math.min(query.length(), 150));
+  this.query = extractQuerySnippet(query);
   this.state = state;
   this.user = user;
   this.totalCost = totalCost;
   this.queueName = queueName;
 }
 
+private String extractQuerySnippet(String queryText) {
+  //Extract upto max char limit as snippet
+  String sizeCappedQuerySnippet = queryText.substring(0,  
Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR));
+  //Trimming down based on line-count
+  if ( QUERY_SNIPPET_MAX_LINES < 
sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {
+int linesConstructed = 0;
+StringBuilder lineCappedQuerySnippet = new StringBuilder();
+String[] queryParts = 
sizeCappedQuerySnippet.split(System.lineSeparator());
+for (String qPart : queryParts) {
+  lineCappedQuerySnippet.append(qPart);
+  if ( ++linesConstructed < QUERY_SNIPPET_MAX_LINES ) {
+lineCappedQuerySnippet.append(System.lineSeparator());
--- End diff --

Do we want to append with new line or maybe space for better readability?


> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246536#comment-16246536
 ] 

ASF GitHub Bot commented on DRILL-5867:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1029#discussion_r150085841
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String 
queryId, long startTime, long
   this.time = new Date(startTime);
   this.foreman = foreman;
   this.link = generateLink(drillConfig, foreman, queryId);
-  this.query = query.substring(0,  Math.min(query.length(), 150));
+  this.query = extractQuerySnippet(query);
   this.state = state;
   this.user = user;
   this.totalCost = totalCost;
   this.queueName = queueName;
 }
 
+private String extractQuerySnippet(String queryText) {
--- End diff --

1. I usually place private method int he end of of the class.
2. We can add javadoc here explaining that first we limit original query 
size and if size fits but query has too many lines we limit it as well for 
better readability on Web UI.


> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246515#comment-16246515
 ] 

ASF GitHub Bot commented on DRILL-5923:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1021
  
Well, I don't have strong preference here, we can use array, as long as 
Prasad makes it nicely documented as in your example rather then in one line.
```
String displayNames[] = {
  "First Value", // FIRST_VALUE = 0
  "Second Value", // SECOND_VALUE = 1
  ...
};
```


> State of a successfully completed query shown as "COMPLETED"
> 
>
> Key: DRILL-5923
> URL: https://issues.apache.org/jira/browse/DRILL-5923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> Drill UI currently lists a successfully completed query as "COMPLETED". 
> Successfully completed, failed and canceled queries are all grouped as 
> Completed queries. 
> It would be better to list the state of a successfully completed query as 
> "Succeeded" to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246442#comment-16246442
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150072992
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Great! This does not yet handle nested tuples or arrays; in part because 
the row set work for that is still sitting in PR #914. You can update this to 
be aware of maps and map arrays once that PR is committed.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246441#comment-16246441
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073673
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

Defined here, but not used in this file. Does not include all types that 
Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not 
allow for ranges for floats & doubles as does JUnit. (Two floats are seldom 
exactly equal.)


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246443#comment-16246443
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Aren't nested interfaces `protected` by default? Just had to change one 
from default to `public` so I could use it in another package...


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5948) The wrong number of batches is displayed

2017-11-09 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246292#comment-16246292
 ] 

Paul Rogers commented on DRILL-5948:


As it turns out, Drill will generate two batches even for a single record:

* The first batch is empty and carries just the schema. (Used for JDBC/ODBC to 
report schema up front.)
* The second batch carries the first (or only) set of records.

>From a metric perspective, we could change Drill to not count the initial, 
>empty batch.

> The wrong number of batches is displayed
> 
>
> Key: DRILL-5948
> URL: https://issues.apache.org/jira/browse/DRILL-5948
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Vlad
> Attachments: json_profile.json
>
>
> I suppose, when you execute a query with a small amount of data drill must 
> create 1 batch, but here you can see that drill created 2 batches. I think 
> it's a wrong behaviour for the drill. Full JSON file will be in the 
> attachment.
> {code:html}
> "fragmentProfile": [
> {
> "majorFragmentId": 0,
> "minorFragmentProfile": [
> {
> "state": 3,
> "minorFragmentId": 0,
> "operatorProfile": [
> {
> "inputProfile": [
> {
> "records": 1,
> "batches": 2,
> "schemas": 1
> }
> ],
> "operatorId": 2,
> "operatorType": 29,
> "setupNanos": 0,
> "processNanos": 1767363740,
> "peakLocalMemoryAllocated": 639120,
> "waitNanos": 25787
> },
> {code}
> Step to reproduce:
> # Create JSON file with 1 row
> # Execute star query whith this file, for example 
> {code:sql}
> select * from dfs.`/path/to/your/file/example.json`
> {code}
> # Go to the Profile page on the UI, and open info about your query
> # Open JSON profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246283#comment-16246283
 ] 

ASF GitHub Bot commented on DRILL-5923:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1021
  
@arina-ielchiieva, it helps to think about the source of the enum. This is 
a Protobuf enum. The ordinal values cannot change; they are a contract between 
sender and receiver. We can add new ones, or retire old ones, but otherwise the 
values are frozen in time.

The array approach captures this reality. We could document the array 
better:
```
String displayNames[] = {
  "First Value", // FIRST_VALUE = 0
  "Second Value", // SECOND_VALUE = 1
  ...
};
```
We can also do a bounds check:
```
if (enumValue.ordinal() >= displayNames.length) {
  return enumValue.toString();
else
  return displayNames[enumValue.ordinal());
}
```
But, IMHO a map seems overkill for such a simple task. Yes, it works, but 
is unnecessary. As they say, "make it as simple as possible (but no simpler)."


> State of a successfully completed query shown as "COMPLETED"
> 
>
> Key: DRILL-5923
> URL: https://issues.apache.org/jira/browse/DRILL-5923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> Drill UI currently lists a successfully completed query as "COMPLETED". 
> Successfully completed, failed and canceled queries are all grouped as 
> Completed queries. 
> It would be better to list the state of a successfully completed query as 
> "Succeeded" to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (DRILL-5259) Allow listing a user-defined number of profiles

2017-11-09 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5259.
---

Verified and committed to Apache master

> Allow listing a user-defined number of profiles 
> 
>
> Key: DRILL-5259
> URL: https://issues.apache.org/jira/browse/DRILL-5259
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Trivial
> Fix For: 1.10.0, 1.12.0
>
>
> Currently, the web UI only lists the last 100 profiles. 
> This count is currently hard coded. The proposed change would be to create an 
> option in drill-override.conf to provide a flexible default value, and also 
> an option within the UI (via optional parameter in the path). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (DRILL-5802) Provide a sortable table for tables within a query profile

2017-11-09 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5802.
---

Verified and Committed into Master on 2 Oct, 2017.

> Provide a sortable table for tables within a query profile
> --
>
> Key: DRILL-5802
> URL: https://issues.apache.org/jira/browse/DRILL-5802
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4286) Have an ability to put server in quiescent mode of operation

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246219#comment-16246219
 ] 

ASF GitHub Bot commented on DRILL-4286:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150047464
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -348,6 +354,21 @@ public void run() {
  */
   }
 
+  /*
+Check if the foreman is ONLINE. If not dont accept any new queries.
+   */
+  public void checkForemanState() throws ForemanException{
+DrillbitEndpoint foreman = drillbitContext.getEndpoint();
+Collection dbs = drillbitContext.getAvailableBits();
--- End diff --

I was thinking of encapsulating code from lines 360 to 367 into a boolean 
isOnline(), since all the values in that code are derived from the current 
DrillbitContext.  Then your code would be simplified to 
`
public void checkForemanState() throws ForemanException{
  if (!drillbitContext.isOnline()) {
  throw new ForemanException("Query submission failed since Foreman is 
shutting down.");
  }
}
`


> Have an ability to put server in quiescent mode of operation
> 
>
> Key: DRILL-4286
> URL: https://issues.apache.org/jira/browse/DRILL-4286
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Flow
>Reporter: Victoria Markman
>Assignee: Venkata Jyothsna Donapati
>
> I think drill will benefit from mode of operation that is called "quiescent" 
> in some databases. 
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to 
> restrict access to the database server without interrupting current 
> processing. After you perform this task, the database server sets a flag that 
> prevents new sessions from gaining access to the database server. The current 
> sessions are allowed to finish processing. After you initiate the mode 
> change, it cannot be canceled. During the mode change from online to 
> quiescent, the database server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reopened DRILL-5867:
-

> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (DRILL-5803) Show the hostname for each minor fragment in operator table

2017-11-09 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5803.
---

Verified and committed into Apache master on 26 September, 2017

> Show the hostname for each minor fragment in operator table
> ---
>
> Key: DRILL-5803
> URL: https://issues.apache.org/jira/browse/DRILL-5803
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246196#comment-16246196
 ] 

ASF GitHub Bot commented on DRILL-5867:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1029
  
Snapshot when testing with search filter for FAILED queries and navigating 
to page 2 of that list. Information about the number of filtered items, etc is 
also provided.

![image](https://user-images.githubusercontent.com/4335237/32622085-a8826c56-c536-11e7-9a18-7a09142b250e.png)



> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246193#comment-16246193
 ] 

ASF GitHub Bot commented on DRILL-5867:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1029
  
Snapshot when rendering the defaults (10 per page) from a pre-loaded set of 
the latest 123 profiles

![image](https://user-images.githubusercontent.com/4335237/32621917-412a90ba-c536-11e7-9d51-83220ce072d3.png)
The query snippet is restricted to 8 lines at most and indicates if there 
is more to the query text with a trailing set of `...`


> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5867:

Attachment: FilteringFailed.png

> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png, FilteringFailed.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5867) List profiles in pages rather than a long verbose listing

2017-11-09 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5867:

Attachment: DefaultRendering.png

> List profiles in pages rather than a long verbose listing
> -
>
> Key: DRILL-5867
> URL: https://issues.apache.org/jira/browse/DRILL-5867
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.12.0
>
> Attachments: DefaultRendering.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246101#comment-16246101
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150030044
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
+0.11.0.1
+**/KafkaTestSuit.class
+  
+
+  
+
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+
+  
+${kafka.TestSuite}
+  
+  
+**/TestKafkaQueries.java
+  
+  
+
+  logback.log.dir
+  ${project.build.directory}/surefire-reports
+
+  
+
+  
+
+  
+
+  
+
+  org.apache.drill.exec
+  drill-java-exec
+  ${project.version}
+  
+
--- End diff --

Why is it necessary to exclude zookeeper? If a specific version of 
zookeeper is required, will it be better to explicitly add zookeeper to the 
dependency management?


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246099#comment-16246099
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150028303
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
+0.11.0.1
+**/KafkaTestSuit.class
--- End diff --

What is the reason to define `kafka.TestSuite` property?


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246100#comment-16246100
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150029170
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
+0.11.0.1
+**/KafkaTestSuit.class
+  
+
+  
+
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+
--- End diff --

It will be better to go with the default `maven-surefire-plugin` 
configuration unless there is a good justification to use custom config. Most 
of the time this can be achieved by using default test name convention.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5948) The wrong number of batches is displayed

2017-11-09 Thread Vlad (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad updated DRILL-5948:

Attachment: json_profile.json

JSON profile of the query.

> The wrong number of batches is displayed
> 
>
> Key: DRILL-5948
> URL: https://issues.apache.org/jira/browse/DRILL-5948
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Vlad
> Attachments: json_profile.json
>
>
> I suppose, when you execute a query with a small amount of data drill must 
> create 1 batch, but here you can see that drill created 2 batches. I think 
> it's a wrong behaviour for the drill. Full JSON file will be in the 
> attachment.
> {code:html}
> "fragmentProfile": [
> {
> "majorFragmentId": 0,
> "minorFragmentProfile": [
> {
> "state": 3,
> "minorFragmentId": 0,
> "operatorProfile": [
> {
> "inputProfile": [
> {
> "records": 1,
> "batches": 2,
> "schemas": 1
> }
> ],
> "operatorId": 2,
> "operatorType": 29,
> "setupNanos": 0,
> "processNanos": 1767363740,
> "peakLocalMemoryAllocated": 639120,
> "waitNanos": 25787
> },
> {code}
> Step to reproduce:
> # Create JSON file with 1 row
> # Execute star query whith this file, for example 
> {code:sql}
> select * from dfs.`/path/to/your/file/example.json`
> {code}
> # Go to the Profile page on the UI, and open info about your query
> # Open JSON profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5948) The wrong number of batches is displayed

2017-11-09 Thread Vlad (JIRA)
Vlad created DRILL-5948:
---

 Summary: The wrong number of batches is displayed
 Key: DRILL-5948
 URL: https://issues.apache.org/jira/browse/DRILL-5948
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Vlad


I suppose, when you execute a query with a small amount of data drill must 
create 1 batch, but here you can see that drill created 2 batches. I think it's 
a wrong behaviour for the drill. Full JSON file will be in the attachment.
{code:html}
"fragmentProfile": [
{
"majorFragmentId": 0,
"minorFragmentProfile": [
{
"state": 3,
"minorFragmentId": 0,
"operatorProfile": [
{
"inputProfile": [
{
"records": 1,
"batches": 2,
"schemas": 1
}
],
"operatorId": 2,
"operatorType": 29,
"setupNanos": 0,
"processNanos": 1767363740,
"peakLocalMemoryAllocated": 639120,
"waitNanos": 25787
},
{code}

Step to reproduce:
# Create JSON file with 1 row
# Execute star query whith this file, for example 
{code:sql}
select * from dfs.`/path/to/your/file/example.json`
{code}
# Go to the Profile page on the UI, and open info about your query
# Open JSON profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246003#comment-16246003
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150019576
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
--- End diff --

If the setting is necessary, it will be better to set it at the root pom.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245865#comment-16245865
 ] 

ASF GitHub Bot commented on DRILL-5923:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149997750
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+import java.util.Collections;
+import java.util.Map;
+
+import com.google.common.collect.Maps;
+
+public class ProfileUtil {
+  // Mapping query state names to display names
+  private static final Map queryStateDisplayName;
+
+  static {
+Map displayNames = Maps.newHashMap();
--- End diff --

1. Please use `Map` since you're already receiving 
`QueryState` as in parameter in method.
Besides, it would guarantee you did not make mistake writing query state 
enum names.
2. `queryStateDisplayName` -> `queryStateDisplayNames`


> State of a successfully completed query shown as "COMPLETED"
> 
>
> Key: DRILL-5923
> URL: https://issues.apache.org/jira/browse/DRILL-5923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> Drill UI currently lists a successfully completed query as "COMPLETED". 
> Successfully completed, failed and canceled queries are all grouped as 
> Completed queries. 
> It would be better to list the state of a successfully completed query as 
> "Succeeded" to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245883#comment-16245883
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user akumarb2010 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150002516
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/DrillKafkaConfig.java
 ---
@@ -0,0 +1,31 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+public class DrillKafkaConfig {
+
+  /**
+   * Timeout for fetching messages from Kafka
--- End diff --

Thanks Paul, this is very good point and it perfectly make sense to add 
them as Drill session options instead of Drill config properties. We are 
working on these changes.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245866#comment-16245866
 ] 

ASF GitHub Bot commented on DRILL-5923:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149998367
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+import java.util.Collections;
+import java.util.Map;
+
+import com.google.common.collect.Maps;
+
+public class ProfileUtil {
+  // Mapping query state names to display names
+  private static final Map queryStateDisplayName;
+
+  static {
+Map displayNames = Maps.newHashMap();
+displayNames.put("STARTING", "Starting");
+displayNames.put("RUNNING", "Running");
+displayNames.put("COMPLETED", "Succeeded");
+displayNames.put("CANCELED", "Canceled");
+displayNames.put("FAILED", "Failed");
+displayNames.put("CANCELLATION_REQUESTED", "Cancellation Requested");
+displayNames.put("ENQUEUED", "Enqueued");
+queryStateDisplayName = Collections.unmodifiableMap(displayNames);
+  }
+
+
+  /**
+   * Utility to return display name for query state
+   * @param queryState
+   * @return display string for query state
+   */
+  public final static String getQueryStateDisplayName(QueryState 
queryState) {
+String state = queryState.name();
+if (queryStateDisplayName.containsKey(state)) {
--- End diff --

This would be more optimal:
```
String state = queryStateDisplayNames.get(queryState);
if (state == null) {
   state = "Unknown State"
}
return state;
```


> State of a successfully completed query shown as "COMPLETED"
> 
>
> Key: DRILL-5923
> URL: https://issues.apache.org/jira/browse/DRILL-5923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> Drill UI currently lists a successfully completed query as "COMPLETED". 
> Successfully completed, failed and canceled queries are all grouped as 
> Completed queries. 
> It would be better to list the state of a successfully completed query as 
> "Succeeded" to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5717) change some date time unit cases with specific timezone or Local

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5717:
---

Assignee: Arina Ielchiieva

> change some date time unit cases with specific timezone or Local
> 
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5921) Counters metrics should be listed in table

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5921:

Labels: ready-to-commit  (was: )

> Counters metrics should be listed in table
> --
>
> Key: DRILL-5921
> URL: https://issues.apache.org/jira/browse/DRILL-5921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Counter metrics are currently displayed as json string in the Drill UI. They 
> should be listed in a table similar to other metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5717) change some date time unit cases with specific timezone or Local

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5717:
---

Assignee: (was: Arina Ielchiieva)

> change some date time unit cases with specific timezone or Local
> 
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>  Labels: ready-to-commit
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5717) change some date time unit cases with specific timezone or Local

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5717:

Labels: ready-to-commit  (was: )

> change some date time unit cases with specific timezone or Local
> 
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245835#comment-16245835
 ] 

ASF GitHub Bot commented on DRILL-5921:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1020
  
+1, LGTM.


> Counters metrics should be listed in table
> --
>
> Key: DRILL-5921
> URL: https://issues.apache.org/jira/browse/DRILL-5921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Counter metrics are currently displayed as json string in the Drill UI. They 
> should be listed in a table similar to other metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245834#comment-16245834
 ] 

ASF GitHub Bot commented on DRILL-5921:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1020#discussion_r149994673
  
--- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl ---
@@ -138,21 +154,14 @@
   });
 };
 
-function updateOthers(metrics) {
-  $.each(["counters", "meters"], function(i, key) {
-if(! $.isEmptyObject(metrics[key])) {
-  $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2));
-}
-  });
-};
-
 var update = function() {
   $.get("/status/metrics", function(metrics) {
 updateGauges(metrics.gauges);
 updateBars(metrics.gauges);
 if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, 
"timers");
 if(! $.isEmptyObject(metrics.histograms)) 
createTable(metrics.histograms, "histograms");
-updateOthers(metrics);
+if(! $.isEmptyObject(metrics.counters)) 
createCountersTable(metrics.counters);
+if(! $.isEmptyObject(metrics.meters)) 
$("#metersVal").html(JSON.stringify(metrics.meters, null, 2));
--- End diff --

Well, sounds good then, thanks for making the changes.


> Counters metrics should be listed in table
> --
>
> Key: DRILL-5921
> URL: https://issues.apache.org/jira/browse/DRILL-5921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Counter metrics are currently displayed as json string in the Drill UI. They 
> should be listed in a table similar to other metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) change some date time unit cases with specific timezone or Local

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245836#comment-16245836
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/904
  
@weijietong, thanks for the pull request, +1


> change some date time unit cases with specific timezone or Local
> 
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5771) Fix serDe errors for format plugins

2017-11-09 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5771:
-
Reviewer: Timothy Farkas

> Fix serDe errors for format plugins
> ---
>
> Key: DRILL-5771
> URL: https://issues.apache.org/jira/browse/DRILL-5771
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be 
> successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and 
> then submit it back to Drill.
> One example of found errors is described in the first comment. Another 
> example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded 
> each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can 
> not find one we try to [create 
> one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists 
> based on 
> configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node 
> based on format configuration.
> Then we have sent major fragment to the different node where we used this 
> format configuration we could not get format plugin based on it and 
> deserialization has failed.
> To fix this problem we need to create format plugin during query 
> deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and 
> equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for 
> configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin 
> priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default 
> parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration 
> (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all 
> nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format 
> plugin with configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245783#comment-16245783
 ] 

ASF GitHub Bot commented on DRILL-5771:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1014
  
@ilooner can you please review this?


> Fix serDe errors for format plugins
> ---
>
> Key: DRILL-5771
> URL: https://issues.apache.org/jira/browse/DRILL-5771
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be 
> successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and 
> then submit it back to Drill.
> One example of found errors is described in the first comment. Another 
> example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded 
> each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can 
> not find one we try to [create 
> one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists 
> based on 
> configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node 
> based on format configuration.
> Then we have sent major fragment to the different node where we used this 
> format configuration we could not get format plugin based on it and 
> deserialization has failed.
> To fix this problem we need to create format plugin during query 
> deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and 
> equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for 
> configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin 
> priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default 
> parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration 
> (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all 
> nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format 
> plugin with configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits

2017-11-09 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5941:
-
Reviewer: Padma Penumarthy

> Skip header / footer logic works incorrectly for Hive tables when file has 
> several input splits
> ---
>
> Key: DRILL-5941
> URL: https://issues.apache.org/jira/browse/DRILL-5941
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> *To reproduce*
> 1. Create csv file with two columns (key, value) for 329 rows, where 
> first row is a header.
> The data file has size of should be greater than chunk size of 256 MB. Copy 
> file to the distributed file system.
> 2. Create table in Hive:
> {noformat}
> CREATE EXTERNAL TABLE `h_table`(
>   `key` bigint,
>   `value` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY ','
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'maprfs:/tmp/h_table'
> TBLPROPERTIES (
>  'skip.header.line.count'='1');
> {noformat}
> 3. Execute query {{select * from hive.h_table}} in Drill (query data using 
> Hive plugin). The result will return less rows then expected. Expected result 
> is 328 (total count minus one row as header).
> *The root cause*
> Since file is greater than default chunk size, it's split into several 
> fragments, known as input splits. For example:
> {noformat}
> maprfs:/tmp/h_table/h_table.csv:0+268435456
> maprfs:/tmp/h_table/h_table.csv:268435457+492782112
> {noformat}
> TextHiveReader is responsible for handling skip header and / or footer logic.
> Currently Drill creates reader [for each input 
> split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84]
>  and skip header and /or footer logic is applied for each input splits, 
> though ideally the above mentioned input splits should have been read by one 
> reader, so skip / header footer logic was applied correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245778#comment-16245778
 ] 

ASF GitHub Bot commented on DRILL-5941:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1030
  
@ppadma can you review this?


> Skip header / footer logic works incorrectly for Hive tables when file has 
> several input splits
> ---
>
> Key: DRILL-5941
> URL: https://issues.apache.org/jira/browse/DRILL-5941
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> *To reproduce*
> 1. Create csv file with two columns (key, value) for 329 rows, where 
> first row is a header.
> The data file has size of should be greater than chunk size of 256 MB. Copy 
> file to the distributed file system.
> 2. Create table in Hive:
> {noformat}
> CREATE EXTERNAL TABLE `h_table`(
>   `key` bigint,
>   `value` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY ','
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'maprfs:/tmp/h_table'
> TBLPROPERTIES (
>  'skip.header.line.count'='1');
> {noformat}
> 3. Execute query {{select * from hive.h_table}} in Drill (query data using 
> Hive plugin). The result will return less rows then expected. Expected result 
> is 328 (total count minus one row as header).
> *The root cause*
> Since file is greater than default chunk size, it's split into several 
> fragments, known as input splits. For example:
> {noformat}
> maprfs:/tmp/h_table/h_table.csv:0+268435456
> maprfs:/tmp/h_table/h_table.csv:268435457+492782112
> {noformat}
> TextHiveReader is responsible for handling skip header and / or footer logic.
> Currently Drill creates reader [for each input 
> split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84]
>  and skip header and /or footer logic is applied for each input splits, 
> though ideally the above mentioned input splits should have been read by one 
> reader, so skip / header footer logic was applied correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5923) State of a successfully completed query shown as "COMPLETED"

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245731#comment-16245731
 ] 

ASF GitHub Bot commented on DRILL-5923:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149974787
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/QueryStateDisplayName.java
 ---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+public class QueryStateDisplayName {
+  // Values should correspond to the QueryState enum in UserBitShared.proto
--- End diff --

@arina-ielchiieva 
yes, map will definitely make it to easier to visualize the mapping. 
Made the changes


> State of a successfully completed query shown as "COMPLETED"
> 
>
> Key: DRILL-5923
> URL: https://issues.apache.org/jira/browse/DRILL-5923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> Drill UI currently lists a successfully completed query as "COMPLETED". 
> Successfully completed, failed and canceled queries are all grouped as 
> Completed queries. 
> It would be better to list the state of a successfully completed query as 
> "Succeeded" to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245717#comment-16245717
 ] 

ASF GitHub Bot commented on DRILL-5921:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1020#discussion_r149972346
  
--- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl ---
@@ -138,21 +154,14 @@
   });
 };
 
-function updateOthers(metrics) {
-  $.each(["counters", "meters"], function(i, key) {
-if(! $.isEmptyObject(metrics[key])) {
-  $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2));
-}
-  });
-};
-
 var update = function() {
   $.get("/status/metrics", function(metrics) {
 updateGauges(metrics.gauges);
 updateBars(metrics.gauges);
 if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, 
"timers");
 if(! $.isEmptyObject(metrics.histograms)) 
createTable(metrics.histograms, "histograms");
-updateOthers(metrics);
+if(! $.isEmptyObject(metrics.counters)) 
createCountersTable(metrics.counters);
+if(! $.isEmptyObject(metrics.meters)) 
$("#metersVal").html(JSON.stringify(metrics.meters, null, 2));
--- End diff --

@arina-ielchiieva,
I have considered reusing existing methods before deciding to have a 
separate method.
With the above suggestion, the table will now look as below-

drill.connections.rpc.control.encrypted|  {count: 0}

'|' here is column delimiter. Do we want to display only the number in the 
second column or a key/value pair? 
I just wanted it to be consistent with the other metrics tables. (so I 
print value.count) 

Removed meters section. 


> Counters metrics should be listed in table
> --
>
> Key: DRILL-5921
> URL: https://issues.apache.org/jira/browse/DRILL-5921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
> Fix For: 1.12.0
>
>
> Counter metrics are currently displayed as json string in the Drill UI. They 
> should be listed in a table similar to other metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5941:

Fix Version/s: (was: 1.12.0)
   Future

> Skip header / footer logic works incorrectly for Hive tables when file has 
> several input splits
> ---
>
> Key: DRILL-5941
> URL: https://issues.apache.org/jira/browse/DRILL-5941
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> *To reproduce*
> 1. Create csv file with two columns (key, value) for 329 rows, where 
> first row is a header.
> The data file has size of should be greater than chunk size of 256 MB. Copy 
> file to the distributed file system.
> 2. Create table in Hive:
> {noformat}
> CREATE EXTERNAL TABLE `h_table`(
>   `key` bigint,
>   `value` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY ','
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'maprfs:/tmp/h_table'
> TBLPROPERTIES (
>  'skip.header.line.count'='1');
> {noformat}
> 3. Execute query {{select * from hive.h_table}} in Drill (query data using 
> Hive plugin). The result will return less rows then expected. Expected result 
> is 328 (total count minus one row as header).
> *The root cause*
> Since file is greater than default chunk size, it's split into several 
> fragments, known as input splits. For example:
> {noformat}
> maprfs:/tmp/h_table/h_table.csv:0+268435456
> maprfs:/tmp/h_table/h_table.csv:268435457+492782112
> {noformat}
> TextHiveReader is responsible for handling skip header and / or footer logic.
> Currently Drill creates reader [for each input 
> split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84]
>  and skip header and /or footer logic is applied for each input splits, 
> though ideally the above mentioned input splits should have been read by one 
> reader, so skip / header footer logic was applied correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245686#comment-16245686
 ] 

ASF GitHub Bot commented on DRILL-5941:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1030

DRILL-5941: Skip header / footer improvements for Hive storage plugin

Overview:
1. When table has header / footer process input splits fo the same file in 
one reader (bug fix for DRILL-5941).
2. Apply skip header logic during reader initialization only once to avoid 
checks during reading the data (DRILL-5106).
3. Apply skip footer logic only when footer is more then 0, otherwise 
default processing will be done without buffering data in queue (DRIL-5106).

Code changes:
1. AbstractReadersInitializer was introduced to factor out common logic 
during readers intialization.
It will have three implementations:
a. Default (each input split gets its own reader);
b. Empty (for empty tables);
c. InputSplitGroups (applied when table has header / footer and input 
splits of the same file should be processed together).

2. AbstractRecordsInspector was introduced to improve performance when 
table has footer is less or equals to 0.
It will have two implementations:
a. Default (records will be processed one by one without buffering);
b. SkipFooter (queue will be used to buffer N records that should be 
skipped in the end of file processing).

3. Allow HiveAbstractReader to have multiple input splits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5941

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1030






> Skip header / footer logic works incorrectly for Hive tables when file has 
> several input splits
> ---
>
> Key: DRILL-5941
> URL: https://issues.apache.org/jira/browse/DRILL-5941
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.12.0
>
>
> *To reproduce*
> 1. Create csv file with two columns (key, value) for 329 rows, where 
> first row is a header.
> The data file has size of should be greater than chunk size of 256 MB. Copy 
> file to the distributed file system.
> 2. Create table in Hive:
> {noformat}
> CREATE EXTERNAL TABLE `h_table`(
>   `key` bigint,
>   `value` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY ','
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'maprfs:/tmp/h_table'
> TBLPROPERTIES (
>  'skip.header.line.count'='1');
> {noformat}
> 3. Execute query {{select * from hive.h_table}} in Drill (query data using 
> Hive plugin). The result will return less rows then expected. Expected result 
> is 328 (total count minus one row as header).
> *The root cause*
> Since file is greater than default chunk size, it's split into several 
> fragments, known as input splits. For example:
> {noformat}
> maprfs:/tmp/h_table/h_table.csv:0+268435456
> maprfs:/tmp/h_table/h_table.csv:268435457+492782112
> {noformat}
> TextHiveReader is responsible for handling skip header and / or footer logic.
> Currently Drill creates reader [for each input 
> split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84]
>  and skip header and /or footer logic is applied for each input splits, 
> though ideally the above mentioned input splits should have been read by one 
> reader, so skip / header footer logic was applied correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5919:

Description: 
Add session options to allow drill working with non standard json strings 
number literals like: NaN, Infinity, -Infinity. By default these options will 
be switched off, the user will be able to toggle them during working session.

*For documentation*
1. Added two session options {{store.json.reader.non_numeric_numbers}} and 
{{store.json.reader.non_numeric_numbers}} that allow to read/write NaN and 
Infinity as numbers. By default these options are set to false.

2. Extended signature of {{convert_toJSON}} and {{convert_fromJSON}} functions 
by adding second optional parameter that enables read/write NaN and Infinity.
For example:
{noformat}
select convert_fromJSON('{"key": NaN}') from (values(1)); will result with 
JsonParseException, but
select convert_fromJSON('{"key": NaN}', true) from (values(1)); will parse NaN 
as a number.
{noformat}

  was:Add session options to allow drill working with non standard json strings 
number literals like: NaN, Infinity, -Infinity. By default these options will 
be switched off, the user will be able to toggle them during working session.


> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting, ready-to-commit
> Fix For: Future
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.
> *For documentation*
> 1. Added two session options {{store.json.reader.non_numeric_numbers}} and 
> {{store.json.reader.non_numeric_numbers}} that allow to read/write NaN and 
> Infinity as numbers. By default these options are set to false.
> 2. Extended signature of {{convert_toJSON}} and {{convert_fromJSON}} 
> functions by adding second optional parameter that enables read/write NaN and 
> Infinity.
> For example:
> {noformat}
> select convert_fromJSON('{"key": NaN}') from (values(1)); will result with 
> JsonParseException, but
> select convert_fromJSON('{"key": NaN}', true) from (values(1)); will parse 
> NaN as a number.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5919:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting, ready-to-commit
> Fix For: Future
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245650#comment-16245650
 ] 

ASF GitHub Bot commented on DRILL-5919:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1026
  
Thanks, +1, LGTM.


> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting
> Fix For: Future
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) change some date time unit cases with specific timezone or Local

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245620#comment-16245620
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/904
  
done


> change some date time unit cases with specific timezone or Local
> 
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite master branch

2017-11-09 Thread Roman Kulyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245614#comment-16245614
 ] 

Roman Kulyk commented on DRILL-3993:


There are 9 error left in the java-exec test suite:
{noformat}
TestUnionDistinct.testDiffDataTypesAndModes:288->BaseTestQuery.testRunAndReturn:360
 » Rpc
TestUnionAll.testDiffDataTypesAndModes:272->BaseTestQuery.testRunAndReturn:360 
» Rpc
TestFunctionsWithTypeExpoQueries.testEqualBetweenIntervalAndTimestampDiff:403->BaseTestQuery.testRunAndReturn:360
 » 
TestExampleQueries.testDRILL_3004:1036->BaseTestQuery.testRunAndReturn:360 » Rpc
TestExampleQueries.testFilterInSubqueryAndOutside » UserRemote DATA_READ 
ERROR...
TestNestedLoopJoin.testNLJWithEmptyBatch:229->BaseTestQuery.testRunAndReturn:360
 » Rpc
TestSqlBracketlessSyntax.checkComplexExpressionParsing:54 » NoClassDefFound 
co...
TestDateTruncFunctions.dateTruncOnIntervalDay:301->BaseTestQuery.testRunAndReturn:360
 » Rpc
TestUtf8SupportInQueryString.testDisableUtf8SupportInQueryString »  
Unexpected...
{noformat}

> Rebase Drill on Calcite master branch
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>Assignee: Roman Kulyk
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245416#comment-16245416
 ] 

ASF GitHub Bot commented on DRILL-5919:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1026#discussion_r149903182
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java
 ---
@@ -0,0 +1,167 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.drill.exec.vector.complex.writer;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.BaseTestQuery;
+import org.apache.drill.common.exceptions.UserRemoteException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.RecordBatchLoader;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.rpc.user.QueryDataBatch;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.List;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.*;
+
+public class TestJsonNonNumerics extends BaseTestQuery {
+
+  @Test
+  public void testNonNumericSelect() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
+String json = "{\"nan\":NaN, \"inf\":Infinity}";
+String query = String.format("select * from 
dfs.`%s`",file.getAbsolutePath());
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `store.json.reader.non_numeric_numbers` = 
true");
+  testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("nan", "inf")
+.baselineValues(Double.NaN, Double.POSITIVE_INFINITY)
+.build()
+.run();
+} finally {
+  test("alter session reset `store.json.reader.non_numeric_numbers`");
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test(expected = UserRemoteException.class)
+  public void testNonNumericFailure() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
+test("alter session set `store.json.reader.non_numeric_numbers` = 
false");
+String json = "{\"nan\":NaN, \"inf\":Infinity}";
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("select * from dfs.`%s`;", file.getAbsolutePath());
+} catch (UserRemoteException e) {
+  assertThat(e.getMessage(), containsString("Error parsing JSON"));
+  throw e;
+} finally {
+  test("alter session reset `store.json.reader.non_numeric_numbers`");
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testCreateTableNonNumerics() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
+String json = "{\"nan\":NaN, \"inf\":Infinity}";
+String tableName = "ctas_test";
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `store.json.reader.non_numeric_numbers` = 
true");
+  test("alter session set `store.json.writer.non_numeric_numbers` = 
true");
+  test("alter session set `store.format`='json'");
+  test("create table dfs_test.tmp.`%s` as select * from dfs.`%s`;", 
tableName, file.getAbsolutePath());
+
+  // ensuring that `NaN` and `Infinity` tokens ARE NOT enclosed with 
double quotes
+  File resultFile = new File(new 
File(getDfsTestTmpSchemaLocation(),tableName),"0_0_0.json");
+  String resultJson = FileUtils.readFileToString(resultFile);
+  int nanIndex = resultJson.indexOf("NaN");
+  assertFalse("`NaN` must not be enclosed with \"\" ", 
resultJson.charAt(nanIndex - 1) == '"');
+  assertFalse("`NaN` must not be enclosed with \"\" ", 

[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245417#comment-16245417
 ] 

ASF GitHub Bot commented on DRILL-5919:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1026#discussion_r149903705
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java
 ---
@@ -0,0 +1,167 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.drill.exec.vector.complex.writer;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.BaseTestQuery;
+import org.apache.drill.common.exceptions.UserRemoteException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.RecordBatchLoader;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.rpc.user.QueryDataBatch;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.List;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.*;
+
+public class TestJsonNonNumerics extends BaseTestQuery {
+
+  @Test
+  public void testNonNumericSelect() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
--- End diff --

It's better to pass dir name as well, rather than emptiness. Ex: 
`getTempDir("test_nan")`


> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting
> Fix For: Future
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5919:

Labels: doc-impacting  (was: )

> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting
> Fix For: Future
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5919:

Reviewer: Arina Ielchiieva

> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting
> Fix For: Future
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5863) Sortable table incorrectly sorts minor fragments and time elements lexically instead of sorting by implicit value

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5863:
---

Assignee: Kunal Khatua  (was: Arina Ielchiieva)

> Sortable table incorrectly sorts minor fragments and time elements lexically 
> instead of sorting by implicit value
> -
>
> Key: DRILL-5863
> URL: https://issues.apache.org/jira/browse/DRILL-5863
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> The fix for this is to use dataTable library's {{data-order}} attribute for 
> the data elements that need to sort by an implicit value.
> ||Old order of Minor Fragment||New order of Minor Fragment||
> |...|...|
> |01-09-01  | 01-09-01|
> |01-10-01  | 01-10-01|
> |01-100-01 | 01-11-01|
> |01-101-01 | 01-12-01|
> |... | ... |
> ||Old order of Duration||New order of Duration|||
> |...|...|
> |1m15s  | 55.03s|
> |55s  | 1m15s|
> |...|...|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (DRILL-5863) Sortable table incorrectly sorts minor fragments and time elements lexically instead of sorting by implicit value

2017-11-09 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reopened DRILL-5863:
-
  Assignee: Arina Ielchiieva  (was: Kunal Khatua)

> Sortable table incorrectly sorts minor fragments and time elements lexically 
> instead of sorting by implicit value
> -
>
> Key: DRILL-5863
> URL: https://issues.apache.org/jira/browse/DRILL-5863
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> The fix for this is to use dataTable library's {{data-order}} attribute for 
> the data elements that need to sort by an implicit value.
> ||Old order of Minor Fragment||New order of Minor Fragment||
> |...|...|
> |01-09-01  | 01-09-01|
> |01-10-01  | 01-10-01|
> |01-100-01 | 01-11-01|
> |01-101-01 | 01-12-01|
> |... | ... |
> ||Old order of Duration||New order of Duration|||
> |...|...|
> |1m15s  | 55.03s|
> |55s  | 1m15s|
> |...|...|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245378#comment-16245378
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r149893459
  
--- Diff: 
contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/cluster/EmbeddedZKQuorum.java
 ---
@@ -0,0 +1,83 @@
+/**
--- End diff --

Apache header should be in a form of comment, not Java doc. Please update 
here and in other newly added files.
Hope, somebody will add to check-style so we won't have to remind about it 
all the time.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245379#comment-16245379
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r149893582
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ---
@@ -343,4 +343,4 @@ public void close() {
   }
 }
   }
-}
+}
--- End diff --

Please revert changes in this file.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4779) Kafka storage plugin support

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245377#comment-16245377
 ] 

ASF GitHub Bot commented on DRILL-4779:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r149893057
  
--- Diff: contrib/storage-kafka/src/test/resources/logback-test.xml ---
@@ -0,0 +1,51 @@
+
--- End diff --

Please remove. Now we have common logging configuration for all in 
drill-common module.


> Kafka storage plugin support
> 
>
> Key: DRILL-4779
> URL: https://issues.apache.org/jira/browse/DRILL-4779
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.11.0
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5921) Counters metrics should be listed in table

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245358#comment-16245358
 ] 

ASF GitHub Bot commented on DRILL-5921:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1020#discussion_r149891566
  
--- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl ---
@@ -138,21 +154,14 @@
   });
 };
 
-function updateOthers(metrics) {
-  $.each(["counters", "meters"], function(i, key) {
-if(! $.isEmptyObject(metrics[key])) {
-  $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2));
-}
-  });
-};
-
 var update = function() {
   $.get("/status/metrics", function(metrics) {
 updateGauges(metrics.gauges);
 updateBars(metrics.gauges);
 if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, 
"timers");
 if(! $.isEmptyObject(metrics.histograms)) 
createTable(metrics.histograms, "histograms");
-updateOthers(metrics);
+if(! $.isEmptyObject(metrics.counters)) 
createCountersTable(metrics.counters);
+if(! $.isEmptyObject(metrics.meters)) 
$("#metersVal").html(JSON.stringify(metrics.meters, null, 2));
--- End diff --

@prasadns14
1. Thanks for adding the screenshots.
2. Most of the code in `createTable` and `createCountersTable` coincide.  I 
suggested you make one function. For example, with three parameters, 
`createTable(metric, name, addReportingClass)`. When you don't need to add 
reporting class you'll call this method with false. Our goal here is generify 
existing methods rather then adding new specific with almost the same content.
3. If we don't have any meters, let's remove them.



> Counters metrics should be listed in table
> --
>
> Key: DRILL-5921
> URL: https://issues.apache.org/jira/browse/DRILL-5921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
> Fix For: 1.12.0
>
>
> Counter metrics are currently displayed as json string in the Drill UI. They 
> should be listed in a table similar to other metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >