[jira] [Commented] (DRILL-7016) Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped

2019-01-29 Thread Sorabh Hamirwasia (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755771#comment-16755771
 ] 

Sorabh Hamirwasia commented on DRILL-7016:
--

Looked more into the issue and it's *not* because of filter on top of Runtime 
Filter operator. While generating RuntimeFilter there is a bug in which left 
and right side fields in join condition is decided to be used in BloomFilter. 
It doesn't uses the ordinals for right keys and instead directly get the field 
name starting with index 0 for each left keys. Hence with changing order of 
filter and join condition the ordinals of right fields changes and bloom filter 
is generated for wrong right side field. For example: In case 1 below 
bloomFilter is generated on right side column c_mktsegment instead of 
c_custkey. Whereas in case 2 bloomFilter is generated on right side column 
c_custkey.

 
*Case 1:*
{code:java}
01-04 HashJoin(condition=[=($2, $0)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(ANY o_custkey, ANY c_mktsegment, ANY c_custkey): rowcount 
= 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9 
network, 396.05 memory}, id = 65202{code}
1 row selected (3.654 seconds)
 
{code:java}
0: jdbc:drill:drillbits=10.10.100.188> select count(*) 
. . . . . . . . . . . . . . semicolon> from 
. . . . . . . . . . . . . . semicolon> customer c, 
. . . . . . . . . . . . . . semicolon> orders o 
. . . . . . . . . . . . . . semicolon> where c.c_mktsegment = 'HOUSEHOLD'
. . . . . . . . . . . . . . semicolon> and c.c_custkey = o.o_custkey;{code}
 
+-+
| EXPR$0  |
+-+
| 19826   |
+-+
 
*Case 2:*
{code:java}
01-04 HashJoin(condition=[=($1, $0)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(ANY o_custkey, ANY c_custkey, ANY c_mktsegment): rowcount 
= 1.5E7, cumulative cost = {6.3675E7 rows, 2.38725E8 cpu, 1.8E7 io, 3.87072E9 
network, 396.05 memory}, id = 66134{code}
1 row selected (1.328 seconds)
 
{code:java}
0: jdbc:drill:drillbits=10.10.100.188> select count(*)
. . . . . . . . . . . . . . semicolon> from 
. . . . . . . . . . . . . . semicolon> customer c,
. . . . . . . . . . . . . . semicolon> orders o
. . . . . . . . . . . . . . semicolon> where c.c_custkey = o.o_custkey and 
. . . . . . . . . . . . . . semicolon> c.c_mktsegment = 'HOUSEHOLD' 
. . . . . . . . . . . . . . semicolon> ;{code}
 
+--+
|  EXPR$0  |
+--+
| 2990828  |
+--+

> Wrong query result with RuntimeFilter enabled when order of join and filter 
> condition is swapped
> 
>
> Key: DRILL-7016
> URL: https://issues.apache.org/jira/browse/DRILL-7016
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.16.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.16.0
>
>
> Below 2 queries generate different results:
> *Query1: Result: 19826* 
> {code:java}
> select count(*)
> from 
>   customer c, 
>   orders o
> where 
>   c.c_mktsegment = 'HOUSEHOLD' 
>   and c.c_custkey = o.o_custkey 
> {code}
> *Query2: Result: 2990828* 
> {code:java}
> select count(*)
> from 
>   customer c, 
>   orders o
> where 
>   c.c_custkey = o.o_custkey and
>   c.c_mktsegment = 'HOUSEHOLD' 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7016) Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped

2019-01-29 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7016:
-
Reviewer: weijie.tong

> Wrong query result with RuntimeFilter enabled when order of join and filter 
> condition is swapped
> 
>
> Key: DRILL-7016
> URL: https://issues.apache.org/jira/browse/DRILL-7016
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.16.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.16.0
>
>
> Below 2 queries generate different results:
> *Query1: Result: 19826* 
> {code:java}
> select count(*)
> from 
>   customer c, 
>   orders o
> where 
>   c.c_mktsegment = 'HOUSEHOLD' 
>   and c.c_custkey = o.o_custkey 
> {code}
> *Query2: Result: 2990828* 
> {code:java}
> select count(*)
> from 
>   customer c, 
>   orders o
> where 
>   c.c_custkey = o.o_custkey and
>   c.c_mktsegment = 'HOUSEHOLD' 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7016) Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped

2019-01-29 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-7016:


 Summary: Wrong query result with RuntimeFilter enabled when order 
of join and filter condition is swapped
 Key: DRILL-7016
 URL: https://issues.apache.org/jira/browse/DRILL-7016
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.16.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia
 Fix For: 1.16.0


Below 2 queries generate different results:

*Query1: Result: 19826* 
{code:java}
select count(*)
from 
  customer c, 
  orders o
where 
  c.c_mktsegment = 'HOUSEHOLD' 
  and c.c_custkey = o.o_custkey 
{code}


*Query2: Result: 2990828* 
{code:java}
select count(*)
from 
  customer c, 
  orders o
where 
  c.c_custkey = o.o_custkey and
  c.c_mktsegment = 'HOUSEHOLD' 
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7014) Format plugin for LTSV files

2019-01-29 Thread Charles Givre (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755547#comment-16755547
 ] 

Charles Givre commented on DRILL-7014:
--

[~priteshm] I can review this.  The initial PR didn't pass Travis CI, but I'll 
post comments anyway in the next day or so. 

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
> Fix For: 1.16.0
>
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7014) Format plugin for LTSV files

2019-01-29 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7014:
-
Fix Version/s: 1.16.0

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
> Fix For: 1.16.0
>
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7014) Format plugin for LTSV files

2019-01-29 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7014:
-
Reviewer: Charles Givre

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
> Fix For: 1.16.0
>
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7014) Format plugin for LTSV files

2019-01-29 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-7014:


Assignee: Takako Shimamoto

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7014) Format plugin for LTSV files

2019-01-29 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755545#comment-16755545
 ] 

Pritesh Maker commented on DRILL-7014:
--

[~cgivre] would you be able to review this contribution?

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6991) Kerberos ticket is being dumped in the log if log level is "debug" for stdout

2019-01-29 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755543#comment-16755543
 ] 

Pritesh Maker commented on DRILL-6991:
--

[~shamirwasia] do you recommend we close this issue?

> Kerberos ticket is being dumped in the log if log level is "debug" for stdout 
> --
>
> Key: DRILL-6991
> URL: https://issues.apache.org/jira/browse/DRILL-6991
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Priority: Major
>
> *Prerequisites:*
>  # Drill is installed on cluster with Kerberos security
>  # Into conf/logback.xml, set the following log level:
> {code:xml}
>   
> 
> 
>   
> {code}
> *Steps:*
> # Start Drill
> # Connect using sqlline using the following string:
> {noformat}
> bin/sqlline -u "jdbc:drill:zk=;principal="
> {noformat}
> *Expected result:*
> No sensitive information should be displayed
> *Actual result:*
> Kerberos  ticket and session key are being dumped into console output:
> {noformat}
> 14:35:38.806 [TGT Renewer for mapr/node1.cluster.com@NODE1] DEBUG 
> o.a.h.security.UserGroupInformation - Found tgt Ticket (hex) = 
> : 61 82 01 3D 30 82 01 39   A0 03 02 01 05 A1 07 1B  a..=0..9
> 0010: 05 4E 4F 44 45 31 A2 1A   30 18 A0 03 02 01 02 A1  .NODE1..0...
> 0020: 11 30 0F 1B 06 6B 72 62   74 67 74 1B 05 4E 4F 44  .0...krbtgt..NOD
> 0030: 45 31 A3 82 01 0B 30 82   01 07 A0 03 02 01 12 A1  E10.
> 0040: 03 02 01 01 A2 81 FA 04   81 F7 03 8D A9 FA 7D 89  
> 0050: 1B DF 37 B7 4D E6 6C 99   3E 8F FA 48 D9 9A 79 F3  ..7.M.l.>..H..y.
> 0060: 92 34 7F BF 67 1E 77 4A   2F C9 AF 82 93 4E 46 1D  .4..g.wJ/NF.
> 0070: 41 74 B0 AF 41 A8 8B 02   71 83 CC 14 51 72 60 EE  At..A...q...Qr`.
> 0080: 29 67 14 F0 A6 33 63 07   41 AA 8D DC 7B 5B 41 F3  )g...3c.A[A.
> 0090: 83 48 8B 2A 0B 4D 6D 57   9A 6E CF 6B DC 0B C0 D1  .H.*.MmW.n.k
> 00A0: 83 BB 27 40 88 7E 9F 2B   D1 FD A8 6A E1 BF F6 CC  ..'@...+...j
> 00B0: 0E 0C FB 93 5D 69 9A 8B   11 88 0C F2 7C E1 FD 04  ]i..
> 00C0: F5 AB 66 0C A4 A4 7B 30   D1 7F F1 2D D6 A1 52 D1  ..f0...-..R.
> 00D0: 79 59 F2 06 CB 65 FB 73   63 1D 5B E9 4F 28 73 EB  yY...e.sc.[.O(s.
> 00E0: 72 7F 04 46 34 56 F4 40   6C C0 2C 39 C0 5B C6 25  r..F4V.@l.,9.[.%
> 00F0: ED EF 64 07 CE ED 35 9D   D7 91 6C 8F C9 CE 16 F5  ..d...5...l.
> 0100: CA 5E 6F DE 08 D2 68 30   C7 03 97 E7 C0 FF D9 52  .^o...h0...R
> 0110: F8 1D 2F DB 63 6D 12 4A   CD 60 AD D0 BA FA 4B CF  ../.cm.J.`K.
> 0120: 2C B9 8C CA 5A E6 EC 10   5A 0A 1F 84 B0 80 BD 39  ,...Z...Z..9
> 0130: 42 2C 33 EB C0 AA 0D 44   F0 F4 E9 87 24 43 BB 9A  B,3D$C..
> 0140: 52 R
> Client Principal = mapr/node1.cluster.com@NODE1
> Server Principal = krbtgt/NODE1@NODE1
> Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
> : 50 DA D1 D7 91 D3 64 BE   45 7B D8 02 25 81 18 25  P.d.E...%..%
> 0010: DA 59 4F BA 76 67 BB 39   9C F7 17 46 A7 C5 00 E2  .YO.vg.9...F
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6855) Query from non-existent proxy user fails with "No default schema selected" when impersonation is enabled

2019-01-29 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6855:
-
Fix Version/s: 1.16.0

> Query from non-existent proxy user fails with "No default schema selected" 
> when impersonation is enabled
> 
>
> Key: DRILL-6855
> URL: https://issues.apache.org/jira/browse/DRILL-6855
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Abhishek Ravi
>Assignee: Abhishek Ravi
>Priority: Major
> Fix For: 1.16.0
>
>
> Query from a *proxy user* fails with following error when *impersonation* is 
> *enabled* but user does not exist. This behaviour was discovered when running 
> Drill on MapR.
> {noformat}
> Error: VALIDATION ERROR: Schema [[dfs]] is not valid with respect to either 
> root schema or current default schema.
> Current default schema: No default schema selected
> {noformat}
> The above error is very confusing and made it very hard to relate to proxy 
> user does not exist + impersonation issue. 
> The {{fs.access(wsPath, FsAction.READ)}} in 
> {{WorkspaceSchemaFactory.accessible fails with IOException,}} which is not 
> handled in {{accessible}} but in {{DynamicRootSchema.loadSchemaFactory}}. At 
> this point none of the schemas are registered and hence the root schema will 
> be registered as default schema. 
> The query execution continues and fails much ahead at 
> {{DrillSqlWorker.getQueryPlan}} where the {{SqlConverter.validate}} 
> eventually throws  {{SchemaUtilites.throwSchemaNotFoundException}}.
> One possible fix could be to handle {{IOException}} similar to 
> {{FileNotFoundException}} in {{WorkspaceSchemaFactory.accessible}}.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7015) Improve documentation for PARTITION BY

2019-01-29 Thread Boaz Ben-Zvi (JIRA)
Boaz Ben-Zvi created DRILL-7015:
---

 Summary: Improve documentation for PARTITION BY
 Key: DRILL-7015
 URL: https://issues.apache.org/jira/browse/DRILL-7015
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi
Assignee: Bridget Bevens
 Fix For: 1.16.0


The documentation for CREATE TABLE AS (CTAS) shows the syntax of the command, 
without the optional PARTITION BY clause. That option is only mentioned later 
under the usage notes.

*+_Suggestion_+*: Add this optional clause to the syntax (same as for CREATE 
TEMPORARY TABLE (CTTAS)). And mention that this option is only applicable when 
storing in Parquet. 

And the documentation for CREATE TEMPORARY TABLE (CTTAS), the comment says:
{panel}
An optional parameter that can *only* be used to create temporary tables with 
the Parquet data format. 
{panel}
Which can mistakenly be understood as "only for temporary tables". 
*_+Suggestion+_*: erase the "to create temporary tables" part (not needed, as 
it is implied from the context of this page).

*_+Last suggestion+_*: In the documentation for the PARTITION BY clause, can 
add an example using the implicit column "filename" to demonstrate how the 
partitioning column puts each distinct value into a separate file. For example, 
add in the "Other Examples" section :
{noformat}
0: jdbc:drill:zk=local> select distinct r_regionkey, filename from mytable1;
+--++
| r_regionkey  |filename|
+--++
| 2| 0_0_3.parquet  |
| 1| 0_0_2.parquet  |
| 0| 0_0_1.parquet  |
| 3| 0_0_4.parquet  |
| 4| 0_0_5.parquet  |
+--++
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-29 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels: ready-to-commit  (was: )

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7011) Allow hybrid model in the Row set-based scan framework

2019-01-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7011:

Description: 
As part of schema provisioning project we want to allow hybrid model for Row 
set-based scan framework, namely to allow to pass custom schema metadata which 
can be partial.

Currently schema provisioning has SchemaContainer class that contains the 
following information (can be obtained from metastore, schema file, table 
function):
1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata
2. properties represented by Map, can contain information if 
schema is strict or partial (default is partial) etc.

  was:
As part of schema provisioning project we want to allow hybrid model for Row 
set-based scan framework, namely to allow to pass custom schema metadata which 
can be partial.

Currently schema provisioning has SchemaContainer class that contains the 
following information (can be obtained from metastore, schema file, table 
function):
1. table schema represented by 
org.apache.drill.exec.record.metadata.TupleMetadata
2. table properties represented by Map, can contain information 
if schema is strict or partial (default is partial) etc.


> Allow hybrid model in the Row set-based scan framework
> --
>
> Key: DRILL-7011
> URL: https://issues.apache.org/jira/browse/DRILL-7011
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project we want to allow hybrid model for Row 
> set-based scan framework, namely to allow to pass custom schema metadata 
> which can be partial.
> Currently schema provisioning has SchemaContainer class that contains the 
> following information (can be obtained from metastore, schema file, table 
> function):
> 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata
> 2. properties represented by Map, can contain information if 
> schema is strict or partial (default is partial) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7011) Allow hybrid model in the Row set-based scan framework

2019-01-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7011:

Description: 
As part of schema provisioning project we want to allow hybrid model for Row 
set-based scan framework, namely to allow to pass custom schema metadata which 
can be partial.

Currently schema provisioning has SchemaContainer class that contains the 
following information (can be obtained from metastore, schema file, table 
function):
1. table schema represented by 
org.apache.drill.exec.record.metadata.TupleMetadata
2. table properties represented by Map, can contain information 
if schema is strict or partial (default is partial) etc.

  was:
As part of schema provisioning project we want to allow hybrid model for Row 
set-based scan framework, namely to allow to pass custom schema metadata which 
can be partial.

Currently schema provisioning has TableSchema class that contains the following 
information (can be obtained from metastore, schema file, table function):
1. table schema represented by 
org.apache.drill.exec.record.metadata.TupleMetadata
2. table properties represented by Map, can contain information 
if schema is strict or partial (default is partial) etc.


> Allow hybrid model in the Row set-based scan framework
> --
>
> Key: DRILL-7011
> URL: https://issues.apache.org/jira/browse/DRILL-7011
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project we want to allow hybrid model for Row 
> set-based scan framework, namely to allow to pass custom schema metadata 
> which can be partial.
> Currently schema provisioning has SchemaContainer class that contains the 
> following information (can be obtained from metastore, schema file, table 
> function):
> 1. table schema represented by 
> org.apache.drill.exec.record.metadata.TupleMetadata
> 2. table properties represented by Map, can contain 
> information if schema is strict or partial (default is partial) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1

2019-01-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reopened DRILL-7002:
-

> RuntimeFilter produce wrong results while setting 
> exec.hashjoin.num_partitions=1
> 
>
> Key: DRILL-7002
> URL: https://issues.apache.org/jira/browse/DRILL-7002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.16.0
>Reporter: weijie.tong
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1

2019-01-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7002:
---

Assignee: weijie.tong  (was: Arina Ielchiieva)

> RuntimeFilter produce wrong results while setting 
> exec.hashjoin.num_partitions=1
> 
>
> Key: DRILL-7002
> URL: https://issues.apache.org/jira/browse/DRILL-7002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.16.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1

2019-01-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7002:
---

Assignee: Arina Ielchiieva  (was: weijie.tong)

> RuntimeFilter produce wrong results while setting 
> exec.hashjoin.num_partitions=1
> 
>
> Key: DRILL-7002
> URL: https://issues.apache.org/jira/browse/DRILL-7002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.16.0
>Reporter: weijie.tong
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7014) Format plugin for LTSV files

2019-01-29 Thread Takako Shimamoto (JIRA)
Takako Shimamoto created DRILL-7014:
---

 Summary: Format plugin for LTSV files
 Key: DRILL-7014
 URL: https://issues.apache.org/jira/browse/DRILL-7014
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - Other
Affects Versions: 1.15.0
Reporter: Takako Shimamoto


I would like to contribute [this 
plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.

h4. Abstract
storage-plugins-override.conf
{code:json}
"storage":{
  dfs: {
type: "file",
connection: "file:///",
formats: {
  "ltsv": {
"type": "ltsv",
"extensions": [
  "ltsv"
]
  }
},
enabled: true
  }
}
{code}
sample.ltsv
{code}
time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
/v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
reqtime:2.532 apptime:2.532 vhost:api.example.com
time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
/v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
reqtime:3.580 apptime:3.580 vhost:api.example.com
{code}
Run query
{code:sh}
root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
Apache Drill 1.15.0
"Drill must go on."
0: jdbc:drill:zk=local> SELECT * FROM 
dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
+-+--+---+---+-+---+--+-+--+--+--+
|time |   host   | forwardedfor  |  req 
 | status  | size  | referer  |   ua| reqtime  | apptime  | 
 vhost   |
+-+--+---+---+-+---+--+-+--+--+--+
| 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET /v1/yyy 
HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 3.580
| api.example.com  |
+-+--+---+---+-+---+--+-+--+--+--+
1 row selected (6.074 seconds)
0: jdbc:drill:zk=local> 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7006) Support type conversion shims in RowSetWriter

2019-01-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7006:

Labels: ready-to-commit  (was: )

> Support type conversion shims in RowSetWriter
> -
>
> Key: DRILL-7006
> URL: https://issues.apache.org/jira/browse/DRILL-7006
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> The {{RowSet}} tools include a {{RowSetWriter}} for populating a batch of 
> vectors. A set of "column writers" exist: one for each kind of vector. These 
> classes provide methods to write a value into a vector. For example, the 
> {{VarcharColumnWriter}} provides a {{setString())}} method to set the value.
> The current writers provide only "natural" conversions: from Java String to 
> Varchar, from Java Double to FLOAT8 and so on. That is, the methods 
> implemented for each type are those that provide s single, unambiguous 
> conversion.
> This ticket asks to add a translation layer: to allow, say, writing an Int 
> column using a String (parsed according to some rules). Or, to convert from a 
> string to a Date using some format.
> The goal is not to provide the type conversions themselves, rather it is to 
> provide a way to insert the type conversion "shim" on top of the "native" 
> column writer in a way that is transparent to code using the row set writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)