[jira] [Updated] (DRILL-7113) Issue with filtering null values from MapRDB-JSON

2019-03-19 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7113:
--
Labels: ready-to-commit  (was: )

> Issue with filtering null values from MapRDB-JSON
> -
>
> Key: DRILL-7113
> URL: https://issues.apache.org/jira/browse/DRILL-7113
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> When the Drill is querying documents from MapRDBJSON that contain fields with 
> null value, it returns the wrong result.
>  The issue is locally reproduced.
> Please find the repro steps:
>  [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.
> [2] Insert the following sample records to table:
> {code:java}
> insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", 
> "confidence": 0.24}'
> insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
> insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", 
> "confidence": 0.54}'
> insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", 
> "confidence": null}'
> {code}
> We can see that for field 'confidence' document 1 has value 0.24, document 3 
> has value 0.54, document 2 does not have the field and document 4 has the 
> field with value null.
> [3] Query the table from DRILL.
>  *Query 1:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person   | 0.24|
> | person2  | null|
> | person3  | 0.54|
> | person4  | null|
> +--+-+
> 4 rows selected (0.2 seconds)
> {code}
> *Query 2:*
> {code:java}
> 0: jdbc:drill:> select * from dfs.tmp.dmdb2;
> +--+-+--+
> | _id  | confidence  |  label   |
> +--+-+--+
> | 1| 0.24| person   |
> | 2| null| person2  |
> | 3| 0.54| person3  |
> | 4| null| person4  |
> +--+-+--+
> 4 rows selected (0.174 seconds)
> {code}
> *Query 3:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence 
> is not null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person   | 0.24|
> | person3  | 0.54|
> | person4  | null|
> +--+-+
> 3 rows selected (0.192 seconds)
> {code}
> *Query 4:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence 
> is  null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person2  | null|
> +--+-+
> 1 row selected (0.262 seconds)
> {code}
> As you can see, Query 3 which queries for all documents with confidence value 
> 'is not null', returns a document with null value.
> *Other observation:*
>  Querying the same data using DRILL without MapRDB provides the correct 
> result.
>  For example, create 4 different JSON files with following data:
> {"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": 
> "person3", "confidence": 0.54} \{"label": "person4", "confidence": null}
> Query it directly using DRILL:
> *Query 5:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person4  | null|
> | person3  | 0.54|
> | person2  | null|
> | person   | 0.24|
> +--+-+
> 4 rows selected (0.203 seconds)
> {code}
> *Query 6:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
> null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person4  | null|
> | person2  | null|
> +--+-+
> 2 rows selected (0.352 seconds)
> {code}
> *Query 7:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
> not null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person3  | 0.54|
> | person   | 0.24|
> +--+-+
> 2 rows selected (0.265 seconds)
> {code}
> As seen in query 6 & 7, it returns the correct result.
> I believe the issue is at the MapRDB layer where it is fetching the results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7121) TPCH 4 takes longer

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7121:
-

 Summary: TPCH 4 takes longer
 Key: DRILL-7121
 URL: https://issues.apache.org/jira/browse/DRILL-7121
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


Here is TPCH 4 with sf 100:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}

The plan has changed when Statistics is disabled.   A Hash Agg and a Broadcast 
Exchange have been added.  These two operators expand the number of rows from 
the lineitem table from 137M to 9B rows.   This forces the hash join to use 6GB 
of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7120:
--
Summary: Query fails with ChannelClosedException when Statistics is 
disabled.  (was: Query fails with ChannelClosedException)

> Query fails with ChannelClosedException when Statistics is disabled.
> 
>
> Key: DRILL-7120
> URL: https://issues.apache.org/jira/browse/DRILL-7120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
> {noformat}
> select
>   n.n_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> from
>   customer c,
>   orders o,
>   lineitem l,
>   supplier s,
>   nation n,
>   region r
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and l.l_suppkey = s.s_suppkey
>   and c.c_nationkey = s.s_nationkey
>   and s.s_nationkey = n.n_nationkey
>   and n.n_regionkey = r.r_regionkey
>   and r.r_name = 'EUROPE'
>   and o.o_orderdate >= date '1997-01-01'
>   and o.o_orderdate < date '1997-01-01' + interval '1' year
> group by
>   n.n_name
> order by
>   revenue desc;
> {noformat}
> This is the error from drillbit.log:
> {noformat}
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
> FINISHED
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED
> 2019-03-04 18:17:51,454 [BitServer-13] WARN  
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits.  Current Allocation: 262144.
> 2019-03-04 18:17:51,454 [BitServer-13] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2019-03-04 18:17:51,463 [BitServer-13] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating 
> buffer.
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.ch

[jira] [Created] (DRILL-7123) TPCDS query 83 runs slower when Statistics is disabled

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7123:
-

 Summary: TPCDS query 83 runs slower when Statistics is disabled
 Key: DRILL-7123
 URL: https://issues.apache.org/jira/browse/DRILL-7123
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


Query is TPCDS 83 with sf 100:
{noformat}
WITH sr_items 
 AS (SELECT i_item_id   item_id, 
Sum(sr_return_quantity) sr_item_qty 
 FROM   store_returns, 
item, 
date_dim 
 WHERE  sr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND sr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id), 
 cr_items 
 AS (SELECT i_item_id   item_id, 
Sum(cr_return_quantity) cr_item_qty 
 FROM   catalog_returns, 
item, 
date_dim 
 WHERE  cr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND cr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id), 
 wr_items 
 AS (SELECT i_item_id   item_id, 
Sum(wr_return_quantity) wr_item_qty 
 FROM   web_returns, 
item, 
date_dim 
 WHERE  wr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND wr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id) 
SELECT sr_items.item_id, 
   sr_item_qty, 
   sr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 sr_dev, 
   cr_item_qty, 
   cr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 cr_dev, 
   wr_item_qty, 
   wr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 wr_dev, 
   ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
   average 
FROM   sr_items, 
   cr_items, 
   wr_items 
WHERE  sr_items.item_id = cr_items.item_id 
   AND sr_items.item_id = wr_items.item_id 
ORDER  BY sr_items.item_id, 
  sr_item_qty
LIMIT 100; 
{noformat}

The number of threads for major fragments 1 and 2 has changed when Statistics 
is disabled.  The number of minor fragments has been reduced from 10 and 15 
fragments down to 3 fragments.  Rowcount has changed for major fragment 2 from 
1439754.0 down to 287950.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7121:
--
Summary: TPCH 4 takes longer when Statistics is disabled.  (was: TPCH 4 
takes longer)

> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7120:
--
Summary: Query fails with ChannelClosedException when Statistics is 
disabled  (was: Query fails with ChannelClosedException when Statistics is 
disabled.)

> Query fails with ChannelClosedException when Statistics is disabled
> ---
>
> Key: DRILL-7120
> URL: https://issues.apache.org/jira/browse/DRILL-7120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
> {noformat}
> select
>   n.n_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> from
>   customer c,
>   orders o,
>   lineitem l,
>   supplier s,
>   nation n,
>   region r
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and l.l_suppkey = s.s_suppkey
>   and c.c_nationkey = s.s_nationkey
>   and s.s_nationkey = n.n_nationkey
>   and n.n_regionkey = r.r_regionkey
>   and r.r_name = 'EUROPE'
>   and o.o_orderdate >= date '1997-01-01'
>   and o.o_orderdate < date '1997-01-01' + interval '1' year
> group by
>   n.n_name
> order by
>   revenue desc;
> {noformat}
> This is the error from drillbit.log:
> {noformat}
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
> FINISHED
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED
> 2019-03-04 18:17:51,454 [BitServer-13] WARN  
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits.  Current Allocation: 262144.
> 2019-03-04 18:17:51,454 [BitServer-13] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2019-03-04 18:17:51,463 [BitServer-13] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating 
> buffer.
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]

[jira] [Assigned] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou reassigned DRILL-7122:
-

 Assignee: Gautam Parai
Affects Version/s: 1.16.0
 Priority: Blocker  (was: Major)
Fix Version/s: 1.16.0
  Description: 
Here is query 29 with sf 100:
{noformat}
SELECT i_item_id, 
   i_item_desc, 
   s_store_id, 
   s_store_name, 
   Avg(ss_quantity)AS store_sales_quantity, 
   Avg(sr_return_quantity) AS store_returns_quantity, 
   Avg(cs_quantity)AS catalog_sales_quantity 
FROM   store_sales, 
   store_returns, 
   catalog_sales, 
   date_dim d1, 
   date_dim d2, 
   date_dim d3, 
   store, 
   item 
WHERE  d1.d_moy = 4 
   AND d1.d_year = 1998 
   AND d1.d_date_sk = ss_sold_date_sk 
   AND i_item_sk = ss_item_sk 
   AND s_store_sk = ss_store_sk 
   AND ss_customer_sk = sr_customer_sk 
   AND ss_item_sk = sr_item_sk 
   AND ss_ticket_number = sr_ticket_number 
   AND sr_returned_date_sk = d2.d_date_sk 
   AND d2.d_moy BETWEEN 4 AND 4 + 3 
   AND d2.d_year = 1998 
   AND sr_customer_sk = cs_bill_customer_sk 
   AND sr_item_sk = cs_item_sk 
   AND cs_sold_date_sk = d3.d_date_sk 
   AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) 
GROUP  BY i_item_id, 
  i_item_desc, 
  s_store_id, 
  s_store_name 
ORDER  BY i_item_id, 
  i_item_desc, 
  s_store_id, 
  s_store_name
LIMIT 100; 
{noformat}

The hash join order has changed.  As a result, one of the hash joins does not 
seem to reduce the number of rows significantly.
  Component/s: Query Planning & Optimization

> TPCDS queries 29 25 17 are slower when Statistics is disabled.
> --
>
> Key: DRILL-7122
> URL: https://issues.apache.org/jira/browse/DRILL-7122
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is query 29 with sf 100:
> {noformat}
> SELECT i_item_id, 
>i_item_desc, 
>s_store_id, 
>s_store_name, 
>Avg(ss_quantity)AS store_sales_quantity, 
>Avg(sr_return_quantity) AS store_returns_quantity, 
>Avg(cs_quantity)AS catalog_sales_quantity 
> FROM   store_sales, 
>store_returns, 
>catalog_sales, 
>date_dim d1, 
>date_dim d2, 
>date_dim d3, 
>store, 
>item 
> WHERE  d1.d_moy = 4 
>AND d1.d_year = 1998 
>AND d1.d_date_sk = ss_sold_date_sk 
>AND i_item_sk = ss_item_sk 
>AND s_store_sk = ss_store_sk 
>AND ss_customer_sk = sr_customer_sk 
>AND ss_item_sk = sr_item_sk 
>AND ss_ticket_number = sr_ticket_number 
>AND sr_returned_date_sk = d2.d_date_sk 
>AND d2.d_moy BETWEEN 4 AND 4 + 3 
>AND d2.d_year = 1998 
>AND sr_customer_sk = cs_bill_customer_sk 
>AND sr_item_sk = cs_item_sk 
>AND cs_sold_date_sk = d3.d_date_sk 
>AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) 
> GROUP  BY i_item_id, 
>   i_item_desc, 
>   s_store_id, 
>   s_store_name 
> ORDER  BY i_item_id, 
>   i_item_desc, 
>   s_store_id, 
>   s_store_name
> LIMIT 100; 
> {noformat}
> The hash join order has changed.  As a result, one of the hash joins does not 
> seem to reduce the number of rows significantly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7122:
-

 Summary: TPCDS queries 29 25 17 are slower when Statistics is 
disabled.
 Key: DRILL-7122
 URL: https://issues.apache.org/jira/browse/DRILL-7122
 Project: Apache Drill
  Issue Type: Bug
Reporter: Robert Hou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7120:
--
Description: 
TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
{noformat}
select
  n.n_name,
  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
from
  customer c,
  orders o,
  lineitem l,
  supplier s,
  nation n,
  region r
where
  c.c_custkey = o.o_custkey
  and l.l_orderkey = o.o_orderkey
  and l.l_suppkey = s.s_suppkey
  and c.c_nationkey = s.s_nationkey
  and s.s_nationkey = n.n_nationkey
  and n.n_regionkey = r.r_regionkey
  and r.r_name = 'EUROPE'
  and o.o_orderdate >= date '1997-01-01'
  and o.o_orderdate < date '1997-01-01' + interval '1' year
group by
  n.n_name
order by
  revenue desc;
{noformat}

This is the error from drillbit.log:
{noformat}
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
FINISHED
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: 
State to report: FINISHED
2019-03-04 18:17:51,454 [BitServer-13] WARN  
o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
stream due to memory limits.  Current Allocation: 262144.
2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer 
- Out of memory in RPC layer.
2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.120.104:31012 <--> 
/10.10.120.106:53048 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: 
org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer.
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
 ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-common-4.0.48.Final.jar:4.0.48.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
allocating buffer.
at 
io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:67)
 ~[drill-memory-base-1.16.0-SNAPSHOT.jar:4.0.

[jira] [Created] (DRILL-7120) Query fails with ChannelClosedException

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7120:
-

 Summary: Query fails with ChannelClosedException
 Key: DRILL-7120
 URL: https://issues.apache.org/jira/browse/DRILL-7120
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH query 5 fails at sf100.  Here is the query:
{noformat}
select
  n.n_name,
  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
from
  customer c,
  orders o,
  lineitem l,
  supplier s,
  nation n,
  region r
where
  c.c_custkey = o.o_custkey
  and l.l_orderkey = o.o_orderkey
  and l.l_suppkey = s.s_suppkey
  and c.c_nationkey = s.s_nationkey
  and s.s_nationkey = n.n_nationkey
  and n.n_regionkey = r.r_regionkey
  and r.r_name = 'EUROPE'
  and o.o_orderdate >= date '1997-01-01'
  and o.o_orderdate < date '1997-01-01' + interval '1' year
group by
  n.n_name
order by
  revenue desc;
{noformat}

This is the error from drillbit.log:
{noformat}
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
FINISHED
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: 
State to report: FINISHED
2019-03-04 18:17:51,454 [BitServer-13] WARN  
o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
stream due to memory limits.  Current Allocation: 262144.
2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer 
- Out of memory in RPC layer.
2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.120.104:31012 <--> 
/10.10.120.106:53048 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: 
org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer.
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
 ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-common-4.0.48.Final.jar:4.0.48.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_1

[jira] [Updated] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7048:

Labels: doc-impacting  (was: )

> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7048:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6050) Provide a limit to number of rows fetched for a query in UI

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6050:

Fix Version/s: 1.17.0

> Provide a limit to number of rows fetched for a query in UI
> ---
>
> Key: DRILL-6050
> URL: https://issues.apache.org/jira/browse/DRILL-6050
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit, user-experience
> Fix For: 1.16.0, 1.17.0
>
>
> Currently, the WebServer side needs to process the entire set of results and 
> stream it back to the WebClient. 
> Since the WebUI does paginate results, we can load a larger set for 
> pagination on the browser client and relieve pressure off the WebServer to 
> host all the data.
> e.g. Fetching all rows from a 1Billion records table is impractical and can 
> be capped at 10K. Currently, the user has to explicitly specify LIMIT in the 
> submitted query. 
> An option can be provided in the field to allow for this entry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6050) Provide a limit to number of rows fetched for a query in UI

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6050:

Labels: ready-to-commit user-experience  (was: doc-impacting 
ready-to-commit user-experience)

> Provide a limit to number of rows fetched for a query in UI
> ---
>
> Key: DRILL-6050
> URL: https://issues.apache.org/jira/browse/DRILL-6050
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit, user-experience
> Fix For: 1.16.0
>
>
> Currently, the WebServer side needs to process the entire set of results and 
> stream it back to the WebClient. 
> Since the WebUI does paginate results, we can load a larger set for 
> pagination on the browser client and relieve pressure off the WebServer to 
> host all the data.
> e.g. Fetching all rows from a 1Billion records table is impractical and can 
> be capped at 10K. Currently, the user has to explicitly specify LIMIT in the 
> submitted query. 
> An option can be provided in the field to allow for this entry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6960) Auto Limit Wrapping should not apply to non-select query

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6960:

Labels: doc-impacting user-experience  (was: user-experience)

> Auto Limit Wrapping should not apply to non-select query
> 
>
> Key: DRILL-6960
> URL: https://issues.apache.org/jira/browse/DRILL-6960
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> [~IhorHuzenko] pointed out that DRILL-6050 can cause submission of queries 
> with incorrect syntax. 
> For example, when user enters {{SHOW DATABASES}}' and after limitation 
> wrapping {{SELECT * FROM (SHOW DATABASES) LIMIT 10}} will be posted. 
> This results into parsing errors, like:
> {{Query Failed: An Error Occurred 
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> Encountered "( show" at line 2, column 15. Was expecting one of:  
> ... }}.
> The fix should involve a javascript check for all non-select queries and not 
> apply the LIMIT wrap for those queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7110) Skip writing profile when an ALTER SESSION is executed

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7110:

Labels: doc-impacting  (was: )

> Skip writing profile when an ALTER SESSION is executed
> --
>
> Key: DRILL-7110
> URL: https://issues.apache.org/jira/browse/DRILL-7110
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently, any {{ALTER }} query will be logged. While this is useful, 
> it can potentially add up to a lot of profiles being written unnecessarily, 
> since those changes are also reflected on the queries that follow.
> This JIRA is proposing an option to skip writing such profiles to the profile 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7110) Skip writing profile when an ALTER SESSION is executed

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7110:

Fix Version/s: (was: 1.17.0)
   1.16.0

> Skip writing profile when an ALTER SESSION is executed
> --
>
> Key: DRILL-7110
> URL: https://issues.apache.org/jira/browse/DRILL-7110
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
>
> Currently, any {{ALTER }} query will be logged. While this is useful, 
> it can potentially add up to a lot of profiles being written unnecessarily, 
> since those changes are also reflected on the queries that follow.
> This JIRA is proposing an option to skip writing such profiles to the profile 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7110) Skip writing profile when an ALTER SESSION is executed

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7110:

Reviewer: Arina Ielchiieva

> Skip writing profile when an ALTER SESSION is executed
> --
>
> Key: DRILL-7110
> URL: https://issues.apache.org/jira/browse/DRILL-7110
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.16.0
>
>
> Currently, any {{ALTER }} query will be logged. While this is useful, 
> it can potentially add up to a lot of profiles being written unnecessarily, 
> since those changes are also reflected on the queries that follow.
> This JIRA is proposing an option to skip writing such profiles to the profile 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7032) Ignore corrupt rows in a PCAP file

2019-03-19 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796473#comment-16796473
 ] 

Arina Ielchiieva commented on DRILL-7032:
-

[~cgivre] did you open Jira for PCAP-NG parser? Can you please link it?

> Ignore corrupt rows in a PCAP file
> --
>
> Key: DRILL-7032
> URL: https://issues.apache.org/jira/browse/DRILL-7032
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.15.0
> Environment: OS: Ubuntu 18.4
> Drill version: 1.15.0
> Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
>Reporter: Giovanni Conte
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.16.0
>
>
> Would be useful for Drill to have some ability to ignore corrupt rows in a 
> PCAP file instead of trow the java exception.
> This is because there are many pcap files with corrupted lines and this 
> funcionality will avoid to do a pre-fixing of the packet-captures (example 
> attached file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7119) Modify selectivity calculations to use histograms

2019-03-19 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-7119:
-

 Summary: Modify selectivity calculations to use histograms
 Key: DRILL-7119
 URL: https://issues.apache.org/jira/browse/DRILL-7119
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning & Optimization
Reporter: Aman Sinha
Assignee: Aman Sinha
 Fix For: 1.16.0


(Please see parent JIRA for the design document)
Once the t-digest based histogram is created, we need to read it back and 
modify the selectivity calculations such that they use the histogram buckets 
for range conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7118) Filter not getting pushed down on MapR-DB tables.

2019-03-19 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7118:
--

 Summary: Filter not getting pushed down on MapR-DB tables.
 Key: DRILL-7118
 URL: https://issues.apache.org/jira/browse/DRILL-7118
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


A simple is null filter is not being pushed down for the mapr-db tables. Here 
is the repro for the same.
{code:java}
0: jdbc:drill:zk=local> explain plan for select * from dfs.`/tmp/js` where b is 
null;
ANTLR Tool version 4.5 used for code generation does not match the current 
runtime version 4.7.1ANTLR Runtime version 4.5 used for parser compilation does 
not match the current runtime version 4.7.1ANTLR Tool version 4.5 used for code 
generation does not match the current runtime version 4.7.1ANTLR Runtime 
version 4.5 used for parser compilation does not match the current runtime 
version 
4.7.1+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(**=[$0])
00-02 Project(T0¦¦**=[$0])
00-03 SelectionVectorRemover
00-04 Filter(condition=[IS NULL($1)])
00-05 Project(T0¦¦**=[$0], b=[$1])
00-06 Scan(table=[[dfs, /tmp/js]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=/tmp/js, condition=null], columns=[`**`, 
`b`], maxwidth=1]])
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal)

2019-03-19 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-7117:
-

 Summary: Support creation of histograms for numeric data types 
(except Decimal)
 Key: DRILL-7117
 URL: https://issues.apache.org/jira/browse/DRILL-7117
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning & Optimization
Reporter: Aman Sinha
Assignee: Aman Sinha
 Fix For: 1.16.0


This JIRA is specific to creating histograms for numeric data types: INT, 
BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable versions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6501) Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6501:
-
Fix Version/s: (was: 1.16.0)
   1.17.0

> Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed
> 
>
> Key: DRILL-6501
> URL: https://issues.apache.org/jira/browse/DRILL-6501
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
> Fix For: 1.17.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DRILL-6212 is a temporary fix to alleviate issues due to CALCITE-2223. Once, 
> CALCITE-2223 is fixed this change needs to be reverted back which would 
> require DrillProjectMergeRule to go back to extending the ProjectMergeRule. 
> Please take a look at how CALCITE-2223 is eventually fixed (as of now it is 
> still not clear which fix is the way to do). Depending on the fix we may need 
> to additional work to integrate these changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7032) Ignore corrupt rows in a PCAP file

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7032:
-
Reviewer: Arina Ielchiieva

> Ignore corrupt rows in a PCAP file
> --
>
> Key: DRILL-7032
> URL: https://issues.apache.org/jira/browse/DRILL-7032
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.15.0
> Environment: OS: Ubuntu 18.4
> Drill version: 1.15.0
> Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
>Reporter: Giovanni Conte
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.16.0
>
>
> Would be useful for Drill to have some ability to ignore corrupt rows in a 
> PCAP file instead of trow the java exception.
> This is because there are many pcap files with corrupted lines and this 
> funcionality will avoid to do a pre-fixing of the packet-captures (example 
> attached file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6992) Support column histogram statistics

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6992:
-
Fix Version/s: 1.16.0

> Support column histogram statistics
> ---
>
> Key: DRILL-6992
> URL: https://issues.apache.org/jira/browse/DRILL-6992
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> As a follow-up to 
> [DRILL-1328|https://issues.apache.org/jira/browse/DRILL-1328] which is adding 
> NDV (num distinct values) support and creating the framework for statistics, 
> we also need Histograms.   These are needed  for range predicates selectivity 
> estimation as well as equality predicates when there is non-uniform 
> distribution of data.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6956) Maintain a single entry for Drill Version in the pom file

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6956:
-
Fix Version/s: (was: 1.16.0)
   1.17.0

> Maintain a single entry for Drill Version in the pom file
> -
>
> Key: DRILL-6956
> URL: https://issues.apache.org/jira/browse/DRILL-6956
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently, updating the version information for a Drill release involves 
> updating 30+ pom files.
> The right way would be to use the Multi Module Setup for Maven CI.
> https://maven.apache.org/maven-ci-friendly.html#Multi_Module_Setup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6899) Fix timestamp issues in unit tests ignored with DRILL-6833

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6899:
-
Fix Version/s: (was: 1.16.0)
   1.17.0

> Fix timestamp issues in unit tests ignored with DRILL-6833
> --
>
> Key: DRILL-6899
> URL: https://issues.apache.org/jira/browse/DRILL-6899
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
> Fix For: 1.17.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> {{The following tests were disabled in the PR for DRILL-6833}}
> {{IndexPlanTest.testCastTimestampPlan() - Re-enable after the MapRDB format 
> plugin issue is fixed.}}
> {{IndexPlanTest.testRowkeyJoinPushdown_13() - Re-enable the testcase after 
> fixing the execution issue with HashJoin used as Rowkeyjoin.}}
> {{IndexPlanTest.testRowkeyJoinPushdown_12() - Remove the testcase since the 
> SemiJoin transformation makes the rowkeyjoinpushdown transformation invalid.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-03-19 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7063:
--
Reviewer: Aman Sinha

> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7114) ANALYZE command generates warnings for stats file and materialization

2019-03-19 Thread Gautam Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796306#comment-16796306
 ] 

Gautam Parai commented on DRILL-7114:
-

[~vitalii] yes you are right - this happens for all queries. I plan to address 
both issues as part of this JIRA.

> ANALYZE command generates warnings for stats file and materialization
> -
>
> Key: DRILL-7114
> URL: https://issues.apache.org/jira/browse/DRILL-7114
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Gautam Parai
>Priority: Minor
> Fix For: 1.16.0
>
>
> When I run ANALYZE, I see warnings in the log file as shown below. The 
> ANALYZE command should not try to read the stats file or materialize the 
> stats.  
> {noformat}
> 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN  
> o.a.d.e.p.common.DrillStatsTable - Failed to read the stats file.
> java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does 
> not exist
> 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN  
> o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. 
> Continuing without stats.
> java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does 
> not exist
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7089) Implement caching of BaseMetadata classes

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7089:
-
Reviewer: Aman Sinha

> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5028:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Opening profiles page from web ui gets very slow when a lot of history files 
> have been stored in HDFS or Local FS.
> --
>
> Key: DRILL-5028
> URL: https://issues.apache.org/jira/browse/DRILL-5028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Account Not Used
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.17.0
>
>
> We have a Drill cluster with 20+ Nodes and we store all history profiles in 
> hdfs. Without doing periodically cleans for hdfs, the profiles page gets 
> slower while serving more queries.
> Code from LocalPersistentStore.java uses fs.list(false, basePath) for 
> fetching the latest 100 history profiles by default, I guess this operation 
> blocks the page loading (Millions small files can be stored in the basePath), 
> maybe we can try some other ways to reach the same goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6032:
-
Fix Version/s: (was: 1.16.0)
   1.17.0

> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.17.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7107) Unable to connect to Drill 1.15 through ZK

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7107:
-
Reviewer: Sorabh Hamirwasia

> Unable to connect to Drill 1.15 through ZK
> --
>
> Key: DRILL-7107
> URL: https://issues.apache.org/jira/browse/DRILL-7107
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.16.0
>
>
> After upgrading to Drill 1.15, users are seeing they are no longer able to 
> connect to Drill using ZK quorum. They are getting the following "Unable to 
> setup ZK for client" error.
> [~]$ sqlline -u "jdbc:drill:zk=172.16.2.165:5181;auth=maprsasl"
> Error: Failure in connecting to Drill: 
> org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client. 
> (state=,code=0)
> java.sql.SQLNonTransientConnectionException: Failure in connecting to Drill: 
> org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client.
>  at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:174)
>  at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)
>  at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:67)
>  at 
> org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
>  at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
>  at sqlline.DatabaseConnection.connect(DatabaseConnection.java:130)
>  at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:179)
>  at sqlline.Commands.connect(Commands.java:1247)
>  at sqlline.Commands.connect(Commands.java:1139)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
>  at sqlline.SqlLine.dispatch(SqlLine.java:722)
>  at sqlline.SqlLine.initArgs(SqlLine.java:416)
>  at sqlline.SqlLine.begin(SqlLine.java:514)
>  at sqlline.SqlLine.start(SqlLine.java:264)
>  at sqlline.SqlLine.main(SqlLine.java:195)
> Caused by: org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for 
> client.
>  at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:340)
>  at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:165)
>  ... 18 more
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.drill.exec.coord.zk.ZKACLProviderFactory.findACLProvider(ZKACLProviderFactory.java:68)
>  at 
> org.apache.drill.exec.coord.zk.ZKACLProviderFactory.getACLProvider(ZKACLProviderFactory.java:47)
>  at 
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator.(ZKClusterCoordinator.java:114)
>  at 
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator.(ZKClusterCoordinator.java:86)
>  at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:337)
>  ... 19 more
> Apache Drill 1.15.0.0
> "This isn't your grandfather's SQL."
> sqlline>
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6540:
-
Fix Version/s: (was: 1.16.0)
   1.17.0

> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
>  Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.2.0|https://hadoop.apache.org/docs/r3.2.0/].
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
>  This upgrade is also needed to leverage the newest version of Zookeeper 
> libraries and Hive 3.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7098) File Metadata Metastore Plugin

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7098:
-
Fix Version/s: (was: 1.16.0)
   2.0.0

> File Metadata Metastore Plugin
> --
>
> Key: DRILL-7098
> URL: https://issues.apache.org/jira/browse/DRILL-7098
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Metadata
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: Metastore
> Fix For: 2.0.0
>
>
> DRILL-6852 introduces Drill Metastore API. 
> The second step is to create internal Drill Metastore mechanism (and File 
> Metastore Plugin), which will involve Metastore API and can be extended for 
> using by other Storage Plugins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6960) Auto Limit Wrapping should not apply to non-select query

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6960:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Auto Limit Wrapping should not apply to non-select query
> 
>
> Key: DRILL-6960
> URL: https://issues.apache.org/jira/browse/DRILL-6960
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: user-experience
> Fix For: 1.17.0
>
>
> [~IhorHuzenko] pointed out that DRILL-6050 can cause submission of queries 
> with incorrect syntax. 
> For example, when user enters {{SHOW DATABASES}}' and after limitation 
> wrapping {{SELECT * FROM (SHOW DATABASES) LIMIT 10}} will be posted. 
> This results into parsing errors, like:
> {{Query Failed: An Error Occurred 
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> Encountered "( show" at line 2, column 15. Was expecting one of:  
> ... }}.
> The fix should involve a javascript check for all non-select queries and not 
> apply the LIMIT wrap for those queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5270) Improve loading of profiles listing in the WebUI

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5270:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Improve loading of profiles listing in the WebUI
> 
>
> Key: DRILL-5270
> URL: https://issues.apache.org/jira/browse/DRILL-5270
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-2362) Drill should manage Query Profiling archiving

2019-03-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-2362:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7113) Issue with filtering null values from MapRDB-JSON

2019-03-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7113:
-
Reviewer: Hanumath Rao Maduri

> Issue with filtering null values from MapRDB-JSON
> -
>
> Key: DRILL-7113
> URL: https://issues.apache.org/jira/browse/DRILL-7113
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> When the Drill is querying documents from MapRDBJSON that contain fields with 
> null value, it returns the wrong result.
>  The issue is locally reproduced.
> Please find the repro steps:
>  [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.
> [2] Insert the following sample records to table:
> {code:java}
> insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", 
> "confidence": 0.24}'
> insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
> insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", 
> "confidence": 0.54}'
> insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", 
> "confidence": null}'
> {code}
> We can see that for field 'confidence' document 1 has value 0.24, document 3 
> has value 0.54, document 2 does not have the field and document 4 has the 
> field with value null.
> [3] Query the table from DRILL.
>  *Query 1:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person   | 0.24|
> | person2  | null|
> | person3  | 0.54|
> | person4  | null|
> +--+-+
> 4 rows selected (0.2 seconds)
> {code}
> *Query 2:*
> {code:java}
> 0: jdbc:drill:> select * from dfs.tmp.dmdb2;
> +--+-+--+
> | _id  | confidence  |  label   |
> +--+-+--+
> | 1| 0.24| person   |
> | 2| null| person2  |
> | 3| 0.54| person3  |
> | 4| null| person4  |
> +--+-+--+
> 4 rows selected (0.174 seconds)
> {code}
> *Query 3:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence 
> is not null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person   | 0.24|
> | person3  | 0.54|
> | person4  | null|
> +--+-+
> 3 rows selected (0.192 seconds)
> {code}
> *Query 4:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence 
> is  null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person2  | null|
> +--+-+
> 1 row selected (0.262 seconds)
> {code}
> As you can see, Query 3 which queries for all documents with confidence value 
> 'is not null', returns a document with null value.
> *Other observation:*
>  Querying the same data using DRILL without MapRDB provides the correct 
> result.
>  For example, create 4 different JSON files with following data:
> {"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": 
> "person3", "confidence": 0.54} \{"label": "person4", "confidence": null}
> Query it directly using DRILL:
> *Query 5:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person4  | null|
> | person3  | 0.54|
> | person2  | null|
> | person   | 0.24|
> +--+-+
> 4 rows selected (0.203 seconds)
> {code}
> *Query 6:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
> null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person4  | null|
> | person2  | null|
> +--+-+
> 2 rows selected (0.352 seconds)
> {code}
> *Query 7:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
> not null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person3  | 0.54|
> | person   | 0.24|
> +--+-+
> 2 rows selected (0.265 seconds)
> {code}
> As seen in query 6 & 7, it returns the correct result.
> I believe the issue is at the MapRDB layer where it is fetching the results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7113) Issue with filtering null values from MapRDB-JSON

2019-03-19 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7113:
--
Fix Version/s: (was: 1.17.0)

> Issue with filtering null values from MapRDB-JSON
> -
>
> Key: DRILL-7113
> URL: https://issues.apache.org/jira/browse/DRILL-7113
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> When the Drill is querying documents from MapRDBJSON that contain fields with 
> null value, it returns the wrong result.
>  The issue is locally reproduced.
> Please find the repro steps:
>  [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.
> [2] Insert the following sample records to table:
> {code:java}
> insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", 
> "confidence": 0.24}'
> insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
> insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", 
> "confidence": 0.54}'
> insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", 
> "confidence": null}'
> {code}
> We can see that for field 'confidence' document 1 has value 0.24, document 3 
> has value 0.54, document 2 does not have the field and document 4 has the 
> field with value null.
> [3] Query the table from DRILL.
>  *Query 1:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person   | 0.24|
> | person2  | null|
> | person3  | 0.54|
> | person4  | null|
> +--+-+
> 4 rows selected (0.2 seconds)
> {code}
> *Query 2:*
> {code:java}
> 0: jdbc:drill:> select * from dfs.tmp.dmdb2;
> +--+-+--+
> | _id  | confidence  |  label   |
> +--+-+--+
> | 1| 0.24| person   |
> | 2| null| person2  |
> | 3| 0.54| person3  |
> | 4| null| person4  |
> +--+-+--+
> 4 rows selected (0.174 seconds)
> {code}
> *Query 3:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence 
> is not null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person   | 0.24|
> | person3  | 0.54|
> | person4  | null|
> +--+-+
> 3 rows selected (0.192 seconds)
> {code}
> *Query 4:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence 
> is  null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person2  | null|
> +--+-+
> 1 row selected (0.262 seconds)
> {code}
> As you can see, Query 3 which queries for all documents with confidence value 
> 'is not null', returns a document with null value.
> *Other observation:*
>  Querying the same data using DRILL without MapRDB provides the correct 
> result.
>  For example, create 4 different JSON files with following data:
> {"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": 
> "person3", "confidence": 0.54} \{"label": "person4", "confidence": null}
> Query it directly using DRILL:
> *Query 5:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person4  | null|
> | person3  | 0.54|
> | person2  | null|
> | person   | 0.24|
> +--+-+
> 4 rows selected (0.203 seconds)
> {code}
> *Query 6:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
> null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person4  | null|
> | person2  | null|
> +--+-+
> 2 rows selected (0.352 seconds)
> {code}
> *Query 7:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
> not null;
> +--+-+
> |  label   | confidence  |
> +--+-+
> | person3  | 0.54|
> | person   | 0.24|
> +--+-+
> 2 rows selected (0.265 seconds)
> {code}
> As seen in query 6 & 7, it returns the correct result.
> I believe the issue is at the MapRDB layer where it is fetching the results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6970) Issue with LogRegex format plugin where drillbuf was overflowing

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6970:

Labels:   (was: ready-to-commit)

> Issue with LogRegex format plugin where drillbuf was overflowing 
> -
>
> Key: DRILL-6970
> URL: https://issues.apache.org/jira/browse/DRILL-6970
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Major
> Fix For: 1.16.0
>
>
> The log format plugin does re-allocate the drillbuf when it fills up. You can 
> query small log files but larger ones will fail with this error:
> 0: jdbc:drill:zk=local> select * from dfs.root.`/prog/test.log`;
> Error: INTERNAL_ERROR ERROR: index: 32724, length: 108 (expected: range(0, 
> 32768))
> Fragment 0:0
> Please, refer to logs for more information.
>  
> I'm running drill-embeded. The log storage plugin is configured like so
> {code:java}
> "log": {
> "type": "logRegex",
> "regex": "(.+)",
> "extension": "log",
> "maxErrors": 10,
> "schema": [
> {
> "fieldName": "line"
> }
> ]
> },
> {code}
> The log files is very simple
> {code:java}
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> ...{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7116) Adapt statistics to use Drill Metastore API

2019-03-19 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-7116:
--

 Summary: Adapt statistics to use Drill Metastore API
 Key: DRILL-7116
 URL: https://issues.apache.org/jira/browse/DRILL-7116
 Project: Apache Drill
  Issue Type: Sub-task
Affects Versions: 1.16.0
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
 Fix For: 1.17.0


The current implementation of statistics supposes the usage of files for 
storing and reading statistics.
 The aim of this Jira is to adapt statistics to use Drill Metastore API so in 
future it may be stored in other metastore implementations.

Implementation details:
 - Move statistics info into {{TableMetadata}}
 - Provide a way for obtaining {{TableMetadata}} in the places where statistics 
may be used (partially implemented in the scope of DRILL-7089)
 - Investigate and implement (if possible) lazy materialization of 
{{DrillStatsTable}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2019-03-19 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6923:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Show schemas uses default(user defined) schema first for resolving table from 
> information_schema
> 
>
> Key: DRILL-6923
> URL: https://issues.apache.org/jira/browse/DRILL-6923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
> Fix For: 1.17.0
>
>
> Show tables tries to find table `information_schema`.`schemata` in default 
> (user defined) schema, and after failed attempt it resolves table 
> successfully against root schema. Please check description below for details 
> explained using example with hive plugin. 
> *Abstract* 
> When Drill used with enabled Hive SQL Standard authorization, execution of 
> queries like,
> {code:sql}
> USE hive.db_general;
> SHOW SCHEMAS LIKE 'hive.%'; {code}
> results in error DrillRuntimeException: Failed to use the Hive authorization 
> components: Error getting object from metastore for Object 
> [type=TABLE_OR_VIEW, name=db_general.information_schema] . 
> *Details* 
> Consider showSchemas() test similar to one defined in 
> TestSqlStdBasedAuthorization : 
> {code:java}
> @Test
> public void showSchemas() throws Exception {
>   test("USE " + hivePluginName + "." + db_general);
>   testBuilder()
>   .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
>   .unOrdered()
>   .baselineColumns("SCHEMA_NAME")
>   .baselineValues("hive.db_general")
>   .baselineValues("hive.default")
>   .go();
> }
> {code}
> Currently execution of such test will produce following stacktrace: 
> {code:none}
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed 
> to use the Hive authorization components: Error getting object from metastore 
> for Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
> at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
> at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
> at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
> at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
> at 
> org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
> at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
> 

[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2019-03-19 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6923:

Priority: Minor  (was: Major)

> Show schemas uses default(user defined) schema first for resolving table from 
> information_schema
> 
>
> Key: DRILL-6923
> URL: https://issues.apache.org/jira/browse/DRILL-6923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
> Fix For: 1.16.0
>
>
> Show tables tries to find table `information_schema`.`schemata` in default 
> (user defined) schema, and after failed attempt it resolves table 
> successfully against root schema. Please check description below for details 
> explained using example with hive plugin. 
> *Abstract* 
> When Drill used with enabled Hive SQL Standard authorization, execution of 
> queries like,
> {code:sql}
> USE hive.db_general;
> SHOW SCHEMAS LIKE 'hive.%'; {code}
> results in error DrillRuntimeException: Failed to use the Hive authorization 
> components: Error getting object from metastore for Object 
> [type=TABLE_OR_VIEW, name=db_general.information_schema] . 
> *Details* 
> Consider showSchemas() test similar to one defined in 
> TestSqlStdBasedAuthorization : 
> {code:java}
> @Test
> public void showSchemas() throws Exception {
>   test("USE " + hivePluginName + "." + db_general);
>   testBuilder()
>   .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
>   .unOrdered()
>   .baselineColumns("SCHEMA_NAME")
>   .baselineValues("hive.db_general")
>   .baselineValues("hive.default")
>   .go();
> }
> {code}
> Currently execution of such test will produce following stacktrace: 
> {code:none}
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed 
> to use the Hive authorization components: Error getting object from metastore 
> for Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
> at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
> at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
> at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
> at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
> at 
> org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
> at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
> at 
> org.apache.calcite.sql.

[jira] [Updated] (DRILL-7076) NPE is logged when querying postgres tables

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7076:

Priority: Blocker  (was: Minor)

> NPE is logged when querying postgres tables
> ---
>
> Key: DRILL-7076
> URL: https://issues.apache.org/jira/browse/DRILL-7076
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> NPE is seen in logs when querying Postgres table:
> {code:sql}
> select 1 from postgres.public.tdt
> {code}
> Stack trace from {{sqlline.log}}:
> {noformat}
> 2019-03-05 13:49:19,395 [23819dc0-abf8-24f3-ea81-6ced1b6e11af:foreman] WARN  
> o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. 
> Continuing without stats.
> java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.visit(DrillStatsTable.java:189)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72) 
> [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at org.apache.calcite.rel.RelVisitor.visit(RelVisitor.java:44) 
> [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.visit(DrillStatsTable.java:202)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61) 
> [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.materialize(DrillStatsTable.java:177)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:235)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:331)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:178)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:204)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:114)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:80)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_191]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_191]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
> {noformat}
> But query runs and returns the correct result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6540:

Reviewer: Sorabh Hamirwasia

> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
>  Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.2.0|https://hadoop.apache.org/docs/r3.2.0/].
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
>  This upgrade is also needed to leverage the newest version of Zookeeper 
> libraries and Hive 3.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7079) Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7079:
---

Assignee: Bohdan Kazydub  (was: Arina Ielchiieva)

> Drill can't query views from the S3 storage when plain authentication is 
> enabled
> 
>
> Key: DRILL-7079
> URL: https://issues.apache.org/jira/browse/DRILL-7079
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Denys Ordynskiy
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.16.0
>
>
> Enable plain authentication in Drill.
> Create the view on the S3 storage:
> create view s3.tmp.`testview` as select * from cp.`employee.json` limit 20;
> Try to select data from the created view:
> select * from s3.tmp.`testview`;
> *Actual result*:
> {noformat}
> 2019-02-27 17:01:09,202 [Client-1] INFO  
> o.a.d.j.i.DrillCursor$ResultsListener - [#4] Query failed: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: A valid userName is expected
> Please, refer to logs for more information.
> [Error Id: 2271c3aa-6d09-4b51-a585-0e0e954b46eb on maprhost:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) 
> [drill-rpc-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) 
> [drill-rpc-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86

[jira] [Assigned] (DRILL-7079) Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7079:
---

Assignee: Arina Ielchiieva  (was: Bohdan Kazydub)

> Drill can't query views from the S3 storage when plain authentication is 
> enabled
> 
>
> Key: DRILL-7079
> URL: https://issues.apache.org/jira/browse/DRILL-7079
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Denys Ordynskiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> Enable plain authentication in Drill.
> Create the view on the S3 storage:
> create view s3.tmp.`testview` as select * from cp.`employee.json` limit 20;
> Try to select data from the created view:
> select * from s3.tmp.`testview`;
> *Actual result*:
> {noformat}
> 2019-02-27 17:01:09,202 [Client-1] INFO  
> o.a.d.j.i.DrillCursor$ResultsListener - [#4] Query failed: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: A valid userName is expected
> Please, refer to logs for more information.
> [Error Id: 2271c3aa-6d09-4b51-a585-0e0e954b46eb on maprhost:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) 
> [drill-rpc-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) 
> [drill-rpc-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:

[jira] [Updated] (DRILL-7079) Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7079:

Reviewer: Volodymyr Vysotskyi

> Drill can't query views from the S3 storage when plain authentication is 
> enabled
> 
>
> Key: DRILL-7079
> URL: https://issues.apache.org/jira/browse/DRILL-7079
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Denys Ordynskiy
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.16.0
>
>
> Enable plain authentication in Drill.
> Create the view on the S3 storage:
> create view s3.tmp.`testview` as select * from cp.`employee.json` limit 20;
> Try to select data from the created view:
> select * from s3.tmp.`testview`;
> *Actual result*:
> {noformat}
> 2019-02-27 17:01:09,202 [Client-1] INFO  
> o.a.d.j.i.DrillCursor$ResultsListener - [#4] Query failed: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: A valid userName is expected
> Please, refer to logs for more information.
> [Error Id: 2271c3aa-6d09-4b51-a585-0e0e954b46eb on maprhost:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) 
> [drill-rpc-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) 
> [drill-rpc-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>  [netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.4

[jira] [Updated] (DRILL-7115) Improve Hive schema show tables performance

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7115:

Reviewer: Vitalii Diravka

> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7115) Improve Hive schema show tables performance

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7115:

Fix Version/s: 1.16.0

> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7115) Improve Hive schema show tables performance

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7115:

Priority: Major  (was: Minor)

> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7115) Improve Hive schema show tables performance

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7115:

Affects Version/s: 1.15.0

> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7115) Improve Hive schema show tables performance

2019-03-19 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7115:
---

 Summary: Improve Hive schema show tables performance
 Key: DRILL-7115
 URL: https://issues.apache.org/jira/browse/DRILL-7115
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Hive, Storage - Information Schema
Reporter: Igor Guzenko
Assignee: Igor Guzenko


In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
20mins. The schema has nearly ~8000 tables.
Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
0.2 secs).

I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7095) Expose Tuple Metadata to the physical operator

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7095:

Labels: ready-to-commit  (was: )

> Expose Tuple Metadata to the physical operator
> --
>
> Key: DRILL-7095
> URL: https://issues.apache.org/jira/browse/DRILL-7095
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Provide mechanism to expose Tuple Metadata to the physical operator (sub 
> scan).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6970) Issue with LogRegex format plugin where drillbuf was overflowing

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6970:

Labels: ready-to-commit  (was: )

> Issue with LogRegex format plugin where drillbuf was overflowing 
> -
>
> Key: DRILL-6970
> URL: https://issues.apache.org/jira/browse/DRILL-6970
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> The log format plugin does re-allocate the drillbuf when it fills up. You can 
> query small log files but larger ones will fail with this error:
> 0: jdbc:drill:zk=local> select * from dfs.root.`/prog/test.log`;
> Error: INTERNAL_ERROR ERROR: index: 32724, length: 108 (expected: range(0, 
> 32768))
> Fragment 0:0
> Please, refer to logs for more information.
>  
> I'm running drill-embeded. The log storage plugin is configured like so
> {code:java}
> "log": {
> "type": "logRegex",
> "regex": "(.+)",
> "extension": "log",
> "maxErrors": 10,
> "schema": [
> {
> "fieldName": "line"
> }
> ]
> },
> {code}
> The log files is very simple
> {code:java}
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk
> ...{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7086) Enhance row-set scan framework to use external schema

2019-03-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7086:

Labels: ready-to-commit  (was: )

> Enhance row-set scan framework to use external schema
> -
>
> Key: DRILL-7086
> URL: https://issues.apache.org/jira/browse/DRILL-7086
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Modify the row-set scan framework to work with an external (partial) schema; 
> inserting "type conversion shims" to convert as needed. The reader provides 
> an "input schema" the data types the reader is prepared to handle. An 
> optional "output schema" describes the types of the value vectors to create. 
> The type conversion "shims" give the reader the "setFoo" method it wants to 
> use, while converting the data to the type needed for the vector. For 
> example, the CSV reader might read only text fields, while the shim converts 
> a column to an INT.
> This is just the framework layer, DRILL-7011 will combine this mechanism with 
> the plan-side features to enable use of the feature in the new row-set based 
> CSV reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6430) Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or Locally

2019-03-19 Thread Bohdan Kazydub (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795938#comment-16795938
 ] 

Bohdan Kazydub commented on DRILL-6430:
---

The functionality seems to be implemented in DRILL-2304.

> Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or 
> Locally
> --
>
> Key: DRILL-6430
> URL: https://issues.apache.org/jira/browse/DRILL-6430
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> This is required for resource management since we will likely remove many 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6430) Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or Locally

2019-03-19 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub resolved DRILL-6430.
---
   Resolution: Done
Fix Version/s: (was: 1.17.0)
   1.16.0

> Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or 
> Locally
> --
>
> Key: DRILL-6430
> URL: https://issues.apache.org/jira/browse/DRILL-6430
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.16.0
>
>
> This is required for resource management since we will likely remove many 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7114) ANALYZE command generates warnings for stats file and materialization

2019-03-19 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795881#comment-16795881
 ] 

Vitalii Diravka commented on DRILL-7114:


It can be reproduced not only for ANALYZE TABLE command, but for SELECT queries 
too (drill-embedded mode, drill master version):
{code}
0: jdbc:drill:zk=local> use dfs.tmp;
+---+--+
|  ok   |   summary|
+---+--+
| true  | Default schema changed to [dfs.tmp]  |
+---+--+
1 row selected (0.135 seconds)
0: jdbc:drill:zk=local> create table temp_t as select * from (VALUES(1));
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
1 row selected (0.65 seconds)
0: jdbc:drill:zk=local> select * from temp_t;
+-+
| EXPR$0  |
+-+
| 1   |
+-+
1 row selected (0.198 seconds)
{code}

> ANALYZE command generates warnings for stats file and materialization
> -
>
> Key: DRILL-7114
> URL: https://issues.apache.org/jira/browse/DRILL-7114
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Gautam Parai
>Priority: Minor
> Fix For: 1.16.0
>
>
> When I run ANALYZE, I see warnings in the log file as shown below. The 
> ANALYZE command should not try to read the stats file or materialize the 
> stats.  
> {noformat}
> 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN  
> o.a.d.e.p.common.DrillStatsTable - Failed to read the stats file.
> java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does 
> not exist
> 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN  
> o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. 
> Continuing without stats.
> java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does 
> not exist
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)