[jira] [Created] (DRILL-7193) Integration changes of the Distributed RM queue configuration with Simple Parallelizer.

2019-04-22 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7193:
--

 Summary: Integration changes of the Distributed RM queue 
configuration with Simple Parallelizer.
 Key: DRILL-7193
 URL: https://issues.apache.org/jira/browse/DRILL-7193
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning  Optimization
Affects Versions: 1.17.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.17.0


Refactoring fragment generation code for the RM to accommodate non RM, ZK based 
queue RM and Distributed RM.
Calling the Distributed RM for queue selection based on memory requirements.
Adjustment of the operator memory based on the memory limits of the selected 
queue.
Setting of the optimal memory allocation per operator in each minor fragment. 
This shows up in the query profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7191) Distributed state persistence and Integration of Distributed queue configuration with Planner

2019-04-21 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7191:
--

 Summary: Distributed state persistence and Integration of 
Distributed queue configuration with Planner
 Key: DRILL-7191
 URL: https://issues.apache.org/jira/browse/DRILL-7191
 Project: Apache Drill
  Issue Type: Sub-task
  Components:  Server, Query Planning  Optimization
Affects Versions: 1.17.0
Reporter: Hanumath Rao Maduri
 Fix For: 1.17.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7164:
--

 Summary: KafkaFilterPushdownTest is sometimes failing to pattern 
match correctly.
 Key: DRILL-7164
 URL: https://issues.apache.org/jira/browse/DRILL-7164
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Kafka
Affects Versions: 1.16.0
Reporter: Hanumath Rao Maduri
Assignee: Abhishek Ravi
 Fix For: 1.17.0


On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin there is a change in the way cost is being represented. 
This has the changed the test which I think is not right. The pattern to 
compare in the plan should be made smart to fix this issue generically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7118) Filter not getting pushed down on MapR-DB tables.

2019-03-19 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7118:
--

 Summary: Filter not getting pushed down on MapR-DB tables.
 Key: DRILL-7118
 URL: https://issues.apache.org/jira/browse/DRILL-7118
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


A simple is null filter is not being pushed down for the mapr-db tables. Here 
is the repro for the same.
{code:java}
0: jdbc:drill:zk=local> explain plan for select * from dfs.`/tmp/js` where b is 
null;
ANTLR Tool version 4.5 used for code generation does not match the current 
runtime version 4.7.1ANTLR Runtime version 4.5 used for parser compilation does 
not match the current runtime version 4.7.1ANTLR Tool version 4.5 used for code 
generation does not match the current runtime version 4.7.1ANTLR Runtime 
version 4.5 used for parser compilation does not match the current runtime 
version 
4.7.1+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(**=[$0])
00-02 Project(T0¦¦**=[$0])
00-03 SelectionVectorRemover
00-04 Filter(condition=[IS NULL($1)])
00-05 Project(T0¦¦**=[$0], b=[$1])
00-06 Scan(table=[[dfs, /tmp/js]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=/tmp/js, condition=null], columns=[`**`, 
`b`], maxwidth=1]])
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7113) Issue with filtering null values from MapRDB-JSON

2019-03-18 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7113:
--

 Summary: Issue with filtering null values from MapRDB-JSON
 Key: DRILL-7113
 URL: https://issues.apache.org/jira/browse/DRILL-7113
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Aman Sinha
 Fix For: 1.16.0, 1.17.0


When the Drill is querying documents from MapRDBJSON that contain fields with 
null value, it returns the wrong result.
 The issue is locally reproduced.

Please find the repro steps:
 [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.

[2] Insert the following sample records to table:
{code:java}
insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", 
"confidence": 0.24}'
insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", 
"confidence": 0.54}'
insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", 
"confidence": null}'
{code}
We can see that for field 'confidence' document 1 has value 0.24, document 3 
has value 0.54, document 2 does not have the field and document 4 has the field 
with value null.

[3] Query the table from DRILL.
 *Query 1:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
+--+-+
|  label   | confidence  |
+--+-+
| person   | 0.24|
| person2  | null|
| person3  | 0.54|
| person4  | null|
+--+-+
4 rows selected (0.2 seconds)

{code}
*Query 2:*
{code:java}
0: jdbc:drill:> select * from dfs.tmp.dmdb2;
+--+-+--+
| _id  | confidence  |  label   |
+--+-+--+
| 1| 0.24| person   |
| 2| null| person2  |
| 3| 0.54| person3  |
| 4| null| person4  |
+--+-+--+
4 rows selected (0.174 seconds)

{code}
*Query 3:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is 
not null;
+--+-+
|  label   | confidence  |
+--+-+
| person   | 0.24|
| person3  | 0.54|
| person4  | null|
+--+-+
3 rows selected (0.192 seconds)

{code}
*Query 4:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is  
null;
+--+-+
|  label   | confidence  |
+--+-+
| person2  | null|
+--+-+
1 row selected (0.262 seconds)

{code}
As you can see, Query 3 which queries for all documents with confidence value 
'is not null', returns a document with null value.

*Other observation:*
 Querying the same data using DRILL without MapRDB provides the correct result.
 For example, create 4 different JSON files with following data:

{"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": 
"person3", "confidence": 0.54} \{"label": "person4", "confidence": null}

Query it directly using DRILL:

*Query 5:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
+--+-+
|  label   | confidence  |
+--+-+
| person4  | null|
| person3  | 0.54|
| person2  | null|
| person   | 0.24|
+--+-+
4 rows selected (0.203 seconds)

{code}
*Query 6:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
null;
+--+-+
|  label   | confidence  |
+--+-+
| person4  | null|
| person2  | null|
+--+-+
2 rows selected (0.352 seconds)

{code}
*Query 7:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is not 
null;
+--+-+
|  label   | confidence  |
+--+-+
| person3  | 0.54|
| person   | 0.24|
+--+-+
2 rows selected (0.265 seconds)

{code}
As seen in query 6 & 7, it returns the correct result.

I believe the issue is at the MapRDB layer where it is fetching the results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7068) Support of memory adjustment framework for resource management with Queues

2019-02-28 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7068:
--

 Summary: Support of memory adjustment framework for resource 
management with Queues
 Key: DRILL-7068
 URL: https://issues.apache.org/jira/browse/DRILL-7068
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Add support for memory adjustment framework based on queue configuration for a 
query. 
It also addresses the re-factoring the existing queue based resource management 
in Drill.
For more details on the design please refer to the parent JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6997:
--

 Summary: Semijoin is changing the join ordering for some tpcds 
queries.
 Key: DRILL-6997
 URL: https://issues.apache.org/jira/browse/DRILL-6997
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
DRILL-6798: Planner changes to support semi-join.
{code:java}
with ws_wh as
 (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
 from web_sales ws1,web_sales ws2
 where ws1.ws_order_number = ws2.ws_order_number
 and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
 [_LIMITA] select [_LIMITB]
 count(distinct ws_order_number) as "order count"
 ,sum(ws_ext_ship_cost) as "total shipping cost"
 ,sum(ws_net_profit) as "total net profit"
 from
 web_sales ws1
 ,date_dim
 ,customer_address
 ,web_site
 where
 d_date between '[YEAR]-[MONTH]-01' and
 (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
 and ws1.ws_ship_date_sk = d_date_sk
 and ws1.ws_ship_addr_sk = ca_address_sk
 and ca_state = '[STATE]'
 and ws1.ws_web_site_sk = web_site_sk
 and web_company_name = 'pri'
 and ws1.ws_order_number in (select ws_order_number
 from ws_wh)
 and ws1.ws_order_number in (select wr_order_number
 from web_returns,ws_wh
 where wr_order_number = ws_wh.ws_order_number)
 order by count(distinct ws_order_number)
 [_LIMITC];
{code}
 I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join disabled. 
Both are executed with commit id 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and 
scale factor 100.

The plan with semi-join enabled has moved the first hash join:

and ws1.ws_order_number in (select ws_order_number
 from ws_wh)
 It used to be on the build side of the first HJ on the left hand side (04-05). 
It is now on the build side of the fourth HJ on the left hand side (01-13).

The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
that takes 10 seconds to execute. But all the fragments take about the same 
amount of time.

The plan with semi-join enabled has two HJ that process 1B rows while the plan 
with semi-joins disabled has one HJ that processes 1B rows.

The plan with semi-join enabled has several senders and receivers that wait 
more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 17-00). 
When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Apache Drill release 1.15.0 - RC2

2018-12-27 Thread Hanumath Rao Maduri
- Downloaded tarball and also built from source from [3]
- Tried on my Mac
- Ran unit tests.

LGTM (+1)


On Thu, Dec 27, 2018 at 4:45 PM Khurram Faraaz  wrote:

> Downloaded binaries and deployed on a 4 node CentOS 7.5 cluster.
> Executed basic SQL queries
> - from sqlline
> - from web UI
> - and from POSTMAN
>
> Verified Web UI, performed sanity tests.
>
> Looks good.
> Here is one question related to querying the new sys.functions system
> table.
> The function names in the name column of sys.functions table in some cases,
> are the operators, is this expected behavior, or should that column have
> actual names and not the operators.
>
> 0: jdbc:drill:schema=dfs.tmp> select distinct name from sys.functions limit
> 12;
> ++
> |  name  |
> ++
> | != |
> | $sum0  |
> | && |
> | -  |
> | /int   |
> | <  |
> | <= |
> | <> |
> | =  |
> | == |
> | >  |
> | >= |
> ++
> 12 rows selected (0.175 seconds)
>
> On Thu, Dec 27, 2018 at 3:02 PM Kunal Khatua  wrote:
>
> > - Downloaded tarball and also built from source
> > - Tried on CentOS 7.5 against MapR profile
> > - Ran a couple of queries consisting of TPCH dataset in Parquet format
> > - WebUX interactions seem clean and without any apparent issue.
> >
> > +1 (binding)
> >
> > Thanks
> > Kunal
> > On 12/27/2018 2:37:05 PM, Boaz Ben-Zvi  wrote:
> > -- Verified gpg signature on source and binaries.
> >
> > -- Checked the checksum sha512 - matched.
> >
> > -- Downloaded source to Linux VM - full build and unit tests passed.
> >
> > -- On the Mac - Build and unit tests passed, except the
> > `drill_derby_test` in the `contrib/storage-jdbc` which also fails for
> > 1.14.0 on my Mac (so it is a local environment issue).
> >
> > -- Manually ran on both Mac and Linux, and checked the Web-UI: All my
> > `semijoin` tests, and memory spilling tests for hash-join and hash-aggr.
> > And a select number of large queries. All passed OK.
> >
> > ==> +1 (binding)
> >
> > Thanks,
> >
> > Boaz
> >
> > On 12/27/18 12:54 PM, Abhishek Girish wrote:
> > > +1
> > >
> > > - Brought up Drill in distributed mode on a 4 node cluster with MapR
> > > platform - looks good!
> > > - Ran regression tests from [6] - looks good!
> > > - Ran unit tests with default & mapr profile - looks good!
> > > - Basic sanity tests on Sqlline, Web UI - looks good!
> > >
> > > [6]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mapr_drill-2Dtest-2Dframework=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7tE7GD3UydzyDZaH_H0xw7V_m-XWe0tj8frqvjH2h7w=Q8PqbATc4VPUWvGcy_V_7iSQu9uyi1iCqLV5v1Mg31k=
> > >
> > > On Thu, Dec 27, 2018 at 11:12 AM Aman Sinha wrote:
> > >
> > >> - Downloaded source from [3] onto my Linux VM, built and ran unit
> > tests. I
> > >> had to run some test suites individually but got a clean run.
> > >> - Verified extraneous directory issue (DRILL-6916) is resolved
> > >> - Built the source using MapR profile and ran the secondary indexing
> > tests
> > >> within mapr format plugin
> > >> - Downloaded binary tar ball from [3] on my Mac. Verified checksum of
> > the
> > >> file using shasum -a 512 *file *and comparing with the one on [3]
> > >> - Verified Vitalii's signature through the following command: gpg
> > --verify
> > >> Downloads/apache-drill-1.15.0.tar.gz.asc apache-drill-1.15.0.tar.gz
> > >> - Ran Drill in embedded mode and ran a few TPC-H queries. Checked
> query
> > >> profiles through Web UI
> > >>
> > >> LGTM. +1
> > >>
> > >> Aman
> > >>
> > >> On Thu, Dec 27, 2018 at 6:17 AM Denys Ordynskiy
> > >> wrote:
> > >>
> > >>> - downloaded source code, successfully built Drill with mapr profile;
> > >>> - run Drill in distributed mode on Ubuntu on JDK8;
> > >>> - connected from Drill Explorer, explored data on S3 and MapRFS
> > storage;
> > >>> - submitted some tests for Drill Web UI and Drill Rest API.
> > >>>
> > >>> +1
> > >>>
> > >>> On Wed, Dec 26, 2018 at 8:40 PM Arina Ielchiieva
> > >> wrote:
> >  Build from source on Linux, started in embedded mode, ran random
> > >>> queries.
> >  Downloaded tarball on Windows, started Drill in embedded mode, run
> > >> random
> >  queries.
> >  Check Web UI: Profiles, Options, Plugins sections.
> > 
> >  Additionally checked:
> >  - information_schema files table;
> >  - new SqlLine version;
> >  - JDBC using Squirrel;
> >  - ODBC using Drill Explorer;
> >  - return result set option.
> > 
> >  +1 (binding)
> > 
> >  Kind regards,
> >  Arina
> > 
> >  On Wed, Dec 26, 2018 at 8:32 PM Volodymyr Vysotskyi
> > >>> volody...@apache.org>
> >  wrote:
> > 
> > > - Downloaded built tar, checked signatures and hashes for built and
> >  source
> > > tars
> > > and for jars;
> > > - run Drill in embedded mode on both Ubuntu and Windows on JDK8 and
> >  JDK11;
> > > - created views, submitted random TPCH queries from UI and 

[jira] [Created] (DRILL-6844) Query with ORDER BY DESC on indexed column does not pick secondary index

2018-11-10 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6844:
--

 Summary: Query with ORDER BY DESC on indexed column does not pick 
secondary index
 Key: DRILL-6844
 URL: https://issues.apache.org/jira/browse/DRILL-6844
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Query with ORDER BY DESC on indexed column does not pick secondary index

{noformat}

// Query that uses the secondary index defined on ts.

0: jdbc:drill:schema=dfs.tmp> explain plan for 
. . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts limit 
1;
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(ts=[$0])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[1])
00-04 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null, 
indexName=ts], columns=[`ts`], limit=1, maxwidth=125]])
{noformat}

// Same query with ORDER BY ts DESC does not use the secondary index defined on 
ts.

0: jdbc:drill:schema=dfs.tmp> explain plan for 
. . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts desc 
limit 1;
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(ts=[$0])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[1])
00-04 SingleMergeExchange(sort0=[0 DESC])
01-01 OrderedMuxExchange(sort0=[0 DESC])
02-01 SelectionVectorRemover
02-02 Limit(fetch=[1])
02-03 SelectionVectorRemover
02-04 TopN(limit=[1])
02-05 HashToRandomExchange(dist0=[[$0]])
03-01 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null], 
columns=[`ts`], maxwidth=8554]])
{noformat}

{ noformat}

Index definition is,
maprcli table index list -path /c8/test3 -json

{
 "timestamp":1538066303932,
 "timeofday":"2018-09-27 04:38:23.932 GMT+ PM",
 "status":"OK",
 "total":2,
 "data":[
 {
 "cluster":"c8",
 "type":"maprdb.si",
 "indexFid":"2176.68.131294",
 "indexName":"ts",
 "hashed":false,
 "indexState":"REPLICA_STATE_REPLICATING",
 "idx":1,
 "indexedFields":"ts:ASC",
 "isUptodate":false,
 "minPendingTS":1538066077,
 "maxPendingTS":1538066077,
 "bytesPending":0,
 "putsPending":0,
 "bucketsPending":1,
 "copyTableCompletionPercentage":100,
 "numTablets":32,
 "numRows":80574368,
 "totalSize":4854052160
 },
 {
 "cluster":"c8",
 "type":"maprdb.si",
 "indexFid":"2176.72.131302",
 "indexName":"ts_desc",
 "hashed":false,
 "indexState":"REPLICA_STATE_REPLICATING",
 "idx":2,
 "indexedFields":"ts:DESC",
 "isUptodate":false,
 "minPendingTS":1538066077,
 "maxPendingTS":1538066077,
 "bytesPending":0,
 "putsPending":0,
 "bucketsPending":1,
 "copyTableCompletionPercentage":100,
 "numTablets":32,
 "numRows":80081344,
 "totalSize":4937154560
 }
 ]
}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Hanumath Rao Maduri
Thank you all for the wishes!

Thanks,
-Hanu

On Thu, Nov 1, 2018 at 1:28 PM Chunhui Shi 
wrote:

> Congratulations Hanu!
> --
> From:Arina Ielchiieva 
> Send Time:2018 Nov 1 (Thu) 06:05
> To:dev ; user 
> Subject:[ANNOUNCE] New Committer: Hanumath Rao Maduri
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Hanumath
> Rao Maduri to become a committer, and we are pleased to announce that he
> has accepted.
>
> Hanumath became a contributor in 2017, making changes mostly in the Drill
> planning side, including lateral / unnest support. He is also one of the
> contributors of index based planning and execution support.
>
> Welcome Hanumath, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>


[Agenda] Drill developer meetup 2019

2018-10-31 Thread Hanumath Rao Maduri
Drill Developers,


I am quite excited to announce the details of the Drill developers day
2019. I have consolidated the topics from our earlier discussions and
prioritized them according to the votes.


MapR has offered to host it on Nov 14th in Training room downstairs.


Here is the exact location


Training Room at

4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.


Please find the agenda for the meetup.



*Lunch starts at 12:00PM.*


*[12:25 - 12:40] Welcome *

   - Recap on last year's activities
   - Preview of this year's focus

*[12:40 - 1:00] Storage plugins*



   - Adding new storage plugins for the following:
  - Netflix Iceberg, Kudu(some code already exists), Cassandra,
  Elasticsearch, Carbondata, ORC/XML file formats, Spark
  RDD/DataFrames/Datasets, Graph databases & more
   - Improving documentation related to Storage plugins


*[1:00 - 1:45] Schema discovery & Evolution*



   - Creation, management of schema
   - Handling schema changes in certain common cases
   - Handling NULL values elegantly
   - Schema learning (similar to MSGpack plugin)
   - Query hints

*[1:45 - 2:30] Metadata Management*



   - Defining an abstraction layer for various types of metadata: views,
   schema, statistics, security
   - Underlying storage for metadata: what are the options and their
   trade-offs?
   - Hive metastore
   - Parquet metadata cache (parquet specific for row group metadata)
   - Ease of using the parquet files generated by other engines (like spark)


*[2:30 - 2:45] Break*


*[2:45 - 4:00] Resource management*



   - Resource limits per query
   - Optimal memory assignment for blocking operators based on stats
   - Enhancing the blocking and exchange operators to live within memory
   limits
   - Aligning with admission control/queueing (YARN concepts)
   - Query scheduling based on queues using tagging and costing
   - Drill on kubernetes


*[4:00 - 4:20] Apache Arrow*

   - Benefits of integrating Apache Drill with Apache Arrow
   - Possible trade-offs & implementation hurdles

*[4:20 - 4:40] **Performance Improvements*

   - Efficient handling of Broadcast/Semi/Anti Semi join
   - Drill Statistics handling
   - Optimizing complex Parquet reader

Thanks,
-Hanu


Re: [ANNOUNCE] New Committer: Gautam Parai

2018-10-22 Thread Hanumath Rao Maduri
Congratulations Gautam!

On Mon, Oct 22, 2018 at 8:46 AM salim achouche  wrote:

> Congrats Gautam!
>
> On Mon, Oct 22, 2018 at 7:25 AM Arina Ielchiieva  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> Gautam
> > Parai to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Gautam has become a contributor since 2016, making changes in various
> Drill
> > areas including planning side. He is also one of the contributors of the
> > upcoming feature to support index based planning and execution.
> >
> > Welcome Gautam, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>
>
> --
> Regards,
> Salim
>


Re: Topics for Drill Hackathon/Drill Developers Day - 2018!

2018-10-17 Thread Hanumath Rao Maduri
Hello All,

Please vote for the list of the topics which you would be interested in.
This will be very helpful to prioritize the topics on Developer Day.
https://docs.google.com/forms/d/1C8nNIznllct_zY68R-XZkWtb3VHWBMFSkMnDs42isLs/edit



On Wed, Oct 17, 2018 at 9:45 PM Hanumath Rao Maduri 
wrote:

> Hello Charles,
>
> Thank you for your interest to volunteer. We are planning to host a remote
> session as well.
> I have added your name as a volunteer to Storage plugins, REST APIs
> related enhancements discussion.
>
>
> On Tue, Oct 16, 2018 at 3:53 PM Charles Givre  wrote:
>
>> @All,
>> I don’t know if remote folks can host a session, but if so, I’d volunteer.
>> — C
>>
>> > On Oct 16, 2018, at 17:13, Vitalii Diravka  wrote:
>> >
>> > Yes, I can edit and post suggestions in the document.
>> > Thank you!
>> >
>> > On Tue, Oct 16, 2018 at 11:50 PM Hanumath Rao Maduri <
>> hanu@gmail.com>
>> > wrote:
>> >
>> >> Hello Vitalli,
>> >>
>> >> I have given permissions to edit the document. Please let me know if
>> it is
>> >> fine.
>> >>
>> >> Regards,
>> >> -Hanu
>> >>
>> >> On Tue, Oct 16, 2018 at 11:10 AM Vitalii Diravka 
>> >> wrote:
>> >>
>> >>> Could you provide the possibility of commenting for the document?
>> >>> It will allow to make suggestions for the topics.
>> >>>
>> >>> On Tue, Oct 16, 2018 at 6:22 AM Hanumath Rao Maduri <
>> hanu@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hello Drill Development Team,
>> >>>>
>> >>>> Thank you all for the interest in attending the Drill Developers Day.
>> >>>> I have curated a list of topics that can be discussed at the
>> up-coming
>> >>>> Drill Developers Day. Please feel free to suggest any other topics
>> >> which
>> >>>> you are interested in. Here is the link for the topics.
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing
>> >>>>
>> >>>> Volunteers to lead the discussions are welcome. Please pick any topic
>> >> of
>> >>>> your interest to volunteer the discussion.
>> >>>>
>> >>>> Agenda and format for the discussions will be shared as we get closer
>> >> to
>> >>>> the event.
>> >>>>
>> >>>> We all are quite excited to meet you at the event.
>> >>>>
>> >>>> Thanks,
>> >>>> -Hanu
>> >>>>
>> >>>
>> >>
>>
>>


Re: Topics for Drill Hackathon/Drill Developers Day - 2018!

2018-10-17 Thread Hanumath Rao Maduri
Hello Charles,

Thank you for your interest to volunteer. We are planning to host a remote
session as well.
I have added your name as a volunteer to Storage plugins, REST APIs related
enhancements discussion.


On Tue, Oct 16, 2018 at 3:53 PM Charles Givre  wrote:

> @All,
> I don’t know if remote folks can host a session, but if so, I’d volunteer.
> — C
>
> > On Oct 16, 2018, at 17:13, Vitalii Diravka  wrote:
> >
> > Yes, I can edit and post suggestions in the document.
> > Thank you!
> >
> > On Tue, Oct 16, 2018 at 11:50 PM Hanumath Rao Maduri  >
> > wrote:
> >
> >> Hello Vitalli,
> >>
> >> I have given permissions to edit the document. Please let me know if it
> is
> >> fine.
> >>
> >> Regards,
> >> -Hanu
> >>
> >> On Tue, Oct 16, 2018 at 11:10 AM Vitalii Diravka 
> >> wrote:
> >>
> >>> Could you provide the possibility of commenting for the document?
> >>> It will allow to make suggestions for the topics.
> >>>
> >>> On Tue, Oct 16, 2018 at 6:22 AM Hanumath Rao Maduri <
> hanu@gmail.com>
> >>> wrote:
> >>>
> >>>> Hello Drill Development Team,
> >>>>
> >>>> Thank you all for the interest in attending the Drill Developers Day.
> >>>> I have curated a list of topics that can be discussed at the up-coming
> >>>> Drill Developers Day. Please feel free to suggest any other topics
> >> which
> >>>> you are interested in. Here is the link for the topics.
> >>>>
> >>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing
> >>>>
> >>>> Volunteers to lead the discussions are welcome. Please pick any topic
> >> of
> >>>> your interest to volunteer the discussion.
> >>>>
> >>>> Agenda and format for the discussions will be shared as we get closer
> >> to
> >>>> the event.
> >>>>
> >>>> We all are quite excited to meet you at the event.
> >>>>
> >>>> Thanks,
> >>>> -Hanu
> >>>>
> >>>
> >>
>
>


Re: Topics for Drill Hackathon/Drill Developers Day - 2018!

2018-10-16 Thread Hanumath Rao Maduri
Hello Vitalli,

I have given permissions to edit the document. Please let me know if it is
fine.

Regards,
-Hanu

On Tue, Oct 16, 2018 at 11:10 AM Vitalii Diravka  wrote:

> Could you provide the possibility of commenting for the document?
> It will allow to make suggestions for the topics.
>
> On Tue, Oct 16, 2018 at 6:22 AM Hanumath Rao Maduri 
> wrote:
>
> > Hello Drill Development Team,
> >
> > Thank you all for the interest in attending the Drill Developers Day.
> > I have curated a list of topics that can be discussed at the up-coming
> > Drill Developers Day. Please feel free to suggest any other topics which
> > you are interested in. Here is the link for the topics.
> >
> >
> >
> https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing
> >
> > Volunteers to lead the discussions are welcome. Please pick any topic of
> > your interest to volunteer the discussion.
> >
> > Agenda and format for the discussions will be shared as we get closer to
> > the event.
> >
> > We all are quite excited to meet you at the event.
> >
> > Thanks,
> > -Hanu
> >
>


Topics for Drill Hackathon/Drill Developers Day - 2018!

2018-10-15 Thread Hanumath Rao Maduri
Hello Drill Development Team,

Thank you all for the interest in attending the Drill Developers Day.
I have curated a list of topics that can be discussed at the up-coming
Drill Developers Day. Please feel free to suggest any other topics which
you are interested in. Here is the link for the topics.

https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing

Volunteers to lead the discussions are welcome. Please pick any topic of
your interest to volunteer the discussion.

Agenda and format for the discussions will be shared as we get closer to
the event.

We all are quite excited to meet you at the event.

Thanks,
-Hanu


Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-28 Thread Hanumath Rao Maduri
Congratulations Chunhui.

On Fri, Sep 28, 2018 at 9:26 AM Padma Penumarthy 
wrote:

> Congratulations Chunhui.
>
> Thanks
> Padma
>
>
> On Fri, Sep 28, 2018 at 2:17 AM Arina Ielchiieva  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> Chunhui
> > Shi to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Chunhui Shi has become a contributor since 2016, making changes in
> various
> > Drill areas. He has shown profound knowledge in Drill planning side
> during
> > his work to support lateral join. He is also one of the contributors of
> the
> > upcoming feature to support index based planning and execution.
> >
> > Welcome Chunhui, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>


Re: Problem of adding support for CROSS JOIN syntax

2018-09-27 Thread Hanumath Rao Maduri
Hello Ihor,

I am not clear on the mentioned goal of this JIRA. Can you please clarify
on this using some examples.
"But main goal of this task is to allow explicit cross joins in queries
when option is enabled and at the same time disallow other ways to execute
cross joins (for example, list tables via comma in FROM section of query
without condition) while option is enabled.  "
At-least from what I understand regarding the support of cross joins, is to
enable the cross join whenever a comma separated syntax without conditions
are also enabled. Currently cross join is not supported at all.

It's support should be similar to that of comma separated query syntax
without conditions.

Thanks,
-Hanu



On Thu, Sep 27, 2018 at 10:14 AM Ihor Huzenko 
wrote:

> Dear Drillers,
>
> I'm trying to implement support for CROSS JOIN syntax in Apache Drill.
> But after long investigation I finally run out of ideas and don't see
> proper way
> how this could be implemented without changes to Calcite. I'm new to Drill
> and
> Calcite and I would appreciate any help. Please, take a look at my comment
> under the issue https://issues.apache.org/jira/browse/DRILL-786.
>
> Thank you in advance, Igor Guzenko
>


Re: Some questions about sorting pushdown to custom plugins

2018-08-28 Thread Hanumath Rao Maduri
If I understand it correctly, you may need to write new storage plugin
rules (as similar to that of other storage plugin rules) to support
projection pushdown and limit pushdown for your custom storage plugin. This
might help in reading only the required fields from the storage.

Thanks,


On Tue, Aug 28, 2018 at 10:19 AM yang zhang 
wrote:

> Hi:
>   I have two questions to ask.
>
>   There is already a custom plugin. Query sql:
>   Select * from indexr.face_image_mess where device_id =
> '3E04846B-6B69-1A4D-0569-ED0813853348' and org_id in (11,12) order by
> short_time desc limit 4 offset 9;
> plan:
> {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "DefaultSqlHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.width.max_per_node",
>   "num_val" : 1,
>   "scope" : "SESSION"
> } ],
> "queue" : 0,
> "hasResourcePlan" : false,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "indexr-scan",
> "@id" : 7,
> "userName" : "conn",
> "indexrScanSpec" : {
>   "tableName" : "face_image_mess",
>   "rsFilter" : {
> "type" : "and",
> "children" : [ {
>   "type" : "equal",
>   "attr" : {
> "name" : "device_id",
> "type" : "VARCHAR"
>   },
>   "numValue" : 0,
>   "strValue" : "3E04846B-6B69-1A4D-0569-ED0813853348",
>   "type" : "equal"
> }, {
>   "type" : "or",
>   "children" : [ {
> "type" : "equal",
> "attr" : {
>   "name" : "org_id",
>   "type" : "VARCHAR"
> },
> "numValue" : 0,
> "strValue" : "11",
> "type" : "equal"
>   }, {
> "type" : "equal",
> "attr" : {
>   "name" : "org_id",
>   "type" : "VARCHAR"
> },
> "numValue" : 0,
> "strValue" : "12",
> "type" : "equal"
>   } ],
>   "type" : "or"
> } ],
> "type" : "and"
>   }
> },
> "storage" : {
>   "type" : "indexr",
>   "enabled" : true
> },
> "columns" : [ "`**`" ],
> "limitScanRows" : 9223372036854775807,
> "scanId" : "dc889ee0-6675-42bb-b7a5-2bc413d6372d",
> "cost" : 0.0
>   }, {
> "pop" : "top-n",
> "@id" : 6,
> "child" : 7,
> "orderings" : [ {
>   "order" : "DESC",
>   "expr" : "`short_time`",
>   "nullDirection" : "FIRST"
> } ],
> "reverse" : false,
> "limit" : 13,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 1.0
>   }, {
> "pop" : "selection-vector-remover",
> "@id" : 5,
> "child" : 6,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 1.0
>   }, {
> "pop" : "limit",
> "@id" : 4,
> "child" : 5,
> "first" : 9,
> "last" : 13,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 13.0
>   }, {
> "pop" : "limit",
> "@id" : 3,
> "child" : 4,
> "first" : 9,
> "last" : 13,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 13.0
>   }, {
> "pop" : "selection-vector-remover",
> "@id" : 2,
> "child" : 3,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 13.0
>   }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
>   "ref" : "`id`",
>   "expr" : "`id`"
> }
> .
> .
> .
>   ],
> "child" : 2,
> "outputProj" : true,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 13.0
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 13.0
>   } ]
> }
>
>
> The version of drill-1.13.0 is used.
> 1. Is order by pushdown supported?
>   Does the drill execution plan push down to the plugin when it
> supports getting the order by field 'short_time' and the select field (all
> fields)?
>  This table (face_image_mess) has 70 fields, one data has 1KB; this
> query hits 2; currently can't get the order by field and select
> field pushed down to the plugin, only scan all the field values of the hit;
> the calculation of the drill engine is 2 * 1KB = 190.7GB.
>Custom plugin I designed a Long type rowid, which points to the physical
> address.
>If the plugin can get the pushed by order field and the select field to
> filter out the fields that do not need to participate in the calculation,
> only four fields that actually participate in the calculation are 'rowid',
> 'short_time', 'device_id', 'org_id', then Minimize the calculation of the
> drill engine. The actual 

Re: Apache drill High availability

2018-08-28 Thread Hanumath Rao Maduri
Unlike other databases or data engines drill doesn't store data in its own
storage engine. So high availability of the data when using Drill means the
storage engine needs to support high availability.

High availability can also mean that when running a query even though some
of the fragments/ nodes might fail, query needs to succeed. In this
scenario Drill reports an error to the user and expects the user to re-run
the query. On re-running the query again, it might be successful (of-course
depending upon load of the system).

If a node is crashed or not responding then the ZK will not consider the
node during planning stage itself. Hence, no fragments are executed on the
node (and query performance might degrade in this scenario).

Thanks,


On Tue, Aug 28, 2018 at 9:34 AM salim achouche  wrote:

> You need to clarify your definition of HA as there can be multiple faults
> at play:
> - A Drill cluster can handle nodes going down (and new ones joining the
> cluster)
> - Though, running queries (which are executed in a distributed manner)
> might fail if they had minor-fragments running on a faulty node
> - Similarly, Drill has some built-in resilience to network disconnects
> albeit it is not always transparent (I believe, queries might fail if
> network disconnect happened during a connection exchange)
>
> Regards,
>
> On Mon, Aug 27, 2018 at 10:48 PM pujari Satish 
> wrote:
>
> > Hi Team,
> >
> > Good Morning. I am trying to do drill high avilability using Haproxy load
> > balancer.
> > Is drill supports for high availability ?
> >
> >
> > please let me know in this.
> >
> >
> > -Thanks,
> > Satish
> >
>


Re: [ANNOUNCE] New PMC member: Volodymyr Vysotskyi

2018-08-24 Thread Hanumath Rao Maduri
Congratulations Volodymyr!

Thanks,
-Hanu

On Fri, Aug 24, 2018 at 10:22 AM Paul Rogers 
wrote:

> Congratulations Volodymyr!
> Thanks,
> - Paul
>
>
>
> On Friday, August 24, 2018, 5:53:25 AM PDT, Arina Ielchiieva <
> ar...@apache.org> wrote:
>
>  I am pleased to announce that Drill PMC invited Volodymyr Vysotskyi to the
> PMC and he has accepted the invitation.
>
> Congratulations Vova and thanks for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>


Drill Hangout tomorrow 08/21

2018-08-20 Thread Hanumath Rao Maduri
The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
us know should you have a topic for tomorrow's hangout. We will also ask
for topics at the beginning of the hangout.

Hangout Link -
https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Regards,
Hanu


Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi

2018-08-17 Thread Hanumath Rao Maduri
Congratulations, Boaz!

On Fri, Aug 17, 2018 at 10:22 AM Kunal Khatua  wrote:

> Congratulations, Boaz!!
> On 8/17/2018 10:11:32 AM, Paul Rogers  wrote:
> Congratulations Boaz!
> - Paul
>
>
>
> On Friday, August 17, 2018, 2:56:27 AM PDT, Vitalii Diravka wrote:
>
> Congrats Boaz!
>
> Kind regards
> Vitalii
>
>
> On Fri, Aug 17, 2018 at 12:51 PM Arina Ielchiieva wrote:
>
> > I am pleased to announce that Drill PMC invited Boaz Ben-Zvi to the PMC
> and
> > he has accepted the invitation.
> >
> > Congratulations Boaz and thanks for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>


[jira] [Created] (DRILL-6671) Multi level lateral unnest join is throwing an exception during materializing the plan.

2018-08-07 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6671:
--

 Summary: Multi level lateral unnest join is throwing an exception 
during materializing the plan.
 Key: DRILL-6671
 URL: https://issues.apache.org/jira/browse/DRILL-6671
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


testMultiUnnestAtSameLevel in TestE2EUnnestAndLateral is throwing an execution 
in Materializer.java. This is due to incorrect matching of Unnest and Lateral 
join. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-07-27 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6645:
--

 Summary: Transform TopN in Lateral Unnest pipeline to Sort and 
Limit.
 Key: DRILL-6645
 URL: https://issues.apache.org/jira/browse/DRILL-6645
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.15.0


TopN operator is not supported in Lateral Unnest pipeline. Hence transform the 
TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6545:
--

 Summary: Projection Push down into Lateral Join operator.
 Key: DRILL-6545
 URL: https://issues.apache.org/jira/browse/DRILL-6545
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.13.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


For the Lateral’s logical and physical plan node, we would need to add an 
output RowType such that a Projection can be pushed down to Lateral. Currently, 
Lateral will produce all columns from left and right and it depends on a 
subsequent Project to eliminate unneeded columns. However, this will blow up 
the memory use of Lateral since each column from the left will be replicated N 
times based on N rows coming from UNNEST. We can have a 
ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL but 
keeps the expression evalulations as part of the Project above the Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Vitalii Diravka

2018-06-26 Thread Hanumath Rao Maduri
Congratulations Vitalii!

On Tue, Jun 26, 2018 at 12:27 PM, Gautam Parai  wrote:

> Congratulations Vitalii!
>
> Gautam
>
> On Tue, Jun 26, 2018 at 11:48 AM, Volodymyr Vysotskyi <
> volody...@apache.org>
> wrote:
>
> >  Congratulations, Vitalii!
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > вт, 26 черв. 2018 о 21:38 Robert Wu  пише:
> >
> > > Congratulations, Vitalii!
> > >
> > > Best regards,
> > >
> > > Rob
> > >
> > > -Original Message-
> > > From: Sorabh Hamirwasia 
> > > Sent: Tuesday, June 26, 2018 11:30 AM
> > > To: dev@drill.apache.org
> > > Subject: Re: [ANNOUNCE] New PMC member: Vitalii Diravka
> > >
> > > Congratulations Vitalii!
> > >
> > > Thanks,
> > > Sorabh
> > >
> > > On Tue, Jun 26, 2018 at 11:18 AM, Arina Yelchiyeva <
> > > arina.yelchiy...@gmail.com> wrote:
> > >
> > > > Congratulations, Vitalii! Well deserved!
> > > >
> > > > Kind regards,
> > > > Arina
> > > >
> > > > On Tue, Jun 26, 2018 at 9:16 PM Bridget Bevens 
> > wrote:
> > > >
> > > > > Congratulations, Vitalii!
> > > > >
> > > > > On Tue, Jun 26, 2018 at 11:14 AM, Abhishek Girish
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Vitalii!
> > > > > >
> > > > > > On Tue, Jun 26, 2018 at 11:12 AM Aman Sinha <
> amansi...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > I am pleased to announce that Drill PMC invited Vitalii Diravka
> > > > > > > to
> > > > the
> > > > > > PMC
> > > > > > > and he has accepted the invitation.
> > > > > > >
> > > > > > > Congratulations Vitalii and thanks for your contributions !
> > > > > > >
> > > > > > > -Aman
> > > > > > > (on behalf of Drill PMC)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (DRILL-6502) Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel is physical relation for LateralJoin

2018-06-15 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6502:
--

 Summary: Rename CorrelatePrel to LateralJoinPrel as currently 
correlatePrel is physical relation for LateralJoin
 Key: DRILL-6502
 URL: https://issues.apache.org/jira/browse/DRILL-6502
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently in Drill correlatePrel is a physical relation operator for 
LateralJoin implementation. Explain plan shows CorrelatePrel which can be 
confusing. Hence it is good to rename this operator to LateralJoinPrel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Padma Penumarthy

2018-06-15 Thread Hanumath Rao Maduri
Congratulations Padma!

On Fri, Jun 15, 2018 at 12:04 PM, Gautam Parai  wrote:

> Congratulations Padma!!
>
>
> Gautam
>
> 
> From: Vlad Rozov 
> Sent: Friday, June 15, 2018 11:56:37 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
>
> Congrats Padma!
>
> Thank you,
>
> Vlad
>
> On 6/15/18 11:38, Charles Givre wrote:
> > Congrats Padma!!
> >
> >> On Jun 15, 2018, at 13:57, Bridget Bevens  wrote:
> >>
> >> Congratulations, Padma!!! 
> >>
> >> 
> >> From: Prasad Nagaraj Subramanya 
> >> Sent: Friday, June 15, 2018 10:32:04 AM
> >> To: dev@drill.apache.org
> >> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
> >>
> >> Congratulations Padma!
> >>
> >> Thanks,
> >> Prasad
> >>
> >> On Fri, Jun 15, 2018 at 9:59 AM Vitalii Diravka <
> vitalii.dira...@gmail.com>
> >> wrote:
> >>
> >>> Congrats Padma!
> >>>
> >>> Kind regards
> >>> Vitalii
> >>>
> >>>
> >>> On Fri, Jun 15, 2018 at 7:40 PM Arina Ielchiieva 
> wrote:
> >>>
>  Padma, congratulations and welcome!
> 
>  Kind regards,
>  Arina
> 
>  On Fri, Jun 15, 2018 at 7:36 PM Aman Sinha 
> wrote:
> 
> > The Project Management Committee (PMC) for Apache Drill has invited
> >>> Padma
> > Penumarthy to become a committer, and we are pleased to announce that
> >>> she
> > has
> > accepted.
> >
> > Padma has been contributing to Drill for about 1 1/2 years.  She has
> >>> made
> > improvements for work-unit assignment in the parallelizer,
> performance
> >>> of
> > filter operator for pattern matching and (more recently) on the batch
> > sizing for several operators: Flatten, MergeJoin, HashJoin, UnionAll.
> >
> > Welcome Padma, and thank you for your contributions.  Keep up the
> good
>  work
> > !
> >
> > -Aman
> > (on behalf of Drill PMC)
> >
>
>


[jira] [Created] (DRILL-6476) Generate explain plan which shows relation between Lateral and the corresponding Unnest.

2018-06-07 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6476:
--

 Summary: Generate explain plan which shows relation between 
Lateral and the corresponding Unnest.
 Key: DRILL-6476
 URL: https://issues.apache.org/jira/browse/DRILL-6476
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Currently, explain plan doesn't show that which lateral and  unnest node's are 
related. This information is good to have so that the visual plan can use it 
and show the relation visually.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6456) Planner shouldn't create any exchanges on the right side of Lateral Join.

2018-05-30 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6456:
--

 Summary: Planner shouldn't create any exchanges on the right side 
of Lateral Join.
 Key: DRILL-6456
 URL: https://issues.apache.org/jira/browse/DRILL-6456
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently, there is no restriction placed on right side of the LateralJoin. 
This is causing planner to generate an Exchange when there are operators like 
(Agg, Limit, Sort etc). 

Due to this unnest operator cannot retrieve the row from lateral's left side to 
process the pipeline further. Enhance the planner to not generate exchanges on 
the right side of the LateralJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Timothy Farkas

2018-05-25 Thread Hanumath Rao Maduri
Congratulations Tim!

Thanks,
-Hanu

On Fri, May 25, 2018 at 1:16 PM, Gautam Parai  wrote:

> Congratulations Tim!
>
>
> Gautam
>
> 
> From: Sorabh Hamirwasia 
> Sent: Friday, May 25, 2018 12:44:47 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Timothy Farkas
>
> Congratulations Tim!
>
>
> Thanks,
> Sorabh
>
> 
> From: Vova Vysotskyi 
> Sent: Friday, May 25, 2018 12:43:04 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Timothy Farkas
>
> Congratulations, Tim!
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> пт, 25 трав. 2018 о 22:17 Padma Penumarthy  пише:
>
> > Congrats Tim.
> >
> > Thanks
> > Padma
> >
> >
> > > On May 25, 2018, at 12:15 PM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> > wrote:
> > >
> > > Good news! Congratulations, Timothy!
> > >
> > > Kind regards
> > > Vitalii
> > >
> > >
> > > On Fri, May 25, 2018 at 10:04 PM Arina Yelchiyeva <
> > > arina.yelchiy...@gmail.com> wrote:
> > >
> > >> Congrats, Tim!
> > >>
> > >> Kind regards,
> > >> Arina
> > >>
> > >>> On May 25, 2018, at 9:59 PM, Kunal Khatua  wrote:
> > >>>
> > >>> Congratulations, Timothy !
> > >>>
> > >>> On 5/25/2018 11:58:31 AM, Aman Sinha  wrote:
> > >>> The Project Management Committee (PMC) for Apache Drill has invited
> > >> Timothy
> > >>> Farkas to become a committer, and we are pleased to announce that he
> > >>> has accepted.
> > >>>
> > >>> Tim has become an active contributor to Drill in less than a year.
> > During
> > >>> this time he has contributed to addressing flaky unit tests, fixing
> > >> memory
> > >>> leaks in certain operators, enhancing the system options framework to
> > be
> > >>> more extensible and setting up the Travis CI tests. More recently, he
> > >>> worked on the memory sizing calculations for hash join.
> > >>>
> > >>> Welcome Tim, and thank you for your contributions. Keep up the good
> > work
> > >> !
> > >>>
> > >>> -Aman
> > >>> (on behalf of Drill PMC)
> > >>
> >
> >
>


[jira] [Created] (DRILL-6431) Unnest operator requires table and a single column alias to be specified.

2018-05-19 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6431:
--

 Summary: Unnest operator requires table and a single column alias 
to be specified.
 Key: DRILL-6431
 URL: https://issues.apache.org/jira/browse/DRILL-6431
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization, SQL Parser
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently, unnest operator is not required to specify alias neither for table 
name nor column name. This has some implications on what name the unnest 
operator output column should use. One can use a common name like "unnest" as 
the output name. It means, customers need to be educated on what to expect from 
unnest operator. This might confuse some customers and also prone to introduce 
errors in the query.

The design decision for DRILL is that unnest always produces either a scalar 
column or a map (depending upon the input schema for it), but it is always a 
single column. 

Given this scenario, it is better to enforce the requirement that unnest 
operator requires a table alias and a column alias(single column). This can 
help to disambiguate the column and further can easily be referenced in the 
query.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia

2018-04-30 Thread Hanumath Rao Maduri
Congrats Sorabh!

Thanks,
-Hanu

On Mon, Apr 30, 2018 at 12:13 PM, salim achouche 
wrote:

> Congrats Sorabh! well deserved.
>
> Regards,
> Salim
>
> On Mon, Apr 30, 2018 at 3:35 PM, Aman Sinha  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> Sorabh
> > Hamirwasia  to become a committer, and we are pleased to announce that he
> > has accepted.
> >
> > Over the last 1 1/2 years Sorabh's contributions have been in a few
> > different areas. He took
> > the lead in designing and implementing network encryption support for
> > Drill. He has contributed
> > to the web server and UI side.  More recently, he is involved in design
> and
> > implementation of the lateral join operator.
> >
> > Welcome Sorabh, and thank you for your contributions.  Keep up the good
> > work !
> >
> > -Aman
> > (on behalf of Drill PMC)
> >
>


Re: Non-column filters in Drill

2018-04-07 Thread Hanumath Rao Maduri
Hello Ryan,

Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be
supplied by the underlying scan.
However, I expect that this column will be present in the scan but not past
the filter (notColumn = 'value') in the plan.
In that case you may need to pushdown the filter to the groupScan and then
remove the column projections from your custom groupscan.

It would be easy for us to guess what could be the issue, if you can post
the logical and physical query plan's for this query.

Hope this helps. Please do let us know if you have any further issues.

Thanks,


On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks 
wrote:

> Hi Drill Dev Team!
>
> I am writing a custom storage plugin and I am curious if it is possible in
> Drill to pass a filter value, in the form of a where clause, that is not
> related to a column. What I would like to accomplish is something like:
>
> select * from myTable where notColumn = 'value';
>
> In the example, notColumn is not a column in myTable, or any other table,
> it is just a specific parameter that the storage plugin will use in the
> filtering process. Additionally, notColumn would not be returned as a
> column so Drill needs to not expect it as a part of the 'select *'. I
> created a rule that will push down and remove these non-column filter
> calls, but I need to somehow tell drill/calcite that the filter name is
> valid, without actually registering it as a column. The following error
> occurs prior to submitting any rules:
>
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR:
> From line 1, column 35 to line 1, column 39: Column 'notColumn' not found
> in any table
>
>
> Alternatively, can I manipulate star queries to only return a subset of
> all the columns for a table?
>
> Any insight would be greatly appreciated!
>
> Thanks,
> Ryan
>


Re: "Death of Schema-on-Read"

2018-04-07 Thread Hanumath Rao Maduri
Hello All,

I have created a JIRA to track this approach.
https://issues.apache.org/jira/browse/DRILL-6312

Thanks,
-Hanu

On Fri, Apr 6, 2018 at 7:38 PM, Paul Rogers 
wrote:

> Hi Aman,
>
> As we get into details, I suggested to Hanu that we move the discussion
> into a JIRA ticket.
>
>  >On the subject of CAST pushdown to Scans, there are potential drawbacks
>
>  >  - In general, the planner will see a Scan-Project where the Project
> has  CAST functions.  But the Project can have arbitrary expressions,  e.g
> CAST(a as INT) * 5
>
> Suggestion: push the CAST(a AS INT) down to the scan, do the a * 5 in the
> Project operator.
>
> >  or a combination of 2 CAST functions
>
> If the user does a two-stage cast, CAST(CAST(a AS INT) AS BIGINT), then
> one simple rule is to push only the innermost cast downwards.
>
> > or non-CAST functions etc.
>
> Just keep it in Project.
>
>  >It would be quite expensive to examine each expression (there could
> be hundreds) to determine whether it is eligible to be pushed to the Scan.
>
> Just push CAST( AS ). Even that would be a huge win.
> Note, for CSV, it might have to be CAST(columns[2] AS INT), since "columns"
> is special for CSV.
>
> >   - Expressing Nullability is not possible with CAST.  If a column
> should be tagged as  (not)nullable, CAST syntax does not allow that.
>
> Can we just add keywords: CAST(a AS INT NULL), CAST(b AS VARCHAR NOT NULL)
> ?
>
>  >  - Drill currently supports CASTing to a SQL data type, but not to
> the complex types such as arrays and maps.  We would have to add support
> for that from a language perspective as well as the run-time.  This would
> be non-trivial effort.
>
> The term "complex type" is always confusing. Consider a map. The rules
> would apply recursively to the members of the map. (Problem: today, if I
> reference a map member, Drill pulls it to the top level: SELECT m.a creates
> a new top-level field, it does not select "a" within "m". We need to fix
> that anyway.  So, CAST(m.a AS INT) should imply the type of column "a"
> within map "m".
>
> For arrays, the problem is more complex. Perhaps more syntax: CAST(a[] AS
> INT) to force array elements to INT. Maybe use CAST(a[][] AS INT) for a
> repeated list (2D array).
>
> Unions don't need a solution as they are their own solution (they can hold
> multiple types.) Same for (non-repeated) lists.
>
> To resolve runs of nulls, maybe allow CAST(m AS MAP). Or we can imply that
> "m" is a Map from the expression CAST(m.a AS INT). For arrays, the
> previously suggested CAST(a[] AS INT). If columns "a" or "m" turn out to be
> a non-null scalar, then we have no good answer.
>
> CAST cannot solve the nasty cases of JSON in which some fields are
> complex, some scalar. E.g. {a: 10} {a: [20]} or {m: "foo"} {m: {value:
> "foo"}}. I suppose no solution is perfect...
>
> I'm sure that, if someone gets a chance to desig this feature, they'll
> find lots more issues. Maybe cast push-down is only a partial solution.
> But, it seems to solve so many of the JSON and CSV cases that I've seen
> that it seems too good to pass up.
>
> Thanks,
>
>
> - Paul


[jira] [Created] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.

2018-04-07 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6312:
--

 Summary: Enable pushing of cast expressions to the scanner for 
better schema discovery.
 Key: DRILL-6312
 URL: https://issues.apache.org/jira/browse/DRILL-6312
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.13.0
Reporter: Hanumath Rao Maduri


Drill is a schema less engine which tries to infer the schema from disparate 
sources at the read time. Currently the scanners infer the schema for each 
batch depending upon the data for that column in the corresponding batch. This 
solves many uses cases but can error out when the data is too different between 
batches like int and array[int] etc... (There are other cases as well but just 
to give one example).

There is also a mechanism to create a view by type casting the columns to 
appropriate type. This solves issues in some cases but fails in many other 
cases. This is due to the fact that cast expression is not being pushed down to 
the scanner but staying at the project or filter etc operators up the query 
plan.

This JIRA is to fix this by propagating the type information embedded in the 
cast function to the scanners so that scanners can cast the incoming data 
appropriately.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: "Death of Schema-on-Read"

2018-04-06 Thread Hanumath Rao Maduri
Hello,

Thanks for Ted & Paul for clarifying my questions.
Sorry for not being clear in my previous post, When I said create view I
was under the impression for simple views where we use cast expressions
currently to cast them to types. In this case planner can use this
information to force the scans to use this as the schema.

If the query fails then it fails at the scan and not after inferring the
schema by the scanner.

I know that views can get complicated with joins and expressions. For
schema hinting through views I assume they should be created on single
tables with corresponding columns one wants to project from the table.


Regarding the same question, today we had a discussion with Aman. Here view
can be considered as a "view" of the table with schema in place.

We can change some syntax to suite it for specifying schema. something like
this.

create schema[optional] view(/virtual table ) v1 as (a: int, b : int)
select a, b from t1 with some other rules as to conversion of scalar to
complex types.

Then the queries when used on this view (below) should enable the scanner
to use this type information and then use it to convert the data into the
appropriate types.
select * from v1

For the possibility of schema information not being known by the user, may
be use something like this.

create schema[optional] view(/virtual table) v1 as select a, b from t1
infer schema.

This view when used to query the table should trigger the logic of
inferring and consolidating the schema and attaching that inferred schema
to the view. In future when we use the same view, we should be using the
inferred schema. This view either can be local view pertaining to the
session or a global view so that other queries across sessions can use them.


By default we can apply certain rules such as converting simple scalar
values to other scalar values (like int to double etc). But we should be
also able to give option to the customer to enable rules such as scalar int
to array[int] when creating the view itself.


Thanks,
-Hanu


On Fri, Apr 6, 2018 at 3:10 PM, Paul Rogers 
wrote:

> Ted, this is why your participation in Drill is such a gift: cast
> push-down is an elegant, simple solution that even works in views.
> Beautiful.
>
> Thanks,
> - Paul
>
>
>
> On Friday, April 6, 2018, 11:35:37 AM PDT, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>  On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
> wrote:
>
> > Great discussion. Really appreciate the insight from the Drill users!
> >
> > To Ted's points: the simplest possible solution is to allow a table
> > function to express types. Just making stuff up:
> >
> > SELECT a FROM schema(myTable, (a: INT))
> >
>
> Why not just allow cast to be pushed down to the reader?
>
> Why invent new language features?
>
> Or, really ugly, a session option:
> >
> > ALTER SESSION SET schema.myTable="a: INT"
> >
>
> These are a big problem.
>
>


Re: "Death of Schema-on-Read"

2018-04-05 Thread Hanumath Rao Maduri
Hello,

Thank you Paul for starting this discussion.
However, I was not clear on the latest point as to how providing hints and
creating a view(mechanism which already exists in DRILL) is different.
I do think that creating a view can be cumbersome (in terms of syntax).
Providing hints are ephemeral and hence it can be used for quick validation
of the schema for a query execution. But if the user absolutely knows the
schema, then I think creating a view and using it might be a better option.
Can you please share your thoughts on this.

Thank you Ted for your valuable suggestions, as regards to your comment on
"metastore is good but centralized is bad" can you please share your view
point on what all design issues it can cause. I know that it can be
bottleneck but just want to know about other issues.
Put in other terms if centralized metastore engineered in a good way to
avoid most of the bottleneck, then do you think it can be good to use for
metadata?

Thanks,
-Hanu

On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
wrote:

> Great discussion. Really appreciate the insight from the Drill users!
>
> To Ted's points: the simplest possible solution is to allow a table
> function to express types. Just making stuff up:
>
> SELECT a FROM schema(myTable, (a: INT))
>
> Or, a SQL extension:
>
> SELECT a FROM myTable(a: INT)
>
> Or, really ugly, a session option:
>
> ALTER SESSION SET schema.myTable="a: INT"
>
> All these are ephemeral and not compatible with, say, Tableau.
>
> Building on Ted's suggestion of using the (distributed) file system we can
> toss out a few half-baked ideas. Maybe use a directory to represent a name
> space, with files representing tables. If I have "weblogs" as my directory,
> I might have a file called "jsonlog" to describe the (messy) format of my
> JSON-formatted log files. And "csvlog" to describe my CSV-format logs.
> Different directories represent different SQL databases (schemas),
> different files represent tables within the schema.
>
>
> The table files can store column hints. But, it could do more. Maybe
> define the partitioning scheme (by year, month, day, say) so that can be
> mapped to a column. Wouldn't it be be great if Drill could figure out the
> partitioning itself if we gave a date range?
>
> The file could also define the format plugin to use, and its options, to
> avoid the need to define this format separate from the data, and to reduce
> the need for table functions.
>
> Today, Drill matches files to format plugins using only extensions. The
> table file could provide a regex for those old-style files (such as real
> web logs) that don't use suffixes. Or, to differentiate between "sales.csv"
> and "returns.csv" in the same data directory.
>
>
> While we're at it, the file might as well contain a standard view to apply
> to the table to define computed columns, do data conversions and so on.
>
> If Drill does automatic scans (to detect schema, to gather stats), maybe
> store that alongside the table file: "csvlogs.drill" for the
> Drill-generated info.
>
>
> Voila! A nice schema definition with no formal metastore. Because the info
> is in files, it easy to version using git, etc. (especially if the
> directory can be mounted using NFS as a normal directory.) Atomic updates
> can be done via the rename trick (which, sadly, does not work on S3...)
>
>
> Or, maybe store all information in ZK in JSON as we do for plugin
> configurations. (Hard to version and modify though...)
>
>
> Lots of ways to skin this cat once we agree that hints are, in fact,
> useful additions to Drill's automatic schema detection.
>
>
> Thanks,
> - Paul
>
>
>
> On Thursday, April 5, 2018, 3:22:07 PM PDT, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>  On Thu, Apr 5, 2018 at 7:24 AM, Joel Pfaff  wrote:
>
> > Hello,
> >
> > A lot of versioning problems arise when trying to share data through
> kafka
> > between multiple applications with different lifecycles and maintainers,
> > since by default, a single message in Kafka is just a blob.
> > One way to solve that is to agree on a single serialization format,
> > friendly with a record per record storage (like avro) and in order to not
> > have to serialize the schema in use for every message, just reference an
> > entry in the Avro Schema Registry (this flow is described here:
> > https://medium.com/@stephane.maarek/introduction-to-
> > schemas-in-apache-kafka-with-the-confluent-schema-registry-3bf55e401321
> > ).
> > On top of the schema registry, specific client libs allow to validate the
> > message structure prior to the injection in kafka.
> > So while comcast mentions the usage of an Avro Schema to describe its
> > feeds, it does not mention directly the usage of avro files (to describe
> > the schema).
> >
>
> This is all good except for the assumption of a single schema for all time.
> You can mutate schemas in Avro (or JSON) in a future-proof manner, but it
> is important to 

Re: [DISCUSS] 1.13.0 release

2018-03-07 Thread Hanumath Rao Maduri
On my machine I couldn't repro the issue related to TestDrillbitResilience.
cancelAfterAllResultsProduced.
I used the vladimir's branch (i.e DRILL-1491).
Used the maven test command for testing it.

output of the test run.
... 4 common frames omitted
Tests run: 20, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 124.187
sec - in org.apache.drill.exec.server.TestDrillbitResilience




On Wed, Mar 7, 2018 at 11:00 AM, Parth Chandra  wrote:

> Yes I agree. JDBC would be a new feature that we can defer to 1.14.0.
> I'm hoping we can resolve the other three in the next few days. Target date
> for starting release process - Friday Mar 9th
>
> Once these are resolved, I will create a branch for the release so that
> Apache master remains open for commits. If any issues are found in the
> release branch, we will fix them in master and I will cherry-pick the into
> the release branch. Once the release is finalized I will add a release tag
> and  remove the branch.
>
> Also note if QA folks want to get started on testing the release, the
> current head of Apache master is close to final. Javadoc generation is only
>  a release build issue, and the other issues are localized to specific
> cases.
>
> Note: to reproduce the javadoc issues:
># set JAVA_HOME to JDK 8
>mvn javadoc:javadoc -Papache-release
>
>
>
> On Wed, Mar 7, 2018 at 11:23 PM, Aman Sinha  wrote:
>
> > It seems to me the main blockers are:
> >
> > 1. DRILL-4547Javadoc fails with Java8   <-- Can we split up the work
> > among few people to resolve these ?
> > 2. DRILL-6216Metadata mismatch.. <-- Agreement was to revert
> > one small piece of code and it appears Sorabh is looking into it
> > 3. TestDrillbitResilience.cancelAfterAllResultsProduced  <-- need
> someone
> > to look into this
> >
> > Regarding the JDBC issues that Parth mentioned, looking at the JIRAs, it
> > seems they are not showstoppers...Parth do you agree ?
> >
> > Since we are close to the finish line for JDK 8, IMO we should try and
> see
> > if in another day or two we can get over these hurdles.
> >
> > -Aman
> >
> >
> >
> > On Wed, Mar 7, 2018 at 7:17 AM, Pritesh Maker  wrote:
> >
> > > The JDK 8 issues will likely require more time to harden for it to be
> > > included in the 1.13 release. My recommendation would be to move ahead
> > with
> > > the 1.13 release now and address these issues right.
> > >
> > > Pritesh
> > >
> > > -Original Message-
> > > From: Parth Chandra 
> > > Sent: March 7, 2018 3:34 AM
> > > To: dev 
> > > Subject: Re: [DISCUSS] 1.13.0 release
> > >
> > > My mistake Volodymyr.
> > >
> > > Found some other JDK 8 issues in JIRA not tracked in DRILL-1491
> > >
> > >   DRILL-4547Javadoc fails with Java8
> > >   DRILL-6163Switch Travis To Java 8
> > >
> > > The following are tracked in DRILL-1491, but it doesn't look like we're
> > > addressing these. Are we?
> > >
> > >   DRILL-4329 13 Unit tests are failing with JDK 8
> > >   DRILL-4333DRILL-4329 tests in
> > > Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8
> > >   DRILL-5120Upgrade JDBC Driver for new Java 8 methods
> > >   DRILL-5680BasicPhysicalOpUnitTest can't run in Eclipse with Java
> 8
> > >
> > >
> > > *DRILL-4547 is a showstopper*. The release build (-Papache-release)
> fails
> > > with far too many Javadoc errors even with doc lint turned off.
> > >
> > > DRILL-4333, DRILL-4329, DRILL-5120 are JDBC related which is a project
> by
> > > itself.
> > >
> > > Note that fixing JDBC related issues and adding the command line option
> > to
> > > turn doc lint off will likely break Java 7 builds.
> > >
> > >
> > > Folks who voted to get JDK 8 into this release, what is the consensus
> on
> > > JDBC/Java8 ?
> > > Also, any volunteers on helping debug
> > > TestDrillbitResilience.cancelAfterAllResultsProduced
> > > ?
> > >
> > >
> > >
> > > On Wed, Mar 7, 2018 at 3:20 PM, Volodymyr Tkach  >
> > > wrote:
> > >
> > > > Addition to my last message:
> > > > The link with PR for DRILL-1491  https://urldefense.proofpoint.
> > > com/v2/url?u=https-3A__github.com_apache_drill_pull_1143=DwIBaQ=
> > > cskdkSMqhcnjZxdQVpwTXg=zySISmkmM4WNViCKijENtQ=oTnKwfjj5hFBosMrq_
> > > WWhazhGeoC2nGSKeMOPxU2_cM=p3uialdRhgnf3XRY22R4SWXGZIq66a
> > Pijuy-Ms0J_-4=
> > > > on which the we can see  TestDrillbitResilience.
> > > > cancelAfterAllResultsProduced
> > > > failure.
> > > >
> > > > 2018-03-07 11:45 GMT+02:00 Volodymyr Tkach :
> > > >
> > > > > *To Parth:*
> > > > > The failure can only be seen if run on DRILL-1491 branch, because
> it
> > > uses
> > > > > jdk 1.8 in pom.xml
> > > > >
> > > > > 1.8
> > > > > 1.8
> > > > >
> > > > > 2018-03-07 6:03 GMT+02:00 Sorabh Hamirwasia  >:
> > > > >
> > > > >> Just sent an email on RCA of DRILL-6216 to discuss next steps.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > 

[jira] [Created] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-03-05 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6212:
--

 Summary: A simple join is recursing too deep in planning and 
eventually throwing stack overflow.
 Key: DRILL-6212
 URL: https://issues.apache.org/jira/browse/DRILL-6212
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Create two views using following statements.

{code}
create view v1 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
create view v2 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
{code}

Executing the following join query produces a stack overflow during the 
planning phase.
{code}
select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v1 as t1 on cast(t.f as 
int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6159) No need to offset rows if order by is not specified in the query.

2018-02-14 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6159:
--

 Summary: No need to offset rows if order by is not specified in 
the query.
 Key: DRILL-6159
 URL: https://issues.apache.org/jira/browse/DRILL-6159
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: Future


For the queries which have offset and limit and no order by no need to add the 
offset to limit during pushdown of the limit.

Sql doesn't guarantee order in the output if no order by is specified in the 
query. It is observed that for the queries with offset and limit and no order 
by, current optimizer is adding the offset and limit and limiting those many 
rows. Doing so will not early exit the query.

Here is an example for a query.

{code}
select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000


00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
1.56569100288E11 network, 4.64926176E7 memory}, id = 787
00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, 
id = 786
00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 
cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 785
00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
{9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
4.64926176E7 memory}, id = 784
00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 cpu, 
0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
782
01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
781
01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative cost 
= {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
4.64926176E7 memory}, id = 780
01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 
network, 4.64926176E7 memory}, id = 779
01-06  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
maprfs:/tmp/csvd1/Daamulti11random21.csv, 
maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
01-05  BroadcastExchange : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 
io, 3.2460300288E10 network, 0.0 memory}, id = 778
02-01Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], 
files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 777
{code}

The limit pushed down is  Limit(fetch=[1010]) instead it should be  
Limit(fetch=[10])
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6158) Create a mux operator for union exchange to enable two phase merging instead of foreman merging all the batches.

2018-02-14 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6158:
--

 Summary: Create a mux operator for union exchange to enable two 
phase merging instead of foreman merging all the batches.
 Key: DRILL-6158
 URL: https://issues.apache.org/jira/browse/DRILL-6158
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: Future


Consider the following simple query

{code}
select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000
{code}

The following plan is generated for this query
{code}
00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
1.56569100288E11 network, 4.64926176E7 memory}, id = 787
00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, 
id = 786
00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 
cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 785
00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
{9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
4.64926176E7 memory}, id = 784
00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 cpu, 
0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
782
01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
781
01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative cost 
= {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
4.64926176E7 memory}, id = 780
01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 
network, 4.64926176E7 memory}, id = 779
01-06  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
maprfs:/tmp/csvd1/Daamulti11random21.csv, 
maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
01-05  BroadcastExchange : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 
io, 3.2460300288E10 network, 0.0 memory}, id = 778
02-01Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], 
files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 777
{code}

In case of many minor fragments and huge cluster all the minor fragments 
feeding into unionExchange will be merged only at the foreman. Eventhough 
unionExchange is not a bottleneck interms of cpu but it creates huge memory 
pressure in terms of memory. 

It is observed that due to this mostly on a large cluster with many minor 
fragments it runs out of memory. 

In this scenario it is always better to locally merge the minor fragments 
pertaining to a DRILLBIT and send the single stream to the foreman. This 
divides the memory consumption to all the drillbits and then reduces the memory 
pressure at the foreman.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-12 Thread Hanumath Rao Maduri (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri resolved DRILL-6148.

Resolution: Fixed

> TestSortSpillWithException is sometimes failing.
> 
>
> Key: DRILL-6148
> URL: https://issues.apache.org/jira/browse/DRILL-6148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0
>        Reporter: Hanumath Rao Maduri
>    Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.13.0
>
>
> TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
> for some reason this is being observed only in one of my branch. 
> TestSpillLeakManaged tests for leak when an exception is thrown during the 
> spilling of the rows in ExternalSort. In the test failure case it happens 
> that ExternalSort is able to sort the data with the given memory and not 
> spill at all. Hence the injection interruption path is not hit at all and 
> hence no exception is thrown.
> The test case should use drill.exec.sort.external.mem_limit to force it to 
> use as less memory as possible so as to test the case.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-09 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6148:
--

 Summary: TestSortSpillWithException is sometimes failing.
 Key: DRILL-6148
 URL: https://issues.apache.org/jira/browse/DRILL-6148
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.12.0


TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
for some reason this is being observed only in one of my branch. 

TestSpillLeakManaged tests for leak when an exception is thrown during the 
spilling of the rows in ExternalSort. In the test failure case it happens that 
ExternalSort is able to sort the data with the given memory and not spill at 
all. Hence the injection interruption path is not hit at all and hence no 
exception is thrown.

The test case should use drill.exec.sort.external.mem_limit to force it to use 
as less memory as possible so as to test the case.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-01-29 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6115:
--

 Summary: SingleMergeExchange is not scaling up when many minor 
fragments are allocated for a query.
 Key: DRILL-6115
 URL: https://issues.apache.org/jira/browse/DRILL-6115
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx

SingleMergeExchange is created when a global order is required in the output. 
The following query produces the SingleMergeExchange.
{code:java}
0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(L_LINENUMBER=[$0])
00-02 SingleMergeExchange(sort0=[0])
01-01 SelectionVectorRemover
01-02 Sort(sort0=[$0], dir0=[ASC])
01-03 HashToRandomExchange(dist0=[[$0]])
02-01 Scan(table=[[dfs, /drill/tables/lineitem]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=maprfs:///drill/tables/lineitem, 
condition=null], columns=[`L_LINENUMBER`], maxwidth=15]])
{code}

On a 10 node cluster if the table is huge then DRILL can spawn many minor 
fragments which are all merged on a single node with one merge receiver. Doing 
so will create lot of memory pressure on the receiver node and also execution 
bottleneck. To address this issue, merge receiver should be multiphase merge 
receiver. 

Ideally for large cluster one can introduce tree merges so that merging can be 
done parallel. But as a first step I think it is better to use the existing 
infrastructure for multiplexing operators to generate an OrderedMux so that all 
the minor fragments pertaining to one DRILLBIT should be merged and the merged 
data can be sent across to the receiver operator.

On a 10 node cluster if each node processes 14 minor fragments.

Current version of code merges 140 minor fragments
the proposed version has two level merges 1 - 14 merge in each drillbit which 
is parallel 
and 10 minorfragments are merged at the receiver node.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.

2017-10-16 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-5878:
--

 Summary: TableNotFound exception is being reported for a wrong 
storage plugin.
 Key: DRILL-5878
 URL: https://issues.apache.org/jira/browse/DRILL-5878
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 1.11.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
Priority: Minor
 Fix For: 1.12.0


Drill is reporting TableNotFound exception for a wrong storage plugin. 
Consider the following query where employee.json is queried using cp plugin.
{code}
0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10;
+--++-++--+-+---++-++--++---+-+-++
| employee_id  | full_name  | first_name  | last_name  | position_id  | 
position_title  | store_id  | department_id  | birth_date  |   
hire_date|  salary  | supervisor_id  |  education_level  | 
marital_status  | gender  |  management_role   |
+--++-++--+-+---++-++--++---+-+-++
| 1| Sheri Nowmer   | Sheri   | Nowmer | 1| 
President   | 0 | 1  | 1961-08-26  | 1994-12-01 
00:00:00.0  | 8.0  | 0  | Graduate Degree   | S   | 
F   | Senior Management  |
| 2| Derrick Whelply| Derrick | Whelply| 2| 
VP Country Manager  | 0 | 1  | 1915-07-03  | 1994-12-01 
00:00:00.0  | 4.0  | 1  | Graduate Degree   | M   | 
M   | Senior Management  |
| 4| Michael Spence | Michael | Spence | 2| 
VP Country Manager  | 0 | 1  | 1969-06-20  | 1998-01-01 
00:00:00.0  | 4.0  | 1  | Graduate Degree   | S   | 
M   | Senior Management  |
| 5| Maya Gutierrez | Maya| Gutierrez  | 2| 
VP Country Manager  | 0 | 1  | 1951-05-10  | 1998-01-01 
00:00:00.0  | 35000.0  | 1  | Bachelors Degree  | M   | 
F   | Senior Management  |
| 6| Roberta Damstra| Roberta | Damstra| 3| 
VP Information Systems  | 0 | 2  | 1942-10-08  | 1994-12-01 
00:00:00.0  | 25000.0  | 1  | Bachelors Degree  | M   | 
F   | Senior Management  |
| 7| Rebecca Kanagaki   | Rebecca | Kanagaki   | 4| 
VP Human Resources  | 0 | 3  | 1949-03-27  | 1994-12-01 
00:00:00.0  | 15000.0  | 1  | Bachelors Degree  | M   | 
F   | Senior Management  |
| 8| Kim Brunner| Kim | Brunner| 11   | 
Store Manager   | 9 | 11 | 1922-08-10  | 1998-01-01 
00:00:00.0  | 1.0  | 5  | Bachelors Degree  | S   | 
F   | Store Management   |
| 9| Brenda Blumberg| Brenda  | Blumberg   | 11   | 
Store Manager   | 21| 11 | 1979-06-23  | 1998-01-01 
00:00:00.0  | 17000.0  | 5  | Graduate Degree   | M   | 
F   | Store Management   |
| 10   | Darren Stanz   | Darren  | Stanz  | 5| 
VP Finance  | 0 | 5  | 1949-08-26  | 1994-12-01 
00:00:00.0  | 5.0  | 1  | Partial College   | M   | 
M   | Senior Management  |
| 11   | Jonathan Murraiin  | Jonathan| Murraiin   | 11   | 
Store Manager   | 1 | 11 | 1967-06-20  | 1998-01-01 
00:00:00.0  | 15000.0  | 5  | Graduate Degree   | S   | 
M   | Store Management   |
+--++-++--+-+---++-++--++---+-+-++
{code}

However if cp1 is used instead of cp then Drill reports TableNotFound exception.
{code}
0: jdbc:drill:zk=local> select * from cp1.`employee.json` limit 10;
Oct 16, 2017 1:40:02 PM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'cp1.employee.json' not found
Oct 16, 2017 1:40

[jira] [Created] (DRILL-5851) Empty table during a join operation with a non empty table produces cast exception

2017-10-06 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-5851:
--

 Summary: Empty table during a join operation with a non empty 
table produces cast exception 
 Key: DRILL-5851
 URL: https://issues.apache.org/jira/browse/DRILL-5851
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Hash Join operation on tables with one table empty and the other non empty 
throws an exception 
{code} 
Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
between 1. Numeric data
 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
type: INT. Add explicit casts to avoid this error
{code}

Here is an example query with which it is reproducible.

{code}
select * from cp.`sample-data/nation.parquet` nation left outer join 
dfs.tmp.`2.csv` as two on two.a = nation.`N_COMMENT`;
{code}

the contents of 2.csv is empty (i.e not even header info).





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Assign a JIRA

2017-09-26 Thread Hanumath Rao Maduri
Hello All,

I would like to work on this JIRA DRILL-5773. Can you please assign this
JIRA to me.

my user-name : hanu.ncr

Thanks,
-Hanu