from:"Hanumath Rao Maduri $JIRA$"

[jira] [Updated] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-04-22 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7191:
---
Description: 
Selection of the queue based on the acl/tags
Non-leader queue configurations
All required blobs for the queues in Zookeeper.
Concept of waiting queues and running queues on Foreman
Handling state transition of queryRM
Changes to support storing UUID for each Drillbit Service Instance locally to 
be used by planner and execution layer. This UUID is used to uniquely identify 
a Drillbit and register Drillbit information in the RM StateBlobs. Introduced a 
PersistentStore named ZookeeperTransactionalPersistenceStore with Transactional 
capabilities using Zookeeper Transactional API’s. This is used for updating RM 
State blobs as all the updates need to happen in transactional manner. Added 
RMStateBlobs definition and support for serde to Zookeeper. Implementation for 
DistributedRM and its corresponding QueryRM apis.
Updated the state management of Query in Foreman so that same Foreman object 
can be submitted multiple times. Also introduced concept of 2 maps keeping 
track of waiting and running queries. These were done to support for async 
admit protocol which will be needed with Distributed RM.

  was:
Selection of the queue based on the acl/tags
Non-leader queue configurations
All required blobs for the queues in Zookeeper.
Concept of waiting queues and running queues on Foreman
Handling state transition of queryRM



> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.17.0
>
>
> Selection of the queue based on the acl/tags
> Non-leader queue configurations
> All required blobs for the queues in Zookeeper.
> Concept of waiting queues and running queues on Foreman
> Handling state transition of queryRM
> Changes to support storing UUID for each Drillbit Service Instance locally to 
> be used by planner and execution layer. This UUID is used to uniquely 
> identify a Drillbit and register Drillbit information in the RM StateBlobs. 
> Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore 
> with Transactional capabilities using Zookeeper Transactional API’s. This is 
> used for updating RM State blobs as all the updates need to happen in 
> transactional manner. Added RMStateBlobs definition and support for serde to 
> Zookeeper. Implementation for DistributedRM and its corresponding QueryRM 
> apis.
> Updated the state management of Query in Foreman so that same Foreman object 
> can be submitted multiple times. Also introduced concept of 2 maps keeping 
> track of waiting and running queries. These were done to support for async 
> admit protocol which will be needed with Distributed RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-04-22 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7191:
---
Summary: RM blobs persistence in Zookeeper for Distributed RM  (was: 
Distributed state persistence in Zookeeper for Distributed RM)

> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.17.0
>
>
> Selection of the queue based on the acl/tags
> Non-leader queue configurations
> All required blobs for the queues in Zookeeper.
> Concept of waiting queues and running queues on Foreman
> Handling state transition of queryRM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7191) Distributed state persistence in Zookeeper for Distributed RM

2019-04-22 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7191:
---
Description: 
Selection of the queue based on the acl/tags
Non-leader queue configurations
All required blobs for the queues in Zookeeper.
Concept of waiting queues and running queues on Foreman
Handling state transition of queryRM


> Distributed state persistence in Zookeeper for Distributed RM
> -
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.17.0
>
>
> Selection of the queue based on the acl/tags
> Non-leader queue configurations
> All required blobs for the queues in Zookeeper.
> Concept of waiting queues and running queues on Foreman
> Handling state transition of queryRM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7191) Distributed state persistence in Zookeeper for Distributed RM

2019-04-22 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7191:
---
Summary: Distributed state persistence in Zookeeper for Distributed RM  
(was: Distributed state persistence and Integration of Distributed queue 
configuration with Planner)

> Distributed state persistence in Zookeeper for Distributed RM
> -
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7193) Integration changes of the Distributed RM queue configuration with Simple Parallelizer.

2019-04-22 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-7193:
--

 Summary: Integration changes of the Distributed RM queue 
configuration with Simple Parallelizer.
 Key: DRILL-7193
 URL: https://issues.apache.org/jira/browse/DRILL-7193
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning  Optimization
Affects Versions: 1.17.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.17.0


Refactoring fragment generation code for the RM to accommodate non RM, ZK based 
queue RM and Distributed RM.
Calling the Distributed RM for queue selection based on memory requirements.
Adjustment of the operator memory based on the memory limits of the selected 
queue.
Setting of the optimal memory allocation per operator in each minor fragment. 
This shows up in the query profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7191) Distributed state persistence and Integration of Distributed queue configuration with Planner

2019-04-21 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-7191:
--

 Summary: Distributed state persistence and Integration of 
Distributed queue configuration with Planner
 Key: DRILL-7191
 URL: https://issues.apache.org/jira/browse/DRILL-7191
 Project: Apache Drill
  Issue Type: Sub-task
  Components:  Server, Query Planning  Optimization
Affects Versions: 1.17.0
Reporter: Hanumath Rao Maduri
 Fix For: 1.17.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7164:
---
Description: 
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a 
change in the way cost is being represented. It also has the changed the test 
which I think is not right. The pattern to compare in the plan should be made 
smart to fix this issue generically.

  was:
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost"

[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7164:
---
Description: 
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a 
change in the way cost is being represented. It also has the changed the test 
which I think is not right. The pattern to compare in the plan should be made 
smart to fix this issue generically.

  was:
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in

[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7164:
---
Description: 
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a 
change in the way cost is being represented. This has the changed the test 
which I think is not right. The pattern to compare in the plan should be made 
smart to fix this issue generically.

  was:
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in

[jira] [Created] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-7164:
--

 Summary: KafkaFilterPushdownTest is sometimes failing to pattern 
match correctly.
 Key: DRILL-7164
 URL: https://issues.apache.org/jira/browse/DRILL-7164
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Kafka
Affects Versions: 1.16.0
Reporter: Hanumath Rao Maduri
Assignee: Abhishek Ravi
 Fix For: 1.17.0


On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin there is a change in the way cost is being represented. 
This has the changed the test which I think is not right. The pattern to 
compare in the plan should be made smart to fix this issue generically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7118) Filter not getting pushed down on MapR-DB tables.

2019-03-19 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-7118:
--

 Summary: Filter not getting pushed down on MapR-DB tables.
 Key: DRILL-7118
 URL: https://issues.apache.org/jira/browse/DRILL-7118
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


A simple is null filter is not being pushed down for the mapr-db tables. Here 
is the repro for the same.
{code:java}
0: jdbc:drill:zk=local> explain plan for select * from dfs.`/tmp/js` where b is 
null;
ANTLR Tool version 4.5 used for code generation does not match the current 
runtime version 4.7.1ANTLR Runtime version 4.5 used for parser compilation does 
not match the current runtime version 4.7.1ANTLR Tool version 4.5 used for code 
generation does not match the current runtime version 4.7.1ANTLR Runtime 
version 4.5 used for parser compilation does not match the current runtime 
version 
4.7.1+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(**=[$0])
00-02 Project(T0¦¦**=[$0])
00-03 SelectionVectorRemover
00-04 Filter(condition=[IS NULL($1)])
00-05 Project(T0¦¦**=[$0], b=[$1])
00-06 Scan(table=[[dfs, /tmp/js]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=/tmp/js, condition=null], columns=[`**`, 
`b`], maxwidth=1]])
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7113) Issue with filtering null values from MapRDB-JSON

2019-03-18 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-7113:
--

 Summary: Issue with filtering null values from MapRDB-JSON
 Key: DRILL-7113
 URL: https://issues.apache.org/jira/browse/DRILL-7113
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Aman Sinha
 Fix For: 1.16.0, 1.17.0


When the Drill is querying documents from MapRDBJSON that contain fields with 
null value, it returns the wrong result.
 The issue is locally reproduced.

Please find the repro steps:
 [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.

[2] Insert the following sample records to table:
{code:java}
insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", 
"confidence": 0.24}'
insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", 
"confidence": 0.54}'
insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", 
"confidence": null}'
{code}
We can see that for field 'confidence' document 1 has value 0.24, document 3 
has value 0.54, document 2 does not have the field and document 4 has the field 
with value null.

[3] Query the table from DRILL.
 *Query 1:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
+--+-+
|  label   | confidence  |
+--+-+
| person   | 0.24|
| person2  | null|
| person3  | 0.54|
| person4  | null|
+--+-+
4 rows selected (0.2 seconds)

{code}
*Query 2:*
{code:java}
0: jdbc:drill:> select * from dfs.tmp.dmdb2;
+--+-+--+
| _id  | confidence  |  label   |
+--+-+--+
| 1| 0.24| person   |
| 2| null| person2  |
| 3| 0.54| person3  |
| 4| null| person4  |
+--+-+--+
4 rows selected (0.174 seconds)

{code}
*Query 3:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is 
not null;
+--+-+
|  label   | confidence  |
+--+-+
| person   | 0.24|
| person3  | 0.54|
| person4  | null|
+--+-+
3 rows selected (0.192 seconds)

{code}
*Query 4:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is  
null;
+--+-+
|  label   | confidence  |
+--+-+
| person2  | null|
+--+-+
1 row selected (0.262 seconds)

{code}
As you can see, Query 3 which queries for all documents with confidence value 
'is not null', returns a document with null value.

*Other observation:*
 Querying the same data using DRILL without MapRDB provides the correct result.
 For example, create 4 different JSON files with following data:

{"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": 
"person3", "confidence": 0.54} \{"label": "person4", "confidence": null}

Query it directly using DRILL:

*Query 5:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
+--+-+
|  label   | confidence  |
+--+-+
| person4  | null|
| person3  | 0.54|
| person2  | null|
| person   | 0.24|
+--+-+
4 rows selected (0.203 seconds)

{code}
*Query 6:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is 
null;
+--+-+
|  label   | confidence  |
+--+-+
| person4  | null|
| person2  | null|
+--+-+
2 rows selected (0.352 seconds)

{code}
*Query 7:*
{code:java}
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is not 
null;
+--+-+
|  label   | confidence  |
+--+-+
| person3  | 0.54|
| person   | 0.24|
+--+-+
2 rows selected (0.265 seconds)

{code}
As seen in query 6 & 7, it returns the correct result.

I believe the issue is at the MapRDB layer where it is fetching the results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6642) Update protocol-buffers version

2019-03-04 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6642:
---
Labels:   (was: ready-to-commit)

> Update protocol-buffers version
> ---
>
> Key: DRILL-6642
> URL: https://issues.apache.org/jira/browse/DRILL-6642
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build  Test
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently Drill uses 2.5.0 {{protocol-buffers}} version.
>  The last version is 3.6.0 in maven repo: 
> [https://mvnrepository.com/artifact/com.google.protobuf/protobuf-java]
> The new version has a lot of useful enhancements, which can be used in Drill.
>  One of them is using {{UNRECOGNIZED Enum NullValue}}, which can help to 
> handle them in place of null values for {{ProtocolMessageEnum}} - DRILL-6639. 
>  Looks like the NullValue can be used instead of null returned from 
> {{valueOf()}} (_or {{forNumber()}}, since {{valueOf()}} is deprecated in the 
> newer protobuf version_):
>  
> [https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/NullValue]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6927) Query fails when hive table with timestamp data is queried with enabled int96_as_timestamp and optimize_scan_with_native_reader options

2019-03-01 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6927:
---
Labels:   (was: ready-to-commit)

> Query fails when hive table with timestamp data is queried with enabled 
> int96_as_timestamp and optimize_scan_with_native_reader options
> ---
>
> Key: DRILL-6927
> URL: https://issues.apache.org/jira/browse/DRILL-6927
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Minor
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 1. Create hive table with timestamp column:
> {code:sql}
> create table test_timestamp stored as PARQUET as select timestamp '2018-01-01 
> 12:12:12.123' as c1;
> {code}
> 2. Enable {{store.parquet.reader.int96_as_timestamp}} and 
> {{store.hive.parquet.optimize_scan_with_native_reader}}:
> {code:sql}
> set `store.parquet.reader.int96_as_timestamp`=true;
> set `store.hive.parquet.optimize_scan_with_native_reader`=true;
> {code}
> 3. Query hive table using Drill:
> {code:sql}
> select * from hive.test_timestamp;
> {code}
> Query fails with error:
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [convert_fromtimestamp_impala(TIMESTAMP-OPTIONAL)].  Full expression: 
> --UNKNOWN EXPRESSION--..
> Fragment 0:0
> {noformat}
> Stack trace:
> {noformat}
> Error in expression at index -1.  Error: Missing function implementation: 
> [convert_fromtimestamp_impala(TIMESTAMP-OPTIONAL)].  Full expression: 
> --UNKNOWN EXPRESSION--..
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchemaFromInput(ProjectRecordBatch.java:498)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:583)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:101)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
>  ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.8.0_141]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_141]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>  ~[hadoop-common-2.7.0-mapr-1808.jar:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284)
>  [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
>   ... 4 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-4858) REPEATED_COUNT on an array of maps and an array of arrays is not implemented

2019-03-01 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-4858:
---
Labels:   (was: ready-to-commit)

> REPEATED_COUNT on an array of maps and an array of arrays is not implemented
> 
>
> Key: DRILL-4858
> URL: https://issues.apache.org/jira/browse/DRILL-4858
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: jean-claude
>Assignee: Bohdan Kazydub
>Priority: Minor
> Fix For: 1.16.0
>
>
> REPEATED_COUNT of JSON containing an array of map does not work.
> JSON file
> {code}
> drill$ cat /Users/jccote/repeated_count.json 
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {code}
> select
> {code}
> 0: jdbc:drill:zk=local> select repeated_count(mapArray) from 
> dfs.`/Users/jccote/repeated_count.json`;
> {code}
> error
> {code}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(MAP-REPEATED)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 1057bb8e-1cc4-4a9a-a748-3a6a14092858 on 192.168.1.3:31010] 
> (state=,code=0)
> {code}
> The same issue is present for an array of arrays
> for JSON file
> {code}
> {"id": 1, "array": [[1, 2], [1, 3], [2, 3]]}
> {"id": 2, "array": []}
> {"id": 3, "array": [[2, 3], [1, 3, 4]]}
> {"id": 4, "array": [[1], [2], [3, 4], [5], [6]]}
> {"id": 5, "array": [[1, 2, 3], [4, 5], [6], [7], [8, 9], [2, 3], [2, 3], [2, 
> 3], [2]]}
> {"id": 6, "array": [[1, 2], [3], [4], [5]]}
> {"id": 7, "array": []}
> {"id": 8, "array": [[1], [2], [3]]}
> {code}
> the following error is shown
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select REPEATED_COUNT(array) from 
> `arrayOfArrays.json`;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(LIST-REPEATED)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 12b81b85-c84b-4773-8427-48b80098cafe on qa102-45.qa.lab:31010] 
> (state=,code=0)
> {code}
> Looking at the org.apache.drill.exec.expr.fn.impl.SimpleRepeatedFunctions
> Looks like it's not enabled yet. 
> {code}
>   // TODO - need to confirm that these work   SMP: They do not
>   @FunctionTemplate(name = "repeated_count", scope = 
> FunctionTemplate.FunctionScope.SIMPLE)
>   public static class RepeatedLengthMap implements DrillSimpleFunc {
> ...
>   // TODO - need to confirm that these work   SMP: They do not
>   @FunctionTemplate(name = "repeated_count", scope = 
> FunctionTemplate.FunctionScope.SIMPLE)
>   public static class RepeatedLengthList implements DrillSimpleFunc {
> {code}
> Also make {{REPEATED_COUNT}} function to support other REPEATED type. So 
> Drill's {{REPEATED_COUNT}} function supports following REPEATED types: 
> RepeatedBit, RepeatedInt, RepeatedBigInt, RepeatedFloat4, RepeatedFloat8, 
> RepeatedDate, RepeatedTimeStamp, RepeatedTime, RepeatedIntervalDay, 
> RepeatedIntervalYear, RepeatedInterval, RepeatedVarChar, RepeatedVarBinary, 
> RepeatedVarDecimal, RepeatedDecimal9, RepeatedDecimal18, 
> RepeatedDecimal28Sparse, RepeatedDecimal38Sparse, RepeatedList, RepeatedMap



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7047) Drill C++ Client crash due to Dangling stack ptr to sasl_callback_t

2019-03-01 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7047:
---
Labels:   (was: ready-to-commit)

> Drill C++ Client crash due to Dangling stack ptr to sasl_callback_t 
> 
>
> Key: DRILL-7047
> URL: https://issues.apache.org/jira/browse/DRILL-7047
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.14.0
>Reporter: Rob Wu
>Assignee: Debraj Ray
>Priority: Major
> Fix For: 1.16.0
>
>
> The sasl_client_new does not copy its callback argument array, resulting in a 
> pointer to transient stack memory. 
>  
> [~debraj92] will be supplying a patch to resolve this issue. This patch moves 
> the callbacks array into the member variable m_callbacks which has the same 
> lifetime as the sasl impl instance and thus will remain valid until the end 
> of life.
>  
> Trace:
> {code:java}
> #0 0x0080 in ?? ()
> #1 0xb38c04bc in _sasl_canon_user ()
> from libdrillClient.so
> #2 0xb38c0611 in _sasl_canon_user_lookup ()
> from libdrillClient.so
> #3 0xb2c0824e in gssapi_client_mech_step () from /usr/lib/sasl2/libgssapiv2.so
> #4 0xb38ad244 in sasl_client_step ()
> from libdrillClient.so
> #5 0xb37fddde in Drill::SaslAuthenticatorImpl::step(exec::shared::SaslMessage 
> const&, exec::shared::SaslMessage&) const ()
> from libdrillClient.so
> #6 0xb37bdf16 in 
> Drill::DrillClientImpl::processSaslChallenge(Drill::AllocatedBuffer*, 
> Drill::rpc::InBoundRpcMessage const&) ()
> from libdrillClient.so
> #7 0xb37bfa17 in Drill::DrillClientImpl::handleRead(unsigned char*, 
> boost_sb::system::error_code const&, unsigned int) ()
> from libdrillClient.so
> #8 0xb37c0955 in 
> boost_sb::detail::function::void_function_obj_invoker2  boost_sb::_mfi::mf3 boost_sb::system::error_code const&, unsigned int>, 
> boost_sb::_bi::list4, 
> boost_sb::_bi::value, boost_sb::arg<1> (*)(), 
> boost_sb::arg<2> (*)()> >, void, boost_sb::system::error_code const&, 
> unsigned int>::invoke(boost_sb::detail::function::function_buffer&, 
> boost_sb::system::error_code const&, unsigned int) ()
> from libdrillClient.so
> #9 0xb378f17d in boost_sb::function2 const&, unsigned int>::operator()(boost_sb::system::error_code const&, 
> unsigned int) const
> () from libdrillClient.so
> #10 0xb3799bc8 in boost_sb::asio::detail::read_op boost_sb::asio::mutable_buffers_1, boost_sb::asio::mutable_buffer const*, 
> boost_sb::asio::detail::transfer_all_t, boost_sb::function (boost_sb::system::error_code const&, unsigned int)> 
> >::operator()(boost_sb::system::error_code const&, unsigned int, int) ()
> from libdrillClient.so
> #11 0xb379a1c3 in 
> boost_sb::asio::detail::reactive_socket_recv_op  boost_sb::asio::detail::read_op boost_sb::asio::mutable_buffers_1, boost_sb::asio::mutable_buffer const*, 
> boost_sb::asio::detail::transfer_all_t, boost_sb::function (boost_sb::system::error_code const&, unsigned int)> > >::do_complete(void*, 
> boost_sb::asio::detail::scheduler_operation*, boost_sb::system::error_code 
> const&, unsigned int) ()
> from libdrillClient.so
> #12 0xb3788fb8 in 
> boost_sb::asio::detail::epoll_reactor::descriptor_state::do_complete(void*, 
> boost_sb::asio::detail::scheduler_operation*, boost_sb::system::error_code 
> const&, unsigned int) ()
> from libdrillClient.so
> #13 0xb3791948 in boost_sb::asio::io_context::run() ()
> from libdrillClient.so
> #14 0xb37c0e67 in 
> boost_sb::detail::thread_data boost_sb::_mfi::mf0, 
> boost_sb::_bi::list1 > > 
> >::run() ()
> from libdrillClient.so
> #15 0xb3825f5a in thread_proxy ()
> from libdrillClient.so
> #16 0xb6730b3c in start_thread () from /lib/libpthread.so.0
> #17 0xb64db44e in clone () from /lib/libc.so.6
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7041) CompileException happens if a nested coalesce function returns null

2019-03-01 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7041:
---
Labels:   (was: ready-to-commit)

> CompileException happens if a nested coalesce function returns null
> ---
>
> Key: DRILL-7041
> URL: https://issues.apache.org/jira/browse/DRILL-7041
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.16.0
>
>
> *Query:*
> {code:sql}
> select coalesce(coalesce(n_name1, n_name2), n_name) from 
> cp.`tpch/nation.parquet`
> {code}
> *Expected result:*
> Values from "n_name" column should be returned
> *Actual result:*
> An exception happens:
> {code}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> CompileException: Line 57, Column 27: Assignment conversion not possible from 
> type "org.apache.drill.exec.expr.holders.NullableVarCharHolder" to type 
> "org.apache.drill.exec.vector.UntypedNullHolder" Fragment 0:0 Please, refer 
> to logs for more information. [Error Id: e54d5bfd-604d-4a39-b62f-33bb964e5286 
> on userf87d-pc:31010] (org.apache.drill.exec.exception.SchemaChangeException) 
> Failure while attempting to load generated class 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchemaFromInput():573
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():583
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():101 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():143
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186 
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():297 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():284 
> java.security.AccessController.doPrivileged():-2 
> javax.security.auth.Subject.doAs():422 
> org.apache.hadoop.security.UserGroupInformation.doAs():1746 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():284 
> org.apache.drill.common.SelfCleaningRunnable.run():38 
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149 
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624 
> java.lang.Thread.run():748 Caused By 
> (org.apache.drill.exec.exception.ClassTransformationException) 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value: package 
> org.apache.drill.exec.test.generated; import 
> org.apache.drill.exec.exception.SchemaChangeException; import 
> org.apache.drill.exec.expr.holders.BigIntHolder; import 
> org.apache.drill.exec.expr.holders.BitHolder; import 
> org.apache.drill.exec.expr.holders.NullableVarBinaryHolder; import 
> org.apache.drill.exec.expr.holders.NullableVarCharHolder; import 
> org.apache.drill.exec.expr.holders.VarCharHolder; import 
> org.apache.drill.exec.ops.FragmentContext; import 
> org.apache.drill.exec.record.RecordBatch; import 
> org.apache.drill.exec.vector.UntypedNullHolder; import 
> org.apache.drill.exec.vector.UntypedNullVector; import 
> org.apache.drill.exec.vector.VarCharVector; public class ProjectorGen35 { 
> BigIntHolder const6; BitHolder constant9; UntypedNullHolder constant13; 
> VarCharVector vv14; UntypedNullVector vv19; public void doEval(int inIndex, 
> int outIndex) throws SchemaChangeException { { UntypedNullHolder out0 = new 
> UntypedNullHolder(); if (constant9 .value == 1) { if (constant13 .isSet!= 0) 
> { out0 = constant13; } } else { VarCharHolder out17 = new VarCharHolder(); { 
> out17 .buffer = vv14 .getBuffer(); long startEnd = vv14 
> .getAccessor().getStartEnd((inIndex)); out17 .start = ((int) startEnd); out17 
> .end = ((int)(startEnd >> 32)); } // start of eval portion of 
> convertToNullableVARCHAR function. // NullableVarCharHolder out18 = new 
> NullableVarCharHolder(); { final NullableVarCharHolder output = new 
> NullableVarCharHolder(); VarCharHolder input = out17; 
> GConvertToNullableVarCharHolder_eval: { output.isSet = 1; output.start = 
> input.start; output.end = input.end; output.buffer = input.buffer; } out18 = 
> output; } // end of eval portion of convertToNullableVARCHAR function. 
> // if (out18 .isSet!= 0) { out0 = out18; } } if (!(out0 .isSet == 0)) { 
> vv19 .getMutator().set((outIndex), out0 .isSet, out0); } } } public void 
> doSetup(FragmentContext context, RecordBatch incoming, RecordBatch outgoing) 
> throws SchemaChangeException { { UntypedNullHolder out1 = new 
>

[jira] [Created] (DRILL-7068) Support of memory adjustment framework for resource management with Queues

2019-02-28 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-7068:
--

 Summary: Support of memory adjustment framework for resource 
management with Queues
 Key: DRILL-7068
 URL: https://issues.apache.org/jira/browse/DRILL-7068
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Add support for memory adjustment framework based on queue configuration for a 
query. 
It also addresses the re-factoring the existing queue based resource management 
in Drill.
For more details on the design please refer to the parent JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7068) Support memory adjustment framework for resource management with Queues

2019-02-28 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7068:
---
Summary: Support memory adjustment framework for resource management with 
Queues  (was: Support of memory adjustment framework for resource management 
with Queues)

> Support memory adjustment framework for resource management with Queues
> ---
>
> Key: DRILL-7068
> URL: https://issues.apache.org/jira/browse/DRILL-7068
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> Add support for memory adjustment framework based on queue configuration for 
> a query. 
> It also addresses the re-factoring the existing queue based resource 
> management in Drill.
> For more details on the design please refer to the parent JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels: ready-to-commit  (was: )

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels:   (was: ready-to-commit)

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-29 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels: ready-to-commit  (was: )

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750380#comment-16750380
 ] 

Hanumath Rao Maduri commented on DRILL-6997:


The following query shows the difference in the plan when semijoin is true and 
false.
{code:java}

0: jdbc:drill:zk=local> alter session set `planner.enable_semijoin` = true;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | planner.enable_semijoin updated.  |
+---+---+
1 row selected (0.073 seconds)
0: jdbc:drill:zk=local> select * from 
dfs.`/home/mapr/data/sf1/parquet/web_sales` ws1, 
dfs.`/home/mapr/data/sf1/parquet/web_sales` ws2  where ws1.ws_ship_date_sk in 
(select dd.d_date_sk from dfs.`/home/mapr/data/sf1/parquet/date_dim` dd) and 
ws1.ws_order_number = ws2.ws_order_number;
+--+--+
|   text
   |   json 
  |
+--+--+
| DrillScreenRel
  DrillProjectRel(**=[$0], **0=[$3])
DrillSemiJoinRel(condition=[=($1, $6)], joinType=[inner])
  DrillJoinRel(condition=[=($2, $5)], joinType=[inner])
DrillScanRel(table=[[dfs, /home/mapr/data/sf1/parquet/web_sales]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/home/mapr/data/sf1/parquet/web_sales]], 
selectionRoot=file:/home/mapr/data/sf1/parquet/web_sales, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, columns=[`**`, `ws_ship_date_sk`, 
`ws_order_number`]]])
DrillScanRel(table=[[dfs, /home/mapr/data/sf1/parquet/web_sales]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/home/mapr/data/sf1/parquet/web_sales]], 
selectionRoot=file:/home/mapr/data/sf1/parquet/web_sales, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, columns=[`**`, `ws_ship_date_sk`, 
`ws_order_number`]]])
  DrillScanRel(table=[[dfs, /home/mapr/data/sf1/parquet/date_dim]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/home/mapr/data/sf1/parquet/date_dim]], 
selectionRoot=file:/home/mapr/data/sf1/parquet/date_dim, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, columns=[`d_date_sk`]]])

0: jdbc:drill:zk=local> alter session set `planner.enable_semijoin` = false;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | planner.enable_semijoin updated.  |
+---+---+
1 row selected (0.077 seconds)
0: jdbc:drill:zk=local> select * from 
dfs.`/home/mapr/data/sf1/parquet/web_sales` ws1, 
dfs.`/home/mapr/data/sf1/parquet/web_sales` ws2  where ws1.ws_ship_date_sk in 
(select dd.d_date_sk from dfs.`/home/mapr/data/sf1/parquet/date_dim` dd) and 
ws1.ws_order_number = ws2.ws_order_number;
+--+--+
|   text
   |   json 
  |
+--+--+
| DrillScreenRel
  DrillProjectRel(**=[$0], **0=[$3])
DrillProjectRel(**=[$0], ws_ship_date_sk=[$1], ws_order_number=[$2], 
**0=[$4], ws_ship_date_sk0=[$5], ws_order_number0=[$6], d_date_sk=[$3])
  DrillJoinRel(condition=[=($2, $6)], joinType=[inner])
DrillJoinRel(condition=[=($1, $3)], joinType=[inner])
  DrillScanRel(table=[[dfs, /home/mapr/data/sf1/parquet/web_sales]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/home/mapr/data/sf1/parquet/web_sales]], 
selectionRoot=file:/home/mapr/data/sf1/parquet/web_sales, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, columns=[`**`, `ws_ship_date_sk`, 
`ws_order_number`]]])
  DrillAggregateRel(group=[{0}])
DrillScanRel(table=[[dfs, /home/mapr/data/sf1/parquet/date_dim]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/home/mapr/data/sf1/parquet/date_dim]], 
selectionRoot=file:/home/mapr/data/sf1/parquet/date_dim, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, columns=[`d_date_sk`]]])
DrillScanRel(table=[[dfs, /home/mapr/data/sf1/parquet/web_sales]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath

[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Attachment: 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Attachment: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6997:
--

 Summary: Semijoin is changing the join ordering for some tpcds 
queries.
 Key: DRILL-6997
 URL: https://issues.apache.org/jira/browse/DRILL-6997
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
DRILL-6798: Planner changes to support semi-join.
{code:java}
with ws_wh as
 (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
 from web_sales ws1,web_sales ws2
 where ws1.ws_order_number = ws2.ws_order_number
 and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
 [_LIMITA] select [_LIMITB]
 count(distinct ws_order_number) as "order count"
 ,sum(ws_ext_ship_cost) as "total shipping cost"
 ,sum(ws_net_profit) as "total net profit"
 from
 web_sales ws1
 ,date_dim
 ,customer_address
 ,web_site
 where
 d_date between '[YEAR]-[MONTH]-01' and
 (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
 and ws1.ws_ship_date_sk = d_date_sk
 and ws1.ws_ship_addr_sk = ca_address_sk
 and ca_state = '[STATE]'
 and ws1.ws_web_site_sk = web_site_sk
 and web_company_name = 'pri'
 and ws1.ws_order_number in (select ws_order_number
 from ws_wh)
 and ws1.ws_order_number in (select wr_order_number
 from web_returns,ws_wh
 where wr_order_number = ws_wh.ws_order_number)
 order by count(distinct ws_order_number)
 [_LIMITC];
{code}
 I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join disabled. 
Both are executed with commit id 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and 
scale factor 100.

The plan with semi-join enabled has moved the first hash join:

and ws1.ws_order_number in (select ws_order_number
 from ws_wh)
 It used to be on the build side of the first HJ on the left hand side (04-05). 
It is now on the build side of the fourth HJ on the left hand side (01-13).

The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
that takes 10 seconds to execute. But all the fragments take about the same 
amount of time.

The plan with semi-join enabled has two HJ that process 1B rows while the plan 
with semi-joins disabled has one HJ that processes 1B rows.

The plan with semi-join enabled has several senders and receivers that wait 
more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 17-00). 
When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6925) Unable to generate Protobuf

2019-01-23 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6925:
---
Labels:   (was: ready-to-commit)

> Unable to generate Protobuf
> ---
>
> Key: DRILL-6925
> URL: https://issues.apache.org/jira/browse/DRILL-6925
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.15.0
>
>
> When generating protocol buffers from protocol dir using {{mvn clean 
> process-sources -P proto-compile}} the following error occurs:
> {noformat}
> [ERROR] Failed to execute goal com.mycila:license-maven-plugin:3.0:format 
> (proto-format) on project drill-protocol: Execution proto-format of goal 
> com.mycila:license-maven-plugin:3.0:format failed: Cannot read header 
> document header. Cause: Resource header not found in file system, classpath 
> or URL: no protocol: header -> [Help 1]
> [ERROR] 
> {noformat}
> Regression after DRILL-6895.
> Also need to check C++ protobuf generation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6924) Validation error - Column ambiguous in join queries

2019-01-07 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-6924:
--

Assignee: Hanumath Rao Maduri

> Validation error - Column ambiguous in join queries
> ---
>
> Key: DRILL-6924
> URL: https://issues.apache.org/jira/browse/DRILL-6924
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Kedar Sankar Behera
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> Happen with not only with cross join but others as well 
> the examples are given below.
> q1 -
> {code}
> select * from customer c cross join (select * from orders, lineitem where 
> orders.o_custkey = lineitem.l_partkey ) as o1 order by o1.o_orderkey limit 2;
> {code}
> Result - 
> {code}
> Error: VALIDATION ERROR: From line 1, column 170 to line 1, column 171: 
> Column 'o_orderkey' is ambiguous
> [Error Id: eff986cd-50e1-47e6-a848-b0976bf44bff on drill182:31010] 
> (state=,code=0)
> java.sql.SQLException: VALIDATION ERROR: From line 1, column 170 to line 1, 
> column 171: Column 'o_orderkey' is ambiguous
> [Error Id: eff986cd-50e1-47e6-a848-b0976bf44bff on drill182:31010]
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:536)
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:608)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>  at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>  at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
>  at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120)
>  at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>  at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196)
>  at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>  at 
> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>  at sqlline.Commands.execute(Commands.java:814)
>  at sqlline.Commands.sql(Commands.java:754)
>  at sqlline.SqlLine.dispatch(SqlLine.java:646)
>  at sqlline.SqlLine.begin(SqlLine.java:510)
>  at sqlline.SqlLine.start(SqlLine.java:233)
>  at sqlline.SqlLine.main(SqlLine.java:175)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION 
> ERROR: From line 1, column 170 to line 1, column 171: Column 'o_orderkey' is 
> ambiguous
> [Error Id: eff986cd-50e1-47e6-a848-b0976bf44bff on drill182:31010]
>  at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>  at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>  at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>  at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>  at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>  at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
>

[jira] [Updated] (DRILL-6844) Query with ORDER BY DESC on indexed column does not pick secondary index

2018-11-15 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6844:
---
Labels: ready-to-commit  (was: )

> Query with ORDER BY DESC on indexed column does not pick secondary index
> 
>
> Key: DRILL-6844
> URL: https://issues.apache.org/jira/browse/DRILL-6844
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Query with ORDER BY DESC on indexed column does not pick secondary index
> {noformat}
> // Query that uses the secondary index defined on ts.
> 0: jdbc:drill:schema=dfs.tmp> explain plan for 
> . . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts 
> limit 1;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(ts=[$0])
> 00-02 SelectionVectorRemover
> 00-03 Limit(fetch=[1])
> 00-04 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan 
> [ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null, 
> indexName=ts], columns=[`ts`], limit=1, maxwidth=125]])
> {noformat}
> // Same query with ORDER BY ts DESC does not use the secondary index defined 
> on ts.
> 0: jdbc:drill:schema=dfs.tmp> explain plan for 
> . . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts desc 
> limit 1;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(ts=[$0])
> 00-02 SelectionVectorRemover
> 00-03 Limit(fetch=[1])
> 00-04 SingleMergeExchange(sort0=[0 DESC])
> 01-01 OrderedMuxExchange(sort0=[0 DESC])
> 02-01 SelectionVectorRemover
> 02-02 Limit(fetch=[1])
> 02-03 SelectionVectorRemover
> 02-04 TopN(limit=[1])
> 02-05 HashToRandomExchange(dist0=[[$0]])
> 03-01 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan 
> [ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null], 
> columns=[`ts`], maxwidth=8554]])
> {noformat}
> { noformat}
> Index definition is,
> maprcli table index list -path /c8/test3 -json
> {
>  "timestamp":1538066303932,
>  "timeofday":"2018-09-27 04:38:23.932 GMT+ PM",
>  "status":"OK",
>  "total":2,
>  "data":[
>  {
>  "cluster":"c8",
>  "type":"maprdb.si",
>  "indexFid":"2176.68.131294",
>  "indexName":"ts",
>  "hashed":false,
>  "indexState":"REPLICA_STATE_REPLICATING",
>  "idx":1,
>  "indexedFields":"ts:ASC",
>  "isUptodate":false,
>  "minPendingTS":1538066077,
>  "maxPendingTS":1538066077,
>  "bytesPending":0,
>  "putsPending":0,
>  "bucketsPending":1,
>  "copyTableCompletionPercentage":100,
>  "numTablets":32,
>  "numRows":80574368,
>  "totalSize":4854052160
>  },
>  {
>  "cluster":"c8",
>  "type":"maprdb.si",
>  "indexFid":"2176.72.131302",
>  "indexName":"ts_desc",
>  "hashed":false,
>  "indexState":"REPLICA_STATE_REPLICATING",
>  "idx":2,
>  "indexedFields":"ts:DESC",
>  "isUptodate":false,
>  "minPendingTS":1538066077,
>  "maxPendingTS":1538066077,
>  "bytesPending":0,
>  "putsPending":0,
>  "bucketsPending":1,
>  "copyTableCompletionPercentage":100,
>  "numTablets":32,
>  "numRows":80081344,
>  "totalSize":4937154560
>  }
>  ]
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6844) Query with ORDER BY DESC on indexed column does not pick secondary index

2018-11-10 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6844:
--

 Summary: Query with ORDER BY DESC on indexed column does not pick 
secondary index
 Key: DRILL-6844
 URL: https://issues.apache.org/jira/browse/DRILL-6844
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Query with ORDER BY DESC on indexed column does not pick secondary index

{noformat}

// Query that uses the secondary index defined on ts.

0: jdbc:drill:schema=dfs.tmp> explain plan for 
. . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts limit 
1;
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(ts=[$0])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[1])
00-04 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null, 
indexName=ts], columns=[`ts`], limit=1, maxwidth=125]])
{noformat}

// Same query with ORDER BY ts DESC does not use the secondary index defined on 
ts.

0: jdbc:drill:schema=dfs.tmp> explain plan for 
. . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts desc 
limit 1;
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(ts=[$0])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[1])
00-04 SingleMergeExchange(sort0=[0 DESC])
01-01 OrderedMuxExchange(sort0=[0 DESC])
02-01 SelectionVectorRemover
02-02 Limit(fetch=[1])
02-03 SelectionVectorRemover
02-04 TopN(limit=[1])
02-05 HashToRandomExchange(dist0=[[$0]])
03-01 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null], 
columns=[`ts`], maxwidth=8554]])
{noformat}

{ noformat}

Index definition is,
maprcli table index list -path /c8/test3 -json

{
 "timestamp":1538066303932,
 "timeofday":"2018-09-27 04:38:23.932 GMT+ PM",
 "status":"OK",
 "total":2,
 "data":[
 {
 "cluster":"c8",
 "type":"maprdb.si",
 "indexFid":"2176.68.131294",
 "indexName":"ts",
 "hashed":false,
 "indexState":"REPLICA_STATE_REPLICATING",
 "idx":1,
 "indexedFields":"ts:ASC",
 "isUptodate":false,
 "minPendingTS":1538066077,
 "maxPendingTS":1538066077,
 "bytesPending":0,
 "putsPending":0,
 "bucketsPending":1,
 "copyTableCompletionPercentage":100,
 "numTablets":32,
 "numRows":80574368,
 "totalSize":4854052160
 },
 {
 "cluster":"c8",
 "type":"maprdb.si",
 "indexFid":"2176.72.131302",
 "indexName":"ts_desc",
 "hashed":false,
 "indexState":"REPLICA_STATE_REPLICATING",
 "idx":2,
 "indexedFields":"ts:DESC",
 "isUptodate":false,
 "minPendingTS":1538066077,
 "maxPendingTS":1538066077,
 "bytesPending":0,
 "putsPending":0,
 "bucketsPending":1,
 "copyTableCompletionPercentage":100,
 "numTablets":32,
 "numRows":80081344,
 "totalSize":4937154560
 }
 ]
}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6758) Hash Join should not return the join columns when they are not needed downstream

2018-10-30 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6758:
---
Fix Version/s: (was: 1.15.0)
   1.16.0

> Hash Join should not return the join columns when they are not needed 
> downstream
> 
>
> Key: DRILL-6758
> URL: https://issues.apache.org/jira/browse/DRILL-6758
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.16.0
>
>
> Currently the Hash-Join operator returns all its (both sides) incoming 
> columns. In cases where the join columns are not used further downstream, 
> this is a waste (allocating vectors, copying each value, etc).
>   Suggestion: Have the planner pass this information to the Hash-Join 
> operator, to enable skipping the return of these columns.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6798) Planner changes to support semi-join

2018-10-22 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6798:
---
Priority: Major  (was: Minor)

> Planner changes to support semi-join
> 
>
> Key: DRILL-6798
> URL: https://issues.apache.org/jira/browse/DRILL-6798
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6798) Planner changes to support semi-join

2018-10-22 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6798:
---
Component/s: Query Planning & Optimization

> Planner changes to support semi-join
> 
>
> Key: DRILL-6798
> URL: https://issues.apache.org/jira/browse/DRILL-6798
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Hanumath Rao Maduri (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636073#comment-16636073
 ] 

Hanumath Rao Maduri commented on DRILL-786:
---

IMO, the option 3 is what the short term solution for this problem was. i.e 
Treat the explicit CROSS JOIN and implicit cross join same. Planner should 
generate the plan when the flag is enabled (which is true by default) for 
scalar query cases. Otherwise it should throw an error.

I am fine with the option 3 but I am not sure if changing the default value is 
needed.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
>

[jira] [Commented] (DRILL-6734) Unable to find value vector of path `EXPR$0`, returning null instance.

2018-09-28 Thread Hanumath Rao Maduri (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632277#comment-16632277
 ] 

Hanumath Rao Maduri commented on DRILL-6734:


[~appler] Can you please post the plan for this query. 

Please use the following queries to get the plan.
{code:java}
explain plan including all attributes for select count(*) from 
ngmysql.information_schema.`PROCESSLIST`{code}
 and also with alias name.
{code:java}
explain plan including all attributes for select count(*) as num from 
ngmysql.information_schema.`PROCESSLIST`{code}

> Unable to find value vector of path `EXPR$0`, returning null instance.
> --
>
> Key: DRILL-6734
> URL: https://issues.apache.org/jira/browse/DRILL-6734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.14.0
> Environment: Apache Drill version 1.14.0 running on CentOS 7.0.1406.
> MySQL version 5.5.43 running on CentOS 6.4.
> MySQL connector/j version 5.1.44.
>Reporter: Cheolgoo Kang
>Priority: Major
>
> Expressions in a query against JDBC without alias returns null as their value.
> I was trying to run this sample query to retrieve count of a MySQL table 
> connected to Drill through the RDBMS storage plugin:
> {code}
> select count(*) from ngmysql.information_schema.`PROCESSLIST`
> {code}
> which returns
> {code}
> EXPR$0   
> -
>
> {code}
> and you could find this warning from the log:
> {quote}
> Unable to find value vector of path `EXPR$0`, returning null instance.
> {quote}
> But it works fine if you give an alias to the expression like this:
> {code}
> select count(*) as num from ngmysql.information_schema.`PROCESSLIST`;
> {code}
> which would end up giving this:
> {code}
> num   
> --
> 16
> {code}
> Here's the portion of logs regarding the sample query:
> {code}
> 2018-09-07 21:44:52,709 [246d0eaa-f8e6-9536-af0c-1df3932cce9f:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 246d0eaa-f8e6-9536-af0c-1df3932cce9f: select count(*) from 
> ngmysql.information_schema.`PROCESSLIST`
> 2018-09-07 21:44:52,752 [246d0eaa-f8e6-9536-af0c-1df3932cce9f:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 246d0eaa-f8e6-9536-af0c-1df3932cce9f:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2018-09-07 21:44:52,753 [246d0eaa-f8e6-9536-af0c-1df3932cce9f:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 246d0eaa-f8e6-9536-af0c-1df3932cce9f:0:0: State to report: RUNNING
> 2018-09-07 21:44:52,756 [246d0eaa-f8e6-9536-af0c-1df3932cce9f:frag:0:0] WARN  
> o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path 
> `EXPR$0`, returning null instance.
> 2018-09-07 21:44:52,759 [246d0eaa-f8e6-9536-af0c-1df3932cce9f:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 246d0eaa-f8e6-9536-af0c-1df3932cce9f:0:0: State change requested RUNNING --> 
> FINISHED
> 2018-09-07 21:44:52,760 [246d0eaa-f8e6-9536-af0c-1df3932cce9f:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 246d0eaa-f8e6-9536-af0c-1df3932cce9f:0:0: State to report: FINISHED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6716) NullPointerException for select query with alias on * symbol

2018-08-28 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-6716:
--

Assignee: Hanumath Rao Maduri

> NullPointerException for select query with alias on * symbol
> 
>
> Key: DRILL-6716
> URL: https://issues.apache.org/jira/browse/DRILL-6716
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Denys Ordynskiy
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Attachments: drillbit.log
>
>
> When I added alias for * in SELECT query in Drill WebUI and in sqlline:
> SELECT * as testAlias FROM cp.`employee.json` LIMIT 2;
> *Actual result:*
>  {color:#ff}org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: *NullPointerException* [Error Id: 
> 58d090f2-cf9a-437a-977d-8a64f50d8e4b on maprhost:31010]{color}
> *Expected result:*
> {color:#33}Bad query error information "PARSE ERROR: ..."{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6671) Multi level lateral unnest join is throwing an exception during materializing the plan.

2018-08-07 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6671:
--

 Summary: Multi level lateral unnest join is throwing an exception 
during materializing the plan.
 Key: DRILL-6671
 URL: https://issues.apache.org/jira/browse/DRILL-6671
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


testMultiUnnestAtSameLevel in TestE2EUnnestAndLateral is throwing an execution 
in Materializer.java. This is due to incorrect matching of Unnest and Lateral 
join. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-08-01 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6645:
---
Issue Type: Improvement  (was: Bug)

> Transform TopN in Lateral Unnest pipeline to Sort and Limit.
> 
>
> Key: DRILL-6645
> URL: https://issues.apache.org/jira/browse/DRILL-6645
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>
> TopN operator is not supported in Lateral Unnest pipeline. Hence transform 
> the TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-07-27 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6645:
--

 Summary: Transform TopN in Lateral Unnest pipeline to Sort and 
Limit.
 Key: DRILL-6645
 URL: https://issues.apache.org/jira/browse/DRILL-6645
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.15.0


TopN operator is not supported in Lateral Unnest pipeline. Hence transform the 
TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6636) Planner side changes to use PartitionLimitBatch in place of LimitBatch

2018-07-27 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6636:
---
Summary: Planner side changes to use PartitionLimitBatch in place of 
LimitBatch  (was: Planner side changes to use PartitionLimitBatch and not have 
TopN in Lateral/Unnest subquery)

> Planner side changes to use PartitionLimitBatch in place of LimitBatch
> --
>
> Key: DRILL-6636
> URL: https://issues.apache.org/jira/browse/DRILL-6636
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6475) Unnest: Null fieldId Pointer

2018-07-10 Thread Hanumath Rao Maduri (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539120#comment-16539120
 ] 

Hanumath Rao Maduri commented on DRILL-6475:


[~priteshm] I think I should be able to open a PR by Thu for this JIRA.

> Unnest: Null fieldId Pointer 
> -
>
> Key: DRILL-6475
> URL: https://issues.apache.org/jira/browse/DRILL-6475
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
>  Executing the following (in TestE2EUnnestAndLateral.java) causes an NPE as 
> `fieldId` is null in `schemaChanged()`: 
> {code}
> @Test
> public void testMultipleBatchesLateral_twoUnnests() throws Exception {
>  String sql = "SELECT t5.l_quantity FROM dfs.`lateraljoin/multipleFiles/` t, 
> LATERAL " +
>  "(SELECT t2.ordrs FROM UNNEST(t.c_orders) t2(ordrs)) t3(ordrs), LATERAL " +
>  "(SELECT t4.l_quantity FROM UNNEST(t3.ordrs) t4(l_quantity)) t5";
>  test(sql);
> }
> {code}
>  
> And the error is:
> {code}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 25f42765-8f68-418e-840a-ffe65788e1e2 on 10.254.130.25:31020]
> (java.lang.NullPointerException) null
>  
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged():381
>  org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext():199
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides():241
>  org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema():264
>  org.apache.drill.exec.record.AbstractRecordBatch.next():152
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1657
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>  java.lang.Thread.run():745 (state=,code=0)
> {code} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6545:
---
Affects Version/s: (was: 1.13.0)

> Projection Push down into Lateral Join operator.
> 
>
> Key: DRILL-6545
> URL: https://issues.apache.org/jira/browse/DRILL-6545
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
> For the Lateral’s logical and physical plan node, we would need to add an 
> output RowType such that a Projection can be pushed down to Lateral. 
> Currently, Lateral will produce all columns from left and right and it 
> depends on a subsequent Project to eliminate unneeded columns. However, this 
> will blow up the memory use of Lateral since each column from the left will 
> be replicated N times based on N rows coming from UNNEST. We can have a 
> ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL 
> but keeps the expression evalulations as part of the Project above the 
> Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6545:
---
Issue Type: Improvement  (was: Bug)

> Projection Push down into Lateral Join operator.
> 
>
> Key: DRILL-6545
> URL: https://issues.apache.org/jira/browse/DRILL-6545
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
> For the Lateral’s logical and physical plan node, we would need to add an 
> output RowType such that a Projection can be pushed down to Lateral. 
> Currently, Lateral will produce all columns from left and right and it 
> depends on a subsequent Project to eliminate unneeded columns. However, this 
> will blow up the memory use of Lateral since each column from the left will 
> be replicated N times based on N rows coming from UNNEST. We can have a 
> ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL 
> but keeps the expression evalulations as part of the Project above the 
> Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6545:
--

 Summary: Projection Push down into Lateral Join operator.
 Key: DRILL-6545
 URL: https://issues.apache.org/jira/browse/DRILL-6545
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.13.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


For the Lateral’s logical and physical plan node, we would need to add an 
output RowType such that a Projection can be pushed down to Lateral. Currently, 
Lateral will produce all columns from left and right and it depends on a 
subsequent Project to eliminate unneeded columns. However, this will blow up 
the memory use of Lateral since each column from the left will be replicated N 
times based on N rows coming from UNNEST. We can have a 
ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL but 
keeps the expression evalulations as part of the Project above the Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6502) Rename CorrelatePrel to LateralJoinPrel

2018-06-15 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6502:
---
Summary: Rename CorrelatePrel to LateralJoinPrel  (was: Rename 
CorrelatePrel to LateralJoinPrel as currently correlatePrel is a physical 
operator for LateralJoin)

> Rename CorrelatePrel to LateralJoinPrel
> ---
>
> Key: DRILL-6502
> URL: https://issues.apache.org/jira/browse/DRILL-6502
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.14.0
>
>
> Currently in Drill correlatePrel is a physical relation operator for 
> LateralJoin implementation. Explain plan shows CorrelatePrel which can be 
> confusing. Hence it is good to rename this operator to LateralJoinPrel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6502) Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel is physical relation for LateralJoin

2018-06-15 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6502:
--

 Summary: Rename CorrelatePrel to LateralJoinPrel as currently 
correlatePrel is physical relation for LateralJoin
 Key: DRILL-6502
 URL: https://issues.apache.org/jira/browse/DRILL-6502
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently in Drill correlatePrel is a physical relation operator for 
LateralJoin implementation. Explain plan shows CorrelatePrel which can be 
confusing. Hence it is good to rename this operator to LateralJoinPrel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6502) Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel is a physical operator for LateralJoin

2018-06-15 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6502:
---
Summary: Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel 
is a physical operator for LateralJoin  (was: Rename CorrelatePrel to 
LateralJoinPrel as currently correlatePrel is physical relation for LateralJoin)

> Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel is a 
> physical operator for LateralJoin
> -
>
> Key: DRILL-6502
> URL: https://issues.apache.org/jira/browse/DRILL-6502
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.14.0
>
>
> Currently in Drill correlatePrel is a physical relation operator for 
> LateralJoin implementation. Explain plan shows CorrelatePrel which can be 
> confusing. Hence it is good to rename this operator to LateralJoinPrel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6476) Generate explain plan which shows relation between Lateral and the corresponding Unnest.

2018-06-07 Thread Hanumath Rao Maduri (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504964#comment-16504964
 ] 

Hanumath Rao Maduri commented on DRILL-6476:


Explain plan for the following lateral unnest query.

{code}

explain plan for select * from (select customer.orders as c_orders from  
dfs.`/home/mapr/LATERAL/drill/exec/java-exec/src/test/resources/lateraljoin/nested-customer.parquet`
 customer ) t1,  lateral ( select t.ord.o_lineitems as items from 
unnest(t1.c_orders) t(ord) ) t2, lateral (select count(*)  from 
unnest(t2.items) t3(item)) d1;

 

| 00-00 Screen : rowType = RecordType(ANY c_orders, ANY items, BIGINT EXPR$0): 
rowcount = 1.0, cumulative cost = \{15.1 rows, 74.1 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 6223
00-01 Project(c_orders=[$0], items=[$1], EXPR$0=[$2]) : rowType = 
RecordType(ANY c_orders, ANY items, BIGINT EXPR$0): rowcount = 1.0, cumulative 
cost = \{15.0 rows, 74.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6222
00-02 Correlate(correlation=[$cor1], joinType=[inner], requiredColumns=[\{1}]) 
: rowType = RecordType(ANY orders, ANY items, BIGINT EXPR$0): rowcount = 1.0, 
cumulative cost = \{14.0 rows, 71.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
6221
00-04 Correlate(correlation=[$cor0], joinType=[inner], requiredColumns=[\{0}]) 
: rowType = RecordType(ANY orders, ANY items): rowcount = 1.0, cumulative cost 
= \{10.0 rows, 38.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6218
00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/home/mapr/LATERAL/drill/exec/java-exec/src/test/resources/lateraljoin/nested-customer.parquet]],
 
selectionRoot=file:/home/mapr/LATERAL/drill/exec/java-exec/src/test/resources/lateraljoin/nested-customer.parquet,
 numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`orders`]]]) : 
rowType = RecordType(ANY orders): rowcount = 4.0, cumulative cost = \{4.0 rows, 
4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6216
00-06 Project(items=[ITEM($0, 'o_lineitems')]) : rowType = RecordType(ANY 
items): rowcount = 1.0, cumulative cost = \{2.0 rows, 2.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 6217
00-09 Unnest [SrcOp: (00-04)] : rowType = RecordType(ANY c_orders): rowcount = 
1.0, cumulative cost = \{1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 6055
00-03 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT 
EXPR$0): rowcount = 1.0, cumulative cost = \{3.0 rows, 17.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 6220
00-05 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 1.0, 
cumulative cost = \{2.0 rows, 5.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
6219
00-08 Unnest [SrcOp: (00-02)]: rowType = RecordType(ANY items): rowcount = 1.0, 
cumulative cost = \{1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
6058
 {code}

> Generate explain plan which shows relation between Lateral and the 
> corresponding Unnest.
> 
>
> Key: DRILL-6476
> URL: https://issues.apache.org/jira/browse/DRILL-6476
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> Currently, explain plan doesn't show that which lateral and  unnest node's 
> are related. This information is good to have so that the visual plan can use 
> it and show the relation visually.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6476) Generate explain plan which shows relation between Lateral and the corresponding Unnest.

2018-06-07 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6476:
--

 Summary: Generate explain plan which shows relation between 
Lateral and the corresponding Unnest.
 Key: DRILL-6476
 URL: https://issues.apache.org/jira/browse/DRILL-6476
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Currently, explain plan doesn't show that which lateral and  unnest node's are 
related. This information is good to have so that the visual plan can use it 
and show the relation visually.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6475) Unnest: Null fieldId Pointer

2018-06-06 Thread Hanumath Rao Maduri (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-6475:
--

Assignee: Hanumath Rao Maduri  (was: Parth Chandra)

> Unnest: Null fieldId Pointer 
> -
>
> Key: DRILL-6475
> URL: https://issues.apache.org/jira/browse/DRILL-6475
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
>  Executing the following (in TestE2EUnnestAndLateral.java) causes an NPE as 
> `fieldId` is null in `schemaChanged()`: 
> {code}
> @Test
> public void testMultipleBatchesLateral_twoUnnests() throws Exception {
>  String sql = "SELECT t5.l_quantity FROM dfs.`lateraljoin/multipleFiles/` t, 
> LATERAL " +
>  "(SELECT t2.ordrs FROM UNNEST(t.c_orders) t2(ordrs)) t3(ordrs), LATERAL " +
>  "(SELECT t4.l_quantity FROM UNNEST(t3.ordrs) t4(l_quantity)) t5";
>  test(sql);
> }
> {code}
>  
> And the error is:
> {code}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 25f42765-8f68-418e-840a-ffe65788e1e2 on 10.254.130.25:31020]
> (java.lang.NullPointerException) null
>  
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged():381
>  org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext():199
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides():241
>  org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema():264
>  org.apache.drill.exec.record.AbstractRecordBatch.next():152
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1657
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>  java.lang.Thread.run():745 (state=,code=0)
> {code} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6456) Planner shouldn't create any exchanges on the right side of Lateral Join.

2018-05-30 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6456:
--

 Summary: Planner shouldn't create any exchanges on the right side 
of Lateral Join.
 Key: DRILL-6456
 URL: https://issues.apache.org/jira/browse/DRILL-6456
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently, there is no restriction placed on right side of the LateralJoin. 
This is causing planner to generate an Exchange when there are operators like 
(Agg, Limit, Sort etc). 

Due to this unnest operator cannot retrieve the row from lateral's left side to 
process the pipeline further. Enhance the planner to not generate exchanges on 
the right side of the LateralJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6431) Unnest operator requires table and a single column alias to be specified.

2018-05-19 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6431:
--

 Summary: Unnest operator requires table and a single column alias 
to be specified.
 Key: DRILL-6431
 URL: https://issues.apache.org/jira/browse/DRILL-6431
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization, SQL Parser
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently, unnest operator is not required to specify alias neither for table 
name nor column name. This has some implications on what name the unnest 
operator output column should use. One can use a common name like "unnest" as 
the output name. It means, customers need to be educated on what to expect from 
unnest operator. This might confuse some customers and also prone to introduce 
errors in the query.

The design decision for DRILL is that unnest always produces either a scalar 
column or a map (depending upon the input schema for it), but it is always a 
single column. 

Given this scenario, it is better to enforce the requirement that unnest 
operator requires a table alias and a column alias(single column). This can 
help to disambiguate the column and further can easily be referenced in the 
query.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.

2018-04-07 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429403#comment-16429403
 ] 

Hanumath Rao Maduri commented on DRILL-6312:


Please find the mail thread which discusses various issues and approaches to 
deal with discovery of schema.

{noformat}
Hi Hanu,

The problem with views as is, even with casts, is that the casting comes too 
late to resolve he issues I highlighted in earlier messages. Ted's cast 
push-down idea causes the conversion to happen at read time so that we can, 
say, cast a string to an int, or cast a null to the proper type.

Today, if we use a cast, such as SELECT cast(a AS INT) FROM myTable then we get 
a DAG that has tree parts (to keep things simple):

* Scan the data, using types inferred from the data itself
* In a Filter operator, convert the type of data to INT
* In Screen, return the result to the user

If the type is ambiguous in the file, then the first step above fails; data 
never gets far enough for the Filter to kick in and apply the cast. Also, if a 
file contains a run of nulls, the scanner will choose Nullable Int, then fail 
when it finds, say, a string.

The key point is that the cast push-down means that the query will not fail due 
to dicey files: the cast resolves the ambiguity. If we push the cast down, then 
it is the SCAN operator that resolves the conflict and does the cast; avoiding 
the failures we've been discussing.

I like the idea you seem to be proposing: cascading views. Have a table view 
that cleans up each table. Then, these can be combined in higher-order views 
for specialized purposes.

The beauty of the cast push-down idea is that no metadata is needed other than 
the query. If the user wants metadata, they use existing views (that contain 
the casts and cause the cast push-down.)

This seems like such a simple, elegant solution that we could try it out 
quickly (if we get past the planner issues Aman mentioned.) In fact, the new 
scan operator code (done as part of the batch sizing work) already has a 
prototype mechanism for type hints. If the type hint is provided to the 
scanner, it uses them, otherwise it infers the type. We'd just hook up the cast 
push down data to that prototype and we could try out the result quickly. (The 
new scan operator is still in my private branch, in case anyone goes looking 
for it...)

Some of your discussion talks about automatically inferring the schema. I 
really don't think we need to do that. The hint (cast push-down) is sufficient 
to resolve ambiguities in the existing scan-time schema inference.

The syntax trick would be to find a way to provide hints just for those columns 
that are issues. If I have a table with columns a, b, ... z, but only b is a 
problem, I don't want to have to do:

SELECT a, CAST(b AS INT), c, ... z FROM myTable

Would be great if we could just do:

SELECT *, CAST(b AS INT) FROM myTable

I realize the above has issues; the key idea is: provide casts only for the 
problem fields without spelling out all fields.

If we really want to get fancy, we can do UDF push down for the complex cases 
you mentioned. Maybe:

SELECT *, CAST(b AS INT), parseCode(c) ...

We are diving into design here; maybe you can file a JIRA and we can shift 
detailed design discussion to that JIRA. Salim already has one related to 
schema change errors, which was why the "Death" article caught my eye.

Thanks,
- Paul





On Friday, April 6, 2018, 4:59:40 PM PDT, Hanumath Rao Maduri 
 wrote:

 Hello,

Thanks for Ted & Paul for clarifying my questions.
Sorry for not being clear in my previous post, When I said create view I
was under the impression for simple views where we use cast expressions
currently to cast them to types. In this case planner can use this
information to force the scans to use this as the schema.

If the query fails then it fails at the scan and not after inferring the
schema by the scanner.

I know that views can get complicated with joins and expressions. For
schema hinting through views I assume they should be created on single
tables with corresponding columns one wants to project from the table.


Regarding the same question, today we had a discussion with Aman. Here view
can be considered as a "view" of the table with schema in place.

We can change some syntax to suite it for specifying schema. something like
this.

create schema[optional] view(/virtual table ) v1 as (a: int, b : int)
select a, b from t1 with some other rules as to conversion of scalar to
complex types.

Then the queries when used on this view (below) should enable the scanner
to use this type information and then use it to convert the data into the
appropriate types.
select * from v1

For the possibility of schema information not being known by the user, may
be use something like this.

create schema[optional] view(/virtual table) v1 as select a, b from t1
infer

[jira] [Created] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.

2018-04-07 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6312:
--

 Summary: Enable pushing of cast expressions to the scanner for 
better schema discovery.
 Key: DRILL-6312
 URL: https://issues.apache.org/jira/browse/DRILL-6312
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.13.0
Reporter: Hanumath Rao Maduri


Drill is a schema less engine which tries to infer the schema from disparate 
sources at the read time. Currently the scanners infer the schema for each 
batch depending upon the data for that column in the corresponding batch. This 
solves many uses cases but can error out when the data is too different between 
batches like int and array[int] etc... (There are other cases as well but just 
to give one example).

There is also a mechanism to create a view by type casting the columns to 
appropriate type. This solves issues in some cases but fails in many other 
cases. This is due to the fact that cast expression is not being pushed down to 
the scanner but staying at the project or filter etc operators up the query 
plan.

This JIRA is to fix this by propagating the type information embedded in the 
cast function to the scanners so that scanners can cast the incoming data 
appropriately.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6291) Wrong result for COUNT function and empty file (schemaless table)

2018-03-24 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-6291:
--

Assignee: Hanumath Rao Maduri

> Wrong result for COUNT function and empty file (schemaless table)
> -
>
> Key: DRILL-6291
> URL: https://issues.apache.org/jira/browse/DRILL-6291
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: Future
>
>
> The count function shouldn't return null result. For the 0 rows case it 
> should always return 0.
> {code}
> 0: jdbc:drill:zk=local> select count(`last_name`) from 
> dfs.`/tmp/empty_file.json`;
> +-+
> | EXPR$0  |
> +-+
> +-+
> No rows selected (0.3 seconds)
> {code}
> The result should be the similar to:
> {code}
> 0: jdbc:drill:zk=local> select count(`non_existent_column`) from 
> cp.`employee.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (0.274 seconds)
> {code}
> Note: empty_file.json is an empty file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-03-23 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411429#comment-16411429
 ] 

Hanumath Rao Maduri commented on DRILL-6212:


[~vvysotskyi] This case should be handled by the chunhui's fix to the similar 
JIRA.

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-6260:
--

Assignee: Hanumath Rao Maduri

> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0, 1.14.0
> Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
> git Commit Message: Update version to 1.14.0-SNAPSHOT
>Reporter: Abhishek Girish
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-03-05 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6212:
---
Description: 
Create two views using following statements.

{code}
create view v1 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
create view v2 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
{code}

Executing the following join query produces a stack overflow during the 
planning phase.
{code}
select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
{code}


  was:
Create two views using following statements.

{code}
create view v1 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
create view v2 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
{code}

Executing the following join query produces a stack overflow during the 
planning phase.
{code}
select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v1 as t1 on cast(t.f as 
int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
{code}



> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-03-05 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6212:
--

 Summary: A simple join is recursing too deep in planning and 
eventually throwing stack overflow.
 Key: DRILL-6212
 URL: https://issues.apache.org/jira/browse/DRILL-6212
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Create two views using following statements.

{code}
create view v1 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
create view v2 as select cast(greeting as int) f from 
dfs.`/home/mapr/data/json/temp.json`;
{code}

Executing the following join query produces a stack overflow during the 
planning phase.
{code}
select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v1 as t1 on cast(t.f as 
int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380938#comment-16380938
 ] 

Hanumath Rao Maduri edited comment on DRILL-6193 at 2/28/18 9:18 PM:
-

[~vvysotskyi] Thank you for your input. I have looked at that code. It looks 
like we may require more changes than just overloading filter method in 
DrillRelBuilder, because the filter method returns the RelBuilder and not just 
the simplified predicates. Currently the client will call the build to build a 
FilterNode. Unless we overload the build and use the original predicates to add 
the removed join predicates, I think we cannot fix it. Please do let me know if 
I am missing anything here.


was (Author: hanu.ncr):
[~vvysotskyi] Thank you for your input. I have looked at that code. It looks 
like we may require more changes than just overloading filter method in 
DrillRelBuilder, because the filter method returns the RelBuilder and the 
simplified predicates. Currently the client will call the build to build a 
FilterNode. Unless we overload the build and use the original predicates to add 
the removed join predicates, I think we cannot fix it. Please do let me know if 
I am missing anything here.

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380938#comment-16380938
 ] 

Hanumath Rao Maduri commented on DRILL-6193:


[~vvysotskyi] Thank you for your input. I have looked at that code. It looks 
like we may require more changes than just overloading filter method in 
DrillRelBuilder, because the filter method returns the RelBuilder and the 
simplified predicates. Currently the client will call the build to build a 
FilterNode. Unless we overload the build and use the original predicates to add 
the removed join predicates, I think we cannot fix it. Please do let me know if 
I am missing anything here.

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-02-20 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6115:
---
Labels: ready-to-commit  (was: )

> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-15 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reopened DRILL-6148:


> TestSortSpillWithException is sometimes failing.
> 
>
> Key: DRILL-6148
> URL: https://issues.apache.org/jira/browse/DRILL-6148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
> for some reason this is being observed only in one of my branch. 
> TestSpillLeakManaged tests for leak when an exception is thrown during the 
> spilling of the rows in ExternalSort. In the test failure case it happens 
> that ExternalSort is able to sort the data with the given memory and not 
> spill at all. Hence the injection interruption path is not hit at all and 
> hence no exception is thrown.
> The test case should use drill.exec.sort.external.mem_limit to force it to 
> use as less memory as possible so as to test the case.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6159) No need to offset rows if order by is not specified in the query.

2018-02-15 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366221#comment-16366221
 ] 

Hanumath Rao Maduri commented on DRILL-6159:


Thank you for the input. I agree that in some database (mostly single instance 
databases) when this type of query is issued by the user all they care is some 
natural order stored in the database for pagination. However, I think in 
context of the distributed database like DRILL there is no natural order. In my 
opinion, even after doing all the processing we might produce some duplicate 
results across query runs. Given this scenario, I was just thinking if 
processing offset is useful at all.

I also think, this optimization can also inhibit users from issuing these kind 
of queries when they get same result no matter what offset is provided. 

However, I just opened this JIRA so as to reach a consensus on this issue. I am 
also fine if the consensus among drillers is not to fix this issue.

> No need to offset rows if order by is not specified in the query.
> -
>
> Key: DRILL-6159
> URL: https://issues.apache.org/jira/browse/DRILL-6159
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: Future
>
>
> For the queries which have offset and limit and no order by no need to add 
> the offset to limit during pushdown of the limit.
> Sql doesn't guarantee order in the output if no order by is specified in the 
> query. It is observed that for the queries with offset and limit and no order 
> by, current optimizer is adding the offset and limit and limiting those many 
> rows. Doing so will not early exit the query.
> Here is an example for a query.
> {code}
> select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000
> 00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
> 1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
> 1.56569100288E11 network, 4.64926176E7 memory}, id = 787
> 00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
> zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
> rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 
> memory}, id = 786
> 00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
> ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 
> 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id 
> = 785
> 00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
> RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
> {9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
> 4.64926176E7 memory}, id = 784
> 00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
> a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 
> cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
> 01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
> zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
> 4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
> 782
> 01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
> ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
> 4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
> 781
> 01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
> RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative 
> cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
> 4.64926176E7 memory}, id = 780
> 01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
> rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
> cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 
> 3.2460300288E10 network, 4.64926176E7 memory}, id = 779
> 01-06  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
> files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
> maprfs:/tmp/csvd1/Daamulti11random21.csv, 
> maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
> ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
> 4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
> 01-05  BroadcastExchange : rowType = RecordType(ANY ZZ2): 
> rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 
> io, 3.2460300288E10 network, 0.0

[jira] [Assigned] (DRILL-3162) Add support for Round Robin exchange and enable new plans

2018-02-14 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-3162:
--

Assignee: Hanumath Rao Maduri

> Add support for Round Robin exchange and enable new plans
> -
>
> Key: DRILL-3162
> URL: https://issues.apache.org/jira/browse/DRILL-3162
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.0.0
>Reporter: Aman Sinha
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: Future
>
>
> It would be quite useful to add support for Round Robin exchange and enable 
> new plans such as : 
>  - Joins plans where left side is round-robin distributed and right side 
> is broadcast
>  - Sort where input of the sort is round-robin distributed instead of hash 
> distributed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6159) No need to offset rows if order by is not specified in the query.

2018-02-14 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6159:
--

 Summary: No need to offset rows if order by is not specified in 
the query.
 Key: DRILL-6159
 URL: https://issues.apache.org/jira/browse/DRILL-6159
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: Future


For the queries which have offset and limit and no order by no need to add the 
offset to limit during pushdown of the limit.

Sql doesn't guarantee order in the output if no order by is specified in the 
query. It is observed that for the queries with offset and limit and no order 
by, current optimizer is adding the offset and limit and limiting those many 
rows. Doing so will not early exit the query.

Here is an example for a query.

{code}
select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000


00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
1.56569100288E11 network, 4.64926176E7 memory}, id = 787
00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, 
id = 786
00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 
cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 785
00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
{9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
4.64926176E7 memory}, id = 784
00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 cpu, 
0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
782
01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
781
01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative cost 
= {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
4.64926176E7 memory}, id = 780
01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 
network, 4.64926176E7 memory}, id = 779
01-06  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
maprfs:/tmp/csvd1/Daamulti11random21.csv, 
maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
01-05  BroadcastExchange : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 
io, 3.2460300288E10 network, 0.0 memory}, id = 778
02-01Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], 
files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 777
{code}

The limit pushed down is  Limit(fetch=[1010]) instead it should be  
Limit(fetch=[10])
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6158) Create a mux operator for union exchange to enable two phase merging instead of foreman merging all the batches.

2018-02-14 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6158:
--

 Summary: Create a mux operator for union exchange to enable two 
phase merging instead of foreman merging all the batches.
 Key: DRILL-6158
 URL: https://issues.apache.org/jira/browse/DRILL-6158
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: Future


Consider the following simple query

{code}
select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000
{code}

The following plan is generated for this query
{code}
00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
1.56569100288E11 network, 4.64926176E7 memory}, id = 787
00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, 
id = 786
00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 
cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 785
00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
{9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
4.64926176E7 memory}, id = 784
00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 cpu, 
0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
782
01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
781
01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative cost 
= {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
4.64926176E7 memory}, id = 780
01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 
network, 4.64926176E7 memory}, id = 779
01-06  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
maprfs:/tmp/csvd1/Daamulti11random21.csv, 
maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
01-05  BroadcastExchange : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 
io, 3.2460300288E10 network, 0.0 memory}, id = 778
02-01Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], 
files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): 
rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 777
{code}

In case of many minor fragments and huge cluster all the minor fragments 
feeding into unionExchange will be merged only at the foreman. Eventhough 
unionExchange is not a bottleneck interms of cpu but it creates huge memory 
pressure in terms of memory. 

It is observed that due to this mostly on a large cluster with many minor 
fragments it runs out of memory. 

In this scenario it is always better to locally merge the minor fragments 
pertaining to a DRILLBIT and send the single stream to the foreman. This 
divides the memory consumption to all the drillbits and then reduces the memory 
pressure at the foreman.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-12 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361524#comment-16361524
 ] 

Hanumath Rao Maduri commented on DRILL-6148:


The PR for this Jira is at this link
https://github.com/apache/drill/pull/1120

> TestSortSpillWithException is sometimes failing.
> 
>
> Key: DRILL-6148
> URL: https://issues.apache.org/jira/browse/DRILL-6148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
> for some reason this is being observed only in one of my branch. 
> TestSpillLeakManaged tests for leak when an exception is thrown during the 
> spilling of the rows in ExternalSort. In the test failure case it happens 
> that ExternalSort is able to sort the data with the given memory and not 
> spill at all. Hence the injection interruption path is not hit at all and 
> hence no exception is thrown.
> The test case should use drill.exec.sort.external.mem_limit to force it to 
> use as less memory as possible so as to test the case.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-12 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6148:
---
  Labels: ready-to-commit  (was: )
Reviewer: Boaz Ben-Zvi

> TestSortSpillWithException is sometimes failing.
> 
>
> Key: DRILL-6148
> URL: https://issues.apache.org/jira/browse/DRILL-6148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
> for some reason this is being observed only in one of my branch. 
> TestSpillLeakManaged tests for leak when an exception is thrown during the 
> spilling of the rows in ExternalSort. In the test failure case it happens 
> that ExternalSort is able to sort the data with the given memory and not 
> spill at all. Hence the injection interruption path is not hit at all and 
> hence no exception is thrown.
> The test case should use drill.exec.sort.external.mem_limit to force it to 
> use as less memory as possible so as to test the case.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-12 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri resolved DRILL-6148.

Resolution: Fixed

> TestSortSpillWithException is sometimes failing.
> 
>
> Key: DRILL-6148
> URL: https://issues.apache.org/jira/browse/DRILL-6148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.13.0
>
>
> TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
> for some reason this is being observed only in one of my branch. 
> TestSpillLeakManaged tests for leak when an exception is thrown during the 
> spilling of the rows in ExternalSort. In the test failure case it happens 
> that ExternalSort is able to sort the data with the given memory and not 
> spill at all. Hence the injection interruption path is not hit at all and 
> hence no exception is thrown.
> The test case should use drill.exec.sort.external.mem_limit to force it to 
> use as less memory as possible so as to test the case.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-09 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6148:
---
Fix Version/s: (was: 1.12.0)
   1.13.0

> TestSortSpillWithException is sometimes failing.
> 
>
> Key: DRILL-6148
> URL: https://issues.apache.org/jira/browse/DRILL-6148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.13.0
>
>
> TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
> for some reason this is being observed only in one of my branch. 
> TestSpillLeakManaged tests for leak when an exception is thrown during the 
> spilling of the rows in ExternalSort. In the test failure case it happens 
> that ExternalSort is able to sort the data with the given memory and not 
> spill at all. Hence the injection interruption path is not hit at all and 
> hence no exception is thrown.
> The test case should use drill.exec.sort.external.mem_limit to force it to 
> use as less memory as possible so as to test the case.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6148) TestSortSpillWithException is sometimes failing.

2018-02-09 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6148:
--

 Summary: TestSortSpillWithException is sometimes failing.
 Key: DRILL-6148
 URL: https://issues.apache.org/jira/browse/DRILL-6148
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.12.0


TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However 
for some reason this is being observed only in one of my branch. 

TestSpillLeakManaged tests for leak when an exception is thrown during the 
spilling of the rows in ExternalSort. In the test failure case it happens that 
ExternalSort is able to sort the data with the given memory and not spill at 
all. Hence the injection interruption path is not hit at all and hence no 
exception is thrown.

The test case should use drill.exec.sort.external.mem_limit to force it to use 
as less memory as possible so as to test the case.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-01-29 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6115:
---
Attachment: Enhancing Drill to multiplex ordered merge exchanges.docx

> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-01-29 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6115:
--

 Summary: SingleMergeExchange is not scaling up when many minor 
fragments are allocated for a query.
 Key: DRILL-6115
 URL: https://issues.apache.org/jira/browse/DRILL-6115
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.12.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx

SingleMergeExchange is created when a global order is required in the output. 
The following query produces the SingleMergeExchange.
{code:java}
0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 Project(L_LINENUMBER=[$0])
00-02 SingleMergeExchange(sort0=[0])
01-01 SelectionVectorRemover
01-02 Sort(sort0=[$0], dir0=[ASC])
01-03 HashToRandomExchange(dist0=[[$0]])
02-01 Scan(table=[[dfs, /drill/tables/lineitem]], groupscan=[JsonTableGroupScan 
[ScanSpec=JsonScanSpec [tableName=maprfs:///drill/tables/lineitem, 
condition=null], columns=[`L_LINENUMBER`], maxwidth=15]])
{code}

On a 10 node cluster if the table is huge then DRILL can spawn many minor 
fragments which are all merged on a single node with one merge receiver. Doing 
so will create lot of memory pressure on the receiver node and also execution 
bottleneck. To address this issue, merge receiver should be multiphase merge 
receiver. 

Ideally for large cluster one can introduce tree merges so that merging can be 
done parallel. But as a first step I think it is better to use the existing 
infrastructure for multiplexing operators to generate an OrderedMux so that all 
the minor fragments pertaining to one DRILLBIT should be merged and the merged 
data can be sent across to the receiver operator.

On a 10 node cluster if each node processes 14 minor fragments.

Current version of code merges 140 minor fragments
the proposed version has two level merges 1 - 14 merge in each drillbit which 
is parallel 
and 10 minorfragments are merged at the receiver node.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5851) Empty table during a join operation with a non empty table produces cast exception

2018-01-22 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-5851:
---
Labels: ready-to-commit  (was: )

> Empty table during a join operation with a non empty table produces cast 
> exception 
> ---
>
> Key: DRILL-5851
> URL: https://issues.apache.org/jira/browse/DRILL-5851
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Hash Join operation on tables with one table empty and the other non empty 
> throws an exception 
> {code} 
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
> type: INT. Add explicit casts to avoid this error
> {code}
> Here is an example query with which it is reproducible.
> {code}
> select * from cp.`sample-data/nation.parquet` nation left outer join 
> dfs.tmp.`2.csv` as two on two.a = nation.`N_COMMENT`;
> {code}
> the contents of 2.csv is empty (i.e not even header info).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5926) TestValueVector tests fail sporadically

2017-11-03 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-5926:
---
Description: 
As reported by [~Paul.Rogers]. The following tests fail sporadically with out 
of memory exception:

* TestValueVector.testFixedVectorReallocation
* TestValueVector.testVariableVectorReallocation



  was:
Ass reported by [~Paul.Rogers]. The following tests fail sporadically with out 
of memory exception:

* TestValueVector.testFixedVectorReallocation
* TestValueVector.testVariableVectorReallocation




> TestValueVector tests fail sporadically
> ---
>
> Key: DRILL-5926
> URL: https://issues.apache.org/jira/browse/DRILL-5926
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> As reported by [~Paul.Rogers]. The following tests fail sporadically with out 
> of memory exception:
> * TestValueVector.testFixedVectorReallocation
> * TestValueVector.testVariableVectorReallocation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5864) Selecting a non-existing field from a MapR-DB JSON table fails with NPE

2017-11-03 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238136#comment-16238136
 ] 

Hanumath Rao Maduri commented on DRILL-5864:


Yes this the pull request which needs to be committed.

> Selecting a non-existing field from a MapR-DB JSON table fails with NPE
> ---
>
> Key: DRILL-5864
> URL: https://issues.apache.org/jira/browse/DRILL-5864
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Abhishek Girish
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Attachments: OrderByNPE.log, OrderByNPE2.log
>
>
> Query 1
> {code}
> > select C_FIRST_NAME,C_BIRTH_COUNTRY,C_BIRTH_YEAR,C_BIRTH_MONTH,C_BIRTH_DAY 
> > from customer ORDER BY C_BIRTH_COUNTRY ASC, C_FIRST_NAME ASC LIMIT 10;
> Error: SYSTEM ERROR: NullPointerException
>   (java.lang.NullPointerException) null
> org.apache.drill.exec.record.SchemaUtil.coerceContainer():176
> 
> org.apache.drill.exec.physical.impl.xsort.managed.BufferedBatches.convertBatch():124
> org.apache.drill.exec.physical.impl.xsort.managed.BufferedBatches.add():90
> org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.addBatch():265
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():421
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():357
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():302
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> {code}
> Plan
> {code}
> 00-00Screen
> 00-01  Project(C_FIRST_NAME=[$0], C_BIRTH_COUNTRY=[$1], 
> C_BIRTH_YEAR=[$2], C_BIRTH_MONTH=[$3], C_BIRTH_DAY=[$4])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[10])
> 00-04Limit(fetch=[10])
> 00-05  SelectionVectorRemover
> 00-06Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
> 00-07  Scan(groupscan=[JsonTableGroupScan 
> [ScanSpec=JsonScanSpec 
>

[jira] [Assigned] (DRILL-5869) Empty maps not handled

2017-10-24 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-5869:
--

Assignee: Hanumath Rao Maduri

> Empty maps not handled 
> ---
>
> Key: DRILL-5869
> URL: https://issues.apache.org/jira/browse/DRILL-5869
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Hanumath Rao Maduri
>
> Consider the below json -
> {code}
> {a:{}}
> {code}
> A query on the column 'a' throws NPE -
> {code}
> select a from temp.json;
> {code}
> Stack trace -
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 0:0
> [Error Id: 7f81fa02-4b20-4401-9d18-bd901653d11d on pns182.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_144]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_144]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen0.setup(ProjectorTemplate.java:91)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchemaFromInput(ProjectRecordBatch.java:497)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:505)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:82)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
> ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
> ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.8.0_144]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_144]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  ~[hadoop-common-2.7.0-mapr-1607.jar:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   ... 4 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.

2017-10-16 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-5878:
--

 Summary: TableNotFound exception is being reported for a wrong 
storage plugin.
 Key: DRILL-5878
 URL: https://issues.apache.org/jira/browse/DRILL-5878
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 1.11.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
Priority: Minor
 Fix For: 1.12.0


Drill is reporting TableNotFound exception for a wrong storage plugin. 
Consider the following query where employee.json is queried using cp plugin.
{code}
0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10;
+--++-++--+-+---++-++--++---+-+-++
| employee_id  | full_name  | first_name  | last_name  | position_id  | 
position_title  | store_id  | department_id  | birth_date  |   
hire_date|  salary  | supervisor_id  |  education_level  | 
marital_status  | gender  |  management_role   |
+--++-++--+-+---++-++--++---+-+-++
| 1| Sheri Nowmer   | Sheri   | Nowmer | 1| 
President   | 0 | 1  | 1961-08-26  | 1994-12-01 
00:00:00.0  | 8.0  | 0  | Graduate Degree   | S   | 
F   | Senior Management  |
| 2| Derrick Whelply| Derrick | Whelply| 2| 
VP Country Manager  | 0 | 1  | 1915-07-03  | 1994-12-01 
00:00:00.0  | 4.0  | 1  | Graduate Degree   | M   | 
M   | Senior Management  |
| 4| Michael Spence | Michael | Spence | 2| 
VP Country Manager  | 0 | 1  | 1969-06-20  | 1998-01-01 
00:00:00.0  | 4.0  | 1  | Graduate Degree   | S   | 
M   | Senior Management  |
| 5| Maya Gutierrez | Maya| Gutierrez  | 2| 
VP Country Manager  | 0 | 1  | 1951-05-10  | 1998-01-01 
00:00:00.0  | 35000.0  | 1  | Bachelors Degree  | M   | 
F   | Senior Management  |
| 6| Roberta Damstra| Roberta | Damstra| 3| 
VP Information Systems  | 0 | 2  | 1942-10-08  | 1994-12-01 
00:00:00.0  | 25000.0  | 1  | Bachelors Degree  | M   | 
F   | Senior Management  |
| 7| Rebecca Kanagaki   | Rebecca | Kanagaki   | 4| 
VP Human Resources  | 0 | 3  | 1949-03-27  | 1994-12-01 
00:00:00.0  | 15000.0  | 1  | Bachelors Degree  | M   | 
F   | Senior Management  |
| 8| Kim Brunner| Kim | Brunner| 11   | 
Store Manager   | 9 | 11 | 1922-08-10  | 1998-01-01 
00:00:00.0  | 1.0  | 5  | Bachelors Degree  | S   | 
F   | Store Management   |
| 9| Brenda Blumberg| Brenda  | Blumberg   | 11   | 
Store Manager   | 21| 11 | 1979-06-23  | 1998-01-01 
00:00:00.0  | 17000.0  | 5  | Graduate Degree   | M   | 
F   | Store Management   |
| 10   | Darren Stanz   | Darren  | Stanz  | 5| 
VP Finance  | 0 | 5  | 1949-08-26  | 1994-12-01 
00:00:00.0  | 5.0  | 1  | Partial College   | M   | 
M   | Senior Management  |
| 11   | Jonathan Murraiin  | Jonathan| Murraiin   | 11   | 
Store Manager   | 1 | 11 | 1967-06-20  | 1998-01-01 
00:00:00.0  | 15000.0  | 5  | Graduate Degree   | S   | 
M   | Store Management   |
+--++-++--+-+---++-++--++---+-+-++
{code}

However if cp1 is used instead of cp then Drill reports TableNotFound exception.
{code}
0: jdbc:drill:zk=local> select * from cp1.`employee.json` limit 10;
Oct 16, 2017 1:40:02 PM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'cp1.employee.json' not found
Oct 16, 2017 1:40:02 PM

[jira] [Commented] (DRILL-5851) Empty table during a join operation with a non empty table produces cast exception

2017-10-06 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195442#comment-16195442
 ] 

Hanumath Rao Maduri commented on DRILL-5851:


Even though it looks like a corner case, it can have an impact for readers 
which can push the data to the sources.
Ideally HashJoin operator should take care of handling these scenarios.

> Empty table during a join operation with a non empty table produces cast 
> exception 
> ---
>
> Key: DRILL-5851
> URL: https://issues.apache.org/jira/browse/DRILL-5851
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>
> Hash Join operation on tables with one table empty and the other non empty 
> throws an exception 
> {code} 
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
> type: INT. Add explicit casts to avoid this error
> {code}
> Here is an example query with which it is reproducible.
> {code}
> select * from cp.`sample-data/nation.parquet` nation left outer join 
> dfs.tmp.`2.csv` as two on two.a = nation.`N_COMMENT`;
> {code}
> the contents of 2.csv is empty (i.e not even header info).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5851) Empty table during a join operation with a non empty table produces cast exception

2017-10-06 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-5851:
--

 Summary: Empty table during a join operation with a non empty 
table produces cast exception 
 Key: DRILL-5851
 URL: https://issues.apache.org/jira/browse/DRILL-5851
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri


Hash Join operation on tables with one table empty and the other non empty 
throws an exception 
{code} 
Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
between 1. Numeric data
 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
type: INT. Add explicit casts to avoid this error
{code}

Here is an example query with which it is reproducible.

{code}
select * from cp.`sample-data/nation.parquet` nation left outer join 
dfs.tmp.`2.csv` as two on two.a = nation.`N_COMMENT`;
{code}

the contents of 2.csv is empty (i.e not even header info).





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

81 matches

Mail list logo