date:20161221

[jira] [Commented] (DRILL-5068) Add a new system table for completed profiles

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769388#comment-15769388
 ] 

ASF GitHub Bot commented on DRILL-5068:
---

GitHub user zbdzzg reopened a pull request:

https://github.com/apache/drill/pull/668

DRILL-5068: Add a new system table for completed profiles

Add table "sys.profiles" for completed queries.

Following fields added:

1. queryID (String)
2. time (Timestamp)
3. latency (long)
4. user (String)
5. query (String)
6. state (String)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zbdzzg/drill profile_query

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #668


commit e05a999dc8ace315966cbbdb72b3e52d3d956bbd
Author: hongze.zhz 
Date:   2016-12-22T07:47:42Z

DRILL-5068: Add a new system table for completed profiles




> Add a new system table for completed profiles
> -
>
> Key: DRILL-5068
> URL: https://issues.apache.org/jira/browse/DRILL-5068
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Affects Versions: 1.8.0
> Environment: Fedora 25
> OpenJDK 8
> Firefox 50.0
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
> Fix For: Future
>
>
> Hi,
> Currently the profile page on UI is still not detailed enough for some 
> complicated uses  (eg. show all failed queries during these three days), we 
> can only access latest 100 query profiles on this page.
> We may sometimes need a specific system table for querying completed profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5068) Add a new system table for completed profiles

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769381#comment-15769381
 ] 

ASF GitHub Bot commented on DRILL-5068:
---

Github user zbdzzg closed the pull request at:

https://github.com/apache/drill/pull/668


> Add a new system table for completed profiles
> -
>
> Key: DRILL-5068
> URL: https://issues.apache.org/jira/browse/DRILL-5068
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Affects Versions: 1.8.0
> Environment: Fedora 25
> OpenJDK 8
> Firefox 50.0
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
> Fix For: Future
>
>
> Hi,
> Currently the profile page on UI is still not detailed enough for some 
> complicated uses  (eg. show all failed queries during these three days), we 
> can only access latest 100 query profiles on this page.
> We may sometimes need a specific system table for querying completed profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4602) Avro files dont work if the union format is ["some-type", "null"]

2016-12-21 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769345#comment-15769345
 ] 

Khurram Faraaz commented on DRILL-4602:
---

[~chr1st1anh] you should create a pull request and some one here will review 
and merge your fix, if all tests run clean.

> Avro files dont work if the union format is ["some-type", "null"]
> -
>
> Key: DRILL-4602
> URL: https://issues.apache.org/jira/browse/DRILL-4602
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Christian
>  Labels: easyfix, patch
> Fix For: Future
>
> Attachments: DRILL-4602.patch
>
>
> An avro file generated by a different system (e.g. Spark) can have a slightly 
> different union format, that is not understood by drill. For example 
> ["some-type", "null"] will cause an error when [ "null", "some-type"] still 
> works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5068) Add a new system table for completed profiles

2016-12-21 Thread Hongze Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769033#comment-15769033
 ] 

Hongze Zhang commented on DRILL-5068:
-

[~khfaraaz]
Hi, Is this thing useful for next version of Drill ? Thanks!

> Add a new system table for completed profiles
> -
>
> Key: DRILL-5068
> URL: https://issues.apache.org/jira/browse/DRILL-5068
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Affects Versions: 1.8.0
> Environment: Fedora 25
> OpenJDK 8
> Firefox 50.0
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
> Fix For: Future
>
>
> Hi,
> Currently the profile page on UI is still not detailed enough for some 
> complicated uses  (eg. show all failed queries during these three days), we 
> can only access latest 100 query profiles on this page.
> We may sometimes need a specific system table for querying completed profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.

2016-12-21 Thread Hongze Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang closed DRILL-5028.
---
Resolution: Later

> Opening profiles page from web ui gets very slow when a lot of history files 
> have been stored in HDFS or Local FS.
> --
>
> Key: DRILL-5028
> URL: https://issues.apache.org/jira/browse/DRILL-5028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Hongze Zhang
>Priority: Minor
> Fix For: Future
>
>
> We have a Drill cluster with 20+ Nodes and we store all history profiles in 
> hdfs. Without doing periodically cleans for hdfs, the profiles page gets 
> slower while serving more queries.
> Code from LocalPersistentStore.java uses fs.list(false, basePath) for 
> fetching the latest 100 history profiles by default, I guess this operation 
> blocks the page loading (Millions small files can be stored in the basePath), 
> maybe we can try some other ways to reach the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-5054) Provide jquery-ui and jquery-dataTables locally

2016-12-21 Thread Hongze Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang closed DRILL-5054.
---
Resolution: Later

> Provide jquery-ui and jquery-dataTables locally
> ---
>
> Key: DRILL-5054
> URL: https://issues.apache.org/jira/browse/DRILL-5054
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.8.0
> Environment: Fedora 24 / OpenJDK 8 / FireFox 50.0
>Reporter: Hongze Zhang
>Priority: Minor
> Fix For: Future
>
>
> Hi,
> Currently Drill uses CDN for serving source files of jquery-ui and 
> jquery-dataTables. This is OK for most cases, but not working in an isolated 
> environment.
> This is a patch adding these files so that Drill will work fine in intranet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-5068) Add a new system table for completed profiles

2016-12-21 Thread Hongze Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang reassigned DRILL-5068:
---

Assignee: Hongze Zhang

> Add a new system table for completed profiles
> -
>
> Key: DRILL-5068
> URL: https://issues.apache.org/jira/browse/DRILL-5068
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Affects Versions: 1.8.0
> Environment: Fedora 25
> OpenJDK 8
> Firefox 50.0
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
> Fix For: Future
>
>
> Hi,
> Currently the profile page on UI is still not detailed enough for some 
> complicated uses  (eg. show all failed queries during these three days), we 
> can only access latest 100 query profiles on this page.
> We may sometimes need a specific system table for querying completed profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5068) Add a new system table for completed profiles

2016-12-21 Thread Hongze Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang updated DRILL-5068:

Assignee: (was: Sudheesh Katkam)
Reviewer: Khurram Faraaz

> Add a new system table for completed profiles
> -
>
> Key: DRILL-5068
> URL: https://issues.apache.org/jira/browse/DRILL-5068
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Affects Versions: 1.8.0
> Environment: Fedora 25
> OpenJDK 8
> Firefox 50.0
>Reporter: Hongze Zhang
> Fix For: Future
>
>
> Hi,
> Currently the profile page on UI is still not detailed enough for some 
> complicated uses  (eg. show all failed queries during these three days), we 
> can only access latest 100 query profiles on this page.
> We may sometimes need a specific system table for querying completed profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5068) Add a new system table for completed profiles

2016-12-21 Thread Hongze Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang updated DRILL-5068:

   Assignee: Sudheesh Katkam
Component/s: (was: Metadata)
 Storage - Information Schema

> Add a new system table for completed profiles
> -
>
> Key: DRILL-5068
> URL: https://issues.apache.org/jira/browse/DRILL-5068
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Affects Versions: 1.8.0
> Environment: Fedora 25
> OpenJDK 8
> Firefox 50.0
>Reporter: Hongze Zhang
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Hi,
> Currently the profile page on UI is still not detailed enough for some 
> complicated uses  (eg. show all failed queries during these three days), we 
> can only access latest 100 query profiles on this page.
> We may sometimes need a specific system table for querying completed profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5125) Provide option to use generic code for sv remover

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768855#comment-15768855
 ] 

ASF GitHub Bot commented on DRILL-5125:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/704

DRILL-5125: Provide option to use generic code for sv remover

Performance tests showed that, for queries with a large number of
columns, it is faster to use a “generic” implementation of the
selection vector remover “copier” than to use a generated version.

This PR provides "generic" versions of the SV2 and SV4 copiers
used by the selection vector remover. The generic forms are
enabled using a new boot-time config parameter that is disabled
by default (preserving the traditional generated code.)

The generic form relies on a "virtual function" (really, just a
plain Java function) defined in the base ValueVector class and
implemented by each concrete vector: both the pre-defined and
generated forms. This form "does the right thing" for the copy
operation so that we don't need to generate code just to handle
the method dispatch operation (which Java does quite well on its
own.)

A unit tests validates that the generic form works by runing
the existing SV remover tests with the generic option turned on.

See the DRILL-5125 for details.

Add test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5125

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #704


commit ba3a38a403b140149d1605decefae088765ead56
Author: Paul Rogers 
Date:   2016-12-12T18:06:43Z

DRILL-5125: Provide option to use generic code for sv remover

Performance tests showed that, for queries with a large number of
columns, it is faster to use a “generic” implementation of the
selection vector remover “copier” than to use a generated version.

This PR provides "generic" versions of the SV2 and SV4 copiers
used by the selection vector remover. The generic forms are
enabled using a new boot-time config parameter that is disabled
by default (preserving the traditional generated code.)

The generic form relies on a "virtual function" (really, just a
plain Java function) defined in the base ValueVector class and
implemented by each concrete vector: both the pre-defined and
generated forms. This form "does the right thing" for the copy
operation so that we don't need to generate code just to handle
the method dispatch operation (which Java does quite well on its
own.)

A unit tests validates that the generic form works by runing
the existing SV remover tests with the generic option turned on.

See the DRILL-5125 for details.

Add test




> Provide option to use generic code for sv remover
> -
>
> Key: DRILL-5125
> URL: https://issues.apache.org/jira/browse/DRILL-5125
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields. 
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from 
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], rn=[$1])
> 00-02Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($2, 10)])
> 00-05  Window(window#0=[window(partition {} order by [1] rows 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06SelectionVectorRemover
> 00-07  Sort(sort0=[$1], dir0=[ASC])
> 00-08Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09  Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes 
> (2159 ms).
> The classes themselves are very simple: create member variables for 486 
> vectors (2 x column count), and call a method on each to do the copy. The 
> only type-specific work is the member variable

[jira] [Created] (DRILL-5152) Enhance the mock data source: better data, SQL access

2016-12-21 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5152:
--

 Summary: Enhance the mock data source: better data, SQL access
 Key: DRILL-5152
 URL: https://issues.apache.org/jira/browse/DRILL-5152
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build & Test
Affects Versions: 1.9.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor


Drill provides a mock data storage engine that generates random data. The mock 
engine is used in some older unit tests that need a volume of data, but that 
are not too particular about the details of the data.

The mock data source continues to have use even for modern tests. For example, 
the work in the external storage batch requires tests with varying amounts of 
data, but the exact form of the data is not important, just the quantity. For 
example, if we want to ensure that spilling happens at various trigger points, 
we need to read the right amount of data for that trigger.

The existing mock data source has two limitations:

1. It generates only "black/white" (alternating) values, which is awkward for 
use in sorting.
2. The mock generator is accessible only from a physical plan, but not from SQL 
queries.

This enhancement proposes to fix both limitations:

1. Generate a uniform, randomly distributed set of values.
2. Provide an encoding that lets a SQL query specify the data to be generated.

Example SQL query:
{code}
SELECT id_i, name_s50 FROM `mock`.employee_10K;
{code}

The above says to generate two fields: INTEGER (the "_i" suffix) and 
VARCHAR(50) (the "_s50") suffix; and to generate 10,000 rows (the "_10K" suffix 
on the table.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768830#comment-15768830
 ] 

ASF GitHub Bot commented on DRILL-5104:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/703#discussion_r93558749
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/logical/PlanProperties.java ---
@@ -112,8 +121,13 @@ public PlanPropertiesBuilder generator(Generator 
generator) {
   return this;
 }
 
+public PlanPropertiesBuilder generator(boolean hasResourcePlan) {
--- End diff --

Fixed.


> Foreman sets external sort memory allocation even for a physical plan
> -
>
> Key: DRILL-5104
> URL: https://issues.apache.org/jira/browse/DRILL-5104
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>
> Consider the (disabled) unit test 
> {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical 
> plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of 
> memory to allocate:
> {code}
>{
> ...
> pop:"external-sort",
> ...
> initialAllocation: 100,
> maxAllocation: 3000
> },
> {code}
> When run, the amount of memory is set to 715827882. The reason is that code 
> was added to {{Foreman}} to compute the memory to allocate to the external 
> sort:
> {code}
>   private void runPhysicalPlan(final PhysicalPlan plan) throws 
> ExecutionSetupException {
> validatePlan(plan);
> MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext);
> {code}
> The problem is that a physical plan should execute as provided to enable 
> detailed testing.
> To solve this problem, move the sort memory setup to the path taken by SQL 
> queries, but not via physical plans.
> This change is necessary to re-enable the previously-disabled external sort 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768813#comment-15768813
 ] 

ASF GitHub Bot commented on DRILL-5104:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/703#discussion_r93556996
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/logical/PlanProperties.java ---
@@ -112,8 +121,13 @@ public PlanPropertiesBuilder generator(Generator 
generator) {
   return this;
 }
 
+public PlanPropertiesBuilder generator(boolean hasResourcePlan) {
--- End diff --

The method's name should not be **generator** but something about having a 
resource plan 


> Foreman sets external sort memory allocation even for a physical plan
> -
>
> Key: DRILL-5104
> URL: https://issues.apache.org/jira/browse/DRILL-5104
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>
> Consider the (disabled) unit test 
> {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical 
> plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of 
> memory to allocate:
> {code}
>{
> ...
> pop:"external-sort",
> ...
> initialAllocation: 100,
> maxAllocation: 3000
> },
> {code}
> When run, the amount of memory is set to 715827882. The reason is that code 
> was added to {{Foreman}} to compute the memory to allocate to the external 
> sort:
> {code}
>   private void runPhysicalPlan(final PhysicalPlan plan) throws 
> ExecutionSetupException {
> validatePlan(plan);
> MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext);
> {code}
> The problem is that a physical plan should execute as provided to enable 
> detailed testing.
> To solve this problem, move the sort memory setup to the path taken by SQL 
> queries, but not via physical plans.
> This change is necessary to re-enable the previously-disabled external sort 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768720#comment-15768720
 ] 

ASF GitHub Bot commented on DRILL-5104:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/703

DRILL-5104: Foreman should not set sort memory for a physical plan

Physical plans include a plan for memory allocations. However, the code
path in Foreman replans external sort memory, even for a physical plan.
This makes it impossible to use a physical plan to test memory
configuration.

This change avoids changing memory settings in a physical plan; while
preserving the adjustments for logical plans or SQL queries.

Revised to put a property in the plan itself. Old plans, and those
generated from SQL, will have memory allocations applied. Plans
marked as already "resource management" planned will be used as-is.

Includes a unit test that demonstrates the new behavior.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5104

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #703


commit 25a7f9ed45b97f9be5971fa979c1b408a6311d8e
Author: Paul Rogers 
Date:   2016-12-13T22:36:42Z

DRILL-5104: Foreman should not set external sort memory for a physical plan

Physical plans include a plan for memory allocations. However, the code
path in Foreman replans external sort memory, even for a physical plan.
This makes it impossible to use a physical plan to test memory
configuration.

This change avoids changing memory settings in a physical plan; while
preserving the adjustments for logical plans or SQL queries.

Revised to put a property in the plan itself. Old plans, and those
generated from SQL, will have memory allocations applied. Plans
marked as already "resource management" planned will be used as-is.

Includes a unit test that demonstrates the new behavior.




> Foreman sets external sort memory allocation even for a physical plan
> -
>
> Key: DRILL-5104
> URL: https://issues.apache.org/jira/browse/DRILL-5104
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>
> Consider the (disabled) unit test 
> {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical 
> plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of 
> memory to allocate:
> {code}
>{
> ...
> pop:"external-sort",
> ...
> initialAllocation: 100,
> maxAllocation: 3000
> },
> {code}
> When run, the amount of memory is set to 715827882. The reason is that code 
> was added to {{Foreman}} to compute the memory to allocate to the external 
> sort:
> {code}
>   private void runPhysicalPlan(final PhysicalPlan plan) throws 
> ExecutionSetupException {
> validatePlan(plan);
> MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext);
> {code}
> The problem is that a physical plan should execute as provided to enable 
> detailed testing.
> To solve this problem, move the sort memory setup to the path taken by SQL 
> queries, but not via physical plans.
> This change is necessary to re-enable the previously-disabled external sort 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768644#comment-15768644
 ] 

Chunhui Shi commented on DRILL-5151:


The fix is on calcite side.

> ConventionTraitDef.plannerConversionMap is not thread safe
> --
>
> Key: DRILL-5151
> URL: https://issues.apache.org/jira/browse/DRILL-5151
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> We are using static instance ConventionTraitDef.INSTANCE globally and 
> plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class 
> is not threadsafe. And the data in the map could corrupt and cause dead loop 
> or other data error.
>   
>   private final WeakHashMap
>   plannerConversionMap =
>   new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5151:
---
Priority: Major  (was: Critical)

> ConventionTraitDef.plannerConversionMap is not thread safe
> --
>
> Key: DRILL-5151
> URL: https://issues.apache.org/jira/browse/DRILL-5151
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> We are using static instance ConventionTraitDef.INSTANCE globally and 
> plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class 
> is not threadsafe. And the data in the map could corrupt and cause dead loop 
> or other data error.
>   
>   private final WeakHashMap
>   plannerConversionMap =
>   new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5151:
--

 Summary: ConventionTraitDef.plannerConversionMap is not thread safe
 Key: DRILL-5151
 URL: https://issues.apache.org/jira/browse/DRILL-5151
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Chunhui Shi
Assignee: Chunhui Shi
Priority: Critical


We are using static instance ConventionTraitDef.INSTANCE globally and 
plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class is 
not threadsafe. And the data in the map could corrupt and cause dead loop or 
other data error.
  
  private final WeakHashMap
  plannerConversionMap =
  new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5150) JDBC connections cause drillbit leaks resources and eventually JVM crashes

2016-12-21 Thread Chun Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768354#comment-15768354
 ] 

Chun Chang commented on DRILL-5150:
---

Forgot to mention this happened with impersonation enabled. 

> JDBC connections cause drillbit leaks resources and eventually JVM crashes
> --
>
> Key: DRILL-5150
> URL: https://issues.apache.org/jira/browse/DRILL-5150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.9.0
>Reporter: Chun Chang
> Attachments: hs_err_pid22724.log
>
>
> Stress test JDBC connections by making connections and disconnect. Very soon, 
> drillbit will crash due to resource leaks. This was observed with Apache 
> DRILL JDBC driver. Testing with a third party driver did not cause the crash. 
> Will upload JVM dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5150) JDBC connections cause drillbit leaks resources and eventually JVM crashes

2016-12-21 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-5150:
--
Attachment: hs_err_pid22724.log

> JDBC connections cause drillbit leaks resources and eventually JVM crashes
> --
>
> Key: DRILL-5150
> URL: https://issues.apache.org/jira/browse/DRILL-5150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.9.0
>Reporter: Chun Chang
> Attachments: hs_err_pid22724.log
>
>
> Stress test JDBC connections by making connections and disconnect. Very soon, 
> drillbit will crash due to resource leaks. This was observed with Apache 
> DRILL JDBC driver. Testing with a third party driver did not cause the crash. 
> Will upload JVM dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5150) JDBC connections cause drillbit leaks resources and eventually JVM crashes

2016-12-21 Thread Chun Chang (JIRA)

Chun Chang created DRILL-5150:
-

 Summary: JDBC connections cause drillbit leaks resources and 
eventually JVM crashes
 Key: DRILL-5150
 URL: https://issues.apache.org/jira/browse/DRILL-5150
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.9.0
Reporter: Chun Chang


Stress test JDBC connections by making connections and disconnect. Very soon, 
drillbit will crash due to resource leaks. This was observed with Apache DRILL 
JDBC driver. Testing with a third party driver did not cause the crash. Will 
upload JVM dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5149) Planner Optimization : Filter should get pushed into the sub-query

2016-12-21 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5149:


 Summary: Planner Optimization : Filter should get pushed into the 
sub-query
 Key: DRILL-5149
 URL: https://issues.apache.org/jira/browse/DRILL-5149
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below plan can be optimized to push the filter into the subquery and also 
to eliminate redundant projects
{code}
explain plan for select * from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';

00-00Screen : rowType = RecordType(ANY *): rowcount = 1.436392845E7, 
cumulative cost = {8.776360282950001E8 rows, 1.4059092422298168E10 cpu, 0.0 io, 
1.96115503104E12 network, 1.532152368E9 memory}, id = 11452
00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 
1.436392845E7, cumulative cost = {8.7619963545E8 rows, 1.4057656029453169E10 
cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11451
00-02SelectionVectorRemover : rowType = RecordType(ANY T18¦¦*): 
rowcount = 1.436392845E7, cumulative cost = {8.7619963545E8 rows, 
1.4057656029453169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 
memory}, id = 11450
00-03  Filter(condition=[=(ITEM(ITEM($0, 'columns'), 0), 
'4041054511')]) : rowType = RecordType(ANY T18¦¦*): rowcount = 1.436392845E7, 
cumulative cost = {8.61835707E8 rows, 1.4043292101003168E10 cpu, 0.0 io, 
1.96115503104E12 network, 1.532152368E9 memory}, id = 11449
00-04Project(T18¦¦*=[$0]) : rowType = RecordType(ANY T18¦¦*): 
rowcount = 9.5759523E7, cumulative cost = {7.66076184E8 rows, 
1.3602798295203169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 
memory}, id = 11448
00-05  SingleMergeExchange(sort0=[1 ASC]) : rowType = 
RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = 
{7.66076184E8 rows, 1.3602798295203169E10 cpu, 0.0 io, 1.96115503104E12 
network, 1.532152368E9 memory}, id = 11447
01-01SelectionVectorRemover : rowType = RecordType(ANY T18¦¦*, 
ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {6.70316661E8 rows, 
1.2836722111203169E10 cpu, 0.0 io, 1.176693018624E12 network, 1.532152368E9 
memory}, id = 11446
01-02  Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(ANY 
T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {5.74557138E8 
rows, 1.2740962588203169E10 cpu, 0.0 io, 1.176693018624E12 network, 
1.532152368E9 memory}, id = 11445
01-03Project(T18¦¦*=[$0], EXPR$1=[$1]) : rowType = 
RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = 
{4.78797615E8 rows, 2.585507121E9 cpu, 0.0 io, 1.176693018624E12 network, 0.0 
memory}, id = 11444
01-04  HashToRandomExchange(dist0=[[$1]]) : rowType = 
RecordType(ANY T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
9.5759523E7, cumulative cost = {4.78797615E8 rows, 2.585507121E9 cpu, 0.0 io, 
1.176693018624E12 network, 0.0 memory}, id = 11443
02-01UnorderedMuxExchange : rowType = RecordType(ANY 
T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, 
cumulative cost = {3.83038092E8 rows, 1.053354753E9 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 11442
03-01  Project(T18¦¦*=[$0], EXPR$1=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY 
T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, 
cumulative cost = {2.87278569E8 rows, 9.5759523E8 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 11441
03-02Project(T18¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : 
rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, 
cumulative cost = {1.91519046E8 rows, 5.74557138E8 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 11440
03-03  Project(T18¦¦*=[$0], columns=[$1]) : rowType 
= RecordType(ANY T18¦¦*, ANY columns): rowcount = 9.5759523E7, cumulative cost 
= {9.5759523E7 rows, 1.91519046E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
11439
03-04Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl, 
numFiles=1, columns=[`*`], 
files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]]) : 
rowType = (DrillRecordRow[*, columns]): rowcount = 9.5759523E7, cumulative cost 
= {9.5759523E7 rows, 1.91519046E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
11438
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5148) Replace hash-distribution with a simple round-robin distribution for a simple order by query

2016-12-21 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5148:


 Summary: Replace hash-distribution with a simple round-robin 
distribution for a simple order by query
 Key: DRILL-5148
 URL: https://issues.apache.org/jira/browse/DRILL-5148
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Query Planning & 
Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below plan indicates that we use hash-distribution to avoid data skew. 
However in the below case a simple round-robin approach would be sufficient

{code}
explain plan for select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0];
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(T2¦¦*=[$0])
00-03  SingleMergeExchange(sort0=[1 ASC])
01-01SelectionVectorRemover
01-02  Sort(sort0=[$1], dir0=[ASC])
01-03Project(T2¦¦*=[$0], EXPR$1=[$1])
01-04  HashToRandomExchange(dist0=[[$1]])
02-01UnorderedMuxExchange
03-01  Project(T2¦¦*=[$0], EXPR$1=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)])
03-02Project(T2¦¦*=[$0], EXPR$1=[ITEM($1, 0)])
03-03  Project(T2¦¦*=[$0], columns=[$1])
03-04Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl, 
numFiles=1, columns=[`*`], 
files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-5147) Doc update: Support impersonation through Web Console

2016-12-21 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens reassigned DRILL-5147:
-

Assignee: Bridget Bevens

> Doc update: Support impersonation through Web Console
> -
>
> Key: DRILL-5147
> URL: https://issues.apache.org/jira/browse/DRILL-5147
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>Priority: Minor
>
> Maybe the doc should say that Drill supports impersonation through web 
> console. These clients use Java client library, just like JDBC.
> Note that *inbound* impersonation is not supported yet because Drill does not 
> expose an “impersonation_target” field through the web login form.
> Thank you,
> Sudheesh
> > On Dec 21, 2016, at 10:08 AM, Akihiko Kusanagi  
> > wrote:
> >
> > Hi,
> >
> > The 'Impersonation Support' table In the following page says that
> > impersonation
> > is not supported with Drill Web Console or REST API.
> > http://drill.apache.org/docs/configuring-user-impersonation/
> >
> > However, when authentication and impersonation are enabled, impersonation is
> > in effect through Web UI.
> >
> > $ cat drill-override.conf
> > ...
> > drill.exec: {
> > ...
> > impersonation: {
> >   enabled: true
> > },
> > ...
> >
> > Only mapr user has read permission for nation.parquet, and Drillbit is
> > running as mapr user.
> >
> > $ hadoop fs -ls /sample-data
> > ...
> > drwx--   - mapr mapr   1210 2016-01-11 19:58 nation.parquet
> > ...
> >
> > Then, login as the other user via Drill Web UI, and run this query:
> >
> > select * from dfs.`/sample-data/nation.parquet`
> >
> > This returns the following error, so it seems that impersonation is in
> > effect.
> >
> > Query Failed: An Error Occurred
> > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > IOException: 2049.177.8452826 /sample-data/nation.parquet (Input/output
> > error) Fragment 0:0 [Error Id: 91684467-8a4f-4fb8-8ad7-6ee04b7f8f53 on
> > node3:31010]
> >
> > When drill.exec.impersonation.enabled = false, the query above returns
> > multiple rows.
> >
> > Is this expected behavior? Does the document need to be updated?
> >
> > Thanks,
> > Aki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5147) Doc update: Support impersonation through Web Console

2016-12-21 Thread Bridget Bevens (JIRA)

Bridget Bevens created DRILL-5147:
-

 Summary: Doc update: Support impersonation through Web Console
 Key: DRILL-5147
 URL: https://issues.apache.org/jira/browse/DRILL-5147
 Project: Apache Drill
  Issue Type: Task
  Components: Documentation
Reporter: Bridget Bevens
Priority: Minor


Maybe the doc should say that Drill supports impersonation through web console. 
These clients use Java client library, just like JDBC.

Note that *inbound* impersonation is not supported yet because Drill does not 
expose an “impersonation_target” field through the web login form.

Thank you,
Sudheesh

> On Dec 21, 2016, at 10:08 AM, Akihiko Kusanagi  wrote:
>
> Hi,
>
> The 'Impersonation Support' table In the following page says that
> impersonation
> is not supported with Drill Web Console or REST API.
> http://drill.apache.org/docs/configuring-user-impersonation/
>
> However, when authentication and impersonation are enabled, impersonation is
> in effect through Web UI.
>
> $ cat drill-override.conf
> ...
> drill.exec: {
> ...
> impersonation: {
>   enabled: true
> },
> ...
>
> Only mapr user has read permission for nation.parquet, and Drillbit is
> running as mapr user.
>
> $ hadoop fs -ls /sample-data
> ...
> drwx--   - mapr mapr   1210 2016-01-11 19:58 nation.parquet
> ...
>
> Then, login as the other user via Drill Web UI, and run this query:
>
> select * from dfs.`/sample-data/nation.parquet`
>
> This returns the following error, so it seems that impersonation is in
> effect.
>
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IOException: 2049.177.8452826 /sample-data/nation.parquet (Input/output
> error) Fragment 0:0 [Error Id: 91684467-8a4f-4fb8-8ad7-6ee04b7f8f53 on
> node3:31010]
>
> When drill.exec.impersonation.enabled = false, the query above returns
> multiple rows.
>
> Is this expected behavior? Does the document need to be updated?
>
> Thanks,
> Aki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5088) Error when reading DBRef column

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768134#comment-15768134
 ] 

ASF GitHub Bot commented on DRILL-5088:
---

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/702

DRILL-5088: set default codec for toJson



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5088

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #702


commit c285a334eaeda810150ff162d1e1c0da342a37ff
Author: chunhui-shi 
Date:   2016-12-18T08:27:50Z

DRILL-5088: set default codec for toJson




> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-5088) Error when reading DBRef column

2016-12-21 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-5088:
--

Assignee: Chunhui Shi

> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Sudheesh Katkam (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767899#comment-15767899
 ] 

Sudheesh Katkam commented on DRILL-5132:


Would UDFs provide a simpler solution?

For example,
{code}
CREATE VIEW my_salary AS SELECT a.salary FROM a_table AS a WHERE 
get_tenant_id(session_user()) = a.tenantId;
{code}

The get_tenant_id UDF could make a call to the external system to get the 
tenant id. Similar UDFs for other parameters.

> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy, session-context, session-parameter
>
> *Requirement*
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in support can bring 
> in dynamism into the whole game.*
> This feature can be utilized for:
> * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
> Discriminator Column
> * *Data Protection in building Chained Views* with Custom Dynamic Filters
> To explain this further, If we assume that:
> # As and when the user connection is established, we populate session context 
> with session  parameters such as:
> #* Tenant ID of the currently logged in user
> #* Roles of the currently logged in user
> # We expose the session context information through context-based-functions 
> such as:
> #* *session_id* -- that returns unique id of the session
> #* *session_parameter('')* - that returns the value of the 
> session parameter
> then a view created with the following kind of query:
> {code}
> create or replace view dynamic_filter_view as select
>a.field as a_field
>b.field as b_field
> from
>a_table as a
> left join
>b_table as b
> on
>a.bId = b.Id
> where
>session_parameter('tenantId')=a.tenantId
> {code}
> becomes a query that has built-in support for dynamic parameterization that 
> only returns records of the tenant of the currently logged in user. This is a 
> very useful feature in a shared-multi-tenant environment where data is 
> isolated using multi-tenant-descriminator column 'tenantId'.
> When building chained views this feature will be useful in filtering records 
> based on context based parameters.
> This feature will particularly be useful for data isolation / data protection 
> with *jdbc storage plugins* where drill-authenticated-credentials are not 
> passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
> shared credentials. Hence the the responsibility of data isolation / data 
> protection lies with Views themselves. Hence, the need for built-in support 
> of context based dynamic parameters in Views.
> *Design/Implementation Considerations:*
> * Session parameters can be obtained through authenticators so that custom 
> authenticators can return a HashMap of parameters obtained from external 
> systems.
> * Introduce SessionContext to hold sessionId and sessionParameters
> * Implement context based functions session_id and session_parameter()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5146) Unnecessary spilling to disk by sort when we only have 5000 rows with one column

2016-12-21 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5146:


 Summary: Unnecessary spilling to disk by sort when we only have 
5000 rows with one column
 Key: DRILL-5146
 URL: https://issues.apache.org/jira/browse/DRILL-5146
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below query spills to disk for the sort. The dataset contains 5000 files 
and each file contains a single record. 
{code}
select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by 
columns[1];
{code}

Enviironment :
{code}
DRILL_MAX_DIRECT_MEMORY="16G"
DRILL_MAX_HEAP="4G"
{code}

I attached the dataset, logs and the profile



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767686#comment-15767686
 ] 

ASF GitHub Bot commented on DRILL-5137:
---

Github user adityakishore commented on a diff in the pull request:

https://github.com/apache/drill/pull/700#discussion_r93487747
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -124,6 +124,11 @@ public HBaseRecordReader(Connection connection, 
HBaseSubScan.HBaseSubScanSpec su
 } else {
   rowKeyOnly = false;
   transformed.add(ROW_KEY_PATH);
+  /* DRILL-5137 - optimize count(*) queries on MapR-DB Binary tables */
+  if (isSkipQuery()) {
--- End diff --

Further optimization can be name by returning only a `count` vector in the 
`next()` call, similar to 
[this](https://github.com/apache/drill/blob/master/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java#L203-L204).


> Optimize count(*) queries on MapR-DB Binary Tables
> --
>
> Key: DRILL-5137
> URL: https://issues.apache.org/jira/browse/DRILL-5137
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase, Storage - MapRDB
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>
> This is related to DRILL-5065, but applies to MapR-DB Binary tables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767685#comment-15767685
 ] 

ASF GitHub Bot commented on DRILL-5137:
---

Github user adityakishore commented on a diff in the pull request:

https://github.com/apache/drill/pull/700#discussion_r93486944
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -124,6 +124,11 @@ public HBaseRecordReader(Connection connection, 
HBaseSubScan.HBaseSubScanSpec su
 } else {
   rowKeyOnly = false;
   transformed.add(ROW_KEY_PATH);
+  /* DRILL-5137 - optimize count(*) queries on MapR-DB Binary tables */
--- End diff --

This branches into the else part of
`if (!isStarQuery()) {`

Can you verify if a query can be both Star and Skip query at the same time 
when count(*) has been requested?


> Optimize count(*) queries on MapR-DB Binary Tables
> --
>
> Key: DRILL-5137
> URL: https://issues.apache.org/jira/browse/DRILL-5137
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase, Storage - MapRDB
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>
> This is related to DRILL-5065, but applies to MapR-DB Binary tables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Nagarajan Chinnasamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarajan Chinnasamy updated DRILL-5132:

Labels: authentication context isolation jdbcstorage multi-tenancy 
session-context session-parameter  (was: authentication context isolation 
jdbcstorage multi-tenancy)

> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy, session-context, session-parameter
>
> *Requirement*
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in support can bring 
> in dynamism into the whole game.*
> This feature can be utilized for:
> * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
> Discriminator Column
> * *Data Protection in building Chained Views* with Custom Dynamic Filters
> To explain this further, If we assume that:
> # As and when the user connection is established, we populate session context 
> with session  parameters such as:
> #* Tenant ID of the currently logged in user
> #* Roles of the currently logged in user
> # We expose the session context information through context-based-functions 
> such as:
> #* *session_id* -- that returns unique id of the session
> #* *session_parameter('')* - that returns the value of the 
> session parameter
> then a view created with the following kind of query:
> {code}
> create or replace view dynamic_filter_view as select
>a.field as a_field
>b.field as b_field
> from
>a_table as a
> left join
>b_table as b
> on
>a.bId = b.Id
> where
>session_parameter('tenantId')=a.tenantId
> {code}
> becomes a query that has built-in support for dynamic parameterization that 
> only returns records of the tenant of the currently logged in user. This is a 
> very useful feature in a shared-multi-tenant environment where data is 
> isolated using multi-tenant-descriminator column 'tenantId'.
> When building chained views this feature will be useful in filtering records 
> based on context based parameters.
> This feature will particularly be useful for data isolation / data protection 
> with *jdbc storage plugins* where drill-authenticated-credentials are not 
> passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
> shared credentials. Hence the the responsibility of data isolation / data 
> protection lies with Views themselves. Hence, the need for built-in support 
> of context based dynamic parameters in Views.
> *Design/Implementation Considerations:*
> * Session parameters can be obtained through authenticators so that custom 
> authenticators can return a HashMap of parameters obtained from external 
> systems.
> * Introduce SessionContext to hold sessionId and sessionParameters
> * Implement context based functions session_id and session_parameter()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767219#comment-15767219
 ] 

ASF GitHub Bot commented on DRILL-4963:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/701

DRILL-4963: Sync remote and local function registries before query ex…

…ecution

Lazy-init was performed only when function was not found during Calcite 
parsing but DRILL-4963 shows different cases when Calcite parsing can pass 
(usually during function overloading) but still function is not found. To 
handle such cases, we need to sync remote and local function registries before 
query execution. To make this sync as much light-weight as possible we first 
compare remote and local function registries versions and start looking for 
missing jars only when versions do not match. Under local function registry is 
implied remote function registry version with which local function registry was 
synchronized last time.

Changes:
1. Add `consists` method to PersistentStore interface which can return true 
if key exists in store, false otherwise. This method is needed to return only 
remote function registry version without its content (unlike method `get`). 
We'll pull remote function registry content only if versions are different.
2. Added check if remote and local function registries are in sync before 
query execution on planning and execution stages.
3. Removed unused methods and changes connected with lazy-init 
implementation on failure only.
4. Added additional debug messages for `CreateFunctionHandler` and 
`DropFunctionHandler`.
5. Updated unit tests to reflect new changes.





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-4963

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/701.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #701


commit 51ef6614a2c27cb6bb58fb0de875952f99e9b102
Author: Arina Ielchiieva 
Date:   2016-12-20T16:57:15Z

DRILL-4963: Sync remote and local function registries before query execution




> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
> Fix For: Future
>
> Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, 
> test_overloading-1.0-sources.jar, test_overloading-1.0.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2016-12-21 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766917#comment-15766917
 ] 

Arina Ielchiieva commented on DRILL-4963:
-

All these errors are connected with lazy-init during query execution. For 
example, for current_date and abs function, lazy-init does not happen since 
they pass Calcite validation and then Drill determines that there is no 
matching function and throws Function Error. Since we expected only Calcite 
function not found exception, we did not catch Drill function error and did not 
start lazy-init. For log function situation is a little different, since there 
are many versions of log function but even though Drill didn't find exactly 
matching function, it decides that he can cast initial value to match found 
function signature. To solve this the best way is to check if remote and local 
registries are in sync before query execution. 
To make this check the most light-weight as possible, we store locally remote 
function registry version and compare it with actual remote function registry 
version. Only if versions do not match, we'll look for missing jars.

> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
> Fix For: Future
>
> Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, 
> test_overloading-1.0-sources.jar, test_overloading-1.0.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2016-12-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4963:

Fix Version/s: Future

> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
> Fix For: Future
>
> Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, 
> test_overloading-1.0-sources.jar, test_overloading-1.0.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2016-12-21 Thread Roman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766723#comment-15766723
 ] 

Roman edited comment on DRILL-4963 at 12/21/16 10:42 AM:
-

Added jars "subquery_udf-1.0" from previous message.


was (Author: romankulyk):
Added jars

> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
> Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, 
> test_overloading-1.0-sources.jar, test_overloading-1.0.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2016-12-21 Thread Roman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman updated DRILL-4963:
-
Attachment: subquery_udf-1.0-sources.jar
subquery_udf-1.0.jar

Added jars

> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
> Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, 
> test_overloading-1.0-sources.jar, test_overloading-1.0.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2016-12-21 Thread Roman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766721#comment-15766721
 ] 

Roman commented on DRILL-4963:
--

Found similar case with different error:

Run query for first time:
{code:sql}
select subqueryudf(t1.first_name, t2.last_name) from cp.`employee.json` t1 
inner join (select last_name, subqueryudf(first_name, last_name) as full_name 
from cp.`employee.json`) t2 on subqueryudf(t1.first_name, 
t1.last_name)=t2.full_name order by t1.employee_id limit 1;
{code}
Error: VALIDATION ERROR: From line 1, column 248 to line 1, column 249: Table 
't1' not found

SQL Query null

And second time:
{code:sql}
select subqueryudf(t1.first_name, t2.last_name) from cp.`employee.json` t1 
inner join (select last_name, subqueryudf(first_name, last_name) as full_name 
from cp.`employee.json`) t2 on subqueryudf(t1.first_name, 
t1.last_name)=t2.full_name order by t1.employee_id limit 1;
{code}
+---+
|EXPR$0 |
+---+
| Sheri Nowmer  |
+---+
1 row selected (0.3 seconds)



> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
> Attachments: test_overloading-1.0-sources.jar, 
> test_overloading-1.0.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766641#comment-15766641
 ] 

ASF GitHub Bot commented on DRILL-5137:
---

GitHub user spanchamiamapr opened a pull request:

https://github.com/apache/drill/pull/700

DRILL-5137 - Optimize count(*) queries on MapR-DB Binary Tables

This diff uses the same optimization as that for the rowKeyOnly queries.
We use the FirstKeyOnlyFilter for count(*) queries.
This fix will optimize these queries for HBase tables, as well as MapR-DB 
Binary tables.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/spanchamiamapr/drill drill-5137

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #700


commit 52765e5420be0db5e12b68d161ac06ea1855d3a0
Author: Smidth Panchamia 
Date:   2016-12-21T09:53:21Z

DRILL-5137 - This diff uses the same optimization as that for the 
rowKeyOnly queries.
We use the FirstKeyOnlyFilter for count(*) queries.
This fix will optimize these queries for HBase tables, as well as MapR-DB 
Binary tables.




> Optimize count(*) queries on MapR-DB Binary Tables
> --
>
> Key: DRILL-5137
> URL: https://issues.apache.org/jira/browse/DRILL-5137
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase, Storage - MapRDB
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>
> This is related to DRILL-5065, but applies to MapR-DB Binary tables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766603#comment-15766603
 ] 

ASF GitHub Bot commented on DRILL-5137:
---

Github user spanchamiamapr commented on the issue:

https://github.com/apache/drill/pull/699
  
Closing this pull request since it is showing unnecessary changes too.


> Optimize count(*) queries on MapR-DB Binary Tables
> --
>
> Key: DRILL-5137
> URL: https://issues.apache.org/jira/browse/DRILL-5137
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase, Storage - MapRDB
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>
> This is related to DRILL-5065, but applies to MapR-DB Binary tables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766604#comment-15766604
 ] 

ASF GitHub Bot commented on DRILL-5137:
---

Github user spanchamiamapr closed the pull request at:

https://github.com/apache/drill/pull/699


> Optimize count(*) queries on MapR-DB Binary Tables
> --
>
> Key: DRILL-5137
> URL: https://issues.apache.org/jira/browse/DRILL-5137
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase, Storage - MapRDB
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>
> This is related to DRILL-5065, but applies to MapR-DB Binary tables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766601#comment-15766601
 ] 

ASF GitHub Bot commented on DRILL-5137:
---

GitHub user spanchamiamapr opened a pull request:

https://github.com/apache/drill/pull/699

DRILL-5137 - Optimize count(*) queries for MapR-DB Binary Tables

This diff uses the same optimization as that for the rowKeyOnly queries.
We use the FirstKeyOnlyFilter for count(*) queries.
This fix will optimize these queries for HBase tables, as well as MapR-DB 
Binary tables.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/spanchamiamapr/drill md1230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/699.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #699


commit 3d12a9526365bcc96c605fc30f3b6fbe4961b97d
Author: Smidth Panchamia 
Date:   2016-11-28T21:59:34Z

DRILL-5065 - Optimize count( * ) queries on MapR-DB JSON Tables
In MapR-DB v5.2.0, we enabled '_id' only projection for JSON tables.
Hence, we can now optimize the following queries:
a. count(*) by projecting only the '_id' column.
b. '_id' only projections, including count(_id)

commit 6b5923e383e2add4b3042248e5f389d32936a5b7
Author: Smidth Panchamia 
Date:   2016-11-29T07:48:36Z

Change the format plugin config parameter name.

commit 8897b8fb15eb76c4b89270456b6b4d86696e7b38
Author: Smidth Panchamia 
Date:   2016-11-28T21:59:34Z

DRILL-5065 - Optimize count( * ) queries on MapR-DB JSON Tables
In MapR-DB v5.2.0, we enabled '_id' only projection for JSON tables.
Hence, we can now optimize the following queries:
a. count(*) by projecting only the '_id' column.
b. '_id' only projections, including count(_id)

Change the format plugin config parameter name.

commit d1d05e8bd4b96fd1bd144c50ddf20466a4bfda16
Author: Smidth Panchamia 
Date:   2016-12-05T18:55:14Z

Merge branch 'master' of github.com:spanchamiamapr/drill

commit ab8e728d7ed3fcd8c27243462bea42b7d2ea29c5
Author: Smidth Panchamia 
Date:   2016-12-14T20:01:06Z

Merge remote-tracking branch 'upstream/master'

Conflicts:

contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPluginConfig.java

commit c84464f8fd099cb17a573faa4310a70eada3b49e
Author: Smidth Panchamia 
Date:   2016-12-20T07:10:17Z

Merge remote-tracking branch 'upstream/master'

commit 72bd06c744dd71ad9a1aa87c91dabcd89eb4fa36
Author: Smidth Panchamia 
Date:   2016-12-21T09:29:45Z

DRILL-5137 - Optimize count(*) queries on MapR-DB Binary tables




> Optimize count(*) queries on MapR-DB Binary Tables
> --
>
> Key: DRILL-5137
> URL: https://issues.apache.org/jira/browse/DRILL-5137
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase, Storage - MapRDB
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>
> This is related to DRILL-5065, but applies to MapR-DB Binary tables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5140) CTAS that does SELECT over 5003 columns fails with CompileException

2016-12-21 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5140:
--
Priority: Critical  (was: Major)

> CTAS that does SELECT over 5003 columns fails with CompileException
> ---
>
> Key: DRILL-5140
> URL: https://issues.apache.org/jira/browse/DRILL-5140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Priority: Critical
> Attachments: drill_5117.q, manyColumns.csv
>
>
> CTAS that does SELECT over 5003 columns fails with CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject...
> Drill 1.9.0 git commit ID : 4c1b420b
> CTAS statement and CSV data file are attached.
> I ran test with and without setting the below system option, test failed in 
> both cases.
> alter system set `exec.java_compiler`='JDK';
> sqlline session just closes with below message, after the failing CTAS is 
> executed.
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> Stack trace from drillbit.log
> {noformat}
> 2016-12-20 12:02:16,016 [27a6e241-99b1-1f2a-8a91-394f8166e969:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen45.java]', 
> Line 11, Column 8: ProjectorGen45.java:11: error: too many constants
> public class ProjectorGen45 {
>^ (compiler.err.limit.pool)
> Fragment 0:0
> [Error Id: ced84dce-669d-47c2-b5d2-5e0559dbd9fd on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen45.java]', 
> Line 11, Column 8: ProjectorGen45.java:11: error: too many constants
> public class ProjectorGen45 {
>^ (compiler.err.limit.pool)
> Fragment 0:0
> [Error Id: ced84dce-669d-47c2-b5d2-5e0559dbd9fd on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure 
> while attempting to load generated class
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:487)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91)
>  ~[drill-ja

[jira] [Closed] (DRILL-4674) Allow casting to boolean the same literals as in Postgre

2016-12-21 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-4674.
-

> Allow casting to boolean the same literals as in Postgre
> 
>
> Key: DRILL-4674
> URL: https://issues.apache.org/jira/browse/DRILL-4674
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Drill does not return results when we try to cast 0 and 1 to boolean inside a 
> value constructor.
> Drill version : 1.7.0-SNAPSHOT  commit ID : 09b26277
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1
> Fragment 0:0
> [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Where as we get results on Postgres for same query.
> {noformat}
> postgres=# values(cast(1 as boolean));
>  column1
> -
>  t
> (1 row)
> postgres=# values(cast(0 as boolean));
>  column1
> -
>  f
> (1 row)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: 
> Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.

[jira] [Commented] (DRILL-4674) Allow casting to boolean the same literals as in Postgre

2016-12-21 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766475#comment-15766475
 ] 

Khurram Faraaz commented on DRILL-4674:
---

Verified, tests are added here
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Functional/case_expr/castTo*.q

> Allow casting to boolean the same literals as in Postgre
> 
>
> Key: DRILL-4674
> URL: https://issues.apache.org/jira/browse/DRILL-4674
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Drill does not return results when we try to cast 0 and 1 to boolean inside a 
> value constructor.
> Drill version : 1.7.0-SNAPSHOT  commit ID : 09b26277
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1
> Fragment 0:0
> [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Where as we get results on Postgres for same query.
> {noformat}
> postgres=# values(cast(1 as boolean));
>  column1
> -
>  t
> (1 row)
> postgres=# values(cast(0 as boolean));
>  column1
> -
>  f
> (1 row)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: 
> Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment

[jira] [Commented] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766463#comment-15766463
 ] 

ASF GitHub Bot commented on DRILL-5132:
---

Github user nagarajanchinnasamy commented on a diff in the pull request:

https://github.com/apache/drill/pull/685#discussion_r93394114
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java ---
@@ -255,11 +257,12 @@ void disableReadTimeout() {
   getChannel().pipeline().remove(BasicServer.TIMEOUT_HANDLER);
 }
 
-void setUser(final UserToBitHandshake inbound) throws IOException {
+void setUser(final UserToBitHandshake inbound, Map 
sessionParams) throws IOException {
--- End diff --

@sudheeshkatkam I had created this few days back. But updated details with 
more clarity now. The ticket is: 
[DRILL-5132](https://issues.apache.org/jira/browse/DRILL-5132). Pls let me know 
your views.


> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy
>
> *Requirement*
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in support can bring 
> in dynamism into the whole game.*
> This feature can be utilized for:
> * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
> Discriminator Column
> * *Data Protection in building Chained Views* with Custom Dynamic Filters
> To explain this further, If we assume that:
> # As and when the user connection is established, we populate session context 
> with session  parameters such as:
> #* Tenant ID of the currently logged in user
> #* Roles of the currently logged in user
> # We expose the session context information through context-based-functions 
> such as:
> #* *session_id* -- that returns unique id of the session
> #* *session_parameter('')* - that returns the value of the 
> session parameter
> then a view created with the following kind of query:
> {code}
> create or replace view dynamic_filter_view as select
>a.field as a_field
>b.field as b_field
> from
>a_table as a
> left join
>b_table as b
> on
>a.bId = b.Id
> where
>session_parameter('tenantId')=a.tenantId
> {code}
> becomes a query that has built-in support for dynamic parameterization that 
> only returns records of the tenant of the currently logged in user. This is a 
> very useful feature in a shared-multi-tenant environment where data is 
> isolated using multi-tenant-descriminator column 'tenantId'.
> When building chained views this feature will be useful in filtering records 
> based on context based parameters.
> This feature will particularly be useful for data isolation / data protection 
> with *jdbc storage plugins* where drill-authenticated-credentials are not 
> passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
> shared credentials. Hence the the responsibility of data isolation / data 
> protection lies with Views themselves. Hence, the need for built-in support 
> of context based dynamic parameters in Views.
> *Design/Implementation Considerations:*
> * Session parameters can be obtained through authenticators so that custom 
> authenticators can return a HashMap of parameters obtained from external 
> systems.
> * Introduce SessionContext to hold sessionId and sessionParameters
> * Implement context based functions session_id and session_parameter()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Nagarajan Chinnasamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarajan Chinnasamy updated DRILL-5132:

Description: 
*Requirement*

Its known that Views in SQL cannot have custom dynamic parameters/variables.  
Please refer to [Justin 
Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
[this SO 
question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
 in handling dynamic parameterization of views. 

[The PR #685|https://github.com/apache/drill/pull/685] 
[DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
originated based on this requirement so that we could build views that can 
dynamically filter records based on some dynamic values (like current 
tenant-id, user role etc.) 

*Since Drill's basic unit is a View... having such built-in support can bring 
in dynamism into the whole game.*

This feature can be utilized for:

* *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
Discriminator Column
* *Data Protection in building Chained Views* with Custom Dynamic Filters

To explain this further, If we assume that:

# As and when the user connection is established, we populate session context 
with session  parameters such as:
#* Tenant ID of the currently logged in user
#* Roles of the currently logged in user

# We expose the session context information through context-based-functions 
such as:
#* *session_id* -- that returns unique id of the session
#* *session_parameter('')* - that returns the value of the 
session parameter

then a view created with the following kind of query:

{code}
create or replace view dynamic_filter_view as select
   a.field as a_field
   b.field as b_field
from
   a_table as a
left join
   b_table as b
on
   a.bId = b.Id
where
   session_parameter('tenantId')=a.tenantId
{code}

becomes a query that has built-in support for dynamic parameterization that 
only returns records of the tenant of the currently logged in user. This is a 
very useful feature in a shared-multi-tenant environment where data is isolated 
using multi-tenant-descriminator column 'tenantId'.

When building chained views this feature will be useful in filtering records 
based on context based parameters.

This feature will particularly be useful for data isolation / data protection 
with *jdbc storage plugins* where drill-authenticated-credentials are not 
passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
shared credentials. Hence the the responsibility of data isolation / data 
protection lies with Views themselves. Hence, the need for built-in support of 
context based dynamic parameters in Views.

*Design/Implementation Considerations:*

* Session parameters can be obtained through authenticators so that custom 
authenticators can return a HashMap of parameters obtained from external 
systems.
* Introduce SessionContext to hold sessionId and sessionParameters
* Implement context based functions session_id and session_parameter()


  was:
Its known that Views in SQL cannot have custom dynamic parameters/variables.  
Please refer to [Justin 
Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
[this SO 
question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
 in handling dynamic parameterization of views. 

[The PR #685|https://github.com/apache/drill/pull/685] 
[DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
originated based on this requirement so that we could build views that can 
dynamically filter records based on some dynamic values (like current 
tenant-id, user role etc.) 

*Since Drill's basic unit is a View... having such built-in support can bring 
in dynamism into the whole game.*

This feature can be utilized for:

* *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
Discriminator Column
* *Data Protection in building Chained Views* with Custom Dynamic Filters

To explain this further, If we assume that:

# As and when the user connection is established, we populate session context 
with session  parameters such as:
#* Tenant ID of the currently logged in user
#* Roles of the currently logged in user

# We expose the session context information through context-based-functions 
such as:
#* *session_id* -- that returns unique id of the session
#* *session_parameter('')* - that returns the value of the 
session parameter

then a view created with the following kind of query:

{code}
create or replace view dynamic_filter_view as select
   a.field as a_field
   b.field as b_field
from
   a_table as a
left join
   b_table as b
on
   a.bId = b.Id
where
   session_parameter('tenantId')=a.tenantId
{code}

becomes a query that has built-in support for dynamic parameterization that 
only returns records of the tenant of the currently logged in user

[jira] [Issue Comment Deleted] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Nagarajan Chinnasamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarajan Chinnasamy updated DRILL-5132:

Comment: was deleted

(was: This feature will particularly be useful for data isolation / data 
protection with *jdbc storage plugins* where drill-authenticated-credentials 
are not used for jdbc connection authentication (like in MapR-DB). A jdbc 
storage  has hard-coded, shared credentials. Hence the the responsibility of 
data isolation / data protection lies with Views themselves. Hence, the need 
for built-in support of context based dynamic filtering in Views.)

> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy
>
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in support can bring 
> in dynamism into the whole game.*
> This feature can be utilized for:
> * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
> Discriminator Column
> * *Data Protection in building Chained Views* with Custom Dynamic Filters
> To explain this further, If we assume that:
> # As and when the user connection is established, we populate session context 
> with session  parameters such as:
> #* Tenant ID of the currently logged in user
> #* Roles of the currently logged in user
> # We expose the session context information through context-based-functions 
> such as:
> #* *session_id* -- that returns unique id of the session
> #* *session_parameter('')* - that returns the value of the 
> session parameter
> then a view created with the following kind of query:
> {code}
> create or replace view dynamic_filter_view as select
>a.field as a_field
>b.field as b_field
> from
>a_table as a
> left join
>b_table as b
> on
>a.bId = b.Id
> where
>session_parameter('tenantId')=a.tenantId
> {code}
> becomes a query that has built-in support for dynamic parameterization that 
> only returns records of the tenant of the currently logged in user. This is a 
> very useful feature in a shared-multi-tenant environment where data is 
> isolated using multi-tenant-descriminator column 'tenantId'.
> When building chained views this feature will be useful in filtering records 
> based on context based parameters.
> This feature will particularly be useful for data isolation / data protection 
> with *jdbc storage plugins* where drill-authenticated-credentials are not 
> passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
> shared credentials. Hence the the responsibility of data isolation / data 
> protection lies with Views themselves. Hence, the need for built-in support 
> of context based dynamic parameters in Views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Nagarajan Chinnasamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarajan Chinnasamy updated DRILL-5132:

Comment: was deleted

(was: Some of the thoughts on design considerations:

# authenticator must be given a chance to populate context values
#* Generally context values are loaded immediately after (or as the part of) 
authentication process
#* Custom authenticators can load custom context values as a result of 
authentication process
# If custom authenticators can add values to context, then we need to have a 
mechanism to make the context variables to be unique so that they don't clash 
with pre-defined system context variables
# Revise the design of "context" class so that it can hold both system defined 
and custom defined variables.
# Change [DRILL-4956|https://issues.apache.org/jira/browse/DRILL-4956] and 
[DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043] that assume that 
UserSession is the place to generate session id (which is very much one of the 
context values) to accommodate externally generated session_id.
#* *session_id* can be provided by an external authenticator. Accommodating 
externally generated session_id (with unique prefix) will help better 
co-ordination with external systems that provide custom authentication and  
context values.)

> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy
>
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in support can bring 
> in dynamism into the whole game.*
> This feature can be utilized for:
> * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
> Discriminator Column
> * *Data Protection in building Chained Views* with Custom Dynamic Filters
> To explain this further, If we assume that:
> # As and when the user connection is established, we populate session context 
> with session  parameters such as:
> #* Tenant ID of the currently logged in user
> #* Roles of the currently logged in user
> # We expose the session context information through context-based-functions 
> such as:
> #* *session_id* -- that returns unique id of the session
> #* *session_parameter('')* - that returns the value of the 
> session parameter
> then a view created with the following kind of query:
> {code}
> create or replace view dynamic_filter_view as select
>a.field as a_field
>b.field as b_field
> from
>a_table as a
> left join
>b_table as b
> on
>a.bId = b.Id
> where
>session_parameter('tenantId')=a.tenantId
> {code}
> becomes a query that has built-in support for dynamic parameterization that 
> only returns records of the tenant of the currently logged in user. This is a 
> very useful feature in a shared-multi-tenant environment where data is 
> isolated using multi-tenant-descriminator column 'tenantId'.
> When building chained views this feature will be useful in filtering records 
> based on context based parameters.
> This feature will particularly be useful for data isolation / data protection 
> with *jdbc storage plugins* where drill-authenticated-credentials are not 
> passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
> shared credentials. Hence the the responsibility of data isolation / data 
> protection lies with Views themselves. Hence, the need for built-in support 
> of context based dynamic parameters in Views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Nagarajan Chinnasamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarajan Chinnasamy updated DRILL-5132:

Description: 
Its known that Views in SQL cannot have custom dynamic parameters/variables.  
Please refer to [Justin 
Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
[this SO 
question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
 in handling dynamic parameterization of views. 

[The PR #685|https://github.com/apache/drill/pull/685] 
[DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
originated based on this requirement so that we could build views that can 
dynamically filter records based on some dynamic values (like current 
tenant-id, user role etc.) 

*Since Drill's basic unit is a View... having such built-in support can bring 
in dynamism into the whole game.*

This feature can be utilized for:

* *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
Discriminator Column
* *Data Protection in building Chained Views* with Custom Dynamic Filters

To explain this further, If we assume that:

# As and when the user connection is established, we populate session context 
with session  parameters such as:
#* Tenant ID of the currently logged in user
#* Roles of the currently logged in user

# We expose the session context information through context-based-functions 
such as:
#* *session_id* -- that returns unique id of the session
#* *session_parameter('')* - that returns the value of the 
session parameter

then a view created with the following kind of query:

{code}
create or replace view dynamic_filter_view as select
   a.field as a_field
   b.field as b_field
from
   a_table as a
left join
   b_table as b
on
   a.bId = b.Id
where
   session_parameter('tenantId')=a.tenantId
{code}

becomes a query that has built-in support for dynamic parameterization that 
only returns records of the tenant of the currently logged in user. This is a 
very useful feature in a shared-multi-tenant environment where data is isolated 
using multi-tenant-descriminator column 'tenantId'.

When building chained views this feature will be useful in filtering records 
based on context based parameters.

This feature will particularly be useful for data isolation / data protection 
with *jdbc storage plugins* where drill-authenticated-credentials are not 
passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
shared credentials. Hence the the responsibility of data isolation / data 
protection lies with Views themselves. Hence, the need for built-in support of 
context based dynamic parameters in Views.



  was:
Its known that Views in SQL cannot have dynamic parameters/variables.  Please 
refer to [Justin 
Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
[this SO 
question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
 in handling dynamic parameterization of views. 

[The PR #685|https://github.com/apache/drill/pull/685] 
[DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
originated based on this requirement so that we could build views that can 
dynamically filter records based on some dynamic values (like current 
tenant-id, user role etc.) 

*Since Drill's basic unit is a View... having such built-in support can bring 
in dynamism into the whole game.*

This feature can be utilized for:

* *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
Discriminator Column
* *Data Protection in building Chained Views* with Custom Dynamic Filters

I will post further design details in the comments



> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy
>
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in

[jira] [Issue Comment Deleted] (DRILL-5132) Context based dynamic parameterization of views

2016-12-21 Thread Nagarajan Chinnasamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarajan Chinnasamy updated DRILL-5132:

Comment: was deleted

(was: Lets say we have a *pre-defined documented place* where a *session based 
temporary table* named *context* is created with the following columns:

{code}
session_id, context_type, context_key, context_value
{code}

and say this context table is transparently populated with context based values 
*as and when a user connection (session) is established*

then a view created with the following kind of query:

{code}
create or replace view dynamic_filter_view as select
   a.field as a_field
   b.field as b_field
from
   a_table as a
left join
   b_table as b
on
   a.bId = b.Id
inner join
   context c
on
   c.session_id=session_id() and
   c.context_type='custom' and
   c.context_key='tenandId" and
   c.context_value=a.tenantId
{code}

This becomes a query that has built-in support for dynamic parameterization 
that only exposes records of the current tenantId of the current context.

The purpose of context_type column is to inject system defined context values 
and custom context values.

Custom context values can be obtained through a *custom-context-provider* (like 
custom-authenticator)

System defined context_types can be *drill.system*, *drill.query* etc.

Does that sound elegant and sensible?? :))

> Context based dynamic parameterization of views
> ---
>
> Key: DRILL-5132
> URL: https://issues.apache.org/jira/browse/DRILL-5132
> Project: Apache Drill
>  Issue Type: Wish
>  Components:  Server
>Reporter: Nagarajan Chinnasamy
>Priority: Critical
>  Labels: authentication, context, isolation, jdbcstorage, 
> multi-tenancy
>
> Its known that Views in SQL cannot have custom dynamic parameters/variables.  
> Please refer to [Justin 
> Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to 
> [this SO 
> question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql]
>  in handling dynamic parameterization of views. 
> [The PR #685|https://github.com/apache/drill/pull/685] 
> [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] 
> originated based on this requirement so that we could build views that can 
> dynamically filter records based on some dynamic values (like current 
> tenant-id, user role etc.) 
> *Since Drill's basic unit is a View... having such built-in support can bring 
> in dynamism into the whole game.*
> This feature can be utilized for:
> * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant 
> Discriminator Column
> * *Data Protection in building Chained Views* with Custom Dynamic Filters
> To explain this further, If we assume that:
> # As and when the user connection is established, we populate session context 
> with session  parameters such as:
> #* Tenant ID of the currently logged in user
> #* Roles of the currently logged in user
> # We expose the session context information through context-based-functions 
> such as:
> #* *session_id* -- that returns unique id of the session
> #* *session_parameter('')* - that returns the value of the 
> session parameter
> then a view created with the following kind of query:
> {code}
> create or replace view dynamic_filter_view as select
>a.field as a_field
>b.field as b_field
> from
>a_table as a
> left join
>b_table as b
> on
>a.bId = b.Id
> where
>session_parameter('tenantId')=a.tenantId
> {code}
> becomes a query that has built-in support for dynamic parameterization that 
> only returns records of the tenant of the currently logged in user. This is a 
> very useful feature in a shared-multi-tenant environment where data is 
> isolated using multi-tenant-descriminator column 'tenantId'.
> When building chained views this feature will be useful in filtering records 
> based on context based parameters.
> This feature will particularly be useful for data isolation / data protection 
> with *jdbc storage plugins* where drill-authenticated-credentials are not 
> passed to jdbc connection authentication. A jdbc storage  has hard-coded, 
> shared credentials. Hence the the responsibility of data isolation / data 
> protection lies with Views themselves. Hence, the need for built-in support 
> of context based dynamic parameters in Views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

50 matches

Mail list logo