[jira] [Commented] (DRILL-5068) Add a new system table for completed profiles
[ https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769388#comment-15769388 ] ASF GitHub Bot commented on DRILL-5068: --- GitHub user zbdzzg reopened a pull request: https://github.com/apache/drill/pull/668 DRILL-5068: Add a new system table for completed profiles Add table "sys.profiles" for completed queries. Following fields added: 1. queryID (String) 2. time (Timestamp) 3. latency (long) 4. user (String) 5. query (String) 6. state (String) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zbdzzg/drill profile_query Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #668 commit e05a999dc8ace315966cbbdb72b3e52d3d956bbd Author: hongze.zhz Date: 2016-12-22T07:47:42Z DRILL-5068: Add a new system table for completed profiles > Add a new system table for completed profiles > - > > Key: DRILL-5068 > URL: https://issues.apache.org/jira/browse/DRILL-5068 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Affects Versions: 1.8.0 > Environment: Fedora 25 > OpenJDK 8 > Firefox 50.0 >Reporter: Hongze Zhang >Assignee: Hongze Zhang > Fix For: Future > > > Hi, > Currently the profile page on UI is still not detailed enough for some > complicated uses (eg. show all failed queries during these three days), we > can only access latest 100 query profiles on this page. > We may sometimes need a specific system table for querying completed profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5068) Add a new system table for completed profiles
[ https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769381#comment-15769381 ] ASF GitHub Bot commented on DRILL-5068: --- Github user zbdzzg closed the pull request at: https://github.com/apache/drill/pull/668 > Add a new system table for completed profiles > - > > Key: DRILL-5068 > URL: https://issues.apache.org/jira/browse/DRILL-5068 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Affects Versions: 1.8.0 > Environment: Fedora 25 > OpenJDK 8 > Firefox 50.0 >Reporter: Hongze Zhang >Assignee: Hongze Zhang > Fix For: Future > > > Hi, > Currently the profile page on UI is still not detailed enough for some > complicated uses (eg. show all failed queries during these three days), we > can only access latest 100 query profiles on this page. > We may sometimes need a specific system table for querying completed profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4602) Avro files dont work if the union format is ["some-type", "null"]
[ https://issues.apache.org/jira/browse/DRILL-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769345#comment-15769345 ] Khurram Faraaz commented on DRILL-4602: --- [~chr1st1anh] you should create a pull request and some one here will review and merge your fix, if all tests run clean. > Avro files dont work if the union format is ["some-type", "null"] > - > > Key: DRILL-4602 > URL: https://issues.apache.org/jira/browse/DRILL-4602 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Christian > Labels: easyfix, patch > Fix For: Future > > Attachments: DRILL-4602.patch > > > An avro file generated by a different system (e.g. Spark) can have a slightly > different union format, that is not understood by drill. For example > ["some-type", "null"] will cause an error when [ "null", "some-type"] still > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5068) Add a new system table for completed profiles
[ https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769033#comment-15769033 ] Hongze Zhang commented on DRILL-5068: - [~khfaraaz] Hi, Is this thing useful for next version of Drill ? Thanks! > Add a new system table for completed profiles > - > > Key: DRILL-5068 > URL: https://issues.apache.org/jira/browse/DRILL-5068 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Affects Versions: 1.8.0 > Environment: Fedora 25 > OpenJDK 8 > Firefox 50.0 >Reporter: Hongze Zhang >Assignee: Hongze Zhang > Fix For: Future > > > Hi, > Currently the profile page on UI is still not detailed enough for some > complicated uses (eg. show all failed queries during these three days), we > can only access latest 100 query profiles on this page. > We may sometimes need a specific system table for querying completed profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.
[ https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongze Zhang closed DRILL-5028. --- Resolution: Later > Opening profiles page from web ui gets very slow when a lot of history files > have been stored in HDFS or Local FS. > -- > > Key: DRILL-5028 > URL: https://issues.apache.org/jira/browse/DRILL-5028 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Hongze Zhang >Priority: Minor > Fix For: Future > > > We have a Drill cluster with 20+ Nodes and we store all history profiles in > hdfs. Without doing periodically cleans for hdfs, the profiles page gets > slower while serving more queries. > Code from LocalPersistentStore.java uses fs.list(false, basePath) for > fetching the latest 100 history profiles by default, I guess this operation > blocks the page loading (Millions small files can be stored in the basePath), > maybe we can try some other ways to reach the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-5054) Provide jquery-ui and jquery-dataTables locally
[ https://issues.apache.org/jira/browse/DRILL-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongze Zhang closed DRILL-5054. --- Resolution: Later > Provide jquery-ui and jquery-dataTables locally > --- > > Key: DRILL-5054 > URL: https://issues.apache.org/jira/browse/DRILL-5054 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.8.0 > Environment: Fedora 24 / OpenJDK 8 / FireFox 50.0 >Reporter: Hongze Zhang >Priority: Minor > Fix For: Future > > > Hi, > Currently Drill uses CDN for serving source files of jquery-ui and > jquery-dataTables. This is OK for most cases, but not working in an isolated > environment. > This is a patch adding these files so that Drill will work fine in intranet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-5068) Add a new system table for completed profiles
[ https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongze Zhang reassigned DRILL-5068: --- Assignee: Hongze Zhang > Add a new system table for completed profiles > - > > Key: DRILL-5068 > URL: https://issues.apache.org/jira/browse/DRILL-5068 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Affects Versions: 1.8.0 > Environment: Fedora 25 > OpenJDK 8 > Firefox 50.0 >Reporter: Hongze Zhang >Assignee: Hongze Zhang > Fix For: Future > > > Hi, > Currently the profile page on UI is still not detailed enough for some > complicated uses (eg. show all failed queries during these three days), we > can only access latest 100 query profiles on this page. > We may sometimes need a specific system table for querying completed profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5068) Add a new system table for completed profiles
[ https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongze Zhang updated DRILL-5068: Assignee: (was: Sudheesh Katkam) Reviewer: Khurram Faraaz > Add a new system table for completed profiles > - > > Key: DRILL-5068 > URL: https://issues.apache.org/jira/browse/DRILL-5068 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Affects Versions: 1.8.0 > Environment: Fedora 25 > OpenJDK 8 > Firefox 50.0 >Reporter: Hongze Zhang > Fix For: Future > > > Hi, > Currently the profile page on UI is still not detailed enough for some > complicated uses (eg. show all failed queries during these three days), we > can only access latest 100 query profiles on this page. > We may sometimes need a specific system table for querying completed profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5068) Add a new system table for completed profiles
[ https://issues.apache.org/jira/browse/DRILL-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongze Zhang updated DRILL-5068: Assignee: Sudheesh Katkam Component/s: (was: Metadata) Storage - Information Schema > Add a new system table for completed profiles > - > > Key: DRILL-5068 > URL: https://issues.apache.org/jira/browse/DRILL-5068 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Affects Versions: 1.8.0 > Environment: Fedora 25 > OpenJDK 8 > Firefox 50.0 >Reporter: Hongze Zhang >Assignee: Sudheesh Katkam > Fix For: Future > > > Hi, > Currently the profile page on UI is still not detailed enough for some > complicated uses (eg. show all failed queries during these three days), we > can only access latest 100 query profiles on this page. > We may sometimes need a specific system table for querying completed profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5125) Provide option to use generic code for sv remover
[ https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768855#comment-15768855 ] ASF GitHub Bot commented on DRILL-5125: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/704 DRILL-5125: Provide option to use generic code for sv remover Performance tests showed that, for queries with a large number of columns, it is faster to use a “generic” implementation of the selection vector remover “copier” than to use a generated version. This PR provides "generic" versions of the SV2 and SV4 copiers used by the selection vector remover. The generic forms are enabled using a new boot-time config parameter that is disabled by default (preserving the traditional generated code.) The generic form relies on a "virtual function" (really, just a plain Java function) defined in the base ValueVector class and implemented by each concrete vector: both the pre-defined and generated forms. This form "does the right thing" for the copy operation so that we don't need to generate code just to handle the method dispatch operation (which Java does quite well on its own.) A unit tests validates that the generic form works by runing the existing SV remover tests with the generic option turned on. See the DRILL-5125 for details. Add test You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5125 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/704.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #704 commit ba3a38a403b140149d1605decefae088765ead56 Author: Paul Rogers Date: 2016-12-12T18:06:43Z DRILL-5125: Provide option to use generic code for sv remover Performance tests showed that, for queries with a large number of columns, it is faster to use a “generic” implementation of the selection vector remover “copier” than to use a generated version. This PR provides "generic" versions of the SV2 and SV4 copiers used by the selection vector remover. The generic forms are enabled using a new boot-time config parameter that is disabled by default (preserving the traditional generated code.) The generic form relies on a "virtual function" (really, just a plain Java function) defined in the base ValueVector class and implemented by each concrete vector: both the pre-defined and generated forms. This form "does the right thing" for the copy operation so that we don't need to generate code just to handle the method dispatch operation (which Java does quite well on its own.) A unit tests validates that the generic form works by runing the existing SV remover tests with the generic option turned on. See the DRILL-5125 for details. Add test > Provide option to use generic code for sv remover > - > > Key: DRILL-5125 > URL: https://issues.apache.org/jira/browse/DRILL-5125 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > Consider a non-typical Drill query: one with 6000 rows but 243 fields. > Consider this query: > {code} > select * from (select *, row_number() over(order by somedate) as rn from > dfs.`/some/path/data.json`) where rn=10 > {code} > This produces a query with the following structure: > {code} > 00-00Screen > 00-01 ProjectAllowDup(*=[$0], rn=[$1]) > 00-02Project(T0¦¦*=[$0], w0$o0=[$2]) > 00-03 SelectionVectorRemover > 00-04Filter(condition=[=($2, 10)]) > 00-05 Window(window#0=[window(partition {} order by [1] rows > between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]) > 00-06SelectionVectorRemover > 00-07 Sort(sort0=[$1], dir0=[ASC]) > 00-08Project(T0¦¦*=[$0], validitydate=[$1]) > 00-09 Scan(groupscan=...) > {code} > Instrumenting, the code to measure compile time, two “long poles” stood out: > {code} > Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500 > Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659 > {code} > Much of the initial run time of 5578 ms is taken up in compiling two classes > (2159 ms). > The classes themselves are very simple: create member variables for 486 > vectors (2 x column count), and call a method on each to do the copy. The > only type-specific work is the member variable
[jira] [Created] (DRILL-5152) Enhance the mock data source: better data, SQL access
Paul Rogers created DRILL-5152: -- Summary: Enhance the mock data source: better data, SQL access Key: DRILL-5152 URL: https://issues.apache.org/jira/browse/DRILL-5152 Project: Apache Drill Issue Type: Improvement Components: Tools, Build & Test Affects Versions: 1.9.0 Reporter: Paul Rogers Assignee: Paul Rogers Priority: Minor Drill provides a mock data storage engine that generates random data. The mock engine is used in some older unit tests that need a volume of data, but that are not too particular about the details of the data. The mock data source continues to have use even for modern tests. For example, the work in the external storage batch requires tests with varying amounts of data, but the exact form of the data is not important, just the quantity. For example, if we want to ensure that spilling happens at various trigger points, we need to read the right amount of data for that trigger. The existing mock data source has two limitations: 1. It generates only "black/white" (alternating) values, which is awkward for use in sorting. 2. The mock generator is accessible only from a physical plan, but not from SQL queries. This enhancement proposes to fix both limitations: 1. Generate a uniform, randomly distributed set of values. 2. Provide an encoding that lets a SQL query specify the data to be generated. Example SQL query: {code} SELECT id_i, name_s50 FROM `mock`.employee_10K; {code} The above says to generate two fields: INTEGER (the "_i" suffix) and VARCHAR(50) (the "_s50") suffix; and to generate 10,000 rows (the "_10K" suffix on the table.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan
[ https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768830#comment-15768830 ] ASF GitHub Bot commented on DRILL-5104: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/703#discussion_r93558749 --- Diff: logical/src/main/java/org/apache/drill/common/logical/PlanProperties.java --- @@ -112,8 +121,13 @@ public PlanPropertiesBuilder generator(Generator generator) { return this; } +public PlanPropertiesBuilder generator(boolean hasResourcePlan) { --- End diff -- Fixed. > Foreman sets external sort memory allocation even for a physical plan > - > > Key: DRILL-5104 > URL: https://issues.apache.org/jira/browse/DRILL-5104 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > > Consider the (disabled) unit test > {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical > plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of > memory to allocate: > {code} >{ > ... > pop:"external-sort", > ... > initialAllocation: 100, > maxAllocation: 3000 > }, > {code} > When run, the amount of memory is set to 715827882. The reason is that code > was added to {{Foreman}} to compute the memory to allocate to the external > sort: > {code} > private void runPhysicalPlan(final PhysicalPlan plan) throws > ExecutionSetupException { > validatePlan(plan); > MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext); > {code} > The problem is that a physical plan should execute as provided to enable > detailed testing. > To solve this problem, move the sort memory setup to the path taken by SQL > queries, but not via physical plans. > This change is necessary to re-enable the previously-disabled external sort > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan
[ https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768813#comment-15768813 ] ASF GitHub Bot commented on DRILL-5104: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/703#discussion_r93556996 --- Diff: logical/src/main/java/org/apache/drill/common/logical/PlanProperties.java --- @@ -112,8 +121,13 @@ public PlanPropertiesBuilder generator(Generator generator) { return this; } +public PlanPropertiesBuilder generator(boolean hasResourcePlan) { --- End diff -- The method's name should not be **generator** but something about having a resource plan > Foreman sets external sort memory allocation even for a physical plan > - > > Key: DRILL-5104 > URL: https://issues.apache.org/jira/browse/DRILL-5104 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > > Consider the (disabled) unit test > {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical > plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of > memory to allocate: > {code} >{ > ... > pop:"external-sort", > ... > initialAllocation: 100, > maxAllocation: 3000 > }, > {code} > When run, the amount of memory is set to 715827882. The reason is that code > was added to {{Foreman}} to compute the memory to allocate to the external > sort: > {code} > private void runPhysicalPlan(final PhysicalPlan plan) throws > ExecutionSetupException { > validatePlan(plan); > MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext); > {code} > The problem is that a physical plan should execute as provided to enable > detailed testing. > To solve this problem, move the sort memory setup to the path taken by SQL > queries, but not via physical plans. > This change is necessary to re-enable the previously-disabled external sort > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan
[ https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768720#comment-15768720 ] ASF GitHub Bot commented on DRILL-5104: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/703 DRILL-5104: Foreman should not set sort memory for a physical plan Physical plans include a plan for memory allocations. However, the code path in Foreman replans external sort memory, even for a physical plan. This makes it impossible to use a physical plan to test memory configuration. This change avoids changing memory settings in a physical plan; while preserving the adjustments for logical plans or SQL queries. Revised to put a property in the plan itself. Old plans, and those generated from SQL, will have memory allocations applied. Plans marked as already "resource management" planned will be used as-is. Includes a unit test that demonstrates the new behavior. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5104 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #703 commit 25a7f9ed45b97f9be5971fa979c1b408a6311d8e Author: Paul Rogers Date: 2016-12-13T22:36:42Z DRILL-5104: Foreman should not set external sort memory for a physical plan Physical plans include a plan for memory allocations. However, the code path in Foreman replans external sort memory, even for a physical plan. This makes it impossible to use a physical plan to test memory configuration. This change avoids changing memory settings in a physical plan; while preserving the adjustments for logical plans or SQL queries. Revised to put a property in the plan itself. Old plans, and those generated from SQL, will have memory allocations applied. Plans marked as already "resource management" planned will be used as-is. Includes a unit test that demonstrates the new behavior. > Foreman sets external sort memory allocation even for a physical plan > - > > Key: DRILL-5104 > URL: https://issues.apache.org/jira/browse/DRILL-5104 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > > Consider the (disabled) unit test > {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical > plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of > memory to allocate: > {code} >{ > ... > pop:"external-sort", > ... > initialAllocation: 100, > maxAllocation: 3000 > }, > {code} > When run, the amount of memory is set to 715827882. The reason is that code > was added to {{Foreman}} to compute the memory to allocate to the external > sort: > {code} > private void runPhysicalPlan(final PhysicalPlan plan) throws > ExecutionSetupException { > validatePlan(plan); > MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext); > {code} > The problem is that a physical plan should execute as provided to enable > detailed testing. > To solve this problem, move the sort memory setup to the path taken by SQL > queries, but not via physical plans. > This change is necessary to re-enable the previously-disabled external sort > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe
[ https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768644#comment-15768644 ] Chunhui Shi commented on DRILL-5151: The fix is on calcite side. > ConventionTraitDef.plannerConversionMap is not thread safe > -- > > Key: DRILL-5151 > URL: https://issues.apache.org/jira/browse/DRILL-5151 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Chunhui Shi >Assignee: Chunhui Shi > > We are using static instance ConventionTraitDef.INSTANCE globally and > plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class > is not threadsafe. And the data in the map could corrupt and cause dead loop > or other data error. > > private final WeakHashMap > plannerConversionMap = > new WeakHashMap(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe
[ https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunhui Shi updated DRILL-5151: --- Priority: Major (was: Critical) > ConventionTraitDef.plannerConversionMap is not thread safe > -- > > Key: DRILL-5151 > URL: https://issues.apache.org/jira/browse/DRILL-5151 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Chunhui Shi >Assignee: Chunhui Shi > > We are using static instance ConventionTraitDef.INSTANCE globally and > plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class > is not threadsafe. And the data in the map could corrupt and cause dead loop > or other data error. > > private final WeakHashMap > plannerConversionMap = > new WeakHashMap(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe
Chunhui Shi created DRILL-5151: -- Summary: ConventionTraitDef.plannerConversionMap is not thread safe Key: DRILL-5151 URL: https://issues.apache.org/jira/browse/DRILL-5151 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Chunhui Shi Assignee: Chunhui Shi Priority: Critical We are using static instance ConventionTraitDef.INSTANCE globally and plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class is not threadsafe. And the data in the map could corrupt and cause dead loop or other data error. private final WeakHashMap plannerConversionMap = new WeakHashMap(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5150) JDBC connections cause drillbit leaks resources and eventually JVM crashes
[ https://issues.apache.org/jira/browse/DRILL-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768354#comment-15768354 ] Chun Chang commented on DRILL-5150: --- Forgot to mention this happened with impersonation enabled. > JDBC connections cause drillbit leaks resources and eventually JVM crashes > -- > > Key: DRILL-5150 > URL: https://issues.apache.org/jira/browse/DRILL-5150 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.9.0 >Reporter: Chun Chang > Attachments: hs_err_pid22724.log > > > Stress test JDBC connections by making connections and disconnect. Very soon, > drillbit will crash due to resource leaks. This was observed with Apache > DRILL JDBC driver. Testing with a third party driver did not cause the crash. > Will upload JVM dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5150) JDBC connections cause drillbit leaks resources and eventually JVM crashes
[ https://issues.apache.org/jira/browse/DRILL-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chang updated DRILL-5150: -- Attachment: hs_err_pid22724.log > JDBC connections cause drillbit leaks resources and eventually JVM crashes > -- > > Key: DRILL-5150 > URL: https://issues.apache.org/jira/browse/DRILL-5150 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.9.0 >Reporter: Chun Chang > Attachments: hs_err_pid22724.log > > > Stress test JDBC connections by making connections and disconnect. Very soon, > drillbit will crash due to resource leaks. This was observed with Apache > DRILL JDBC driver. Testing with a third party driver did not cause the crash. > Will upload JVM dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5150) JDBC connections cause drillbit leaks resources and eventually JVM crashes
Chun Chang created DRILL-5150: - Summary: JDBC connections cause drillbit leaks resources and eventually JVM crashes Key: DRILL-5150 URL: https://issues.apache.org/jira/browse/DRILL-5150 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Affects Versions: 1.9.0 Reporter: Chun Chang Stress test JDBC connections by making connections and disconnect. Very soon, drillbit will crash due to resource leaks. This was observed with Apache DRILL JDBC driver. Testing with a third party driver did not cause the crash. Will upload JVM dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5149) Planner Optimization : Filter should get pushed into the sub-query
Rahul Challapalli created DRILL-5149: Summary: Planner Optimization : Filter should get pushed into the sub-query Key: DRILL-5149 URL: https://issues.apache.org/jira/browse/DRILL-5149 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.10.0 Reporter: Rahul Challapalli git.commit.id.abbrev=cf2b7c7 The below plan can be optimized to push the filter into the subquery and also to eliminate redundant projects {code} explain plan for select * from (select * from dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by columns[0]) d where d.columns[0] = '4041054511'; 00-00Screen : rowType = RecordType(ANY *): rowcount = 1.436392845E7, cumulative cost = {8.776360282950001E8 rows, 1.4059092422298168E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11452 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.436392845E7, cumulative cost = {8.7619963545E8 rows, 1.4057656029453169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11451 00-02SelectionVectorRemover : rowType = RecordType(ANY T18¦¦*): rowcount = 1.436392845E7, cumulative cost = {8.7619963545E8 rows, 1.4057656029453169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11450 00-03 Filter(condition=[=(ITEM(ITEM($0, 'columns'), 0), '4041054511')]) : rowType = RecordType(ANY T18¦¦*): rowcount = 1.436392845E7, cumulative cost = {8.61835707E8 rows, 1.4043292101003168E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11449 00-04Project(T18¦¦*=[$0]) : rowType = RecordType(ANY T18¦¦*): rowcount = 9.5759523E7, cumulative cost = {7.66076184E8 rows, 1.3602798295203169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11448 00-05 SingleMergeExchange(sort0=[1 ASC]) : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {7.66076184E8 rows, 1.3602798295203169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11447 01-01SelectionVectorRemover : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {6.70316661E8 rows, 1.2836722111203169E10 cpu, 0.0 io, 1.176693018624E12 network, 1.532152368E9 memory}, id = 11446 01-02 Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {5.74557138E8 rows, 1.2740962588203169E10 cpu, 0.0 io, 1.176693018624E12 network, 1.532152368E9 memory}, id = 11445 01-03Project(T18¦¦*=[$0], EXPR$1=[$1]) : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {4.78797615E8 rows, 2.585507121E9 cpu, 0.0 io, 1.176693018624E12 network, 0.0 memory}, id = 11444 01-04 HashToRandomExchange(dist0=[[$1]]) : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, cumulative cost = {4.78797615E8 rows, 2.585507121E9 cpu, 0.0 io, 1.176693018624E12 network, 0.0 memory}, id = 11443 02-01UnorderedMuxExchange : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, cumulative cost = {3.83038092E8 rows, 1.053354753E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11442 03-01 Project(T18¦¦*=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, cumulative cost = {2.87278569E8 rows, 9.5759523E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11441 03-02Project(T18¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {1.91519046E8 rows, 5.74557138E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11440 03-03 Project(T18¦¦*=[$0], columns=[$1]) : rowType = RecordType(ANY T18¦¦*, ANY columns): rowcount = 9.5759523E7, cumulative cost = {9.5759523E7 rows, 1.91519046E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11439 03-04Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]]) : rowType = (DrillRecordRow[*, columns]): rowcount = 9.5759523E7, cumulative cost = {9.5759523E7 rows, 1.91519046E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 11438 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5148) Replace hash-distribution with a simple round-robin distribution for a simple order by query
Rahul Challapalli created DRILL-5148: Summary: Replace hash-distribution with a simple round-robin distribution for a simple order by query Key: DRILL-5148 URL: https://issues.apache.org/jira/browse/DRILL-5148 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Query Planning & Optimization Affects Versions: 1.10.0 Reporter: Rahul Challapalli git.commit.id.abbrev=cf2b7c7 The below plan indicates that we use hash-distribution to avoid data skew. However in the below case a simple round-robin approach would be sufficient {code} explain plan for select * from dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by columns[0]; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T2¦¦*=[$0]) 00-03 SingleMergeExchange(sort0=[1 ASC]) 01-01SelectionVectorRemover 01-02 Sort(sort0=[$1], dir0=[ASC]) 01-03Project(T2¦¦*=[$0], EXPR$1=[$1]) 01-04 HashToRandomExchange(dist0=[[$1]]) 02-01UnorderedMuxExchange 03-01 Project(T2¦¦*=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) 03-02Project(T2¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) 03-03 Project(T2¦¦*=[$0], columns=[$1]) 03-04Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-5147) Doc update: Support impersonation through Web Console
[ https://issues.apache.org/jira/browse/DRILL-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens reassigned DRILL-5147: - Assignee: Bridget Bevens > Doc update: Support impersonation through Web Console > - > > Key: DRILL-5147 > URL: https://issues.apache.org/jira/browse/DRILL-5147 > Project: Apache Drill > Issue Type: Task > Components: Documentation >Reporter: Bridget Bevens >Assignee: Bridget Bevens >Priority: Minor > > Maybe the doc should say that Drill supports impersonation through web > console. These clients use Java client library, just like JDBC. > Note that *inbound* impersonation is not supported yet because Drill does not > expose an “impersonation_target” field through the web login form. > Thank you, > Sudheesh > > On Dec 21, 2016, at 10:08 AM, Akihiko Kusanagi > > wrote: > > > > Hi, > > > > The 'Impersonation Support' table In the following page says that > > impersonation > > is not supported with Drill Web Console or REST API. > > http://drill.apache.org/docs/configuring-user-impersonation/ > > > > However, when authentication and impersonation are enabled, impersonation is > > in effect through Web UI. > > > > $ cat drill-override.conf > > ... > > drill.exec: { > > ... > > impersonation: { > > enabled: true > > }, > > ... > > > > Only mapr user has read permission for nation.parquet, and Drillbit is > > running as mapr user. > > > > $ hadoop fs -ls /sample-data > > ... > > drwx-- - mapr mapr 1210 2016-01-11 19:58 nation.parquet > > ... > > > > Then, login as the other user via Drill Web UI, and run this query: > > > > select * from dfs.`/sample-data/nation.parquet` > > > > This returns the following error, so it seems that impersonation is in > > effect. > > > > Query Failed: An Error Occurred > > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > > IOException: 2049.177.8452826 /sample-data/nation.parquet (Input/output > > error) Fragment 0:0 [Error Id: 91684467-8a4f-4fb8-8ad7-6ee04b7f8f53 on > > node3:31010] > > > > When drill.exec.impersonation.enabled = false, the query above returns > > multiple rows. > > > > Is this expected behavior? Does the document need to be updated? > > > > Thanks, > > Aki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5147) Doc update: Support impersonation through Web Console
Bridget Bevens created DRILL-5147: - Summary: Doc update: Support impersonation through Web Console Key: DRILL-5147 URL: https://issues.apache.org/jira/browse/DRILL-5147 Project: Apache Drill Issue Type: Task Components: Documentation Reporter: Bridget Bevens Priority: Minor Maybe the doc should say that Drill supports impersonation through web console. These clients use Java client library, just like JDBC. Note that *inbound* impersonation is not supported yet because Drill does not expose an “impersonation_target” field through the web login form. Thank you, Sudheesh > On Dec 21, 2016, at 10:08 AM, Akihiko Kusanagi wrote: > > Hi, > > The 'Impersonation Support' table In the following page says that > impersonation > is not supported with Drill Web Console or REST API. > http://drill.apache.org/docs/configuring-user-impersonation/ > > However, when authentication and impersonation are enabled, impersonation is > in effect through Web UI. > > $ cat drill-override.conf > ... > drill.exec: { > ... > impersonation: { > enabled: true > }, > ... > > Only mapr user has read permission for nation.parquet, and Drillbit is > running as mapr user. > > $ hadoop fs -ls /sample-data > ... > drwx-- - mapr mapr 1210 2016-01-11 19:58 nation.parquet > ... > > Then, login as the other user via Drill Web UI, and run this query: > > select * from dfs.`/sample-data/nation.parquet` > > This returns the following error, so it seems that impersonation is in > effect. > > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IOException: 2049.177.8452826 /sample-data/nation.parquet (Input/output > error) Fragment 0:0 [Error Id: 91684467-8a4f-4fb8-8ad7-6ee04b7f8f53 on > node3:31010] > > When drill.exec.impersonation.enabled = false, the query above returns > multiple rows. > > Is this expected behavior? Does the document need to be updated? > > Thanks, > Aki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5088) Error when reading DBRef column
[ https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768134#comment-15768134 ] ASF GitHub Bot commented on DRILL-5088: --- GitHub user chunhui-shi opened a pull request: https://github.com/apache/drill/pull/702 DRILL-5088: set default codec for toJson You can merge this pull request into a Git repository by running: $ git pull https://github.com/chunhui-shi/drill DRILL-5088 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/702.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #702 commit c285a334eaeda810150ff162d1e1c0da342a37ff Author: chunhui-shi Date: 2016-12-18T08:27:50Z DRILL-5088: set default codec for toJson > Error when reading DBRef column > --- > > Key: DRILL-5088 > URL: https://issues.apache.org/jira/browse/DRILL-5088 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Environment: drill 1.9.0 > mongo 3.2 >Reporter: Guillaume Champion >Assignee: Chunhui Shi > > In a mongo database with DBRef, when a DBRef is inserted in the first line of > a mongo's collection drill query failed : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > {code} > Simple example to reproduce: > In mongo instance > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] > (state=,code=0) > {code} > If the first line doesn't contain de DBRef, drill will querying correctly : > In a mongo instance : > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") }); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > +--+---+ > | _id |account > | > +--+---+ > | {"$oid":"582081d96b69060001fd8939"} | {"$id":{}} > | > | {"$oid":"582081d96b69060001fd8938"} | > {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}} | > +--+---+ > 2 rows selected (0,563 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-5088) Error when reading DBRef column
[ https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunhui Shi reassigned DRILL-5088: -- Assignee: Chunhui Shi > Error when reading DBRef column > --- > > Key: DRILL-5088 > URL: https://issues.apache.org/jira/browse/DRILL-5088 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Environment: drill 1.9.0 > mongo 3.2 >Reporter: Guillaume Champion >Assignee: Chunhui Shi > > In a mongo database with DBRef, when a DBRef is inserted in the first line of > a mongo's collection drill query failed : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > {code} > Simple example to reproduce: > In mongo instance > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] > (state=,code=0) > {code} > If the first line doesn't contain de DBRef, drill will querying correctly : > In a mongo instance : > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") }); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > +--+---+ > | _id |account > | > +--+---+ > | {"$oid":"582081d96b69060001fd8939"} | {"$id":{}} > | > | {"$oid":"582081d96b69060001fd8938"} | > {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}} | > +--+---+ > 2 rows selected (0,563 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767899#comment-15767899 ] Sudheesh Katkam commented on DRILL-5132: Would UDFs provide a simpler solution? For example, {code} CREATE VIEW my_salary AS SELECT a.salary FROM a_table AS a WHERE get_tenant_id(session_user()) = a.tenantId; {code} The get_tenant_id UDF could make a call to the external system to get the tenant id. Similar UDFs for other parameters. > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy, session-context, session-parameter > > *Requirement* > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in support can bring > in dynamism into the whole game.* > This feature can be utilized for: > * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant > Discriminator Column > * *Data Protection in building Chained Views* with Custom Dynamic Filters > To explain this further, If we assume that: > # As and when the user connection is established, we populate session context > with session parameters such as: > #* Tenant ID of the currently logged in user > #* Roles of the currently logged in user > # We expose the session context information through context-based-functions > such as: > #* *session_id* -- that returns unique id of the session > #* *session_parameter('')* - that returns the value of the > session parameter > then a view created with the following kind of query: > {code} > create or replace view dynamic_filter_view as select >a.field as a_field >b.field as b_field > from >a_table as a > left join >b_table as b > on >a.bId = b.Id > where >session_parameter('tenantId')=a.tenantId > {code} > becomes a query that has built-in support for dynamic parameterization that > only returns records of the tenant of the currently logged in user. This is a > very useful feature in a shared-multi-tenant environment where data is > isolated using multi-tenant-descriminator column 'tenantId'. > When building chained views this feature will be useful in filtering records > based on context based parameters. > This feature will particularly be useful for data isolation / data protection > with *jdbc storage plugins* where drill-authenticated-credentials are not > passed to jdbc connection authentication. A jdbc storage has hard-coded, > shared credentials. Hence the the responsibility of data isolation / data > protection lies with Views themselves. Hence, the need for built-in support > of context based dynamic parameters in Views. > *Design/Implementation Considerations:* > * Session parameters can be obtained through authenticators so that custom > authenticators can return a HashMap of parameters obtained from external > systems. > * Introduce SessionContext to hold sessionId and sessionParameters > * Implement context based functions session_id and session_parameter() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5146) Unnecessary spilling to disk by sort when we only have 5000 rows with one column
Rahul Challapalli created DRILL-5146: Summary: Unnecessary spilling to disk by sort when we only have 5000 rows with one column Key: DRILL-5146 URL: https://issues.apache.org/jira/browse/DRILL-5146 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli git.commit.id.abbrev=cf2b7c7 The below query spills to disk for the sort. The dataset contains 5000 files and each file contains a single record. {code} select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by columns[1]; {code} Enviironment : {code} DRILL_MAX_DIRECT_MEMORY="16G" DRILL_MAX_HEAP="4G" {code} I attached the dataset, logs and the profile -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables
[ https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767686#comment-15767686 ] ASF GitHub Bot commented on DRILL-5137: --- Github user adityakishore commented on a diff in the pull request: https://github.com/apache/drill/pull/700#discussion_r93487747 --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java --- @@ -124,6 +124,11 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su } else { rowKeyOnly = false; transformed.add(ROW_KEY_PATH); + /* DRILL-5137 - optimize count(*) queries on MapR-DB Binary tables */ + if (isSkipQuery()) { --- End diff -- Further optimization can be name by returning only a `count` vector in the `next()` call, similar to [this](https://github.com/apache/drill/blob/master/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java#L203-L204). > Optimize count(*) queries on MapR-DB Binary Tables > -- > > Key: DRILL-5137 > URL: https://issues.apache.org/jira/browse/DRILL-5137 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase, Storage - MapRDB >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > This is related to DRILL-5065, but applies to MapR-DB Binary tables -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables
[ https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767685#comment-15767685 ] ASF GitHub Bot commented on DRILL-5137: --- Github user adityakishore commented on a diff in the pull request: https://github.com/apache/drill/pull/700#discussion_r93486944 --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java --- @@ -124,6 +124,11 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su } else { rowKeyOnly = false; transformed.add(ROW_KEY_PATH); + /* DRILL-5137 - optimize count(*) queries on MapR-DB Binary tables */ --- End diff -- This branches into the else part of `if (!isStarQuery()) {` Can you verify if a query can be both Star and Skip query at the same time when count(*) has been requested? > Optimize count(*) queries on MapR-DB Binary Tables > -- > > Key: DRILL-5137 > URL: https://issues.apache.org/jira/browse/DRILL-5137 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase, Storage - MapRDB >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > This is related to DRILL-5065, but applies to MapR-DB Binary tables -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nagarajan Chinnasamy updated DRILL-5132: Labels: authentication context isolation jdbcstorage multi-tenancy session-context session-parameter (was: authentication context isolation jdbcstorage multi-tenancy) > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy, session-context, session-parameter > > *Requirement* > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in support can bring > in dynamism into the whole game.* > This feature can be utilized for: > * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant > Discriminator Column > * *Data Protection in building Chained Views* with Custom Dynamic Filters > To explain this further, If we assume that: > # As and when the user connection is established, we populate session context > with session parameters such as: > #* Tenant ID of the currently logged in user > #* Roles of the currently logged in user > # We expose the session context information through context-based-functions > such as: > #* *session_id* -- that returns unique id of the session > #* *session_parameter('')* - that returns the value of the > session parameter > then a view created with the following kind of query: > {code} > create or replace view dynamic_filter_view as select >a.field as a_field >b.field as b_field > from >a_table as a > left join >b_table as b > on >a.bId = b.Id > where >session_parameter('tenantId')=a.tenantId > {code} > becomes a query that has built-in support for dynamic parameterization that > only returns records of the tenant of the currently logged in user. This is a > very useful feature in a shared-multi-tenant environment where data is > isolated using multi-tenant-descriminator column 'tenantId'. > When building chained views this feature will be useful in filtering records > based on context based parameters. > This feature will particularly be useful for data isolation / data protection > with *jdbc storage plugins* where drill-authenticated-credentials are not > passed to jdbc connection authentication. A jdbc storage has hard-coded, > shared credentials. Hence the the responsibility of data isolation / data > protection lies with Views themselves. Hence, the need for built-in support > of context based dynamic parameters in Views. > *Design/Implementation Considerations:* > * Session parameters can be obtained through authenticators so that custom > authenticators can return a HashMap of parameters obtained from external > systems. > * Introduce SessionContext to hold sessionId and sessionParameters > * Implement context based functions session_id and session_parameter() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
[ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767219#comment-15767219 ] ASF GitHub Bot commented on DRILL-4963: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/701 DRILL-4963: Sync remote and local function registries before query ex… …ecution Lazy-init was performed only when function was not found during Calcite parsing but DRILL-4963 shows different cases when Calcite parsing can pass (usually during function overloading) but still function is not found. To handle such cases, we need to sync remote and local function registries before query execution. To make this sync as much light-weight as possible we first compare remote and local function registries versions and start looking for missing jars only when versions do not match. Under local function registry is implied remote function registry version with which local function registry was synchronized last time. Changes: 1. Add `consists` method to PersistentStore interface which can return true if key exists in store, false otherwise. This method is needed to return only remote function registry version without its content (unlike method `get`). We'll pull remote function registry content only if versions are different. 2. Added check if remote and local function registries are in sync before query execution on planning and execution stages. 3. Removed unused methods and changes connected with lazy-init implementation on failure only. 4. Added additional debug messages for `CreateFunctionHandler` and `DropFunctionHandler`. 5. Updated unit tests to reflect new changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-4963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/701.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #701 commit 51ef6614a2c27cb6bb58fb0de875952f99e9b102 Author: Arina Ielchiieva Date: 2016-12-20T16:57:15Z DRILL-4963: Sync remote and local function registries before query execution > Issues when overloading Drill native functions with dynamic UDFs > > > Key: DRILL-4963 > URL: https://issues.apache.org/jira/browse/DRILL-4963 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Fix For: Future > > Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, > test_overloading-1.0-sources.jar, test_overloading-1.0.jar > > > I created jar file which overloads 3 DRILL native functions > (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and > ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. > If I try to use my functions I will get errors: > {code:xml} > SELECT CURRENT_DATE('test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) > SQL Query null > {code:xml} > SELECT ABS('test','test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) > SQL Query null > {code:xml} > SELECT LOG('test') FROM (VALUES(1)); > {code} > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator LOG('test'). Errors: > Error in expression at index -1. Error: Missing function implementation: > castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. > But if I rerun all this queries after "DrillRuntimeException", they will run > correctly. It seems that Drill have not updated the function signature before > that error. Also if I add jar as usual UDF (copy jar to > /drill_home/jars/3rdparty and restart drillbits), all queries will run > correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
[ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766917#comment-15766917 ] Arina Ielchiieva commented on DRILL-4963: - All these errors are connected with lazy-init during query execution. For example, for current_date and abs function, lazy-init does not happen since they pass Calcite validation and then Drill determines that there is no matching function and throws Function Error. Since we expected only Calcite function not found exception, we did not catch Drill function error and did not start lazy-init. For log function situation is a little different, since there are many versions of log function but even though Drill didn't find exactly matching function, it decides that he can cast initial value to match found function signature. To solve this the best way is to check if remote and local registries are in sync before query execution. To make this check the most light-weight as possible, we store locally remote function registry version and compare it with actual remote function registry version. Only if versions do not match, we'll look for missing jars. > Issues when overloading Drill native functions with dynamic UDFs > > > Key: DRILL-4963 > URL: https://issues.apache.org/jira/browse/DRILL-4963 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Fix For: Future > > Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, > test_overloading-1.0-sources.jar, test_overloading-1.0.jar > > > I created jar file which overloads 3 DRILL native functions > (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and > ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. > If I try to use my functions I will get errors: > {code:xml} > SELECT CURRENT_DATE('test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) > SQL Query null > {code:xml} > SELECT ABS('test','test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) > SQL Query null > {code:xml} > SELECT LOG('test') FROM (VALUES(1)); > {code} > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator LOG('test'). Errors: > Error in expression at index -1. Error: Missing function implementation: > castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. > But if I rerun all this queries after "DrillRuntimeException", they will run > correctly. It seems that Drill have not updated the function signature before > that error. Also if I add jar as usual UDF (copy jar to > /drill_home/jars/3rdparty and restart drillbits), all queries will run > correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
[ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4963: Fix Version/s: Future > Issues when overloading Drill native functions with dynamic UDFs > > > Key: DRILL-4963 > URL: https://issues.apache.org/jira/browse/DRILL-4963 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Fix For: Future > > Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, > test_overloading-1.0-sources.jar, test_overloading-1.0.jar > > > I created jar file which overloads 3 DRILL native functions > (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and > ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. > If I try to use my functions I will get errors: > {code:xml} > SELECT CURRENT_DATE('test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) > SQL Query null > {code:xml} > SELECT ABS('test','test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) > SQL Query null > {code:xml} > SELECT LOG('test') FROM (VALUES(1)); > {code} > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator LOG('test'). Errors: > Error in expression at index -1. Error: Missing function implementation: > castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. > But if I rerun all this queries after "DrillRuntimeException", they will run > correctly. It seems that Drill have not updated the function signature before > that error. Also if I add jar as usual UDF (copy jar to > /drill_home/jars/3rdparty and restart drillbits), all queries will run > correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
[ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766723#comment-15766723 ] Roman edited comment on DRILL-4963 at 12/21/16 10:42 AM: - Added jars "subquery_udf-1.0" from previous message. was (Author: romankulyk): Added jars > Issues when overloading Drill native functions with dynamic UDFs > > > Key: DRILL-4963 > URL: https://issues.apache.org/jira/browse/DRILL-4963 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, > test_overloading-1.0-sources.jar, test_overloading-1.0.jar > > > I created jar file which overloads 3 DRILL native functions > (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and > ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. > If I try to use my functions I will get errors: > {code:xml} > SELECT CURRENT_DATE('test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) > SQL Query null > {code:xml} > SELECT ABS('test','test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) > SQL Query null > {code:xml} > SELECT LOG('test') FROM (VALUES(1)); > {code} > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator LOG('test'). Errors: > Error in expression at index -1. Error: Missing function implementation: > castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. > But if I rerun all this queries after "DrillRuntimeException", they will run > correctly. It seems that Drill have not updated the function signature before > that error. Also if I add jar as usual UDF (copy jar to > /drill_home/jars/3rdparty and restart drillbits), all queries will run > correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
[ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman updated DRILL-4963: - Attachment: subquery_udf-1.0-sources.jar subquery_udf-1.0.jar Added jars > Issues when overloading Drill native functions with dynamic UDFs > > > Key: DRILL-4963 > URL: https://issues.apache.org/jira/browse/DRILL-4963 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Attachments: subquery_udf-1.0-sources.jar, subquery_udf-1.0.jar, > test_overloading-1.0-sources.jar, test_overloading-1.0.jar > > > I created jar file which overloads 3 DRILL native functions > (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and > ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. > If I try to use my functions I will get errors: > {code:xml} > SELECT CURRENT_DATE('test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) > SQL Query null > {code:xml} > SELECT ABS('test','test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) > SQL Query null > {code:xml} > SELECT LOG('test') FROM (VALUES(1)); > {code} > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator LOG('test'). Errors: > Error in expression at index -1. Error: Missing function implementation: > castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. > But if I rerun all this queries after "DrillRuntimeException", they will run > correctly. It seems that Drill have not updated the function signature before > that error. Also if I add jar as usual UDF (copy jar to > /drill_home/jars/3rdparty and restart drillbits), all queries will run > correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
[ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766721#comment-15766721 ] Roman commented on DRILL-4963: -- Found similar case with different error: Run query for first time: {code:sql} select subqueryudf(t1.first_name, t2.last_name) from cp.`employee.json` t1 inner join (select last_name, subqueryudf(first_name, last_name) as full_name from cp.`employee.json`) t2 on subqueryudf(t1.first_name, t1.last_name)=t2.full_name order by t1.employee_id limit 1; {code} Error: VALIDATION ERROR: From line 1, column 248 to line 1, column 249: Table 't1' not found SQL Query null And second time: {code:sql} select subqueryudf(t1.first_name, t2.last_name) from cp.`employee.json` t1 inner join (select last_name, subqueryudf(first_name, last_name) as full_name from cp.`employee.json`) t2 on subqueryudf(t1.first_name, t1.last_name)=t2.full_name order by t1.employee_id limit 1; {code} +---+ |EXPR$0 | +---+ | Sheri Nowmer | +---+ 1 row selected (0.3 seconds) > Issues when overloading Drill native functions with dynamic UDFs > > > Key: DRILL-4963 > URL: https://issues.apache.org/jira/browse/DRILL-4963 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > Attachments: test_overloading-1.0-sources.jar, > test_overloading-1.0.jar > > > I created jar file which overloads 3 DRILL native functions > (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and > ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. > If I try to use my functions I will get errors: > {code:xml} > SELECT CURRENT_DATE('test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) > SQL Query null > {code:xml} > SELECT ABS('test','test') FROM (VALUES(1)); > {code} > Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) > SQL Query null > {code:xml} > SELECT LOG('test') FROM (VALUES(1)); > {code} > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator LOG('test'). Errors: > Error in expression at index -1. Error: Missing function implementation: > castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. > But if I rerun all this queries after "DrillRuntimeException", they will run > correctly. It seems that Drill have not updated the function signature before > that error. Also if I add jar as usual UDF (copy jar to > /drill_home/jars/3rdparty and restart drillbits), all queries will run > correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables
[ https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766641#comment-15766641 ] ASF GitHub Bot commented on DRILL-5137: --- GitHub user spanchamiamapr opened a pull request: https://github.com/apache/drill/pull/700 DRILL-5137 - Optimize count(*) queries on MapR-DB Binary Tables This diff uses the same optimization as that for the rowKeyOnly queries. We use the FirstKeyOnlyFilter for count(*) queries. This fix will optimize these queries for HBase tables, as well as MapR-DB Binary tables. You can merge this pull request into a Git repository by running: $ git pull https://github.com/spanchamiamapr/drill drill-5137 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/700.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #700 commit 52765e5420be0db5e12b68d161ac06ea1855d3a0 Author: Smidth Panchamia Date: 2016-12-21T09:53:21Z DRILL-5137 - This diff uses the same optimization as that for the rowKeyOnly queries. We use the FirstKeyOnlyFilter for count(*) queries. This fix will optimize these queries for HBase tables, as well as MapR-DB Binary tables. > Optimize count(*) queries on MapR-DB Binary Tables > -- > > Key: DRILL-5137 > URL: https://issues.apache.org/jira/browse/DRILL-5137 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase, Storage - MapRDB >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > This is related to DRILL-5065, but applies to MapR-DB Binary tables -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables
[ https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766603#comment-15766603 ] ASF GitHub Bot commented on DRILL-5137: --- Github user spanchamiamapr commented on the issue: https://github.com/apache/drill/pull/699 Closing this pull request since it is showing unnecessary changes too. > Optimize count(*) queries on MapR-DB Binary Tables > -- > > Key: DRILL-5137 > URL: https://issues.apache.org/jira/browse/DRILL-5137 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase, Storage - MapRDB >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > This is related to DRILL-5065, but applies to MapR-DB Binary tables -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables
[ https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766604#comment-15766604 ] ASF GitHub Bot commented on DRILL-5137: --- Github user spanchamiamapr closed the pull request at: https://github.com/apache/drill/pull/699 > Optimize count(*) queries on MapR-DB Binary Tables > -- > > Key: DRILL-5137 > URL: https://issues.apache.org/jira/browse/DRILL-5137 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase, Storage - MapRDB >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > This is related to DRILL-5065, but applies to MapR-DB Binary tables -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5137) Optimize count(*) queries on MapR-DB Binary Tables
[ https://issues.apache.org/jira/browse/DRILL-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766601#comment-15766601 ] ASF GitHub Bot commented on DRILL-5137: --- GitHub user spanchamiamapr opened a pull request: https://github.com/apache/drill/pull/699 DRILL-5137 - Optimize count(*) queries for MapR-DB Binary Tables This diff uses the same optimization as that for the rowKeyOnly queries. We use the FirstKeyOnlyFilter for count(*) queries. This fix will optimize these queries for HBase tables, as well as MapR-DB Binary tables. You can merge this pull request into a Git repository by running: $ git pull https://github.com/spanchamiamapr/drill md1230 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/699.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #699 commit 3d12a9526365bcc96c605fc30f3b6fbe4961b97d Author: Smidth Panchamia Date: 2016-11-28T21:59:34Z DRILL-5065 - Optimize count( * ) queries on MapR-DB JSON Tables In MapR-DB v5.2.0, we enabled '_id' only projection for JSON tables. Hence, we can now optimize the following queries: a. count(*) by projecting only the '_id' column. b. '_id' only projections, including count(_id) commit 6b5923e383e2add4b3042248e5f389d32936a5b7 Author: Smidth Panchamia Date: 2016-11-29T07:48:36Z Change the format plugin config parameter name. commit 8897b8fb15eb76c4b89270456b6b4d86696e7b38 Author: Smidth Panchamia Date: 2016-11-28T21:59:34Z DRILL-5065 - Optimize count( * ) queries on MapR-DB JSON Tables In MapR-DB v5.2.0, we enabled '_id' only projection for JSON tables. Hence, we can now optimize the following queries: a. count(*) by projecting only the '_id' column. b. '_id' only projections, including count(_id) Change the format plugin config parameter name. commit d1d05e8bd4b96fd1bd144c50ddf20466a4bfda16 Author: Smidth Panchamia Date: 2016-12-05T18:55:14Z Merge branch 'master' of github.com:spanchamiamapr/drill commit ab8e728d7ed3fcd8c27243462bea42b7d2ea29c5 Author: Smidth Panchamia Date: 2016-12-14T20:01:06Z Merge remote-tracking branch 'upstream/master' Conflicts: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPluginConfig.java commit c84464f8fd099cb17a573faa4310a70eada3b49e Author: Smidth Panchamia Date: 2016-12-20T07:10:17Z Merge remote-tracking branch 'upstream/master' commit 72bd06c744dd71ad9a1aa87c91dabcd89eb4fa36 Author: Smidth Panchamia Date: 2016-12-21T09:29:45Z DRILL-5137 - Optimize count(*) queries on MapR-DB Binary tables > Optimize count(*) queries on MapR-DB Binary Tables > -- > > Key: DRILL-5137 > URL: https://issues.apache.org/jira/browse/DRILL-5137 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase, Storage - MapRDB >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > This is related to DRILL-5065, but applies to MapR-DB Binary tables -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5140) CTAS that does SELECT over 5003 columns fails with CompileException
[ https://issues.apache.org/jira/browse/DRILL-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-5140: -- Priority: Critical (was: Major) > CTAS that does SELECT over 5003 columns fails with CompileException > --- > > Key: DRILL-5140 > URL: https://issues.apache.org/jira/browse/DRILL-5140 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Priority: Critical > Attachments: drill_5117.q, manyColumns.csv > > > CTAS that does SELECT over 5003 columns fails with CompileException: File > 'org.apache.drill.exec.compile.DrillJavaFileObject... > Drill 1.9.0 git commit ID : 4c1b420b > CTAS statement and CSV data file are attached. > I ran test with and without setting the below system option, test failed in > both cases. > alter system set `exec.java_compiler`='JDK'; > sqlline session just closes with below message, after the failing CTAS is > executed. > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > Stack trace from drillbit.log > {noformat} > 2016-12-20 12:02:16,016 [27a6e241-99b1-1f2a-8a91-394f8166e969:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File > 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen45.java]', > Line 11, Column 8: ProjectorGen45.java:11: error: too many constants > public class ProjectorGen45 { >^ (compiler.err.limit.pool) > Fragment 0:0 > [Error Id: ced84dce-669d-47c2-b5d2-5e0559dbd9fd on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > CompileException: File > 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen45.java]', > Line 11, Column 8: ProjectorGen45.java:11: error: too many constants > public class ProjectorGen45 { >^ (compiler.err.limit.pool) > Fragment 0:0 > [Error Id: ced84dce-669d-47c2-b5d2-5e0559dbd9fd on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293) > [drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262) > [drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.9.0.jar:1.9.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_91] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure > while attempting to load generated class > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:487) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91) > ~[drill-ja
[jira] [Closed] (DRILL-4674) Allow casting to boolean the same literals as in Postgre
[ https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz closed DRILL-4674. - > Allow casting to boolean the same literals as in Postgre > > > Key: DRILL-4674 > URL: https://issues.apache.org/jira/browse/DRILL-4674 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.7.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.9.0 > > > Drill does not return results when we try to cast 0 and 1 to boolean inside a > value constructor. > Drill version : 1.7.0-SNAPSHOT commit ID : 09b26277 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1 > Fragment 0:0 > [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > (state=,code=0) > {noformat} > Where as we get results on Postgres for same query. > {noformat} > postgres=# values(cast(1 as boolean)); > column1 > - > t > (1 row) > postgres=# values(cast(0 as boolean)); > column1 > - > f > (1 row) > {noformat} > Stack trace from drillbit.log > {noformat} > 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: > Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalArgumentException: Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0 > at > org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.
[jira] [Commented] (DRILL-4674) Allow casting to boolean the same literals as in Postgre
[ https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766475#comment-15766475 ] Khurram Faraaz commented on DRILL-4674: --- Verified, tests are added here https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Functional/case_expr/castTo*.q > Allow casting to boolean the same literals as in Postgre > > > Key: DRILL-4674 > URL: https://issues.apache.org/jira/browse/DRILL-4674 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.7.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.9.0 > > > Drill does not return results when we try to cast 0 and 1 to boolean inside a > value constructor. > Drill version : 1.7.0-SNAPSHOT commit ID : 09b26277 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1 > Fragment 0:0 > [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > (state=,code=0) > {noformat} > Where as we get results on Postgres for same query. > {noformat} > postgres=# values(cast(1 as boolean)); > column1 > - > t > (1 row) > postgres=# values(cast(0 as boolean)); > column1 > - > f > (1 row) > {noformat} > Stack trace from drillbit.log > {noformat} > 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: > Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalArgumentException: Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0 > at > org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment
[jira] [Commented] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766463#comment-15766463 ] ASF GitHub Bot commented on DRILL-5132: --- Github user nagarajanchinnasamy commented on a diff in the pull request: https://github.com/apache/drill/pull/685#discussion_r93394114 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java --- @@ -255,11 +257,12 @@ void disableReadTimeout() { getChannel().pipeline().remove(BasicServer.TIMEOUT_HANDLER); } -void setUser(final UserToBitHandshake inbound) throws IOException { +void setUser(final UserToBitHandshake inbound, Map sessionParams) throws IOException { --- End diff -- @sudheeshkatkam I had created this few days back. But updated details with more clarity now. The ticket is: [DRILL-5132](https://issues.apache.org/jira/browse/DRILL-5132). Pls let me know your views. > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy > > *Requirement* > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in support can bring > in dynamism into the whole game.* > This feature can be utilized for: > * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant > Discriminator Column > * *Data Protection in building Chained Views* with Custom Dynamic Filters > To explain this further, If we assume that: > # As and when the user connection is established, we populate session context > with session parameters such as: > #* Tenant ID of the currently logged in user > #* Roles of the currently logged in user > # We expose the session context information through context-based-functions > such as: > #* *session_id* -- that returns unique id of the session > #* *session_parameter('')* - that returns the value of the > session parameter > then a view created with the following kind of query: > {code} > create or replace view dynamic_filter_view as select >a.field as a_field >b.field as b_field > from >a_table as a > left join >b_table as b > on >a.bId = b.Id > where >session_parameter('tenantId')=a.tenantId > {code} > becomes a query that has built-in support for dynamic parameterization that > only returns records of the tenant of the currently logged in user. This is a > very useful feature in a shared-multi-tenant environment where data is > isolated using multi-tenant-descriminator column 'tenantId'. > When building chained views this feature will be useful in filtering records > based on context based parameters. > This feature will particularly be useful for data isolation / data protection > with *jdbc storage plugins* where drill-authenticated-credentials are not > passed to jdbc connection authentication. A jdbc storage has hard-coded, > shared credentials. Hence the the responsibility of data isolation / data > protection lies with Views themselves. Hence, the need for built-in support > of context based dynamic parameters in Views. > *Design/Implementation Considerations:* > * Session parameters can be obtained through authenticators so that custom > authenticators can return a HashMap of parameters obtained from external > systems. > * Introduce SessionContext to hold sessionId and sessionParameters > * Implement context based functions session_id and session_parameter() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nagarajan Chinnasamy updated DRILL-5132: Description: *Requirement* Its known that Views in SQL cannot have custom dynamic parameters/variables. Please refer to [Justin Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to [this SO question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] in handling dynamic parameterization of views. [The PR #685|https://github.com/apache/drill/pull/685] [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] originated based on this requirement so that we could build views that can dynamically filter records based on some dynamic values (like current tenant-id, user role etc.) *Since Drill's basic unit is a View... having such built-in support can bring in dynamism into the whole game.* This feature can be utilized for: * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant Discriminator Column * *Data Protection in building Chained Views* with Custom Dynamic Filters To explain this further, If we assume that: # As and when the user connection is established, we populate session context with session parameters such as: #* Tenant ID of the currently logged in user #* Roles of the currently logged in user # We expose the session context information through context-based-functions such as: #* *session_id* -- that returns unique id of the session #* *session_parameter('')* - that returns the value of the session parameter then a view created with the following kind of query: {code} create or replace view dynamic_filter_view as select a.field as a_field b.field as b_field from a_table as a left join b_table as b on a.bId = b.Id where session_parameter('tenantId')=a.tenantId {code} becomes a query that has built-in support for dynamic parameterization that only returns records of the tenant of the currently logged in user. This is a very useful feature in a shared-multi-tenant environment where data is isolated using multi-tenant-descriminator column 'tenantId'. When building chained views this feature will be useful in filtering records based on context based parameters. This feature will particularly be useful for data isolation / data protection with *jdbc storage plugins* where drill-authenticated-credentials are not passed to jdbc connection authentication. A jdbc storage has hard-coded, shared credentials. Hence the the responsibility of data isolation / data protection lies with Views themselves. Hence, the need for built-in support of context based dynamic parameters in Views. *Design/Implementation Considerations:* * Session parameters can be obtained through authenticators so that custom authenticators can return a HashMap of parameters obtained from external systems. * Introduce SessionContext to hold sessionId and sessionParameters * Implement context based functions session_id and session_parameter() was: Its known that Views in SQL cannot have custom dynamic parameters/variables. Please refer to [Justin Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to [this SO question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] in handling dynamic parameterization of views. [The PR #685|https://github.com/apache/drill/pull/685] [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] originated based on this requirement so that we could build views that can dynamically filter records based on some dynamic values (like current tenant-id, user role etc.) *Since Drill's basic unit is a View... having such built-in support can bring in dynamism into the whole game.* This feature can be utilized for: * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant Discriminator Column * *Data Protection in building Chained Views* with Custom Dynamic Filters To explain this further, If we assume that: # As and when the user connection is established, we populate session context with session parameters such as: #* Tenant ID of the currently logged in user #* Roles of the currently logged in user # We expose the session context information through context-based-functions such as: #* *session_id* -- that returns unique id of the session #* *session_parameter('')* - that returns the value of the session parameter then a view created with the following kind of query: {code} create or replace view dynamic_filter_view as select a.field as a_field b.field as b_field from a_table as a left join b_table as b on a.bId = b.Id where session_parameter('tenantId')=a.tenantId {code} becomes a query that has built-in support for dynamic parameterization that only returns records of the tenant of the currently logged in user
[jira] [Issue Comment Deleted] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nagarajan Chinnasamy updated DRILL-5132: Comment: was deleted (was: This feature will particularly be useful for data isolation / data protection with *jdbc storage plugins* where drill-authenticated-credentials are not used for jdbc connection authentication (like in MapR-DB). A jdbc storage has hard-coded, shared credentials. Hence the the responsibility of data isolation / data protection lies with Views themselves. Hence, the need for built-in support of context based dynamic filtering in Views.) > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy > > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in support can bring > in dynamism into the whole game.* > This feature can be utilized for: > * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant > Discriminator Column > * *Data Protection in building Chained Views* with Custom Dynamic Filters > To explain this further, If we assume that: > # As and when the user connection is established, we populate session context > with session parameters such as: > #* Tenant ID of the currently logged in user > #* Roles of the currently logged in user > # We expose the session context information through context-based-functions > such as: > #* *session_id* -- that returns unique id of the session > #* *session_parameter('')* - that returns the value of the > session parameter > then a view created with the following kind of query: > {code} > create or replace view dynamic_filter_view as select >a.field as a_field >b.field as b_field > from >a_table as a > left join >b_table as b > on >a.bId = b.Id > where >session_parameter('tenantId')=a.tenantId > {code} > becomes a query that has built-in support for dynamic parameterization that > only returns records of the tenant of the currently logged in user. This is a > very useful feature in a shared-multi-tenant environment where data is > isolated using multi-tenant-descriminator column 'tenantId'. > When building chained views this feature will be useful in filtering records > based on context based parameters. > This feature will particularly be useful for data isolation / data protection > with *jdbc storage plugins* where drill-authenticated-credentials are not > passed to jdbc connection authentication. A jdbc storage has hard-coded, > shared credentials. Hence the the responsibility of data isolation / data > protection lies with Views themselves. Hence, the need for built-in support > of context based dynamic parameters in Views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nagarajan Chinnasamy updated DRILL-5132: Comment: was deleted (was: Some of the thoughts on design considerations: # authenticator must be given a chance to populate context values #* Generally context values are loaded immediately after (or as the part of) authentication process #* Custom authenticators can load custom context values as a result of authentication process # If custom authenticators can add values to context, then we need to have a mechanism to make the context variables to be unique so that they don't clash with pre-defined system context variables # Revise the design of "context" class so that it can hold both system defined and custom defined variables. # Change [DRILL-4956|https://issues.apache.org/jira/browse/DRILL-4956] and [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043] that assume that UserSession is the place to generate session id (which is very much one of the context values) to accommodate externally generated session_id. #* *session_id* can be provided by an external authenticator. Accommodating externally generated session_id (with unique prefix) will help better co-ordination with external systems that provide custom authentication and context values.) > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy > > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in support can bring > in dynamism into the whole game.* > This feature can be utilized for: > * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant > Discriminator Column > * *Data Protection in building Chained Views* with Custom Dynamic Filters > To explain this further, If we assume that: > # As and when the user connection is established, we populate session context > with session parameters such as: > #* Tenant ID of the currently logged in user > #* Roles of the currently logged in user > # We expose the session context information through context-based-functions > such as: > #* *session_id* -- that returns unique id of the session > #* *session_parameter('')* - that returns the value of the > session parameter > then a view created with the following kind of query: > {code} > create or replace view dynamic_filter_view as select >a.field as a_field >b.field as b_field > from >a_table as a > left join >b_table as b > on >a.bId = b.Id > where >session_parameter('tenantId')=a.tenantId > {code} > becomes a query that has built-in support for dynamic parameterization that > only returns records of the tenant of the currently logged in user. This is a > very useful feature in a shared-multi-tenant environment where data is > isolated using multi-tenant-descriminator column 'tenantId'. > When building chained views this feature will be useful in filtering records > based on context based parameters. > This feature will particularly be useful for data isolation / data protection > with *jdbc storage plugins* where drill-authenticated-credentials are not > passed to jdbc connection authentication. A jdbc storage has hard-coded, > shared credentials. Hence the the responsibility of data isolation / data > protection lies with Views themselves. Hence, the need for built-in support > of context based dynamic parameters in Views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nagarajan Chinnasamy updated DRILL-5132: Description: Its known that Views in SQL cannot have custom dynamic parameters/variables. Please refer to [Justin Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to [this SO question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] in handling dynamic parameterization of views. [The PR #685|https://github.com/apache/drill/pull/685] [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] originated based on this requirement so that we could build views that can dynamically filter records based on some dynamic values (like current tenant-id, user role etc.) *Since Drill's basic unit is a View... having such built-in support can bring in dynamism into the whole game.* This feature can be utilized for: * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant Discriminator Column * *Data Protection in building Chained Views* with Custom Dynamic Filters To explain this further, If we assume that: # As and when the user connection is established, we populate session context with session parameters such as: #* Tenant ID of the currently logged in user #* Roles of the currently logged in user # We expose the session context information through context-based-functions such as: #* *session_id* -- that returns unique id of the session #* *session_parameter('')* - that returns the value of the session parameter then a view created with the following kind of query: {code} create or replace view dynamic_filter_view as select a.field as a_field b.field as b_field from a_table as a left join b_table as b on a.bId = b.Id where session_parameter('tenantId')=a.tenantId {code} becomes a query that has built-in support for dynamic parameterization that only returns records of the tenant of the currently logged in user. This is a very useful feature in a shared-multi-tenant environment where data is isolated using multi-tenant-descriminator column 'tenantId'. When building chained views this feature will be useful in filtering records based on context based parameters. This feature will particularly be useful for data isolation / data protection with *jdbc storage plugins* where drill-authenticated-credentials are not passed to jdbc connection authentication. A jdbc storage has hard-coded, shared credentials. Hence the the responsibility of data isolation / data protection lies with Views themselves. Hence, the need for built-in support of context based dynamic parameters in Views. was: Its known that Views in SQL cannot have dynamic parameters/variables. Please refer to [Justin Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to [this SO question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] in handling dynamic parameterization of views. [The PR #685|https://github.com/apache/drill/pull/685] [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] originated based on this requirement so that we could build views that can dynamically filter records based on some dynamic values (like current tenant-id, user role etc.) *Since Drill's basic unit is a View... having such built-in support can bring in dynamism into the whole game.* This feature can be utilized for: * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant Discriminator Column * *Data Protection in building Chained Views* with Custom Dynamic Filters I will post further design details in the comments > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy > > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in
[jira] [Issue Comment Deleted] (DRILL-5132) Context based dynamic parameterization of views
[ https://issues.apache.org/jira/browse/DRILL-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nagarajan Chinnasamy updated DRILL-5132: Comment: was deleted (was: Lets say we have a *pre-defined documented place* where a *session based temporary table* named *context* is created with the following columns: {code} session_id, context_type, context_key, context_value {code} and say this context table is transparently populated with context based values *as and when a user connection (session) is established* then a view created with the following kind of query: {code} create or replace view dynamic_filter_view as select a.field as a_field b.field as b_field from a_table as a left join b_table as b on a.bId = b.Id inner join context c on c.session_id=session_id() and c.context_type='custom' and c.context_key='tenandId" and c.context_value=a.tenantId {code} This becomes a query that has built-in support for dynamic parameterization that only exposes records of the current tenantId of the current context. The purpose of context_type column is to inject system defined context values and custom context values. Custom context values can be obtained through a *custom-context-provider* (like custom-authenticator) System defined context_types can be *drill.system*, *drill.query* etc. Does that sound elegant and sensible?? :)) > Context based dynamic parameterization of views > --- > > Key: DRILL-5132 > URL: https://issues.apache.org/jira/browse/DRILL-5132 > Project: Apache Drill > Issue Type: Wish > Components: Server >Reporter: Nagarajan Chinnasamy >Priority: Critical > Labels: authentication, context, isolation, jdbcstorage, > multi-tenancy > > Its known that Views in SQL cannot have custom dynamic parameters/variables. > Please refer to [Justin > Swanhart|http://stackoverflow.com/users/679236/justin-swanhart]'s response to > [this SO > question|http://stackoverflow.com/questions/2281890/can-i-create-view-with-parameter-in-mysql] > in handling dynamic parameterization of views. > [The PR #685|https://github.com/apache/drill/pull/685] > [DRILL-5043|https://issues.apache.org/jira/browse/DRILL-5043?filter=-2] > originated based on this requirement so that we could build views that can > dynamically filter records based on some dynamic values (like current > tenant-id, user role etc.) > *Since Drill's basic unit is a View... having such built-in support can bring > in dynamism into the whole game.* > This feature can be utilized for: > * *Data Isolation in Shared Multi-Tenant environments* based on Custom Tenant > Discriminator Column > * *Data Protection in building Chained Views* with Custom Dynamic Filters > To explain this further, If we assume that: > # As and when the user connection is established, we populate session context > with session parameters such as: > #* Tenant ID of the currently logged in user > #* Roles of the currently logged in user > # We expose the session context information through context-based-functions > such as: > #* *session_id* -- that returns unique id of the session > #* *session_parameter('')* - that returns the value of the > session parameter > then a view created with the following kind of query: > {code} > create or replace view dynamic_filter_view as select >a.field as a_field >b.field as b_field > from >a_table as a > left join >b_table as b > on >a.bId = b.Id > where >session_parameter('tenantId')=a.tenantId > {code} > becomes a query that has built-in support for dynamic parameterization that > only returns records of the tenant of the currently logged in user. This is a > very useful feature in a shared-multi-tenant environment where data is > isolated using multi-tenant-descriminator column 'tenantId'. > When building chained views this feature will be useful in filtering records > based on context based parameters. > This feature will particularly be useful for data isolation / data protection > with *jdbc storage plugins* where drill-authenticated-credentials are not > passed to jdbc connection authentication. A jdbc storage has hard-coded, > shared credentials. Hence the the responsibility of data isolation / data > protection lies with Views themselves. Hence, the need for built-in support > of context based dynamic parameters in Views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)