[jira] [Updated] (DRILL-5699) Drill Web UI Page Source Has Links To External Sites
[ https://issues.apache.org/jira/browse/DRILL-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sindhuri Ramanarayan Rayavaram updated DRILL-5699: -- Labels: ready-to-commit (was: ) > Drill Web UI Page Source Has Links To External Sites > > > Key: DRILL-5699 > URL: https://issues.apache.org/jira/browse/DRILL-5699 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Reporter: Sindhuri Ramanarayan Rayavaram >Assignee: Sindhuri Ramanarayan Rayavaram >Priority: Minor > Labels: ready-to-commit > Fix For: 1.12.0 > > > Drill uses external CDN for javascript and css files in the result page. When > there is no internet connection this page fails to load. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-4211) Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned.
[ https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-4211: -- Summary: Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned. (was: Inconsistent results from a joined sql statement to postgres tables) > Column aliases not pushed down to JDBC stores in some cases when Drill > expects aliased columns to be returned. > -- > > Key: DRILL-4211 > URL: https://issues.apache.org/jira/browse/DRILL-4211 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.3.0, 1.11.0 > Environment: Postgres db stroage >Reporter: Robert Hamilton-Smith >Assignee: Timothy Farkas > Labels: newbie > > When making an sql statement that incorporates a join to a table and then a > self join to that table to get a parent value , Drill brings back > inconsistent results. > Here is the sql in postgres with correct output: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from transactions trx > join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join categories w1 on (cat.categoryparentguid = w1.categoryguid) > where cat.categoryparentguid IS NOT NULL; > {code} > Output: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|food| > |id1|restaurants|food| > |id2|Coffee Shops|food| > |id2|Coffee Shops|food| > When run in Drill with correct storage prefix: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from db.schema.transactions trx > join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join db.schema.wpfm_categories w1 on (cat.categoryparentguid = > w1.categoryguid) > where cat.categoryparentguid IS NOT NULL > {code} > Results are: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|null| > |id1|restaurants|null| > |id2|Coffee Shops|null| > |id2|Coffee Shops|null| > Physical plan is: > {code:sql} > 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) > categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = > {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293 > 00-01 Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : > rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292 > 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) > : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291 > 00-03 Jdbc(sql=[SELECT * > FROM "public"."transactions" > INNER JOIN (SELECT * > FROM "public"."categories" > WHERE "categoryparentguid" IS NOT NULL) AS "t" ON > "transactions"."categoryguid" = "t"."categoryguid" > INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" > = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) > transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) > transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) > transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, > VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, > VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) > transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) > transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) > transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) > transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, > VARCHAR(50) transactionorigpartyguid, VARCHAR(255) > transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) > transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, > 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) > transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) > transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) > transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, > VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, > TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) > transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, > VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) > categoryguid0, VARCHAR(50)
[jira] [Commented] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables
[ https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120887#comment-16120887 ] Timothy Farkas commented on DRILL-4211: --- There seem to be two things going wrong here: # Doing joins on two tables that have columns with the same names causes incorrect nulls in the results DRILL-5713 # Column aliases are not pushed down to the JDBCReader, so the columns returned from postgres have the original names. The Project operators in Drill then proceed to use the aliased column names to do operations on batches. The aliased names ofcourse don't exist in the schema, so nulls are returned for the aliased column. I'll make this ticket focus on the aliasing issue, and I'll fix the column name conflict issue in DRILL-5713 > Inconsistent results from a joined sql statement to postgres tables > --- > > Key: DRILL-4211 > URL: https://issues.apache.org/jira/browse/DRILL-4211 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.3.0, 1.11.0 > Environment: Postgres db stroage >Reporter: Robert Hamilton-Smith >Assignee: Timothy Farkas > Labels: newbie > > When making an sql statement that incorporates a join to a table and then a > self join to that table to get a parent value , Drill brings back > inconsistent results. > Here is the sql in postgres with correct output: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from transactions trx > join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join categories w1 on (cat.categoryparentguid = w1.categoryguid) > where cat.categoryparentguid IS NOT NULL; > {code} > Output: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|food| > |id1|restaurants|food| > |id2|Coffee Shops|food| > |id2|Coffee Shops|food| > When run in Drill with correct storage prefix: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from db.schema.transactions trx > join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join db.schema.wpfm_categories w1 on (cat.categoryparentguid = > w1.categoryguid) > where cat.categoryparentguid IS NOT NULL > {code} > Results are: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|null| > |id1|restaurants|null| > |id2|Coffee Shops|null| > |id2|Coffee Shops|null| > Physical plan is: > {code:sql} > 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) > categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = > {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293 > 00-01 Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : > rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292 > 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) > : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291 > 00-03 Jdbc(sql=[SELECT * > FROM "public"."transactions" > INNER JOIN (SELECT * > FROM "public"."categories" > WHERE "categoryparentguid" IS NOT NULL) AS "t" ON > "transactions"."categoryguid" = "t"."categoryguid" > INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" > = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) > transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) > transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) > transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, > VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, > VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) > transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) > transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) > transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) > transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, > VARCHAR(50) transactionorigpartyguid, VARCHAR(255) > transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) > transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, > 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) > transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) > transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) > transactionmatrulerefobjguid, VARCHAR(50)
[jira] [Updated] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables
[ https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-4211: -- Affects Version/s: 1.11.0 > Inconsistent results from a joined sql statement to postgres tables > --- > > Key: DRILL-4211 > URL: https://issues.apache.org/jira/browse/DRILL-4211 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.3.0, 1.11.0 > Environment: Postgres db stroage >Reporter: Robert Hamilton-Smith >Assignee: Timothy Farkas > Labels: newbie > > When making an sql statement that incorporates a join to a table and then a > self join to that table to get a parent value , Drill brings back > inconsistent results. > Here is the sql in postgres with correct output: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from transactions trx > join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join categories w1 on (cat.categoryparentguid = w1.categoryguid) > where cat.categoryparentguid IS NOT NULL; > {code} > Output: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|food| > |id1|restaurants|food| > |id2|Coffee Shops|food| > |id2|Coffee Shops|food| > When run in Drill with correct storage prefix: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from db.schema.transactions trx > join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join db.schema.wpfm_categories w1 on (cat.categoryparentguid = > w1.categoryguid) > where cat.categoryparentguid IS NOT NULL > {code} > Results are: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|null| > |id1|restaurants|null| > |id2|Coffee Shops|null| > |id2|Coffee Shops|null| > Physical plan is: > {code:sql} > 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) > categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = > {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293 > 00-01 Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : > rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292 > 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) > : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291 > 00-03 Jdbc(sql=[SELECT * > FROM "public"."transactions" > INNER JOIN (SELECT * > FROM "public"."categories" > WHERE "categoryparentguid" IS NOT NULL) AS "t" ON > "transactions"."categoryguid" = "t"."categoryguid" > INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" > = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) > transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) > transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) > transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, > VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, > VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) > transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) > transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) > transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) > transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, > VARCHAR(50) transactionorigpartyguid, VARCHAR(255) > transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) > transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, > 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) > transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) > transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) > transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, > VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, > TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) > transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, > VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) > categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, > VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) > partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, > DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) >
[jira] [Created] (DRILL-5713) Doing joins on tables that share column names in a JDBC store returns incorrect results
Timothy Farkas created DRILL-5713: - Summary: Doing joins on tables that share column names in a JDBC store returns incorrect results Key: DRILL-5713 URL: https://issues.apache.org/jira/browse/DRILL-5713 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.11.0 Environment: My mac running the latest drill in embedded mode. Reporter: Timothy Farkas Assignee: Timothy Farkas If there are two tables in Postgres that share column names, incorrect results are returned when a join is done between the two tables. For example if we have two tables: categories and categories2 with the following contents: +---+-+---+ | categoryguid | categoryparentguid | categoryname | +---+-+---+ | id1 | null| restaurants | | null | id1 | food | | id2 | null| Coffee Shops | | null | id2 | food | +---+-+---+ Then the following join query returns incorrectly names columns and incorrect null values: select cat.categoryname, cat2.categoryname from postgres.public.categories cat join postgres.public.categories2 cat2 on (cat.categoryguid = cat2.categoryguid) where cat.categoryguid IS NOT NULL; +---++ | categoryname | categoryname0 | +---++ | restaurants | null | | Coffee Shops | null | +---++ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5699) Drill Web UI Page Source Has Links To External Sites
[ https://issues.apache.org/jira/browse/DRILL-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120795#comment-16120795 ] ASF GitHub Bot commented on DRILL-5699: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/891 +1 > Drill Web UI Page Source Has Links To External Sites > > > Key: DRILL-5699 > URL: https://issues.apache.org/jira/browse/DRILL-5699 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Reporter: Sindhuri Ramanarayan Rayavaram >Assignee: Sindhuri Ramanarayan Rayavaram >Priority: Minor > Fix For: 1.12.0 > > > Drill uses external CDN for javascript and css files in the result page. When > there is no internet connection this page fails to load. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5712) Update the pom files with dependency exclusions for commons-codec
Sindhuri Ramanarayan Rayavaram created DRILL-5712: - Summary: Update the pom files with dependency exclusions for commons-codec Key: DRILL-5712 URL: https://issues.apache.org/jira/browse/DRILL-5712 Project: Apache Drill Issue Type: Bug Reporter: Sindhuri Ramanarayan Rayavaram Assignee: Sindhuri Ramanarayan Rayavaram In java-exec, we are adding a dependency for commons-codec of version 1.10. Other dependencies like hadoop-common, parquet-column etc are trying to download different versions for common codec. Exclusions should be added for common-codec in these dependencies. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5709) Provide a value vector method to convert a vector to nullable
[ https://issues.apache.org/jira/browse/DRILL-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120782#comment-16120782 ] ASF GitHub Bot commented on DRILL-5709: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/901 DRILL-5709: Provide a value vector method to convert a vector to nullable Please see the DRILL-5709 for an explanation and example. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5709 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #901 commit ced37d52d161daae333d6048150f92c181defa53 Author: Paul RogersDate: 2017-08-09T03:04:24Z DRILL-5709: Provide a value vector method to convert a vector to nullable Please see the DRILL-5709 for an explanation and example. > Provide a value vector method to convert a vector to nullable > - > > Key: DRILL-5709 > URL: https://issues.apache.org/jira/browse/DRILL-5709 > Project: Apache Drill > Issue Type: Improvement >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.12.0 > > > The hash agg spill work has need to convert a non-null scalar vector to the > nullable equivalent. For efficiency, the code wishes to simply transfer the > underlying data buffer(s), and create the required "bits" vector, rather than > generating code that does the transfer row-by-row. > The solution is to add a {{toNullable(ValueVector nullableVector)}} method to > the {{ValueVector}} class, then implement it where needed. > Since the target code only works with scalars (that is, no arrays, no maps, > no lists), the code only handles these cases, throwing an > {{UnsupportedOperationException}} in other cases. > Usage: > {code} > ValueVector nonNullableVector = // your non-nullable vector > MajorType type = MajorType.newBuilder(nonNullableVector.getType) > .setMode(DataMode.OPTIONAL) > .build(); > MaterializedField field = MaterializedField.create(name, type); > ValueVector nullableVector = TypeHelper.getNewVector(field, > oContext.getAllocator()); > nonNullableVector.toNullable(nullableVector); > // Data is now in nullableVector > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5663) Drillbit fails to start when only keystore path is provided without keystore password.
[ https://issues.apache.org/jira/browse/DRILL-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120752#comment-16120752 ] ASF GitHub Bot commented on DRILL-5663: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/874#discussion_r132323524 --- Diff: distribution/src/resources/drill-override-example.conf --- @@ -93,6 +93,13 @@ drill.exec: { credentials: true } }, + # Below SSL parameters need to be set for custom transport layer settings. + ssl{ +keyStore: "/keystore.file", --- End diff -- The names in this example file are not consistent with the names uses in ExecConstants. keyStore and trustStore instead of keyStorePath, trustStorePath. > Drillbit fails to start when only keystore path is provided without keystore > password. > -- > > Key: DRILL-5663 > URL: https://issues.apache.org/jira/browse/DRILL-5663 > Project: Apache Drill > Issue Type: Bug >Reporter: Sorabh Hamirwasia >Assignee: Sindhuri Ramanarayan Rayavaram > > When we configure keystore path without keystore password inside > drill-override.conf for WebServer, then Drillbit fails to start. We should > explicitly check for either both being present or both being absent. If any > one of them is only present then throw startup exception for Drill. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5711) Incorrect operator profiles for queries on json files
[ https://issues.apache.org/jira/browse/DRILL-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Nagaraj Subramanya updated DRILL-5711: - Description: 1) Join query on two json files {code} select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey; {code} 2) Check the query profile a) JSON_SUB_SCAN type incorrectly ordered b) Missing SCREEN type Attached 1) Two json files 2) Snapshot of query profile and operator profile Commit id - 9d1d815737528251a7500621cc976b57e7f3be59 was: 1) Join query on two json files {code} select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey; {code} 2) Check the query profile a) JSON_SUB_SCAN type incorrectly ordered b) Missing SCREEN type Attached 1) Two json files 2) Snapshot of query profile and operator profile > Incorrect operator profiles for queries on json files > - > > Key: DRILL-5711 > URL: https://issues.apache.org/jira/browse/DRILL-5711 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya > Attachments: OperatorProfiles.png, part.json, partsupp.json, > QueryProfile.png > > > 1) Join query on two json files > {code} > select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, > dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey; > {code} > 2) Check the query profile > a) JSON_SUB_SCAN type incorrectly ordered > b) Missing SCREEN type > Attached > 1) Two json files > 2) Snapshot of query profile and operator profile > Commit id - 9d1d815737528251a7500621cc976b57e7f3be59 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5698) Drill should start in embedded mode using java 1.8.0_144
[ https://issues.apache.org/jira/browse/DRILL-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120635#comment-16120635 ] ASF GitHub Bot commented on DRILL-5698: --- Github user darrenbrien closed the pull request at: https://github.com/apache/drill/pull/899 > Drill should start in embedded mode using java 1.8.0_144 > > > Key: DRILL-5698 > URL: https://issues.apache.org/jira/browse/DRILL-5698 > Project: Apache Drill > Issue Type: Bug >Reporter: Darren > Labels: ready-to-commit > Fix For: 1.12.0 > > > Currently the start up script > distribution/src/resources/drill-config.sh > prevents drill from starting as the regex incorrectly captures the 144 > portion of the version code > this is because the regex isn't escaped > -"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" > /dev/null > +"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" > > /dev/null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5711) Incorrect operator profiles for queries on json files
[ https://issues.apache.org/jira/browse/DRILL-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Nagaraj Subramanya updated DRILL-5711: - Description: 1) Join query on two json files {code} select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey; {code} 2) Check the query profile a) JSON_SUB_SCAN type incorrectly ordered b) Missing SCREEN type Attached 1) Two json files 2) Snapshot of query profile and operator profile was: 1) Join query on two json files {code} select ps.ps_suppkey from dfs.`testData/json/part` as p, dfs.`testData/json/partsupp` as ps where p.p_partkey = ps.ps_partkey; {code} 2) Check the query profile a) JSON_SUB_SCAN type incorrectly ordered b) Missing SCREEN type > Incorrect operator profiles for queries on json files > - > > Key: DRILL-5711 > URL: https://issues.apache.org/jira/browse/DRILL-5711 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya > Attachments: OperatorProfiles.png, part.json, partsupp.json, > QueryProfile.png > > > 1) Join query on two json files > {code} > select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, > dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey; > {code} > 2) Check the query profile > a) JSON_SUB_SCAN type incorrectly ordered > b) Missing SCREEN type > Attached > 1) Two json files > 2) Snapshot of query profile and operator profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5711) Incorrect operator profiles for queries on json files
[ https://issues.apache.org/jira/browse/DRILL-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Nagaraj Subramanya updated DRILL-5711: - Attachment: OperatorProfiles.png QueryProfile.png part.json partsupp.json > Incorrect operator profiles for queries on json files > - > > Key: DRILL-5711 > URL: https://issues.apache.org/jira/browse/DRILL-5711 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya > Attachments: OperatorProfiles.png, part.json, partsupp.json, > QueryProfile.png > > > 1) Join query on two json files > {code} > select ps.ps_suppkey from dfs.`testData/json/part` as p, > dfs.`testData/json/partsupp` as ps where p.p_partkey = ps.ps_partkey; > {code} > 2) Check the query profile > a) JSON_SUB_SCAN type incorrectly ordered > b) Missing SCREEN type -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5711) Incorrect operator profiles for queries on json files
Prasad Nagaraj Subramanya created DRILL-5711: Summary: Incorrect operator profiles for queries on json files Key: DRILL-5711 URL: https://issues.apache.org/jira/browse/DRILL-5711 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.11.0 Reporter: Prasad Nagaraj Subramanya 1) Join query on two json files {code} select ps.ps_suppkey from dfs.`testData/json/part` as p, dfs.`testData/json/partsupp` as ps where p.p_partkey = ps.ps_partkey; {code} 2) Check the query profile a) JSON_SUB_SCAN type incorrectly ordered b) Missing SCREEN type -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables
[ https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120601#comment-16120601 ] Timothy Farkas commented on DRILL-4211: --- I was not able to replicate the original query but I constructed a similar query which appears to show the same erronous behavoir. All the code I used to produce this is in this repo [https://github.com/ilooner-mapr/DRILL-4211] *Postgres query* select trx.categoryguid, cat.categoryname, w1.categoryname as parentcat from trx join categories cat on (cat.categoryguid = trx.categoryguid) join categories w1 on (cat.categoryguid = w1.categoryguid) where w1.categoryguid IS NOT NULL; categoryguid | categoryname | parentcat --+--+-- id1 | restaurants | restaurants id1 | restaurants | restaurants id2 | Coffee Shops | Coffee Shops id2 | Coffee Shops | Coffee Shops *Drill Query* select trx.categoryguid, cat.categoryname, w1.categoryname as parentcat from postgres.public.trx trx join postgres.public.categories cat on (cat.categoryguid = trx.categoryguid) join postgres.public.categories w1 on (cat.categoryguid = w1.categoryguid) where w1.categoryguid IS NOT NULL; +---+---++ | categoryguid | categoryname | parentcat | +---+---++ | id1 | restaurants | null | | id1 | restaurants | null | | id2 | Coffee Shops | null | | id2 | Coffee Shops | null | +---+---++ > Inconsistent results from a joined sql statement to postgres tables > --- > > Key: DRILL-4211 > URL: https://issues.apache.org/jira/browse/DRILL-4211 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.3.0 > Environment: Postgres db stroage >Reporter: Robert Hamilton-Smith >Assignee: Timothy Farkas > Labels: newbie > > When making an sql statement that incorporates a join to a table and then a > self join to that table to get a parent value , Drill brings back > inconsistent results. > Here is the sql in postgres with correct output: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from transactions trx > join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join categories w1 on (cat.categoryparentguid = w1.categoryguid) > where cat.categoryparentguid IS NOT NULL; > {code} > Output: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|food| > |id1|restaurants|food| > |id2|Coffee Shops|food| > |id2|Coffee Shops|food| > When run in Drill with correct storage prefix: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from db.schema.transactions trx > join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join db.schema.wpfm_categories w1 on (cat.categoryparentguid = > w1.categoryguid) > where cat.categoryparentguid IS NOT NULL > {code} > Results are: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|null| > |id1|restaurants|null| > |id2|Coffee Shops|null| > |id2|Coffee Shops|null| > Physical plan is: > {code:sql} > 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) > categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = > {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293 > 00-01 Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : > rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292 > 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) > : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291 > 00-03 Jdbc(sql=[SELECT * > FROM "public"."transactions" > INNER JOIN (SELECT * > FROM "public"."categories" > WHERE "categoryparentguid" IS NOT NULL) AS "t" ON > "transactions"."categoryguid" = "t"."categoryguid" > INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" > = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) > transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) > transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) > transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, > VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, > VARCHAR(256) transactiondescription,
[jira] [Updated] (DRILL-5703) Add Syntax Highlighting & Autocompletion to Query Form
[ https://issues.apache.org/jira/browse/DRILL-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-5703: - Flags: Patch Labels: UI (was: ) > Add Syntax Highlighting & Autocompletion to Query Form > -- > > Key: DRILL-5703 > URL: https://issues.apache.org/jira/browse/DRILL-5703 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Charles Givre >Assignee: Charles Givre > Labels: UI > > The UI could really benefit from having syntax highlighting and > autocompletion in the query window as well as the form to update storage > plugins. This PR adds that capability to the query form using the Ace code > editor (https://ace.c9.io). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5703) Add Syntax Highlighting & Autocompletion to Query Form
[ https://issues.apache.org/jira/browse/DRILL-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120089#comment-16120089 ] Charles Givre commented on DRILL-5703: -- I closed the original PR and resubmitted a clean PR here: https://github.com/apache/drill/pull/897. > Add Syntax Highlighting & Autocompletion to Query Form > -- > > Key: DRILL-5703 > URL: https://issues.apache.org/jira/browse/DRILL-5703 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Charles Givre >Assignee: Charles Givre > > The UI could really benefit from having syntax highlighting and > autocompletion in the query window as well as the form to update storage > plugins. This PR adds that capability to the query form using the Ace code > editor (https://ace.c9.io). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5703) Add Syntax Highlighting & Autocompletion to Query Form
[ https://issues.apache.org/jira/browse/DRILL-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre reassigned DRILL-5703: Assignee: Charles Givre > Add Syntax Highlighting & Autocompletion to Query Form > -- > > Key: DRILL-5703 > URL: https://issues.apache.org/jira/browse/DRILL-5703 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.11.0 >Reporter: Charles Givre >Assignee: Charles Givre > > The UI could really benefit from having syntax highlighting and > autocompletion in the query window as well as the form to update storage > plugins. This PR adds that capability to the query form using the Ace code > editor (https://ace.c9.io). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5708) Add DNS decode function for PCAP storage
[ https://issues.apache.org/jira/browse/DRILL-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre reassigned DRILL-5708: Assignee: Charles Givre > Add DNS decode function for PCAP storage > > > Key: DRILL-5708 > URL: https://issues.apache.org/jira/browse/DRILL-5708 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Reporter: Takeo Ogawara >Assignee: Charles Givre >Priority: Minor > > As described in DRILL-5432, it is very useful to analyze packet contents and > application layer protocols. To improve the PCAP analysis function, it's > better to add a function to decode DNS queries and responses. This enables to > classify packets by FQDN and display user access trends. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-3827) Empty metadata file causes queries on the table to fail
[ https://issues.apache.org/jira/browse/DRILL-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka reassigned DRILL-3827: -- Assignee: Vitalii Diravka (was: Parth Chandra) > Empty metadata file causes queries on the table to fail > --- > > Key: DRILL-3827 > URL: https://issues.apache.org/jira/browse/DRILL-3827 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Vitalii Diravka >Priority: Critical > Fix For: Future > > > I ran into a situation where drill created an empty metadata file (which is a > separate issue and I will try to narrow it down. Suspicion is that this > happens when "refresh table metada x" fails with "permission denied" error). > However, we need to guard against situation where metadata file is empty or > corrupted. We probably should skip reading it if we encounter unexpected > result and continue with query planning without that information. In the same > fashion as partition pruning failure. It's also important to log this > information somewhere, drillbit.log as a start. It would be really nice to > have a flag in the query profile that tells a user if we used metadata file > for planning or not. Will help in debugging performance issues. > Very confusing exception is thrown if you have zero length meta data file in > the directory: > {code} > [Wed Sep 23 07:45:28] # ls -la > total 2 > drwxr-xr-x 2 root root 2 Sep 10 14:55 . > drwxr-xr-x 16 root root 35 Sep 15 12:54 .. > -rwxr-xr-x 1 root root 483 Jul 1 11:29 0_0_0.parquet > -rwxr-xr-x 1 root root 0 Sep 10 14:55 .drill.parquet_metadata > 0: jdbc:drill:schema=dfs> select * from t1; > Error: SYSTEM ERROR: JsonMappingException: No content to map due to > end-of-input > at [Source: com.mapr.fs.MapRFsDataInputStream@342bd88d; line: 1, column: 1] > [Error Id: c97574f6-b3e8-4183-8557-c30df6ca675f on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > Workaround is trivial, remove the file. Marking it as critical, since we > don't have any concurrency control in place and this file can get corrupted > as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-3829) Metadata Caching : Drill should ignore a corrupted cache file
[ https://issues.apache.org/jira/browse/DRILL-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka reassigned DRILL-3829: -- Assignee: Vitalii Diravka > Metadata Caching : Drill should ignore a corrupted cache file > - > > Key: DRILL-3829 > URL: https://issues.apache.org/jira/browse/DRILL-3829 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=3c89b30 > Drill should validate the cache file structure and ignore it if it detects > any corruption to its contents. > I placed an empty cache file in the directory and executed a count(*) query > on top of the directory. Below is what I got > {code} > select count(*) from dfs.`/drill/testdata/metadata_caching/lineitem`; > Error: SYSTEM ERROR: JsonMappingException: No content to map due to > end-of-input > at [Source: com.mapr.fs.MapRFsDataInputStream@293240cd; line: 1, column: 1] > [Error Id: 88f77d37-aff3-4adc-bb0e-6c13b49e7776 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > At the very least we should inform that the cache file has been corrupted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase
[ https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120020#comment-16120020 ] ASF GitHub Bot commented on DRILL-5691: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/889 @weijietong thanks for code formatting. > I wonder if the enhanced code does not break the current unit test ,it will be ok. Can you please check this? Did all unit tests pass after the change? > multiple count distinct query planning error at physical phase > --- > > Key: DRILL-5691 > URL: https://issues.apache.org/jira/browse/DRILL-5691 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.9.0, 1.10.0 >Reporter: weijie.tong > > I materialized the count distinct query result in a cache , added a plugin > rule to translate the (Aggregate、Aggregate、Project、Scan) or > (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. > Then ,once user issue count distinct queries , it will be translated to query > the cache to get the result. > eg1: " select count(*),sum(a) ,count(distinct b) from t where dt=xx " > eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t > where dt=xxx " > eg3:"select count(distinct b), count(distinct c) from t where dt=xxx" > eg1 will be right and have a query result as I expected , but eg2 will be > wrong at the physical phase.The error info is here: > https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. > eg3 will also get the similar error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120019#comment-16120019 ] ASF GitHub Bot commented on DRILL-4735: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/900 @jinfengni addressed code review comments. Please review when possible. > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Assignee: Arina Ielchiieva >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120013#comment-16120013 ] ASF GitHub Bot commented on DRILL-4735: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/882 New PR was opened - #900 > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Assignee: Arina Ielchiieva >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120010#comment-16120010 ] ASF GitHub Bot commented on DRILL-4735: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/900 DRILL-4735: ConvertCountToDirectScan rule enhancements 1. ConvertCountToDirectScan rule will be applicable for 2 or more COUNT aggregates. To achieve this DynamicPojoRecordReader was added which accepts any number of columns, on the contrary with PojoRecordReader which depends on class fields. AbstractPojoRecordReader class was added to factor out common logic for these two readers. 2. ConvertCountToDirectScan will distinguish between missing, directory and implicit columns. For missing columns count will be set 0, for implicit to the total records count since implicit columns are based on files and there is no data without a file. If directory column will be encountered, rule won't be applied. 3. MetadataDirectGroupScan class was introduced to indicate which files statistics for used. Details in Jira [DRILL-4735](https://issues.apache.org/jira/browse/DRILL-4735). You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-4735 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #900 commit 031a5b04ae430dbae015d88089c6d10623b9e87a Author: Arina IelchiievaDate: 2017-07-20T16:26:44Z DRILL-4735: ConvertCountToDirectScan rule enhancements 1. ConvertCountToDirectScan rule will be applicable for 2 or more COUNT aggregates. To achieve this DynamicPojoRecordReader was added which accepts any number of columns, on the contrary with PojoRecordReader which depends on class fields. AbstractPojoRecordReader class was added to factor out common logic for these two readers. 2. ConvertCountToDirectScan will distinguish between missing, directory and implicit columns. For missing columns count will be set 0, for implicit to the total records count since implicit columns are based on files and there is no data without a file. If directory column will be encountered, rule won't be applied. CountsCollector class was introduced to encapsulate counts collection logic. 3. MetadataDirectGroupScan class was introduced to indicate to the user when metadata was used during calculation and for which files it was applied. commit f3fa3dc2e2a876a21f1ce51b74dfd2544201f6f6 Author: Arina Ielchiieva Date: 2017-08-08T13:18:37Z DRILL-4735: Changes after code review. > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Assignee: Arina Ielchiieva >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the
[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119986#comment-16119986 ] ASF GitHub Bot commented on DRILL-4735: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/882 > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Assignee: Arina Ielchiieva >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5698) Drill should start in embedded mode using java 1.8.0_144
[ https://issues.apache.org/jira/browse/DRILL-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119969#comment-16119969 ] ASF GitHub Bot commented on DRILL-5698: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/899 @darrenbrien please close this PR. > Drill should start in embedded mode using java 1.8.0_144 > > > Key: DRILL-5698 > URL: https://issues.apache.org/jira/browse/DRILL-5698 > Project: Apache Drill > Issue Type: Bug >Reporter: Darren > Labels: ready-to-commit > Fix For: 1.12.0 > > > Currently the start up script > distribution/src/resources/drill-config.sh > prevents drill from starting as the regex incorrectly captures the 144 > portion of the version code > this is because the regex isn't escaped > -"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" > /dev/null > +"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" > > /dev/null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5698) Drill should start in embedded mode using java 1.8.0_144
[ https://issues.apache.org/jira/browse/DRILL-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119690#comment-16119690 ] ASF GitHub Bot commented on DRILL-5698: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/890 There was not need to create new PR. You could just amend your commit message and use `git push -f`. > Drill should start in embedded mode using java 1.8.0_144 > > > Key: DRILL-5698 > URL: https://issues.apache.org/jira/browse/DRILL-5698 > Project: Apache Drill > Issue Type: Bug >Reporter: Darren > Labels: ready-to-commit > Fix For: 1.12.0 > > > Currently the start up script > distribution/src/resources/drill-config.sh > prevents drill from starting as the regex incorrectly captures the 144 > portion of the version code > this is because the regex isn't escaped > -"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" > /dev/null > +"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" > > /dev/null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5698) Drill should start in embedded mode using java 1.8.0_144
[ https://issues.apache.org/jira/browse/DRILL-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119685#comment-16119685 ] ASF GitHub Bot commented on DRILL-5698: --- Github user darrenbrien commented on the issue: https://github.com/apache/drill/pull/890 see https://github.com/apache/drill/pull/899 > Drill should start in embedded mode using java 1.8.0_144 > > > Key: DRILL-5698 > URL: https://issues.apache.org/jira/browse/DRILL-5698 > Project: Apache Drill > Issue Type: Bug >Reporter: Darren > Labels: ready-to-commit > Fix For: 1.12.0 > > > Currently the start up script > distribution/src/resources/drill-config.sh > prevents drill from starting as the regex incorrectly captures the 144 > portion of the version code > this is because the regex isn't escaped > -"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" > /dev/null > +"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" > > /dev/null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5698) Drill should start in embedded mode using java 1.8.0_144
[ https://issues.apache.org/jira/browse/DRILL-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119678#comment-16119678 ] ASF GitHub Bot commented on DRILL-5698: --- GitHub user darrenbrien opened a pull request: https://github.com/apache/drill/pull/899 DRILL-5698: Escape version number period separator, this captures ver… …sion numbers with 4 5 or 6 in them, like 1.8.0_144 You can merge this pull request into a Git repository by running: $ git pull https://github.com/darrenbrien/drill Drill-5698 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/899.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #899 > Drill should start in embedded mode using java 1.8.0_144 > > > Key: DRILL-5698 > URL: https://issues.apache.org/jira/browse/DRILL-5698 > Project: Apache Drill > Issue Type: Bug >Reporter: Darren > Labels: ready-to-commit > Fix For: 1.12.0 > > > Currently the start up script > distribution/src/resources/drill-config.sh > prevents drill from starting as the regex incorrectly captures the 144 > portion of the version code > this is because the regex isn't escaped > -"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" > /dev/null > +"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" > > /dev/null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5698) Drill should start in embedded mode using java 1.8.0_144
[ https://issues.apache.org/jira/browse/DRILL-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119624#comment-16119624 ] ASF GitHub Bot commented on DRILL-5698: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/890 @darrenbrien Can you please also rename the commit message as well, as Paul have mentioned there is a convention here to have Jira number at the beginning: `DRILL-5698: Escape version number period separator, this captures version numbers` > Drill should start in embedded mode using java 1.8.0_144 > > > Key: DRILL-5698 > URL: https://issues.apache.org/jira/browse/DRILL-5698 > Project: Apache Drill > Issue Type: Bug >Reporter: Darren > Labels: ready-to-commit > Fix For: 1.12.0 > > > Currently the start up script > distribution/src/resources/drill-config.sh > prevents drill from starting as the regex incorrectly captures the 144 > portion of the version code > this is because the regex isn't escaped > -"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" > /dev/null > +"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" > > /dev/null -- This message was sent by Atlassian JIRA (v6.4.14#64029)