[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB
[ https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rifat Mahmud updated DRILL-4462: Attachment: fragmentprof.PNG Fragment profile. > Slow JOIN Query On Remote MongoDB > - > > Key: DRILL-4462 > URL: https://issues.apache.org/jira/browse/DRILL-4462 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI, Storage - MongoDB >Affects Versions: 1.5.0 >Reporter: Rifat Mahmud > Attachments: fragmentprof.PNG > > > Regardless of the number of collections in the MongoDB database, simple join > query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 > seconds from drill-embedded running on a single machine. > Here are the profiles: > https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U > https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB
[ https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rifat Mahmud updated DRILL-4462: Description: Regardless of the number of collections in the MongoDB database, simple join query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 seconds from drill-embedded running on a single machine. Here are the profiles: https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 Screenshot of fragment profile has been attached. was: Regardless of the number of collections in the MongoDB database, simple join query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 seconds from drill-embedded running on a single machine. Here are the profiles: https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 > Slow JOIN Query On Remote MongoDB > - > > Key: DRILL-4462 > URL: https://issues.apache.org/jira/browse/DRILL-4462 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI, Storage - MongoDB >Affects Versions: 1.5.0 >Reporter: Rifat Mahmud > Attachments: fragmentprof.PNG > > > Regardless of the number of collections in the MongoDB database, simple join > query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 > seconds from drill-embedded running on a single machine. > Here are the profiles: > https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U > https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 > Screenshot of fragment profile has been attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB
[ https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rifat Mahmud updated DRILL-4462: Description: Regardless of the number of collections in the MongoDB database, simple join query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 seconds from drill-embedded running on a single machine. Here are the profiles: https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 was: Regardless of the number of collections in the MongoDB database, simple join query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 seconds. Here are the profiles: https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 > Slow JOIN Query On Remote MongoDB > - > > Key: DRILL-4462 > URL: https://issues.apache.org/jira/browse/DRILL-4462 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI, Storage - MongoDB >Affects Versions: 1.5.0 >Reporter: Rifat Mahmud > > Regardless of the number of collections in the MongoDB database, simple join > query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 > seconds from drill-embedded running on a single machine. > Here are the profiles: > https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U > https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB
[ https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rifat Mahmud updated DRILL-4462: Flags: Important > Slow JOIN Query On Remote MongoDB > - > > Key: DRILL-4462 > URL: https://issues.apache.org/jira/browse/DRILL-4462 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI, Storage - MongoDB >Affects Versions: 1.5.0 >Reporter: Rifat Mahmud > > Regardless of the number of collections in the MongoDB database, simple join > query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 > seconds. > Here are the profiles: > https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U > https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4462) Slow JOIN Query On Remote MongoDB
Rifat Mahmud created DRILL-4462: --- Summary: Slow JOIN Query On Remote MongoDB Key: DRILL-4462 URL: https://issues.apache.org/jira/browse/DRILL-4462 Project: Apache Drill Issue Type: Bug Components: Client - CLI, Storage - MongoDB Affects Versions: 1.5.0 Reporter: Rifat Mahmud Regardless of the number of collections in the MongoDB database, simple join query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 seconds. Here are the profiles: https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174839#comment-15174839 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/400#issuecomment-191014420 I think it would be better to have a single delegation setting that is json that describes the entirety of what we need to express. I think that will be much clearer than a set of multiple separate properties. > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174702#comment-15174702 ] ASF GitHub Bot commented on DRILL-4281: --- GitHub user sudheeshkatkam opened a pull request: https://github.com/apache/drill/pull/400 DRILL-4281: Support authorized users to delegate for other users + Need to make changes to [sqlline](https://github.com/mapr/sqlline) to pass down _delegator_ connection property. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sudheeshkatkam/drill DRILL-4281 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/400.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #400 commit b27d7421e73b504773b24c205f06aa0f98bd0bb6 Author: Sudheesh Katkam Date: 2016-03-02T00:23:57Z DRILL-4281: Support authorized users to delegate for other users > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4461) Drill Custom Authentication Startup Exception
Bridget Bevens created DRILL-4461: - Summary: Drill Custom Authentication Startup Exception Key: DRILL-4461 URL: https://issues.apache.org/jira/browse/DRILL-4461 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Bridget Bevens Assignee: Bridget Bevens On page: https://drill.apache.org/docs/configuring-user-authentication/ There is one step that is missing from the doc. For the custom classpath scanner to find the new class put the following config code in a file named drill-module.conf at the root of the jar file with custom authentication class. drill { classpath.scanning { packages += "myorg.drill.security" } } Could you please create a JIRA for changing the documentation? Thanks Venki -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-4347: Fix Version/s: 1.6.0 > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0 > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . . . . . . . . . . > household_demographics hd2, > . . . . . . . . . . . . > customer_address ad1, > . . . . . . . . . . . . > customer_address ad2, > . . . . . . . . . . . . > income_band ib1, > . . . . . . . . . . . . > income_band ib2, > . . . . . . . . . . . . > item > . . . . . . . . . . . . > WHERE ss_store_sk = s_store_sk > . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk > . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk > . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk > . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk > . . . . . . . . . . . . > AND ss_addr_sk = ad1.ca_address_sk > . . . . . . . . . . . . > AND ss_item_sk = i_item_
[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation
[ https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174345#comment-15174345 ] Julian Hyde commented on DRILL-4460: An "external" algorithm is one that uses disk to complete if there is not enough memory. An "adaptive" algorithm is one that can start using memory and switch to external in the same run, without losing data. A "hybrid" algorithm is one that puts as much data as possible in memory and puts the rest in external, and therefore tends to gracefully degrade as input increases. I wanted to point out that there are adaptive, external algorithms based on sort as well as hash. This paper describes adaptive hybrid hash join but adaptive hybrid hash aggregation is similar (and in fact simpler). http://www.vldb.org/conf/1990/P186.PDF To be clear, external hashing is not currently implemented in Drill. > Provide feature that allows fall back to sort aggregation > - > > Key: DRILL-4460 > URL: https://issues.apache.org/jira/browse/DRILL-4460 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.5.0 >Reporter: John Omernik > > Currently, the default setting for Drill is to use a Hash (in Memory) model > for aggregations (set by planner.enable_hashagg = true as default). This > works well, but it's memory dependent and an out of memory condition will > cause a query failure. At this point, a user can alter session set > `planner.enable_hashagg` = false and run the query again. If memory is a > challenge again, the sort based approach will spill to disk allowing the > query to complete (slower). > What I am requesting is a feature, that defaults to be off (so Drill default > behavior will be the same after this feature is added) that would allow a > query that tried hash aggregation and failed due to out of memory to restart > the same query with sort aggregation. Basically, allowing the query to > succeed, it will try hash first, then go to sort. This would make for a > better user experience in that the query would succeed. Perhaps a warning > could be set for the user that would allow them to understand that this > occurred, so they could just go to a sort based query by default in the > future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174312#comment-15174312 ] Edmon Begoli commented on DRILL-3149: - +1, for sure. This issue comes up very frequently and it lowers the usability of Drill for some very common scenarios when working with delimited files with /r/n. It stops us from querying large data sets because of sparse occurrences of /r/n; we have to pre-process large volumes to remove these terminators. > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Priority: Minor > Fix For: Future > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation
[ https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174283#comment-15174283 ] John Omernik commented on DRILL-4460: - Would external hashing essentially allow the hash to be spilled to disk? I guess I want to be clear when it comes to "switching": if there is a way to switch "during the query" to save time and work that's already done with one method, that would be great. But as I am thinking more crudely here. Essentially catching the out of memory error, alter session hashagg = false, re run query, turn hash agg back on. It would be slow to be sure, but from a user perspective, a query that works, but takes time is better than one that fails. (see previous work: The Apache Hive project). To better summarize: as an admin, and a bunch of new users to drill, if they try to run a query, in the default, and it fails, it takes them seeking out how to fix it, or reaching out to me or my team. If instead I had an option that allowed it to succeed, and at the same time showed them how to do it differently, the user experience is better (they learned what was happening) and my experience is better (they don't interrupt my vacation with questions) :) I am not familiar with external hashing, what options controls that? > Provide feature that allows fall back to sort aggregation > - > > Key: DRILL-4460 > URL: https://issues.apache.org/jira/browse/DRILL-4460 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.5.0 >Reporter: John Omernik > > Currently, the default setting for Drill is to use a Hash (in Memory) model > for aggregations (set by planner.enable_hashagg = true as default). This > works well, but it's memory dependent and an out of memory condition will > cause a query failure. At this point, a user can alter session set > `planner.enable_hashagg` = false and run the query again. If memory is a > challenge again, the sort based approach will spill to disk allowing the > query to complete (slower). > What I am requesting is a feature, that defaults to be off (so Drill default > behavior will be the same after this feature is added) that would allow a > query that tried hash aggregation and failed due to out of memory to restart > the same query with sort aggregation. Basically, allowing the query to > succeed, it will try hash first, then go to sort. This would make for a > better user experience in that the query would succeed. Perhaps a warning > could be set for the user that would allow them to understand that this > occurred, so they could just go to a sort based query by default in the > future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation
[ https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174268#comment-15174268 ] Julian Hyde commented on DRILL-4460: Falling back to external hashing would be another viable solution to the problem. It's a little more expensive to switch from memory hashing to external hashing when you discover that the data set is larger than you expected (hashing uses a different data structure for external data, whereas sorting uses essentially the same data structure) > Provide feature that allows fall back to sort aggregation > - > > Key: DRILL-4460 > URL: https://issues.apache.org/jira/browse/DRILL-4460 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.5.0 >Reporter: John Omernik > > Currently, the default setting for Drill is to use a Hash (in Memory) model > for aggregations (set by planner.enable_hashagg = true as default). This > works well, but it's memory dependent and an out of memory condition will > cause a query failure. At this point, a user can alter session set > `planner.enable_hashagg` = false and run the query again. If memory is a > challenge again, the sort based approach will spill to disk allowing the > query to complete (slower). > What I am requesting is a feature, that defaults to be off (so Drill default > behavior will be the same after this feature is added) that would allow a > query that tried hash aggregation and failed due to out of memory to restart > the same query with sort aggregation. Basically, allowing the query to > succeed, it will try hash first, then go to sort. This would make for a > better user experience in that the query would succeed. Perhaps a warning > could be set for the user that would allow them to understand that this > occurred, so they could just go to a sort based query by default in the > future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3623) Limit 0 should avoid execution when querying a known schema
[ https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3623: Labels: doc-impacting (was: ) > Limit 0 should avoid execution when querying a known schema > --- > > Key: DRILL-3623 > URL: https://issues.apache.org/jira/browse/DRILL-3623 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: MapR cluster >Reporter: Andries Engelbrecht >Assignee: Sudheesh Katkam > Labels: doc-impacting > Fix For: Future > > > Running a select * from hive.table limit 0 does not return (hangs). > Select * from hive.table limit 1 works fine > Hive table is about 6GB with 330 files with parquet using snappy compression. > Data types are int, bigint, string and double. > Querying directory with parquet files through the DFS plugin works fine > select * from dfs.root.`/user/hive/warehouse/database/table` limit 0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-4281: Labels: doc-impacting security (was: security) > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4460) Provide feature that allows fall back to sort aggregation
John Omernik created DRILL-4460: --- Summary: Provide feature that allows fall back to sort aggregation Key: DRILL-4460 URL: https://issues.apache.org/jira/browse/DRILL-4460 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Affects Versions: 1.5.0 Reporter: John Omernik Currently, the default setting for Drill is to use a Hash (in Memory) model for aggregations (set by planner.enable_hashagg = true as default). This works well, but it's memory dependent and an out of memory condition will cause a query failure. At this point, a user can alter session set `planner.enable_hashagg` = false and run the query again. If memory is a challenge again, the sort based approach will spill to disk allowing the query to complete (slower). What I am requesting is a feature, that defaults to be off (so Drill default behavior will be the same after this feature is added) that would allow a query that tried hash aggregation and failed due to out of memory to restart the same query with sort aggregation. Basically, allowing the query to succeed, it will try hash first, then go to sort. This would make for a better user experience in that the query would succeed. Perhaps a warning could be set for the user that would allow them to understand that this occurred, so they could just go to a sort based query by default in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174170#comment-15174170 ] John Omernik commented on DRILL-3149: - The ability to support /r/n (or just multi character new line) is very important to heterogeneous data environments. +1 on this. > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Priority: Minor > Fix For: Future > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4458) JDBC plugin case sensitive table names
[ https://issues.apache.org/jira/browse/DRILL-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4458: -- Assignee: Taras Supyk > JDBC plugin case sensitive table names > -- > > Key: DRILL-4458 > URL: https://issues.apache.org/jira/browse/DRILL-4458 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 > Environment: Drill embedded mode on OSX, connecting to MS SQLServer >Reporter: Paul Mogren >Assignee: Taras Supyk >Priority: Minor > > I just tried Drill with MS SQL Server and I found that Drill treats table > names case-sensitively, contrary to > https://drill.apache.org/docs/lexical-structure/ which indicates that > table names are "case-insensitive unless enclosed in double quotation > marks”. This presents a problem for users and existing SQL scripts that > expect table names to be case-insensitive. > This works: select * from mysandbox.dbo.AD_Role > This does not work: select * from mysandbox.dbo.ad_role > Mailing list reference including stack trace: > http://mail-archives.apache.org/mod_mbox/drill-user/201603.mbox/%3ccajrw0otv8n5ybmvu6w_efe4npgenrdk5grmh9jtbxu9xnni...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table
[ https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3688: Labels: doc-impacting (was: doc) > Drill should honor "skip.header.line.count" and "skip.footer.line.count" > attributes of Hive table > - > > Key: DRILL-3688 > URL: https://issues.apache.org/jira/browse/DRILL-3688 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: 1.1 >Reporter: Hao Zhu >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.6.0 > > > Currently Drill does not honor the "skip.header.line.count" attribute of Hive > table. > It may cause some other format conversion issue. > Reproduce: > 1. Create a Hive table > {code} > create table h1db.testheader(col0 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE > tblproperties("skip.header.line.count"="1"); > {code} > 2. Prepare a sample data: > {code} > # cat test.data > col0 > 2015-01-01 > {code} > 3. Load sample data into Hive > {code} > LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; > {code} > 4. Hive > {code} > hive> select * from h1db.testheader ; > OK > 2015-01-01 > Time taken: 0.254 seconds, Fetched: 1 row(s) > {code} > 5. Drill > {code} > > select * from hive.h1db.testheader ; > +-+ > |col0 | > +-+ > | col0| > | 2015-01-01 | > +-+ > 2 rows selected (0.257 seconds) > > select cast(col0 as date) from hive.h1db.testheader ; > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > Fragment 0:0 > [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] > (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be > in the range [1,12] > org.joda.time.field.FieldUtils.verifyValueBounds():236 > org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 > org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 > org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 > org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 > org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():147 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1566 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():255 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} > Also "skip.footer.line.count" should be taken into account. > If "skip.header.line.count" or "skip.footer.line.count" has incorrect value > in Hive, throw appropriate exception in Drill. > Ex: Hive table property skip.header.line.count value 'someValue' is > non-numeric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3745: Labels: doc-impacting (was: doc) > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > Labels: doc-impacting > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table
[ https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3688: Labels: doc (was: ) > Drill should honor "skip.header.line.count" and "skip.footer.line.count" > attributes of Hive table > - > > Key: DRILL-3688 > URL: https://issues.apache.org/jira/browse/DRILL-3688 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: 1.1 >Reporter: Hao Zhu >Assignee: Arina Ielchiieva > Labels: doc > Fix For: 1.6.0 > > > Currently Drill does not honor the "skip.header.line.count" attribute of Hive > table. > It may cause some other format conversion issue. > Reproduce: > 1. Create a Hive table > {code} > create table h1db.testheader(col0 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE > tblproperties("skip.header.line.count"="1"); > {code} > 2. Prepare a sample data: > {code} > # cat test.data > col0 > 2015-01-01 > {code} > 3. Load sample data into Hive > {code} > LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; > {code} > 4. Hive > {code} > hive> select * from h1db.testheader ; > OK > 2015-01-01 > Time taken: 0.254 seconds, Fetched: 1 row(s) > {code} > 5. Drill > {code} > > select * from hive.h1db.testheader ; > +-+ > |col0 | > +-+ > | col0| > | 2015-01-01 | > +-+ > 2 rows selected (0.257 seconds) > > select cast(col0 as date) from hive.h1db.testheader ; > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > Fragment 0:0 > [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] > (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be > in the range [1,12] > org.joda.time.field.FieldUtils.verifyValueBounds():236 > org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 > org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 > org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 > org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 > org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():147 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1566 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():255 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} > Also "skip.footer.line.count" should be taken into account. > If "skip.header.line.count" or "skip.footer.line.count" has incorrect value > in Hive, throw appropriate exception in Drill. > Ex: Hive table property skip.header.line.count value 'someValue' is > non-numeric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3745: Labels: doc (was: ) > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > Labels: doc > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173942#comment-15173942 ] ASF GitHub Bot commented on DRILL-3745: --- Github user arina-ielchiieva commented on the pull request: https://github.com/apache/drill/pull/399#issuecomment-190782202 > Rename to needOIForDrillType or related to mean do we need to generate ObjectInspector in Drill for Drill type? Agree. Done. > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173862#comment-15173862 ] ASF GitHub Bot commented on DRILL-3745: --- Github user vkorukanti commented on the pull request: https://github.com/apache/drill/pull/399#issuecomment-190757236 LGTM, +1. > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4459) SchemaChangeException while querying hive json table
Vitalii Diravka created DRILL-4459: -- Summary: SchemaChangeException while querying hive json table Key: DRILL-4459 URL: https://issues.apache.org/jira/browse/DRILL-4459 Project: Apache Drill Issue Type: Bug Components: Functions - Drill, Functions - Hive Affects Versions: 1.4.0 Environment: MapR-Drill 1.4.0 Hive-1.2.0 Reporter: Vitalii Diravka Assignee: Vitalii Diravka Fix For: 1.6.0 getting the SchemaChangeException while querying json documents stored in hive table. {noformat} Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castBIT(VAR16CHAR-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. {noformat} minimum reproduce {noformat} created sample json documents using the attached script(randomdata.sh) hive>create table simplejson(json string); hive>load data local inpath '/tmp/simple.json' into table simplejson; now query it through Drill. Drill Version select * from sys.version; +---++-+-++ | commit_id | commit_message | commit_time | build_email | build_time | +---++-+-++ | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901: Don't do early expansion of directory in the non-metadata-cache case because it already happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC | +---++-+-++ 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castBIT(VAR16CHAR-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 1:1 [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] (state=,code=0) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4458) JDBC plugin case sensitive table names
Paul Mogren created DRILL-4458: -- Summary: JDBC plugin case sensitive table names Key: DRILL-4458 URL: https://issues.apache.org/jira/browse/DRILL-4458 Project: Apache Drill Issue Type: Bug Components: Storage - JDBC Affects Versions: 1.5.0 Environment: Drill embedded mode on OSX, connecting to MS SQLServer Reporter: Paul Mogren Priority: Minor I just tried Drill with MS SQL Server and I found that Drill treats table names case-sensitively, contrary to https://drill.apache.org/docs/lexical-structure/ which indicates that table names are "case-insensitive unless enclosed in double quotation marks”. This presents a problem for users and existing SQL scripts that expect table names to be case-insensitive. This works: select * from mysandbox.dbo.AD_Role This does not work: select * from mysandbox.dbo.ad_role Mailing list reference including stack trace: http://mail-archives.apache.org/mod_mbox/drill-user/201603.mbox/%3ccajrw0otv8n5ybmvu6w_efe4npgenrdk5grmh9jtbxu9xnni...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4457) Difference in results returned by window function over BIGINT data
Khurram Faraaz created DRILL-4457: - Summary: Difference in results returned by window function over BIGINT data Key: DRILL-4457 URL: https://issues.apache.org/jira/browse/DRILL-4457 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.6.0 Environment: 4 node cluster Reporter: Khurram Faraaz Difference in results returned by window function query over same data on Drill vs on Postgres. Drill 1.6.0 commit ID 6d5f4983 {noformat} Verification Failures: /root/public_framework/drill-test-framework/framework/resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.q Query: SELECT FIRST_VALUE(c3) OVER(PARTITION BY c8 ORDER BY c1 RANGE BETWEEN CURRENT ROW AND CURRENT ROW) FROM `t_alltype.parquet` Expected number of rows: 145 Actual number of rows from Drill: 145 Number of matching rows: 143 Number of rows missing: 2 Number of rows unexpected: 2 These rows are not expected (first 10): 36022570792 21011901540311080 These rows are missing (first 10): null (2 time(s)) {noformat} Here is the difference in results, Drill 1.6.0 returns 36022570792 whereas Postgres returns null, and another difference is that Drill returns 21011901540311080 whereas Postgres returns null. {noformat} [root@centos-01 drill-output]# diff -cb RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10\:36\:42_UTC_2016 ../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e *** RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10:36:42_UTC_2016 2016-03-01 10:36:43.012382649 + --- ../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e 2016-03-01 10:32:56.605677914 + *** *** 55,61 5424751352 3734160392 36022570792 ! 36022570792 584831936 37102817894137256 61958708627376736 --- 55,61 5424751352 3734160392 36022570792 ! null 584831936 37102817894137256 61958708627376736 *** *** 64,70 29537626363643852 52598911986023288 21011901540311080 ! 21011901540311080 17990322900862228 61608051272 3136812789494 --- 64,70 29537626363643852 52598911986023288 21011901540311080 ! null 17990322900862228 61608051272 3136812789494 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4456) Hive translate function is not working
Arina Ielchiieva created DRILL-4456: --- Summary: Hive translate function is not working Key: DRILL-4456 URL: https://issues.apache.org/jira/browse/DRILL-4456 Project: Apache Drill Issue Type: Improvement Components: Functions - Hive Affects Versions: 1.5.0 Reporter: Arina Ielchiieva Fix For: Future In Hive "select translate(name, 'A', 'B') from users" works fine. But in Drill "select translate(name, 'A', 'B') from hive.`users`" returns the following error: org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: Encountered "," at line 1, column 22. Was expecting one of: "USING" ... "NOT" ... "IN" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "=" ... ">" ... "<" ... "<=" ... ">=" ... "<>" ... "+" ... "-" ... "*" ... "/" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "MULTISET" ... "[" ... "." ... "(" ... while parsing SQL query: select translate(name, 'A', 'B') from hive.users ^ [Error Id: ba21956b-3285-4544-b3b2-fab68b95be1f on localhost:31010] Root cause: Calcite follows the standard SQL reference. SQL reference, ISO/IEC 9075-2:2011(E), section 6.30 ::= TRANSLATE USING To fix: 1. add support to translate (expession, from_string, to_string) alternative syntax 2. add unit test in org.apache.drill.exec.fn.hive.TestInbuiltHiveUDFs Changes can be made directly in Calcite and then upgrade to appropriate Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173426#comment-15173426 ] ASF GitHub Bot commented on DRILL-3745: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/399 DRILL-3745: Hive CHAR not supported 1. Added Hive Char support in queries and udf-s out parameter. Char is trimmed first and then treated as varchar. 2. Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-3745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/399.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #399 commit 25f0513207dfdff0d993f49e60cf601abff19600 Author: Arina Ielchiieva Date: 2016-02-19T17:03:52Z DRILL-3745: Hive CHAR not supported > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)