[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB

2016-03-01 Thread Rifat Mahmud (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rifat Mahmud updated DRILL-4462:

Attachment: fragmentprof.PNG

Fragment profile.

> Slow JOIN Query On Remote MongoDB
> -
>
> Key: DRILL-4462
> URL: https://issues.apache.org/jira/browse/DRILL-4462
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI, Storage - MongoDB
>Affects Versions: 1.5.0
>Reporter: Rifat Mahmud
> Attachments: fragmentprof.PNG
>
>
> Regardless of the number of collections in the MongoDB database, simple join 
> query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
> seconds from drill-embedded running on a single machine.
> Here are the profiles:
> https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
> https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB

2016-03-01 Thread Rifat Mahmud (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rifat Mahmud updated DRILL-4462:

Description: 
Regardless of the number of collections in the MongoDB database, simple join 
query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
seconds from drill-embedded running on a single machine.
Here are the profiles:
https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0

Screenshot of fragment profile has been attached.

  was:
Regardless of the number of collections in the MongoDB database, simple join 
query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
seconds from drill-embedded running on a single machine.
Here are the profiles:
https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0


> Slow JOIN Query On Remote MongoDB
> -
>
> Key: DRILL-4462
> URL: https://issues.apache.org/jira/browse/DRILL-4462
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI, Storage - MongoDB
>Affects Versions: 1.5.0
>Reporter: Rifat Mahmud
> Attachments: fragmentprof.PNG
>
>
> Regardless of the number of collections in the MongoDB database, simple join 
> query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
> seconds from drill-embedded running on a single machine.
> Here are the profiles:
> https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
> https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0
> Screenshot of fragment profile has been attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB

2016-03-01 Thread Rifat Mahmud (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rifat Mahmud updated DRILL-4462:

Description: 
Regardless of the number of collections in the MongoDB database, simple join 
query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
seconds from drill-embedded running on a single machine.
Here are the profiles:
https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0

  was:
Regardless of the number of collections in the MongoDB database, simple join 
query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
seconds.
Here are the profiles:
https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0


> Slow JOIN Query On Remote MongoDB
> -
>
> Key: DRILL-4462
> URL: https://issues.apache.org/jira/browse/DRILL-4462
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI, Storage - MongoDB
>Affects Versions: 1.5.0
>Reporter: Rifat Mahmud
>
> Regardless of the number of collections in the MongoDB database, simple join 
> query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
> seconds from drill-embedded running on a single machine.
> Here are the profiles:
> https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
> https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4462) Slow JOIN Query On Remote MongoDB

2016-03-01 Thread Rifat Mahmud (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rifat Mahmud updated DRILL-4462:

Flags: Important

> Slow JOIN Query On Remote MongoDB
> -
>
> Key: DRILL-4462
> URL: https://issues.apache.org/jira/browse/DRILL-4462
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI, Storage - MongoDB
>Affects Versions: 1.5.0
>Reporter: Rifat Mahmud
>
> Regardless of the number of collections in the MongoDB database, simple join 
> query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
> seconds.
> Here are the profiles:
> https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
> https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4462) Slow JOIN Query On Remote MongoDB

2016-03-01 Thread Rifat Mahmud (JIRA)
Rifat Mahmud created DRILL-4462:
---

 Summary: Slow JOIN Query On Remote MongoDB
 Key: DRILL-4462
 URL: https://issues.apache.org/jira/browse/DRILL-4462
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI, Storage - MongoDB
Affects Versions: 1.5.0
Reporter: Rifat Mahmud


Regardless of the number of collections in the MongoDB database, simple join 
query, like select * from t1, t2 where t1.a=t2.b is taking on and around 27 
seconds.
Here are the profiles:
https://drive.google.com/open?id=0B-J_8-KYz50mZ1NjSzlUUjR3Q0U
https://drive.google.com/open?id=0B-J_8-KYz50mcTFpRmxKOWdfak0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174839#comment-15174839
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/400#issuecomment-191014420
  
I think it would be better to have a single delegation setting that is json 
that describes the entirety of what we need to express. I think that will be 
much clearer than a set of multiple separate properties. 


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174702#comment-15174702
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/400

DRILL-4281: Support authorized users to delegate for other users

+ Need to make changes to [sqlline](https://github.com/mapr/sqlline) to 
pass down _delegator_ connection property.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-4281

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/400.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #400


commit b27d7421e73b504773b24c205f06aa0f98bd0bb6
Author: Sudheesh Katkam 
Date:   2016-03-02T00:23:57Z

DRILL-4281: Support authorized users to delegate for other users




> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4461) Drill Custom Authentication Startup Exception

2016-03-01 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-4461:
-

 Summary: Drill Custom Authentication Startup Exception
 Key: DRILL-4461
 URL: https://issues.apache.org/jira/browse/DRILL-4461
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Bridget Bevens
Assignee: Bridget Bevens


On page: https://drill.apache.org/docs/configuring-user-authentication/

There is one step that is missing from the doc.

For the custom classpath scanner to find the new class put the following
config code in a file named drill-module.conf at the root of the jar file
with custom authentication class.

drill {
  classpath.scanning {
packages += "myorg.drill.security"
  }
}

Could you please create a JIRA for changing the documentation?

Thanks
Venki




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-03-01 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4347:

Fix Version/s: 1.6.0

> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > household_demographics hd2,
> . . . . . . . . . . . . > customer_address ad1,
> . . . . . . . . . . . . > customer_address ad2,
> . . . . . . . . . . . . > income_band ib1,
> . . . . . . . . . . . . > income_band ib2,
> . . . . . . . . . . . . > item
> . . . . . . . . . . . . >  WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . > AND ss_addr_sk = ad1.ca_address_sk
> . . . . . . . . . . . . > AND ss_item_sk = i_item_

[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation

2016-03-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174345#comment-15174345
 ] 

Julian Hyde commented on DRILL-4460:


An "external" algorithm is one that uses disk to complete if there is not 
enough memory. An "adaptive" algorithm is one that can start using memory and 
switch to external in the same run, without losing data. A "hybrid" algorithm 
is one that puts as much data as possible in memory and puts the rest in 
external, and therefore tends to gracefully degrade as input increases.

I wanted to point out that there are adaptive, external algorithms based on 
sort as well as hash. This paper describes adaptive hybrid hash join but 
adaptive hybrid hash aggregation is similar (and in fact simpler). 
http://www.vldb.org/conf/1990/P186.PDF

To be clear, external hashing is not currently implemented in Drill.

> Provide feature that allows fall back to sort aggregation
> -
>
> Key: DRILL-4460
> URL: https://issues.apache.org/jira/browse/DRILL-4460
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.5.0
>Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-03-01 Thread Edmon Begoli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174312#comment-15174312
 ] 

Edmon Begoli commented on DRILL-3149:
-

+1, for sure. This issue comes up very frequently and it lowers the usability 
of Drill for some very common scenarios when working with delimited files with 
/r/n.
It stops us from querying large data sets because of sparse occurrences of 
/r/n; we have to pre-process large volumes to remove these terminators.  

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Priority: Minor
> Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation

2016-03-01 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174283#comment-15174283
 ] 

John Omernik commented on DRILL-4460:
-

Would external hashing essentially allow the hash to be spilled to disk?  I 
guess I want to be clear when it comes to "switching": if there is a way to 
switch "during the query" to save time and work that's already done with one 
method, that would be great. But as I am thinking more crudely here. 
Essentially catching the out of memory error, alter session hashagg = false, re 
run query, turn hash agg back on. It would be slow to be sure, but from a user 
perspective, a query that works, but takes time is better than one that fails. 
(see previous work: The Apache Hive project).   

To better summarize: as an admin, and a bunch of new users to drill, if they 
try to run a query, in the default, and it fails, it takes them seeking out how 
to fix it, or reaching out to me or my team. If instead I had an option that 
allowed it to succeed, and at the same time showed them how to do it 
differently, the user experience is better (they learned what was happening) 
and my experience is better (they don't interrupt my vacation with questions) 
:) 

I am not familiar with external hashing, what options controls that? 

> Provide feature that allows fall back to sort aggregation
> -
>
> Key: DRILL-4460
> URL: https://issues.apache.org/jira/browse/DRILL-4460
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.5.0
>Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation

2016-03-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174268#comment-15174268
 ] 

Julian Hyde commented on DRILL-4460:


Falling back to external hashing would be another viable solution to the 
problem. It's a little more expensive to switch from memory hashing to external 
hashing when you discover that the data set is larger than you expected 
(hashing uses a different data structure for external data, whereas sorting 
uses essentially the same data structure)

> Provide feature that allows fall back to sort aggregation
> -
>
> Key: DRILL-4460
> URL: https://issues.apache.org/jira/browse/DRILL-4460
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.5.0
>Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-01 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3623:

Labels: doc-impacting  (was: )

> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4281) Drill should support inbound impersonation

2016-03-01 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4281:

Labels: doc-impacting security  (was: security)

> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4460) Provide feature that allows fall back to sort aggregation

2016-03-01 Thread John Omernik (JIRA)
John Omernik created DRILL-4460:
---

 Summary: Provide feature that allows fall back to sort aggregation
 Key: DRILL-4460
 URL: https://issues.apache.org/jira/browse/DRILL-4460
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.5.0
Reporter: John Omernik


Currently, the default setting for Drill is to use a Hash (in Memory) model for 
aggregations (set by planner.enable_hashagg = true as default).  This works 
well, but it's memory dependent and an out of memory condition will cause a 
query failure.  At this point, a user can alter session set 
`planner.enable_hashagg` = false and run the query again. If memory is a 
challenge again, the sort based approach will spill to disk allowing the query 
to complete (slower).

What I am requesting is a feature, that defaults to be off (so Drill default 
behavior will be the same after this feature is added) that would allow a query 
that tried hash aggregation and failed due to out of memory to restart the same 
query with sort aggregation.  Basically, allowing the query to succeed, it will 
try hash first, then go to sort.  This would make for a better user experience 
in that the query would succeed. Perhaps a warning could be set for the user 
that would allow them to understand that this occurred, so they could just go 
to a sort based query by default in the future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-03-01 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174170#comment-15174170
 ] 

John Omernik commented on DRILL-3149:
-

The ability to support /r/n (or just multi character new line) is very 
important to heterogeneous data environments. +1 on this. 

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Priority: Minor
> Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4458) JDBC plugin case sensitive table names

2016-03-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4458:
--
Assignee: Taras Supyk

> JDBC plugin case sensitive table names
> --
>
> Key: DRILL-4458
> URL: https://issues.apache.org/jira/browse/DRILL-4458
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
> Environment: Drill embedded mode on OSX, connecting to MS SQLServer
>Reporter: Paul Mogren
>Assignee: Taras Supyk
>Priority: Minor
>
> I just tried Drill with MS SQL Server and I found that Drill treats table
> names case-sensitively, contrary to
> https://drill.apache.org/docs/lexical-structure/ which indicates that
> table names are "case-insensitive unless enclosed in double quotation
> marks”. This presents a problem for users and existing SQL scripts that
> expect table names to be case-insensitive.
> This works: select * from mysandbox.dbo.AD_Role
> This does not work: select * from mysandbox.dbo.ad_role
> Mailing list reference including stack trace: 
> http://mail-archives.apache.org/mod_mbox/drill-user/201603.mbox/%3ccajrw0otv8n5ybmvu6w_efe4npgenrdk5grmh9jtbxu9xnni...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table

2016-03-01 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3688:

Labels: doc-impacting  (was: doc)

> Drill should honor "skip.header.line.count" and "skip.footer.line.count" 
> attributes of Hive table
> -
>
> Key: DRILL-3688
> URL: https://issues.apache.org/jira/browse/DRILL-3688
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: 1.1
>Reporter: Hao Zhu
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.6.0
>
>
> Currently Drill does not honor the "skip.header.line.count" attribute of Hive 
> table.
> It may cause some other format conversion issue.
> Reproduce:
> 1. Create a Hive table
> {code}
> create table h1db.testheader(col0 string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE
> tblproperties("skip.header.line.count"="1");
> {code}
> 2. Prepare a sample data:
> {code}
> # cat test.data
> col0
> 2015-01-01
> {code}
> 3. Load sample data into Hive
> {code}
> LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader;
> {code}
> 4. Hive
> {code}
> hive> select * from h1db.testheader ;
> OK
> 2015-01-01
> Time taken: 0.254 seconds, Fetched: 1 row(s)
> {code}
> 5. Drill
> {code}
> >  select * from hive.h1db.testheader ;
> +-+
> |col0 |
> +-+
> | col0|
> | 2015-01-01  |
> +-+
> 2 rows selected (0.257 seconds)
> > select cast(col0 as date) from hive.h1db.testheader ;
> Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
> be in the range [1,12]
> Fragment 0:0
> [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010]
>   (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be 
> in the range [1,12]
> org.joda.time.field.FieldUtils.verifyValueBounds():236
> org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613
> org.joda.time.chrono.BasicChronology.getDateTimeMillis():159
> org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218
> org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67
> org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():147
> org.apache.drill.exec.physical.impl.BaseRootExec.next():83
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79
> org.apache.drill.exec.physical.impl.BaseRootExec.next():73
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1566
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():255
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}
> Also "skip.footer.line.count" should be taken into account.
> If "skip.header.line.count" or "skip.footer.line.count" has incorrect value 
> in Hive, throw appropriate exception in Drill.
> Ex: Hive table property skip.header.line.count value 'someValue' is 
> non-numeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3745) Hive CHAR not supported

2016-03-01 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3745:

Labels: doc-impacting  (was: doc)

> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table

2016-03-01 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3688:

Labels: doc  (was: )

> Drill should honor "skip.header.line.count" and "skip.footer.line.count" 
> attributes of Hive table
> -
>
> Key: DRILL-3688
> URL: https://issues.apache.org/jira/browse/DRILL-3688
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: 1.1
>Reporter: Hao Zhu
>Assignee: Arina Ielchiieva
>  Labels: doc
> Fix For: 1.6.0
>
>
> Currently Drill does not honor the "skip.header.line.count" attribute of Hive 
> table.
> It may cause some other format conversion issue.
> Reproduce:
> 1. Create a Hive table
> {code}
> create table h1db.testheader(col0 string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE
> tblproperties("skip.header.line.count"="1");
> {code}
> 2. Prepare a sample data:
> {code}
> # cat test.data
> col0
> 2015-01-01
> {code}
> 3. Load sample data into Hive
> {code}
> LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader;
> {code}
> 4. Hive
> {code}
> hive> select * from h1db.testheader ;
> OK
> 2015-01-01
> Time taken: 0.254 seconds, Fetched: 1 row(s)
> {code}
> 5. Drill
> {code}
> >  select * from hive.h1db.testheader ;
> +-+
> |col0 |
> +-+
> | col0|
> | 2015-01-01  |
> +-+
> 2 rows selected (0.257 seconds)
> > select cast(col0 as date) from hive.h1db.testheader ;
> Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
> be in the range [1,12]
> Fragment 0:0
> [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010]
>   (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be 
> in the range [1,12]
> org.joda.time.field.FieldUtils.verifyValueBounds():236
> org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613
> org.joda.time.chrono.BasicChronology.getDateTimeMillis():159
> org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218
> org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67
> org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():147
> org.apache.drill.exec.physical.impl.BaseRootExec.next():83
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79
> org.apache.drill.exec.physical.impl.BaseRootExec.next():73
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1566
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():255
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}
> Also "skip.footer.line.count" should be taken into account.
> If "skip.header.line.count" or "skip.footer.line.count" has incorrect value 
> in Hive, throw appropriate exception in Drill.
> Ex: Hive table property skip.header.line.count value 'someValue' is 
> non-numeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3745) Hive CHAR not supported

2016-03-01 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3745:

Labels: doc  (was: )

> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3745) Hive CHAR not supported

2016-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173942#comment-15173942
 ] 

ASF GitHub Bot commented on DRILL-3745:
---

Github user arina-ielchiieva commented on the pull request:

https://github.com/apache/drill/pull/399#issuecomment-190782202
  
> Rename to needOIForDrillType or related to mean do we need to generate 
ObjectInspector in Drill for Drill type?

Agree. Done.



> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3745) Hive CHAR not supported

2016-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173862#comment-15173862
 ] 

ASF GitHub Bot commented on DRILL-3745:
---

Github user vkorukanti commented on the pull request:

https://github.com/apache/drill/pull/399#issuecomment-190757236
  
LGTM, +1.


> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4459) SchemaChangeException while querying hive json table

2016-03-01 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4459:
--

 Summary: SchemaChangeException while querying hive json table
 Key: DRILL-4459
 URL: https://issues.apache.org/jira/browse/DRILL-4459
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill, Functions - Hive
Affects Versions: 1.4.0
 Environment: MapR-Drill 1.4.0
Hive-1.2.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.6.0


getting the SchemaChangeException while querying json documents stored in hive 
table.
{noformat}
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
{noformat}
minimum reproduce
{noformat}
created sample json documents using the attached script(randomdata.sh)
hive>create table simplejson(json string);
hive>load data local inpath '/tmp/simple.json' into table simplejson;
now query it through Drill.
Drill Version
select * from sys.version;
+---++-+-++
| commit_id | commit_message | commit_time | build_email | build_time |
+---++-+-++
| eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
expansion of directory in the non-metadata-cache case because it already 
happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 @ 
17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
+---++-+-++

0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 1:1

[Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
(state=,code=0)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4458) JDBC plugin case sensitive table names

2016-03-01 Thread Paul Mogren (JIRA)
Paul Mogren created DRILL-4458:
--

 Summary: JDBC plugin case sensitive table names
 Key: DRILL-4458
 URL: https://issues.apache.org/jira/browse/DRILL-4458
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JDBC
Affects Versions: 1.5.0
 Environment: Drill embedded mode on OSX, connecting to MS SQLServer
Reporter: Paul Mogren
Priority: Minor


I just tried Drill with MS SQL Server and I found that Drill treats table
names case-sensitively, contrary to
https://drill.apache.org/docs/lexical-structure/ which indicates that
table names are "case-insensitive unless enclosed in double quotation
marks”. This presents a problem for users and existing SQL scripts that
expect table names to be case-insensitive.

This works: select * from mysandbox.dbo.AD_Role
This does not work: select * from mysandbox.dbo.ad_role

Mailing list reference including stack trace: 
http://mail-archives.apache.org/mod_mbox/drill-user/201603.mbox/%3ccajrw0otv8n5ybmvu6w_efe4npgenrdk5grmh9jtbxu9xnni...@mail.gmail.com%3e




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4457) Difference in results returned by window function over BIGINT data

2016-03-01 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4457:
-

 Summary: Difference in results returned by window function over 
BIGINT data
 Key: DRILL-4457
 URL: https://issues.apache.org/jira/browse/DRILL-4457
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.6.0
 Environment: 4 node cluster
Reporter: Khurram Faraaz


Difference in results returned by window function query over same data on Drill 
vs on Postgres.
Drill 1.6.0 commit ID 6d5f4983

{noformat}
Verification Failures:
/root/public_framework/drill-test-framework/framework/resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.q
Query:
SELECT FIRST_VALUE(c3) OVER(PARTITION BY c8 ORDER BY c1 RANGE BETWEEN CURRENT 
ROW AND CURRENT ROW) FROM `t_alltype.parquet`
 Expected number of rows: 145
Actual number of rows from Drill: 145
 Number of matching rows: 143
  Number of rows missing: 2
   Number of rows unexpected: 2

These rows are not expected (first 10):
36022570792
21011901540311080

These rows are missing (first 10):
null (2 time(s))
{noformat}

Here is the difference in results, Drill 1.6.0 returns 36022570792 whereas 
Postgres returns null, and another difference is that Drill returns 
21011901540311080 whereas Postgres returns null.

{noformat}
[root@centos-01 drill-output]# diff -cb 
RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10\:36\:42_UTC_2016 
../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e
*** RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10:36:42_UTC_2016 2016-03-01 
10:36:43.012382649 +
--- 
../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e  
2016-03-01 10:32:56.605677914 +
***
*** 55,61 
  5424751352
  3734160392
  36022570792
! 36022570792
  584831936
  37102817894137256
  61958708627376736
--- 55,61 
  5424751352
  3734160392
  36022570792
! null
  584831936
  37102817894137256
  61958708627376736
***
*** 64,70 
  29537626363643852
  52598911986023288
  21011901540311080
! 21011901540311080
  17990322900862228
  61608051272
  3136812789494
--- 64,70 
  29537626363643852
  52598911986023288
  21011901540311080
! null
  17990322900862228
  61608051272
  3136812789494
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4456) Hive translate function is not working

2016-03-01 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-4456:
---

 Summary: Hive translate function is not working
 Key: DRILL-4456
 URL: https://issues.apache.org/jira/browse/DRILL-4456
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Hive
Affects Versions: 1.5.0
Reporter: Arina Ielchiieva
 Fix For: Future


In Hive "select translate(name, 'A', 'B') from users" works fine.
But in Drill "select translate(name, 'A', 'B') from hive.`users`" returns the 
following error:

org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
Encountered "," at line 1, column 22. Was expecting one of: "USING" ... "NOT" 
... "IN" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "=" ... ">" ... "<" ... 
"<=" ... ">=" ... "<>" ... "+" ... "-" ... "*" ... "/" ... "||" ... "AND" ... 
"OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "MULTISET" ... "[" ... "." ... 
"(" ... while parsing SQL query: select translate(name, 'A', 'B') from 
hive.users ^ [Error Id: ba21956b-3285-4544-b3b2-fab68b95be1f on localhost:31010]

Root cause:
Calcite follows the standard SQL reference.
SQL reference,  ISO/IEC 9075-2:2011(E), section 6.30

 ::=
  TRANSLATE  
USING  

To fix:
1. add support to translate (expession, from_string, to_string) alternative 
syntax
2. add unit test in org.apache.drill.exec.fn.hive.TestInbuiltHiveUDFs

Changes can be made directly in Calcite and then upgrade to appropriate Calcite 
version. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3745) Hive CHAR not supported

2016-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173426#comment-15173426
 ] 

ASF GitHub Bot commented on DRILL-3745:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/399

DRILL-3745: Hive CHAR not supported

1. Added Hive Char support in queries and udf-s out parameter. Char is 
trimmed first and then treated as varchar.
2. Unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-3745

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/399.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #399


commit 25f0513207dfdff0d993f49e60cf601abff19600
Author: Arina Ielchiieva 
Date:   2016-02-19T17:03:52Z

DRILL-3745: Hive CHAR not supported




> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)