[jira] [Updated] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5732:
--
Attachment: 2668b522-5833-8fd2-0b6d-e685197f0ae3.sys.drill

> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> -
>
> Key: DRILL-5732
> URL: https://issues.apache.org/jira/browse/DRILL-5732
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Attachments: 2668b522-5833-8fd2-0b6d-e685197f0ae3.sys.drill, 
> drillbit.log
>
>
> git commit id:
> {noformat}
> | 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
> Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
> r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
> {noformat}
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
> max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
> max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
> max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
> max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
> min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), 
> max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), 
> max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), 
> min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), 
> min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
> max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
> min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
> min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
> min(c_current_addr_sk), min(c_first_shipto_date_sk), 
> min(c_first_sales_date_sk), min(length(c_salutation)), 
> min(length(c_first_name)), min(length(c_last_name)), 
> min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
> min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
> cs_sold_date_sk+cs_sold_time_sk col1, * from 
> dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
> first) d where d.col1 > 2536816 and c_email_address is not null group by 
> c_email_address;
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.disable_exchanges` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> alter session set `planner.width.max_per_node` = 17;
> alter session set `planner.width.max_per_query` = 1000;
> {noformat}
> Here is the stack trace:
> {noformat}
> 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
> memory = 71964288
> 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
> ran out of memory while executing the query.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 71964288
> allocator limit 52428800
> [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5732:
--
Attachment: drillbit.log

> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> -
>
> Key: DRILL-5732
> URL: https://issues.apache.org/jira/browse/DRILL-5732
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Attachments: drillbit.log
>
>
> git commit id:
> {noformat}
> | 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
> Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
> r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
> {noformat}
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
> max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
> max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
> max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
> max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
> min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), 
> max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), 
> max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), 
> min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), 
> min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
> max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
> min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
> min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
> min(c_current_addr_sk), min(c_first_shipto_date_sk), 
> min(c_first_sales_date_sk), min(length(c_salutation)), 
> min(length(c_first_name)), min(length(c_last_name)), 
> min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
> min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
> cs_sold_date_sk+cs_sold_time_sk col1, * from 
> dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
> first) d where d.col1 > 2536816 and c_email_address is not null group by 
> c_email_address;
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.disable_exchanges` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> alter session set `planner.width.max_per_node` = 17;
> alter session set `planner.width.max_per_query` = 1000;
> {noformat}
> Here is the stack trace:
> {noformat}
> 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
> memory = 71964288
> 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
> ran out of memory while executing the query.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 71964288
> allocator limit 52428800
> [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
>  

[jira] [Created] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5732:
-

 Summary: Unable to allocate sv2 for 9039 records, and not enough 
batchGroups to spill.
 Key: DRILL-5732
 URL: https://issues.apache.org/jira/browse/DRILL-5732
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Robert Hou
Assignee: Paul Rogers


git commit id:
{noformat}
| 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
{noformat}

Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 104857600;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), max(cs_order_number), 
max(cs_quantity), max(cs_wholesale_cost), max(cs_list_price), 
max(cs_sales_price), max(cs_ext_discount_amt), min(cs_ext_sales_price), 
max(cs_ext_wholesale_cost), min(cs_ext_list_price), min(cs_ext_tax), 
min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
min(c_current_addr_sk), min(c_first_shipto_date_sk), 
min(c_first_sales_date_sk), min(length(c_salutation)), 
min(length(c_first_name)), min(length(c_last_name)), 
min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
cs_sold_date_sk+cs_sold_time_sk col1, * from 
dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
first) d where d.col1 > 2536816 and c_email_address is not null group by 
c_email_address;
ALTER SESSION SET `exec.sort.disable_managed` = true;
alter session set `planner.disable_exchanges` = false;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
alter session set `planner.width.max_per_node` = 17;
alter session set `planner.width.max_per_query` = 1000;
{noformat}

Here is the stack trace:
{noformat}
2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
memory = 71964288
2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
ran out of memory while executing the query.
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 71964288
allocator limit 52428800

[Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 

[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133513#comment-16133513
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134035511
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -57,22 +57,120 @@ private StringFunctions() {}
 @Output BitHolder out;
 @Workspace java.util.regex.Matcher matcher;
 @Workspace org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper 
charSequenceWrapper;
+@Workspace 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternInfo patternInfo;
 
 @Override
 public void setup() {
-  matcher = 
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
 //
-  
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
  pattern.end,  pattern.buffer))).matcher("");
+  patternInfo = 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
+  
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
 pattern.end, pattern.buffer));
   charSequenceWrapper = new 
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
-  matcher.reset(charSequenceWrapper);
+
+  // Use java regex and compile pattern only if it is not a simple 
pattern.
+  if (patternInfo.getPatternType() == 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) {
+java.lang.String javaPatternString = 
patternInfo.getJavaPatternString();
+matcher = 
java.util.regex.Pattern.compile(javaPatternString).matcher("");
+matcher.reset(charSequenceWrapper);
+  }
 }
 
 @Override
 public void eval() {
   charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
   // Reusing same charSequenceWrapper, no need to pass it in.
   // This saves one method call since reset(CharSequence) calls reset()
-  matcher.reset();
-  out.value = matcher.matches()? 1:0;
+
+  // Not a simple case. Just use Java regex.
+  if (patternInfo.getPatternType() == 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) {
+matcher.reset();
+out.value = matcher.matches() ? 1 : 0;
+  }
+
+  // This is a simple pattern that ends with a constant string i.e. 
%ABC
+  // Compare the characters starting from end.
+  if (patternInfo.getPatternType() == 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.ENDS_WITH) {
--- End diff --

Chains of ifs are rather old-school. Do a switch on the enum. And, do it in 
the pattern class so it can be unit tested.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133508#comment-16133508
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134032343
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -47,18 +47,55 @@
   "[:alnum:]", "\\p{Alnum}"
   };
 
+  // type of pattern string.
+  public enum sqlPatternType {
--- End diff --

Class name format: `SqlPatternType`


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133514#comment-16133514
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134035089
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -96,20 +145,46 @@ public static String sqlToRegexLike(
 || (nextChar == '%')
 || (nextChar == escapeChar)) {
   javaPattern.append(nextChar);
+  simplePattern.append(nextChar);
   i++;
 } else {
   throw invalidEscapeSequence(sqlPattern, i);
 }
   } else if (c == '_') {
+// if we find _, it is not simple pattern, we are looking for only 
%
+notSimple = true;
 javaPattern.append('.');
   } else if (c == '%') {
+if (i == 0) {
+  // % at the start could potentially be one of the simple cases 
i.e. ENDS_WITH.
+  endsWith = true;
+} else if (i == (len-1)) {
+  // % at the end could potentially be one of the simple cases 
i.e. STARTS_WITH
+  startsWith = true;
+} else {
+  // If we find % anywhere other than start or end, it is not a 
simple case.
+  notSimple = true;
+}
 javaPattern.append(".");
 javaPattern.append('*');
   } else {
 javaPattern.append(c);
+simplePattern.append(c);
   }
 }
-return javaPattern.toString();
+
+if (!notSimple) {
--- End diff --

Yeah, the zillion-flags approach is too complex to follow. Really need a 
good-old state machine.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133516#comment-16133516
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134035728
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -85,23 +183,118 @@ public void eval() {
 @Output BitHolder out;
 @Workspace java.util.regex.Matcher matcher;
 @Workspace org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper 
charSequenceWrapper;
+@Workspace 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternInfo patternInfo;
 
 @Override
 public void setup() {
-  matcher = 
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
 //
+  patternInfo = 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
   
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
  pattern.end,  pattern.buffer),
-  
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(escape.start,
  escape.end,  escape.buffer))).matcher("");
+  
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(escape.start,
  escape.end,  escape.buffer));
   charSequenceWrapper = new 
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
-  matcher.reset(charSequenceWrapper);
+
+  // Use java regex and compile pattern only if it is not a simple 
pattern.
+  if (patternInfo.getPatternType() == 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) {
+java.lang.String javaPatternString = 
patternInfo.getJavaPatternString();
+matcher = 
java.util.regex.Pattern.compile(javaPatternString).matcher("");
+matcher.reset(charSequenceWrapper);
+  }
 }
 
 @Override
 public void eval() {
   charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
   // Reusing same charSequenceWrapper, no need to pass it in.
   // This saves one method call since reset(CharSequence) calls reset()
-  matcher.reset();
-  out.value = matcher.matches()? 1:0;
+
+  // Not a simple case. Just use Java regex.
+  if (patternInfo.getPatternType() == 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) {
--- End diff --

We are doing a switch (actually, chain of ifs) per value. This is a tight 
inner loop. Far better to simply generate an instance of the proper class and 
call a single method to do the work.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133511#comment-16133511
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134035411
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -57,22 +57,120 @@ private StringFunctions() {}
 @Output BitHolder out;
 @Workspace java.util.regex.Matcher matcher;
 @Workspace org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper 
charSequenceWrapper;
+@Workspace 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternInfo patternInfo;
 
 @Override
 public void setup() {
-  matcher = 
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
 //
-  
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
  pattern.end,  pattern.buffer))).matcher("");
+  patternInfo = 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
+  
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
 pattern.end, pattern.buffer));
   charSequenceWrapper = new 
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
-  matcher.reset(charSequenceWrapper);
+
+  // Use java regex and compile pattern only if it is not a simple 
pattern.
+  if (patternInfo.getPatternType() == 
org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) {
--- End diff --

You have an enum to describe the cases, and a class to capture the info. 
That is the perfect place to encode the information about how to process. In 
particular, the pattern class should act as a factory for a pattern executor: 
will create an instance of the class needed to do the work. That will also 
allow this stuff to be unit tested without needing all of Drill.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133517#comment-16133517
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134035974
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java
 ---
@@ -157,6 +157,967 @@ public void testRegexpReplace() throws Exception {
   }
 
   @Test
+  public void testLikeStartsWith() throws Exception {
+
+// all ASCII.
+testBuilder()
--- End diff --

The regex parsing and execution code is becoming complex. Let's test it 
with a true unit test, not just a system-level test using a query. See the test 
frameworks available. We can also discuss in person.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133515#comment-16133515
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134034498
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -76,12 +113,24 @@ public static String sqlToRegexLike(
   /**
* Translates a SQL LIKE pattern to Java regex pattern.
*/
-  public static String sqlToRegexLike(
+  public static sqlPatternInfo sqlToRegexLike(
   String sqlPattern,
   char escapeChar) {
 int i;
 final int len = sqlPattern.length();
 final StringBuilder javaPattern = new StringBuilder(len + len);
+final StringBuilder simplePattern = new StringBuilder(len);
+
+// Figure out the pattern type and build simplePatternString
+// as we are going through the sql pattern string
+// to build java regex pattern string. This is better instead of using
+// regex later for determining if a pattern is simple or not.
+// Saves CPU cycles.
+sqlPatternType patternType = sqlPatternType.NOT_SIMPLE;
+boolean startsWith = false;
+boolean endsWith = false;
+boolean notSimple = false;
--- End diff --

Or, since your enum represents terminal states, create a new enum with 
internal states. CONST_ONLY, WILDCARD, COMPLEX with transitions
```
initial: CONST_ONLY
all constant caracters, CONST_ONLY: -> CONST_ONLY
%, CONST_ONLY --> WILDCARD
any other special char, any state --> COMPLEX
%, WILDCARD --> COMPLEX
```
Or, even better, define a simple recursive decent parser in which states 
are encoded as methods rather than as state variables.
```
parseConstant() ...
- parseWildcard()
- parseComplex()

parseWildcard() ...
- parseComplex()

parseComplex() ...
```

Here, I'm ignoring the details of detecting abc, abc%, ab%c, %abc. These 
can also be represented as states with the resulting transitions.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133512#comment-16133512
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134032943
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -96,20 +145,46 @@ public static String sqlToRegexLike(
 || (nextChar == '%')
 || (nextChar == escapeChar)) {
   javaPattern.append(nextChar);
+  simplePattern.append(nextChar);
   i++;
 } else {
   throw invalidEscapeSequence(sqlPattern, i);
 }
   } else if (c == '_') {
+// if we find _, it is not simple pattern, we are looking for only 
%
+notSimple = true;
--- End diff --

`type = NOT_SIMPLE`


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133510#comment-16133510
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134032668
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -76,12 +113,24 @@ public static String sqlToRegexLike(
   /**
* Translates a SQL LIKE pattern to Java regex pattern.
*/
-  public static String sqlToRegexLike(
+  public static sqlPatternInfo sqlToRegexLike(
   String sqlPattern,
   char escapeChar) {
 int i;
 final int len = sqlPattern.length();
 final StringBuilder javaPattern = new StringBuilder(len + len);
+final StringBuilder simplePattern = new StringBuilder(len);
+
+// Figure out the pattern type and build simplePatternString
+// as we are going through the sql pattern string
+// to build java regex pattern string. This is better instead of using
+// regex later for determining if a pattern is simple or not.
+// Saves CPU cycles.
+sqlPatternType patternType = sqlPatternType.NOT_SIMPLE;
+boolean startsWith = false;
+boolean endsWith = false;
+boolean notSimple = false;
--- End diff --

These are not independent states. Probably better to use your enum, with 
the initial value as null (unknown).


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133509#comment-16133509
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r134034748
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -96,20 +145,46 @@ public static String sqlToRegexLike(
 || (nextChar == '%')
 || (nextChar == escapeChar)) {
   javaPattern.append(nextChar);
+  simplePattern.append(nextChar);
   i++;
 } else {
   throw invalidEscapeSequence(sqlPattern, i);
 }
   } else if (c == '_') {
+// if we find _, it is not simple pattern, we are looking for only 
%
+notSimple = true;
 javaPattern.append('.');
   } else if (c == '%') {
+if (i == 0) {
+  // % at the start could potentially be one of the simple cases 
i.e. ENDS_WITH.
+  endsWith = true;
+} else if (i == (len-1)) {
--- End diff --

A bit of a funky way to do this. Might was well actually wait to the end. 
This is why we need states (as an enum or via recursive descent.) At end:

If all constants: CONST
If one wildcard: one of the simple cases
Otherwise: COMPLEX


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-1162) 25 way join ended up with OOM

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133460#comment-16133460
 ] 

ASF GitHub Bot commented on DRILL-1162:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/905
  
@jinfengni, I think you are more familiar with this part of the code, can 
you take a look?


> 25 way join ended up with OOM
> -
>
> Key: DRILL-1162
> URL: https://issues.apache.org/jira/browse/DRILL-1162
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: Future
>
> Attachments: error.log, oom_error.log
>
>
> git.commit.id.abbrev=e5c2da0
> The below query results in 0 results being returned 
> {code:sql}
> select count(*) from `lineitem1.parquet` a 
> inner join `part.parquet` j on a.l_partkey = j.p_partkey 
> inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey 
> inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey 
> inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey 
> = m.ps_suppkey 
> inner join `customer.parquet` n on k.o_custkey = n.c_custkey 
> inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey 
> inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey 
> inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey 
> inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice 
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment 
> inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate 
> inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate 
> inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate 
> inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate 
> inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate 
> inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate 
> inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate 
> inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate 
> inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate 
> inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate 
> inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate 
> inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate 
> inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate;
> {code}
> However when we remove the last 'inner join' and run the query it returns 
> '716372534'. Since the last inner join is similar to the one's before it, it 
> should match some records and return the data appropriately.
> The logs indicated that it actually returned 0 results. Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5712) Update the pom files with dependency exclusions for commons-codec

2017-08-18 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra updated DRILL-5712:
-
Labels:   (was: ready-to-commit)

> Update the pom files with dependency exclusions for commons-codec
> -
>
> Key: DRILL-5712
> URL: https://issues.apache.org/jira/browse/DRILL-5712
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>
> In java-exec, we are adding a dependency for commons-codec of version 1.10. 
> Other dependencies like hadoop-common, parquet-column etc are trying to 
> download different versions for common codec. Exclusions should be added for 
> common-codec in these dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5731) regionsToScan is computed multiple times for MapR DB Json and Binary tables

2017-08-18 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-5731:
---

 Summary: regionsToScan is computed multiple times for MapR DB Json 
and Binary tables
 Key: DRILL-5731
 URL: https://issues.apache.org/jira/browse/DRILL-5731
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy


We are computing regionsToScan for both Json and binary tables in groupScan 
init, which gets called multiple times during planning. This is expensive and 
not needed. Instead, compute only when needed for parallelization assignment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (DRILL-5704) Improve error message on client side when queries fail with "Failed to create schema tree." when Impersonation is enabled and logins are anonymous

2017-08-18 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia reopened DRILL-5704:
--

> Improve error message on client side when queries fail with "Failed to create 
> schema tree." when Impersonation is enabled and logins are anonymous
> --
>
> Key: DRILL-5704
> URL: https://issues.apache.org/jira/browse/DRILL-5704
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Reported by [~agirish]
> When username is not specified then Drill set's the session user as anonymous 
> if impersonation is enabled. During query execution Drill tries to build 
> schema tree and as part of that it validates if the user has access to the 
> workspace or not by using FileClient Api liststatus which verifies the user 
> from the OS user. Since impersonation is only enabled here without 
> authentication and we don't specify any user in connection string, Drill will 
> use default user which is "anonymous" and pass that to check workspace 
> permission which will fail as node doesn't have any valid user with that name.
> {code:java}
> Caused by: java.io.IOException: Error getting user info for current user, 
> anonymous
>..
>..
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.listStatus(DrillFileSystem.java:523)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.accessible(WorkspaceSchemaFactory.java:157)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.(FileSystemSchemaFactory.java:78)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.registerSchemas(FileSystemSchemaFactory.java:65)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.FileSystemPlugin.registerSchemas(FileSystemPlugin.java:150)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.StoragePluginRegistryImpl$DrillSchemaFactory.registerSchemas(StoragePluginRegistryImpl.java:365)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.SchemaTreeProvider.createRootSchema(SchemaTreeProvider.java:72)
>  [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> ... 10 common frames omitted
> {code}
> # $DRILL_HOME/bin/sqlline -u "jdbc:drill:zk=localhost:5181" 
> sqlline> select * from sys.drillbits;
> User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: Failed to 
> create schema tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2017-08-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5377:
---
Fix Version/s: Future

> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133099#comment-16133099
 ] 

ASF GitHub Bot commented on DRILL-5377:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/916

DRILL-5377: Five-digit year dates are displayed incorrectly via jdbc



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-5377

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/916.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #916


commit 02b533dde137b85fd500cf1290bfa8a1a3f2f4e6
Author: Vitalii Diravka 
Date:   2017-08-15T17:51:10Z

DRILL-5377: Five-digit year dates are displayed incorrectly via jdbc




> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2017-08-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5377:
---
Summary: Five-digit year dates are displayed incorrectly via jdbc  (was: 
Five-digit year dates aren't correct displayed via jdbc)

> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5377) Five-digit year dates aren't correct displayed via jdbc

2017-08-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133082#comment-16133082
 ] 

Vitalii Diravka commented on DRILL-5377:


The root cause of the issue is that java.sql.Date cut five-digit year dates in 
toString() method.

Old behaviour:
{code}
0: jdbc:drill:zk=local> select cast('11551-02-16' as date) as FUTURE_DATE, 
CURRENT_DATE from (VALUES(1));
+--+---+
| FUTURE_DATE  | CURRENT_DATE  |
+--+---+
| 551-02-16   | 2017-08-18|
+--+---+
{code}

After fix:
{code}
0: jdbc:drill:zk=local> select cast('11551-02-16' as date) as FUTURE_DATE, 
CURRENT_DATE from (VALUES(1));
+--+---+
| FUTURE_DATE  | CURRENT_DATE  |
+--+---+
| 11551-02-16  | 2017-08-18|
+--+---+
{code}

> Five-digit year dates aren't correct displayed via jdbc
> ---
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-1051) Casting timestamp as date gives wrong result for dates earlier than 1883

2017-08-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133080#comment-16133080
 ] 

Vitalii Diravka commented on DRILL-1051:


The issue is connected to the JDBC and converting of Joda DateTime instances to 
java.sql Date, Timestamp and Time instances for old dates: 
1. For dates earlier than Gregorian cutover date (depends from timezone, 
usually 1582 year)
https://www.timeanddate.com/calendar/julian-gregorian-switch.html
2. For LMT (Local Mean Time) dates (depends from timezone, usually earlier than 
1883 year)
https://www.timeanddate.com/time/change/usa/los-angeles?year=1883

Old behaviour:
{code}
0: jdbc:drill:zk=local> select cast(to_timestamp('1581-12-01 23:32:01', 
'-MM-dd HH:mm:ss') as date) as `DATE`, to_timestamp('1581-12-01 23:32:01', 
'-MM-dd HH:mm:ss') as `TIMESTAMP`, cast(to_timestamp('1581-12-01 23:32:01', 
'-MM-dd HH:mm:ss') as time) as `TIME`, timeofday() from (VALUES(1));
+-++---+--+
|DATE |   TIMESTAMP|   TIME|EXPR$3  
  |
+-++---+--+
| 1581-11-20  | 1581-11-21 23:24:59.0  | 23:32:01  | 2017-08-18 06:10:36.542 
America/Los_Angeles  |
+-++---+--+

0: jdbc:drill:zk=local> select cast(to_timestamp('1883-11-16 01:32:01', 
'-MM-dd HH:mm:ss') as date) as `DATE`, to_timestamp('1883-11-16 01:32:01', 
'-MM-dd HH:mm:ss') as `TIMESTAMP`, cast(to_timestamp('1883-11-16 01:32:01', 
'-MM-dd HH:mm:ss') as time) as `TIME`, timeofday() from (VALUES(1));
+-++---+--+
|DATE |   TIMESTAMP|   TIME|EXPR$3  
  |
+-++---+--+
| 1883-11-15  | 1883-11-16 01:24:59.0  | 01:32:01  | 2017-08-18 06:04:33.512 
America/Los_Angeles  |
+-++---+--+
1 row selected (0.249 seconds)

{code}

After fix:
{code}
0: jdbc:drill:zk=local> select cast(to_timestamp('1581-12-01 23:32:01', 
'-MM-dd HH:mm:ss') as date) as `DATE`, to_timestamp('1581-12-01 23:32:01', 
'-MM-dd HH:mm:ss') as `TIMESTAMP`, cast(to_timestamp('1581-12-01 23:32:01', 
'-MM-dd HH:mm:ss') as time) as `TIME`, timeofday() from (VALUES(1));
+-++---+--+
|DATE |   TIMESTAMP|   TIME|EXPR$3  
  |
+-++---+--+
| 1581-12-01  | 1581-12-01 23:32:01.0  | 23:32:01  | 2017-08-18 06:12:30.837 
America/Los_Angeles  |
+-++---+--+

0: jdbc:drill:zk=local> select cast(to_timestamp('1883-11-16 01:32:01', 
'-MM-dd HH:mm:ss') as date) as `DATE`, to_timestamp('1883-11-16 01:32:01', 
'-MM-dd HH:mm:ss') as `TIMESTAMP`, cast(to_timestamp('1883-11-16 01:32:01', 
'-MM-dd HH:mm:ss') as time) as `TIME`, timeofday() from (VALUES(1));
+-++---+--+
|DATE |   TIMESTAMP|   TIME|EXPR$3  
  |
+-++---+--+
| 1883-11-16  | 1883-11-16 01:32:01.0  | 01:32:01  | 2017-08-18 06:08:59.944 
America/Los_Angeles  |
+-++---+--+
{code}

> Casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 

[jira] [Commented] (DRILL-1051) Casting timestamp as date gives wrong result for dates earlier than 1883

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133042#comment-16133042
 ] 

ASF GitHub Bot commented on DRILL-1051:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/915

DRILL-1051: Casting timestamp as date gives wrong result for dates ea…

…rlier than 1883

- Fix DateAccessor's, TimestampAccessor's and TimeAccessor's converting 
joda time to java.sql

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-1051

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #915


commit b3cee2294b04e35e75e7b471df62a9233483c183
Author: Vitalii Diravka 
Date:   2017-08-15T17:51:10Z

DRILL-1051: Casting timestamp as date gives wrong result for dates earlier 
than 1883

- Fix DateAccessor's, TimestampAccessor's and TimeAccessor's converting 
joda time to java.sql




> Casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5377) Five-digit year dates aren't correct displayed via jdbc

2017-08-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5377:
---
Summary: Five-digit year dates aren't correct displayed via jdbc  (was: 
Five-digit year dates aren't correct displayed by jdbc)

> Five-digit year dates aren't correct displayed via jdbc
> ---
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DRILL-5377) Five-digit year dates aren't correct displayed by jdbc

2017-08-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945457#comment-15945457
 ] 

Vitalii Diravka edited comment on DRILL-5377 at 8/18/17 1:49 PM:
-

Dates are displayed incorrectly only by jdbc, when year over than . In the 
drill's webui I've got the following result: 
{code}
l_shipdate
15356-03-19T00:00:00.000-07:00
15356-03-21T00:00:00.000-07:00
15356-03-21T00:00:00.000-07:00
15356-03-23T00:00:00.000-07:00
15356-03-24T00:00:00.000-07:00
15356-03-24T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
{code}
It looks like an expected result. 


was (Author: vitalii):
sqlline usually cuts a years longer than . In the drill's webui I've got 
the following result: 
{code}
l_shipdate
15356-03-19T00:00:00.000-07:00
15356-03-21T00:00:00.000-07:00
15356-03-21T00:00:00.000-07:00
15356-03-23T00:00:00.000-07:00
15356-03-24T00:00:00.000-07:00
15356-03-24T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
15356-03-26T00:00:00.000-07:00
{code}
It looks like an expected result. 

> Five-digit year dates aren't correct displayed by jdbc
> --
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5377) Five-digit year dates aren't correct displayed by jdbc

2017-08-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5377:
---
Summary: Five-digit year dates aren't correct displayed by jdbc  (was: 
Drill returns weird characters when parquet date auto-correction is turned off)

> Five-digit year dates aren't correct displayed by jdbc
> --
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-1051) Casting timestamp as date gives wrong result for dates earlier than 1883

2017-08-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-1051:
---
Summary: Casting timestamp as date gives wrong result for dates earlier 
than 1883  (was: casting timestamp as date gives wrong result for dates earlier 
than 1883)

> Casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-1051) casting timestamp as date gives wrong result for dates earlier than 1883

2017-08-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-1051:
---
Summary: casting timestamp as date gives wrong result for dates earlier 
than 1883  (was: casting timestamp as date gives wrong result for dates earlier 
than 1797)

> casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)