[jira] [Updated] (DRILL-5736) SYSTEM ERROR: NullPointerException when selecting 88 parquet files(132MB)

2017-08-21 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated DRILL-5736:
-
Summary: SYSTEM ERROR: NullPointerException  when selecting 88 parquet 
files(132MB)  (was: SYSTEM ERROR NullPointerException  when selecting 88 
parquet files(132MB))

> SYSTEM ERROR: NullPointerException  when selecting 88 parquet files(132MB)
> --
>
> Key: DRILL-5736
> URL: https://issues.apache.org/jira/browse/DRILL-5736
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
> Environment: java version "1.8.0_40"
> Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
> Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
> System Software Overview:
>   System Version: OS X 10.11.4 (15E65)
>   Kernel Version: Darwin 15.4.0
>   Boot Volume: Macintosh HD
>   Boot Mode: Normal
>   Computer Name: smallworld的MacBook Pro
>   User Name: smallworld (smallworld)
>   Secure Virtual Memory: Enabled
>   System Integrity Protection: Enabled
>   Time since boot: 1:38
>Reporter: Chris
>  Labels: multiple, nullpointerexception, parquet
> Attachments: error_log20170822.txt, report1and2_error.png, 
> report1and2_struct.png, report1_ok.png, report2_ok.png
>
>
> *1. Parquet file structure to be queried.* There're three directories 
> /reports1, /reports2, and /reports1and2, as png files attached. 
> 
> /reports1(36 subdirectories and 36 parquet files 
> included.)
> |__report1/0_0_0.parquet(1.5MB)
> |__report2/0_0_0.parquet(1.5MB)
> ...
> |__report36/0_0_0.parquet(1.5MB)
> /reports2(50 subdirectories and 50 parquet files 
> included.)
> |__report37/0_0_0.parquet(1.5MB)
> |__report38/0_0_0.parquet(1.5MB)
> ...
> |__report88/0_0_0.parquet(1.5MB)
> /reports1and2  88 subdirectories and 88 parquet files 
> merged above.
> |__report1/0_0_0.parquet(1.5MB)
> |__report2/0_0_0.parquet(1.5MB)
> ...
> |__report88/0_0_0.parquet(1.5MB)
> *2. Error when doing SELECT query.*
> *2.1 SELECT reports1, it is ok.*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1/report_HCB*`;
> 61,994 rows selected (0.744 seconds)
> *2.2 SELECT reports2, it is ok.*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports2/report_HCB*`;
> 85,452 rows selected (0.743 seconds)
> *{color:red}2.3 SELECT reports1and2, errors occur.{color}*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1and2/report_HCB*`;
> {color:red}Error: SYSTEM ERROR: NullPointerException
> Fragment 1:1
> [Error Id: 54595882-3767-4b0a-91c4-671b16b86fdf on 192.168.0.13:31010] 
> (state=,code=0){color}
> *3. Error log*
> See attached error_log20170822.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5736) SYSTEM ERROR: NullPointerException when selecting 88 subdirectories including 88 parquet files(132MB)

2017-08-21 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated DRILL-5736:
-
Summary: SYSTEM ERROR: NullPointerException  when selecting 88 
subdirectories including 88 parquet files(132MB)  (was: SYSTEM ERROR: 
NullPointerException  when selecting 88 parquet files(132MB))

> SYSTEM ERROR: NullPointerException  when selecting 88 subdirectories 
> including 88 parquet files(132MB)
> --
>
> Key: DRILL-5736
> URL: https://issues.apache.org/jira/browse/DRILL-5736
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
> Environment: java version "1.8.0_40"
> Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
> Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
> System Software Overview:
>   System Version: OS X 10.11.4 (15E65)
>   Kernel Version: Darwin 15.4.0
>   Boot Volume: Macintosh HD
>   Boot Mode: Normal
>   Computer Name: smallworld的MacBook Pro
>   User Name: smallworld (smallworld)
>   Secure Virtual Memory: Enabled
>   System Integrity Protection: Enabled
>   Time since boot: 1:38
>Reporter: Chris
>  Labels: multiple, nullpointerexception, parquet
> Attachments: error_log20170822.txt, report1and2_error.png, 
> report1and2_struct.png, report1_ok.png, report2_ok.png
>
>
> *1. Parquet file structure to be queried.* There're three directories 
> /reports1, /reports2, and /reports1and2, as png files attached. 
> 
> /reports1(36 subdirectories and 36 parquet files 
> included.)
> |__report1/0_0_0.parquet(1.5MB)
> |__report2/0_0_0.parquet(1.5MB)
> ...
> |__report36/0_0_0.parquet(1.5MB)
> /reports2(50 subdirectories and 50 parquet files 
> included.)
> |__report37/0_0_0.parquet(1.5MB)
> |__report38/0_0_0.parquet(1.5MB)
> ...
> |__report88/0_0_0.parquet(1.5MB)
> /reports1and2  88 subdirectories and 88 parquet files 
> merged above.
> |__report1/0_0_0.parquet(1.5MB)
> |__report2/0_0_0.parquet(1.5MB)
> ...
> |__report88/0_0_0.parquet(1.5MB)
> *2. Error when doing SELECT query.*
> *2.1 SELECT reports1, it is ok.*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1/report_HCB*`;
> 61,994 rows selected (0.744 seconds)
> *2.2 SELECT reports2, it is ok.*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports2/report_HCB*`;
> 85,452 rows selected (0.743 seconds)
> *{color:red}2.3 SELECT reports1and2, errors occur.{color}*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1and2/report_HCB*`;
> {color:red}Error: SYSTEM ERROR: NullPointerException
> Fragment 1:1
> [Error Id: 54595882-3767-4b0a-91c4-671b16b86fdf on 192.168.0.13:31010] 
> (state=,code=0){color}
> *3. Error log*
> See attached error_log20170822.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5736) SYSTEM ERROR NullPointerException when selecting 88 parquet files(132MB)

2017-08-21 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated DRILL-5736:
-
Description: 
*1. Parquet file structure to be queried.* There're three directories 
/reports1, /reports2, and /reports1and2, as png files attached. 

/reports1(36 subdirectories and 36 parquet files 
included.)
|__report1/0_0_0.parquet(1.5MB)
|__report2/0_0_0.parquet(1.5MB)
...
|__report36/0_0_0.parquet(1.5MB)


/reports2(50 subdirectories and 50 parquet files 
included.)
|__report37/0_0_0.parquet(1.5MB)
|__report38/0_0_0.parquet(1.5MB)
...
|__report88/0_0_0.parquet(1.5MB)


/reports1and2  88 subdirectories and 88 parquet files 
merged above.
|__report1/0_0_0.parquet(1.5MB)
|__report2/0_0_0.parquet(1.5MB)
...
|__report88/0_0_0.parquet(1.5MB)

*2. Error when doing SELECT query.*
*2.1 SELECT reports1, it is ok.*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1/report_HCB*`;
61,994 rows selected (0.744 seconds)

*2.2 SELECT reports2, it is ok.*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports2/report_HCB*`;
85,452 rows selected (0.743 seconds)

*{color:red}2.3 SELECT reports1and2, errors occur.{color}*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1and2/report_HCB*`;
{color:red}Error: SYSTEM ERROR: NullPointerException

Fragment 1:1

[Error Id: 54595882-3767-4b0a-91c4-671b16b86fdf on 192.168.0.13:31010] 
(state=,code=0){color}


*3. Error log*

See attached error_log20170822.txt

  was:
*1. Parquet file structure to be queried.* There're three directories 
/reports1, /reports2, and /reports1and2. 

/reports1(36 subdirectories and 36 parquet files 
included.)
|__report1/0_0_0.parquet(1.5MB)
|__report2/0_0_0.parquet(1.5MB)
...
|__report36/0_0_0.parquet(1.5MB)


/reports2(50 subdirectories and 50 parquet files 
included.)
|__report37/0_0_0.parquet(1.5MB)
|__report38/0_0_0.parquet(1.5MB)
...
|__report88/0_0_0.parquet(1.5MB)


/reports1and2  88 subdirectories and 88 parquet files 
merged above.
|__report1/0_0_0.parquet(1.5MB)
|__report2/0_0_0.parquet(1.5MB)
...
|__report88/0_0_0.parquet(1.5MB)

*2. Error when doing SELECT query.*
*2.1 SELECT reports1, it is ok.*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1/report_HCB*`;
61,994 rows selected (0.744 seconds)

*2.2 SELECT reports2, it is ok.*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports2/report_HCB*`;
85,452 rows selected (0.743 seconds)

*{color:red}2.3 SELECT reports1and2, errors occur.{color}*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1and2/report_HCB*`;
{color:red}Error: SYSTEM ERROR: NullPointerException

Fragment 1:1

[Error Id: 54595882-3767-4b0a-91c4-671b16b86fdf on 192.168.0.13:31010] 
(state=,code=0){color}


*3. Error log*

2017-08-22 13:03:43,435 [266444d0-30ca-5bf6-5b85-4d147fcc9379:frag:1:1] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException

Fragment 1:1

[Error Id: a9e14b40-5ccb-424d-9c9b-fbc3cd397fe7 on 192.168.0.13:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException

Fragment 1:1

[Error Id: a9e14b40-5ccb-424d-9c9b-fbc3cd397fe7 on 192.168.0.13:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0.jar:1.11.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
of [simple type, class org.apache.drill.exec.store.parquet.ParquetRowGroupScan] 
value failed (java.lang.NullPointerException): null
 at [Source: {
  "pop" : "single-sender",
  "@id" : 0,
  "receiver-major-fragment" : 0,
  "receiver-minor-fragment" : 0,
  "child" : {
"pop" : 

[jira] [Updated] (DRILL-5736) SYSTEM ERROR NullPointerException when selecting 88 parquet files(132MB)

2017-08-21 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated DRILL-5736:
-
Labels: multiple nullpointerexception parquet  (was: )

> SYSTEM ERROR NullPointerException  when selecting 88 parquet files(132MB)
> -
>
> Key: DRILL-5736
> URL: https://issues.apache.org/jira/browse/DRILL-5736
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
> Environment: java version "1.8.0_40"
> Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
> Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
> System Software Overview:
>   System Version: OS X 10.11.4 (15E65)
>   Kernel Version: Darwin 15.4.0
>   Boot Volume: Macintosh HD
>   Boot Mode: Normal
>   Computer Name: smallworld的MacBook Pro
>   User Name: smallworld (smallworld)
>   Secure Virtual Memory: Enabled
>   System Integrity Protection: Enabled
>   Time since boot: 1:38
>Reporter: Chris
>  Labels: multiple, nullpointerexception, parquet
>
> *1. Parquet file structure to be queried.* There're three directories 
> /reports1, /reports2, and /reports1and2. 
> 
> /reports1(36 subdirectories and 36 parquet files 
> included.)
> |__report1/0_0_0.parquet(1.5MB)
> |__report2/0_0_0.parquet(1.5MB)
> ...
> |__report36/0_0_0.parquet(1.5MB)
> /reports2(50 subdirectories and 50 parquet files 
> included.)
> |__report37/0_0_0.parquet(1.5MB)
> |__report38/0_0_0.parquet(1.5MB)
> ...
> |__report88/0_0_0.parquet(1.5MB)
> /reports1and2  88 subdirectories and 88 parquet files 
> merged above.
> |__report1/0_0_0.parquet(1.5MB)
> |__report2/0_0_0.parquet(1.5MB)
> ...
> |__report88/0_0_0.parquet(1.5MB)
> *2. Error when doing SELECT query.*
> *2.1 SELECT reports1, it is ok.*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1/report_HCB*`;
> 61,994 rows selected (0.744 seconds)
> *2.2 SELECT reports2, it is ok.*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports2/report_HCB*`;
> 85,452 rows selected (0.743 seconds)
> *{color:red}2.3 SELECT reports1and2, errors occur.{color}*
> 0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1and2/report_HCB*`;
> {color:red}Error: SYSTEM ERROR: NullPointerException
> Fragment 1:1
> [Error Id: 54595882-3767-4b0a-91c4-671b16b86fdf on 192.168.0.13:31010] 
> (state=,code=0){color}
> *3. Error log*
> 2017-08-22 13:03:43,435 [266444d0-30ca-5bf6-5b85-4d147fcc9379:frag:1:1] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 1:1
> [Error Id: a9e14b40-5ccb-424d-9c9b-fbc3cd397fe7 on 192.168.0.13:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 1:1
> [Error Id: a9e14b40-5ccb-424d-9c9b-fbc3cd397fe7 on 192.168.0.13:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.11.0.jar:1.11.0]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
>  [drill-java-exec-1.11.0.jar:1.11.0]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.11.0.jar:1.11.0]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>  [drill-java-exec-1.11.0.jar:1.11.0]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0.jar:1.11.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_40]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_40]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
> of [simple type, class 
> org.apache.drill.exec.store.parquet.ParquetRowGroupScan] value failed 
> (java.lang.NullPointerException): null
>  at [Source: {
>   "pop" : "single-sender",
>   "@id" : 0,
>   "receiver-major-fragment" : 0,
>   "receiver-minor-fragment" : 0,
>   "child" : {
> "pop" : "parquet-row-group-scan",
> "@id" : 1,
> "userName" : "smallworld",
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : "/Users/sw/gdb_v1",
>   "writable" : true,
>   "defaultInputFormat" : null
> },
>   

[jira] [Created] (DRILL-5736) SYSTEM ERROR NullPointerException when selecting 88 parquet files(132MB)

2017-08-21 Thread Chris (JIRA)
Chris created DRILL-5736:


 Summary: SYSTEM ERROR NullPointerException  when selecting 88 
parquet files(132MB)
 Key: DRILL-5736
 URL: https://issues.apache.org/jira/browse/DRILL-5736
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.11.0
 Environment: java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

System Software Overview:

  System Version: OS X 10.11.4 (15E65)
  Kernel Version: Darwin 15.4.0
  Boot Volume: Macintosh HD
  Boot Mode: Normal
  Computer Name: smallworld的MacBook Pro
  User Name: smallworld (smallworld)
  Secure Virtual Memory: Enabled
  System Integrity Protection: Enabled
  Time since boot: 1:38
Reporter: Chris


*1. Parquet file structure to be queried.* There're three directories 
/reports1, /reports2, and /reports1and2. 
/reports136 subdirectories and 36 parquet files 
included.
|__report1
||__0_0_0.parquet(1.5MB)
|__report2
 |__0_0_0.parquet(1.5MB)
...
|__report36
 |__0_0_0.parquet(1.5MB)


/reports250 subdirectories and 50 parquet files 
included.
|__report37
||__0_0_0.parquet(1.5MB)
|__report38
 |__0_0_0.parquet(1.5MB)
...
|__report88
 |__0_0_0.parquet(1.5MB)


/reports1and2  88 subdirectories and 88 parquet files 
merged above.
|__report1
||__0_0_0.parquet(1.5MB)
|__report2
 |__0_0_0.parquet(1.5MB)
...
|__report88
 |__0_0_0.parquet(1.5MB)

*2. Error when doing SELECT query.*
*2.1 SELECT reports1, it is ok.*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1/report_HCB*`;
61,994 rows selected (0.744 seconds)

*2.2 SELECT reports2, it is ok.*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports2/report_HCB*`;
85,452 rows selected (0.743 seconds)

*{color:red}2.3 SELECT reports1and2, errors occur.{color}*
0: jdbc:drill:zk=local> SELECT snp_id from db.`reports1and2/report_HCB*`;
{color:red}Error: SYSTEM ERROR: NullPointerException

Fragment 1:1

[Error Id: 54595882-3767-4b0a-91c4-671b16b86fdf on 192.168.0.13:31010] 
(state=,code=0){color}


*3. Error log*

2017-08-22 13:03:43,435 [266444d0-30ca-5bf6-5b85-4d147fcc9379:frag:1:1] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException

Fragment 1:1

[Error Id: a9e14b40-5ccb-424d-9c9b-fbc3cd397fe7 on 192.168.0.13:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException

Fragment 1:1

[Error Id: a9e14b40-5ccb-424d-9c9b-fbc3cd397fe7 on 192.168.0.13:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0.jar:1.11.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
of [simple type, class org.apache.drill.exec.store.parquet.ParquetRowGroupScan] 
value failed (java.lang.NullPointerException): null
 at [Source: {
  "pop" : "single-sender",
  "@id" : 0,
  "receiver-major-fragment" : 0,
  "receiver-minor-fragment" : 0,
  "child" : {
"pop" : "parquet-row-group-scan",
"@id" : 1,
"userName" : "smallworld",
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "config" : null,
  "workspaces" : {
"root" : {
  "location" : "/Users/sw/gdb_v1",
  "writable" : true,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : "/tmp",
  "writable" : true,
  "defaultInputFormat" : null
},
"test" : {
  "location" : "/Users/sw/gdb_v1/test",
  "writable" : true,
  "defaultInputFormat" : null
},
"db" : {
  "location" : "/Users/sw/gdb_v1/db",
  "writable" : true,
  "defaultInputFormat" : null
},
"tpr" : {
  

[jira] [Commented] (DRILL-5733) Unable to SELECT from parquet file with Hadoop 2.7.4

2017-08-21 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135947#comment-16135947
 ] 

Kunal Khatua commented on DRILL-5733:
-

Thanks, [~lammic] .

[~arina], can you have someone take a quick look at this? 
It looks like HDFS-10673 ( 
[Commit|https://github.com/apache/hadoop/commit/a39a9fc46bb8536b68b91b41c2a0293c27683828]
 ) affected this. 

> Unable to SELECT from parquet file with Hadoop 2.7.4
> 
>
> Key: DRILL-5733
> URL: https://issues.apache.org/jira/browse/DRILL-5733
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Michele Lamarca
>Assignee: Arina Ielchiieva
>
> {{SELECT * FROM hdfs.`/user/drill/nation.parquet`;}} fails with Hadoop 2.7.4 
> with {noformat}
> 1/2  SELECT * FROM hdfs.`/user/drill/nation.parquet`;
> Error: SYSTEM ERROR: RemoteException: /user/drill/nation.parquet (is
> not a directory)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:272)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:215)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:199)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1752)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:100)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3820)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:855)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)
> {noformat}
> Query correctly executes with Hadoop 2.7.3, while it fails with:
> - Hadoop 2.7.4 with Drill 1.11 (default pom.xml)
> - Hadoop 2.7.4 with Drill 1.11 (with -Dhadoop.version=2.7.4)
> - Hadoop 2.8.0 with Drill 1.11 (default pom.xml)
> - Hadoop 3.0.0-alpha4 with Drill 1.11 (default pom.xml)
> thus looking related to https://issues.apache.org/jira/browse/HDFS-10673
> Temporary workaround consists in querying on an enclosing directory, as 
> suggested by [~kkhatua] on drill-user mailinglist.
> Relevant stacktrace from drillbit log
> {noformat}
> 2017-08-19 09:00:45,570 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 26681de9-2b48-2c3a-cc7c-2c7ceeb1beae: SELECT * FROM 
> hdfs.`/user/drill/nation.parquet`
> 2017-08-19 09:00:45,571 [UserServer-1] WARN  
> o.a.drill.exec.rpc.user.UserServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 7137ms.
> 2017-08-19 09:00:45,617 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 7 classes for 
> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
> 2017-08-19 09:00:45,618 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,619 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,619 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,648 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 7 classes for 
> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
> 2017-08-19 09:00:45,649 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> 

[jira] [Assigned] (DRILL-5733) Unable to SELECT from parquet file with Hadoop 2.7.4

2017-08-21 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-5733:
---

Assignee: Arina Ielchiieva

> Unable to SELECT from parquet file with Hadoop 2.7.4
> 
>
> Key: DRILL-5733
> URL: https://issues.apache.org/jira/browse/DRILL-5733
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Michele Lamarca
>Assignee: Arina Ielchiieva
>
> {{SELECT * FROM hdfs.`/user/drill/nation.parquet`;}} fails with Hadoop 2.7.4 
> with {noformat}
> 1/2  SELECT * FROM hdfs.`/user/drill/nation.parquet`;
> Error: SYSTEM ERROR: RemoteException: /user/drill/nation.parquet (is
> not a directory)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:272)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:215)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:199)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1752)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:100)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3820)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:855)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)
> {noformat}
> Query correctly executes with Hadoop 2.7.3, while it fails with:
> - Hadoop 2.7.4 with Drill 1.11 (default pom.xml)
> - Hadoop 2.7.4 with Drill 1.11 (with -Dhadoop.version=2.7.4)
> - Hadoop 2.8.0 with Drill 1.11 (default pom.xml)
> - Hadoop 3.0.0-alpha4 with Drill 1.11 (default pom.xml)
> thus looking related to https://issues.apache.org/jira/browse/HDFS-10673
> Temporary workaround consists in querying on an enclosing directory, as 
> suggested by [~kkhatua] on drill-user mailinglist.
> Relevant stacktrace from drillbit log
> {noformat}
> 2017-08-19 09:00:45,570 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 26681de9-2b48-2c3a-cc7c-2c7ceeb1beae: SELECT * FROM 
> hdfs.`/user/drill/nation.parquet`
> 2017-08-19 09:00:45,571 [UserServer-1] WARN  
> o.a.drill.exec.rpc.user.UserServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 7137ms.
> 2017-08-19 09:00:45,617 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 7 classes for 
> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
> 2017-08-19 09:00:45,618 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,619 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,619 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,648 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 7 classes for 
> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
> 2017-08-19 09:00:45,649 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> org.apache.drill.common.logical.FormatPluginConfig took 0ms
> 2017-08-19 09:00:45,649 [26681de9-2b48-2c3a-cc7c-2c7ceeb1beae:foreman] INFO  
> o.a.d.c.s.persistence.ScanResult - loading 8 classes for 
> 

[jira] [Commented] (DRILL-5725) Update Jackson version to 2.7.8

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135720#comment-16135720
 ] 

ASF GitHub Bot commented on DRILL-5725:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/908
  
A careful read of the [Maven dependency 
mechanism](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html)
 shows that, in general, we can have conflicts. We would have a conflict if the 
Drill root pom.xml added Jackson 2.7.8, but some other project pulled in an 
earlier (or, eventually, later) version. Since nearest wins, the dependency for 
that project would win -- for that project, and would result in two copies of 
Jackson appearing in Drill's build. We've run into such problems multiple times.

But, since that is not the case here, this change is fine.

+1


> Update Jackson version to 2.7.8
> ---
>
> Key: DRILL-5725
> URL: https://issues.apache.org/jira/browse/DRILL-5725
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>
> Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update 
> Jackson version to 2.7.8.
> All Jackson versions 2.7.x before 2.7.8 have [CVE-2016-7051 
> vulnerability|https://nvd.nist.gov/vuln/detail/CVE-2016-7051]. 
> The problem was with the {{jackson-dataformat-xml}} module 
> ([issue-211|https://github.com/FasterXML/jackson-dataformat-xml/issues/211]). 
> Drill does not use this module yet, but we want to update the version for the 
> case when we start to use this module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135708#comment-16135708
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134298357
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
+  currentReader.allocate(mutator.fieldVectorMap());
+} catch (OutOfMemoryException e) {
+  clearFieldVectorMap();
+  throw UserException.memoryError(e).build(logger);
+}
 
-  container.buildSchema(SelectionVectorMode.NONE);
-  schema = container.getSchema();
+recordCount = currentReader.next();
+Preconditions.checkArgument(recordCount >= 0,
+"recordCount from RecordReader.next() should not be negative");
 
-  return IterOutcome.OK_NEW_SCHEMA;
-}
-return IterOutcome.NONE;
-  }
-  // At this point, the reader that hit its end is not the last 
reader.
+boolean isNewRegularSchema = mutator.isNewSchema();
+// We should skip the reader, when recordCount = 0 && ! 
isNewRegularSchema.
+// Add/set implicit column vectors, only when reader gets > 0 row, 
or
+// when reader gets 0 row but with a schema with new field added
+if (recordCount > 0 || isNewRegularSchema) {
+  addImplicitVectors();
+  populateImplicitVectors();
+}
 
-  // If all the files we have read so far are just empty, the 
schema is not useful
-  if (! hasReadNonEmptyFile) {
-container.clear();
-clearFieldVectorMap();
-mutator.clear();
-  }
+boolean isNewImplicitSchema = mutator.isNewSchema();
+for (VectorWrapper w : container) {
+  w.getValueVector().getMutator().setValueCount(recordCount);
+}
+final boolean isNewSchema = isNewRegularSchema || 
isNewImplicitSchema;
+oContext.getStats().batchReceived(0, recordCount, isNewSchema);
 
+if (recordCount == 0) {
   currentReader.close();
-  currentReader = readers.next();
-  implicitValues = implicitColumns.hasNext() ? 
implicitColumns.next() : null;
-  currentReader.setup(oContext, mutator);
-  try {
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
+  if (isNewSchema) {
+// current reader presents a new schema in mutator even though 
it has 0 row.
+// This could happen when data sources have a non-trivial 
schema with 0 row.
+container.buildSchema(SelectionVectorMode.NONE);
+schema = container.getSchema();
+if (readers.hasNext()) {
--- End diff --

This code is pretty convoluted. Would be 

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135704#comment-16135704
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134297430
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
+  currentReader.allocate(mutator.fieldVectorMap());
+} catch (OutOfMemoryException e) {
+  clearFieldVectorMap();
+  throw UserException.memoryError(e).build(logger);
+}
 
-  container.buildSchema(SelectionVectorMode.NONE);
-  schema = container.getSchema();
+recordCount = currentReader.next();
+Preconditions.checkArgument(recordCount >= 0,
+"recordCount from RecordReader.next() should not be negative");
 
-  return IterOutcome.OK_NEW_SCHEMA;
-}
-return IterOutcome.NONE;
-  }
-  // At this point, the reader that hit its end is not the last 
reader.
+boolean isNewRegularSchema = mutator.isNewSchema();
+// We should skip the reader, when recordCount = 0 && ! 
isNewRegularSchema.
+// Add/set implicit column vectors, only when reader gets > 0 row, 
or
+// when reader gets 0 row but with a schema with new field added
+if (recordCount > 0 || isNewRegularSchema) {
+  addImplicitVectors();
+  populateImplicitVectors();
+}
 
-  // If all the files we have read so far are just empty, the 
schema is not useful
-  if (! hasReadNonEmptyFile) {
-container.clear();
-clearFieldVectorMap();
-mutator.clear();
-  }
+boolean isNewImplicitSchema = mutator.isNewSchema();
--- End diff --

The implicit schema will change only if the new file is in a different 
directory than the previous file. The implicit columns are fixed (`filename`, 
etc.) Only the `dir0`, `dir1` columns can change.

Does Drill allow combining files from different directory levels into a 
single scan? If so, don't we have a trivial schema change problem? If the scan 
decides to scan `a/b/c.csv` before, say, `a/b/d/e.csv`, then we get a trivial 
schema change on the second file when we add the `dir2` column. Better analysis 
up front of the collection of paths will avoid this problem.

If we avoid the `dirx` problem, then the implicit schema is constant for 
all readers (the values of the columns, of course, differs), so the 
`isNewImplicitSchema` logic can be dropped.


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an 

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135702#comment-16135702
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134300698
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
--- End diff --

The code in `ScanBatch` changed significantly -- looks like a very good 
improvement.

However, I could not readily find unit tests that execute all the complex 
new code paths. Can you perhaps point out the tests? Otherwise, as a reviewer, 
I find myself acting as the unit tests; I must "mentally execute" the code 
paths for all scenarios I can imagine. This is slow and will lead to many, many 
comments as I try to think through each and every step. Plus, the situation is 
made harder by the fact that code is duplicated along various execution 
branches.


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135711#comment-16135711
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134296622
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
+  currentReader.allocate(mutator.fieldVectorMap());
+} catch (OutOfMemoryException e) {
+  clearFieldVectorMap();
+  throw UserException.memoryError(e).build(logger);
+}
 
-  container.buildSchema(SelectionVectorMode.NONE);
-  schema = container.getSchema();
+recordCount = currentReader.next();
+Preconditions.checkArgument(recordCount >= 0,
+"recordCount from RecordReader.next() should not be negative");
 
-  return IterOutcome.OK_NEW_SCHEMA;
-}
-return IterOutcome.NONE;
-  }
-  // At this point, the reader that hit its end is not the last 
reader.
+boolean isNewRegularSchema = mutator.isNewSchema();
--- End diff --

`isNewRegularSchema` --> `isNewTableSchema`? This describes the table 
portion of the schema, as contrasted with the implicit part mentioned below.


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135709#comment-16135709
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134287754
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/Project.java 
---
@@ -46,14 +56,18 @@ public Project(@JsonProperty("exprs") 
List exprs, @JsonProperty
 return exprs;
   }
 
+  public boolean isOutputProj() {
--- End diff --

Maybe a comment to explain the purpose of this attribute? To quote from the 
PR description:

> Add a new flag 'outputProj' to Project operator, to indicate if Project 
is for the query's final output. Such Project is added by TopProjectVisitor, to 
handle fast NONE when all the inputs to the query are empty
and are skipped.


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135712#comment-16135712
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134298612
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
+  currentReader.allocate(mutator.fieldVectorMap());
+} catch (OutOfMemoryException e) {
+  clearFieldVectorMap();
+  throw UserException.memoryError(e).build(logger);
+}
 
-  container.buildSchema(SelectionVectorMode.NONE);
-  schema = container.getSchema();
+recordCount = currentReader.next();
+Preconditions.checkArgument(recordCount >= 0,
+"recordCount from RecordReader.next() should not be negative");
 
-  return IterOutcome.OK_NEW_SCHEMA;
-}
-return IterOutcome.NONE;
-  }
-  // At this point, the reader that hit its end is not the last 
reader.
+boolean isNewRegularSchema = mutator.isNewSchema();
+// We should skip the reader, when recordCount = 0 && ! 
isNewRegularSchema.
+// Add/set implicit column vectors, only when reader gets > 0 row, 
or
+// when reader gets 0 row but with a schema with new field added
+if (recordCount > 0 || isNewRegularSchema) {
+  addImplicitVectors();
+  populateImplicitVectors();
+}
 
-  // If all the files we have read so far are just empty, the 
schema is not useful
-  if (! hasReadNonEmptyFile) {
-container.clear();
-clearFieldVectorMap();
-mutator.clear();
-  }
+boolean isNewImplicitSchema = mutator.isNewSchema();
+for (VectorWrapper w : container) {
+  w.getValueVector().getMutator().setValueCount(recordCount);
+}
+final boolean isNewSchema = isNewRegularSchema || 
isNewImplicitSchema;
+oContext.getStats().batchReceived(0, recordCount, isNewSchema);
 
+if (recordCount == 0) {
   currentReader.close();
-  currentReader = readers.next();
-  implicitValues = implicitColumns.hasNext() ? 
implicitColumns.next() : null;
-  currentReader.setup(oContext, mutator);
-  try {
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
+  if (isNewSchema) {
+// current reader presents a new schema in mutator even though 
it has 0 row.
+// This could happen when data sources have a non-trivial 
schema with 0 row.
+container.buildSchema(SelectionVectorMode.NONE);
+schema = container.getSchema();
+if (readers.hasNext()) {
+  advanceNextReader();
+} else {
+   

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135710#comment-16135710
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134305681
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
--- End diff --

From the PR description:

> 1. Skip RecordReader if it returns 0 row && present same schema. A new 
schema (by calling Mutator.isNewSchema() ) means either a new top level field 
is added, or a field in a nested field is added, or an existing field type is 
changed.

The code, however, adds an additional condition: if implicit fields change. 
(But, as noted below, that should never occur in practice.)

What happens on the first reader? There is no schema, so *any* schema is a 
new schema. Suppose the file is JSON and the schema is built on the fly. Does 
the code handle the case that we have no schema (first reader), and that reader 
adds no columns?

Or, according to the logic that the downstream wants to know the schema, 
even if there are no records, do we send an empty schema (schema with no 
columns) downstream, because that is an accurate representation of an empty 
JSON file?

What happens in the case of an empty JSON file following a non-empty file? 
In this case, do we consider the empty schema as a schema change relative to a 
non-empty change?

In short, can we generalize this first rule a bit?

> 2. Implicit columns are added and populated only when the input is not 
empty, i.e. the batch contains > 0 row or rowCount == 0 && new schema.

How does this interact with a scan batch that has only one file, and that 
file is empty? Would we return the empty schema downstream? With the implicit 
columns?

> 3. ScanBatch will return NONE directly (called as "fast NONE"), if all 
its RecordReaders haver empty input and thus are skipped, in stead of returing 
OK_NEW_SCHEMA first.


This is just a bit ambiguous. If the reader is JSON, then an empty file has 
an empty schema (for reasons cited above.)

But, if the input is CSV, then we *always* have a schema. If the file has 
column headers, then we know that the schema is, say, (a, b, c) because those 
are the headers. Or, if the file has no headers, the schema is always the 
`columns` array. So, should we send that schema downstream? If so, should it 
include the implicit columns?

This, in fact, raises another issue (out of scope for this PR): if we 
return an empty batch with non-empty schema, we have no place to attach the 
implicit columns that will allow the user to figure out that, say, "foo.csv" is 
empty.

On the other hand, if we say that an empty CSV file has no schema, then we 
can skip that file. The same might be true of JSON. What about Parquet? We'd 
have a schema even if there are no rows. Same with JDBC. Should we return this 
schema, even if the data is empty?

Finally, do we need special handling for "null" files: a file that no 
longer exists on disk and so has a completely undefined 

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135707#comment-16135707
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134301372
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
+  currentReader.allocate(mutator.fieldVectorMap());
+} catch (OutOfMemoryException e) {
+  clearFieldVectorMap();
+  throw UserException.memoryError(e).build(logger);
+}
 
-  container.buildSchema(SelectionVectorMode.NONE);
-  schema = container.getSchema();
+recordCount = currentReader.next();
+Preconditions.checkArgument(recordCount >= 0,
+"recordCount from RecordReader.next() should not be negative");
 
-  return IterOutcome.OK_NEW_SCHEMA;
-}
-return IterOutcome.NONE;
-  }
-  // At this point, the reader that hit its end is not the last 
reader.
+boolean isNewRegularSchema = mutator.isNewSchema();
+// We should skip the reader, when recordCount = 0 && ! 
isNewRegularSchema.
+// Add/set implicit column vectors, only when reader gets > 0 row, 
or
+// when reader gets 0 row but with a schema with new field added
+if (recordCount > 0 || isNewRegularSchema) {
+  addImplicitVectors();
+  populateImplicitVectors();
+}
 
-  // If all the files we have read so far are just empty, the 
schema is not useful
-  if (! hasReadNonEmptyFile) {
-container.clear();
-clearFieldVectorMap();
-mutator.clear();
-  }
+boolean isNewImplicitSchema = mutator.isNewSchema();
+for (VectorWrapper w : container) {
+  w.getValueVector().getMutator().setValueCount(recordCount);
+}
+final boolean isNewSchema = isNewRegularSchema || 
isNewImplicitSchema;
+oContext.getStats().batchReceived(0, recordCount, isNewSchema);
 
+if (recordCount == 0) {
   currentReader.close();
-  currentReader = readers.next();
-  implicitValues = implicitColumns.hasNext() ? 
implicitColumns.next() : null;
-  currentReader.setup(oContext, mutator);
-  try {
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
+  if (isNewSchema) {
+// current reader presents a new schema in mutator even though 
it has 0 row.
--- End diff --

Thanks for the comments here and in the PR. I wonder, can the PR 
description be moved into a class Javadoc comment so that it is available for 
future readers? Also, makes it easier to review since the description is close 
to the code.


> Schema change problems caused by empty batch
> 

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135703#comment-16135703
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134297885
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
+  currentReader.allocate(mutator.fieldVectorMap());
+} catch (OutOfMemoryException e) {
+  clearFieldVectorMap();
+  throw UserException.memoryError(e).build(logger);
+}
 
-  container.buildSchema(SelectionVectorMode.NONE);
-  schema = container.getSchema();
+recordCount = currentReader.next();
+Preconditions.checkArgument(recordCount >= 0,
+"recordCount from RecordReader.next() should not be negative");
 
-  return IterOutcome.OK_NEW_SCHEMA;
-}
-return IterOutcome.NONE;
-  }
-  // At this point, the reader that hit its end is not the last 
reader.
+boolean isNewRegularSchema = mutator.isNewSchema();
+// We should skip the reader, when recordCount = 0 && ! 
isNewRegularSchema.
+// Add/set implicit column vectors, only when reader gets > 0 row, 
or
+// when reader gets 0 row but with a schema with new field added
+if (recordCount > 0 || isNewRegularSchema) {
+  addImplicitVectors();
+  populateImplicitVectors();
+}
 
-  // If all the files we have read so far are just empty, the 
schema is not useful
-  if (! hasReadNonEmptyFile) {
-container.clear();
-clearFieldVectorMap();
-mutator.clear();
-  }
+boolean isNewImplicitSchema = mutator.isNewSchema();
+for (VectorWrapper w : container) {
+  w.getValueVector().getMutator().setValueCount(recordCount);
+}
+final boolean isNewSchema = isNewRegularSchema || 
isNewImplicitSchema;
+oContext.getStats().batchReceived(0, recordCount, isNewSchema);
 
+if (recordCount == 0) {
   currentReader.close();
-  currentReader = readers.next();
-  implicitValues = implicitColumns.hasNext() ? 
implicitColumns.next() : null;
-  currentReader.setup(oContext, mutator);
-  try {
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
+  if (isNewSchema) {
+// current reader presents a new schema in mutator even though 
it has 0 row.
+// This could happen when data sources have a non-trivial 
schema with 0 row.
+container.buildSchema(SelectionVectorMode.NONE);
+schema = container.getSchema();
+if (readers.hasNext()) {
+  advanceNextReader();
+} else {
+   

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135706#comment-16135706
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134299504
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -329,9 +326,11 @@ public TypedFieldId getValueVectorId(SchemaPath path) {
 
   @VisibleForTesting
   public static class Mutator implements OutputMutator {
-/** Whether schema has changed since last inquiry (via #isNewSchema}). 
 Is
- *  true before first inquiry. */
-private boolean schemaChanged = true;
+/** Flag keeping track whether top-level schema has changed since last 
inquiry (via #isNewSchema}).
+ * It's initialized to false, or reset to false after #isNewSchema or 
after #clear, until a new value vector
+ * or a value vector with different type is added to fieldVectorMap.
+ **/
+private boolean schemaChanged;
--- End diff --

Using a flag is very messy. The new version uses a counter. An observer 
simply remembers the previous count and compares it against the current count. 
Allows multiple observers without negotiating over which is responsible for 
resetting the flag.


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135705#comment-16135705
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134299304
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -252,14 +235,28 @@ public IterOutcome next() {
 }
   }
 
+  private void releaseAssets() {
+container.zeroVectors();
+  }
+
+  private void clearFieldVectorMap() {
+for (final ValueVector v : mutator.fieldVectorMap().values()) {
+  v.clear();
+}
+  }
+
+  private void advanceNextReader() throws ExecutionSetupException {
+currentReader = readers.next();
+implicitValues = implicitColumns.hasNext() ? implicitColumns.next() : 
null;
+currentReader.setup(oContext, mutator);
--- End diff --

This seems somewhat unreliable. In the `next()` method above, we have, say, 
a reader with a new schema that returned 0 rows. We want to return the 
container, with that new schema, downstream.

Before we do, we set up the next reader, passing it the mutator. Suppose 
the reader decides to set up its schema in the mutator? Doesn't that add noise 
to the signal we want to send downstream?

Conversely, do readers know to *not* touch the mutator in `setup()` and 
instead defer schema setup to the first call to `next()`? Doesn't that make 
readers rather more complicated than they need to be?

Of course, I could be missing something. In that case, a bit of comment to 
explain the protocol would be greatly appreciated!


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135701#comment-16135701
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134295508
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
--- End diff --

The original logic allocates a batch once per call to `next()`. The new 
path allocates vectors once per pass though this loop. Is this desired? If we 
make a single pass though the loop, then all is fine. If we make two passes 
through the loop, which code releases the vectors allocated on the first pass?


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135700#comment-16135700
 ] 

ASF GitHub Bot commented on DRILL-5546:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r134287696
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/Project.java 
---
@@ -35,9 +35,19 @@
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(Project.class);
 
   private final List exprs;
+  /**
+   * {@link org.apache.drill.exec.planner.physical.ProjectPrel for the 
meaning of flag 'outputProj'}
+   */
+  private boolean outputProj = false;
 
   @JsonCreator
-  public Project(@JsonProperty("exprs") List exprs, 
@JsonProperty("child") PhysicalOperator child) {
+  public Project(@JsonProperty("exprs") List exprs, 
@JsonProperty("child") PhysicalOperator child, @JsonProperty("outputproj") 
boolean outputProj) {
--- End diff --

`outputProj`?


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5729) Fix Travis Checks

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135606#comment-16135606
 ] 

ASF GitHub Bot commented on DRILL-5729:
---

Github user ilooner-mapr commented on the issue:

https://github.com/apache/drill/pull/913
  
@vrozov Spoke with Parth. Could you take a look at this PR? There are other 
issues with the travis build. For example it doesn't even run the unit tests 
right now. But this is a first step to fix what is currently in the travis 
build from failing.


> Fix Travis Checks
> -
>
> Key: DRILL-5729
> URL: https://issues.apache.org/jira/browse/DRILL-5729
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
> Fix For: 1.12.0
>
>
> Currently the Travis Checks are failing. The failures are happening because 
> Travis recently switched their default build containers from Ubuntu precise 
> to Ubuntu Trusty and we do not explicitly define the dist we build on in our 
> travis.yml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5721) Query with only root fragment and no non-root fragment hangs when Drillbit to Drillbit Control Connection has network issues

2017-08-21 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia reassigned DRILL-5721:


Assignee: Sorabh Hamirwasia

> Query with only root fragment and no non-root fragment hangs when Drillbit to 
> Drillbit Control Connection has network issues
> 
>
> Key: DRILL-5721
> URL: https://issues.apache.org/jira/browse/DRILL-5721
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> Recently I found an issue (Thanks to [~knguyen] to create this scenario) 
> related to Fragment Status reporting and would like some feedback on it. 
> When a client submits a query to Foreman, then it is planned by Foreman and 
> later fragments are scheduled to root and non-root nodes. Foreman creates a 
> DriilbitStatusListener and FragmentStatusListener to know about the health of 
> Drillbit node and a fragment respectively. The way root and non-root 
> fragments are setup by Foreman are different: 
> Root fragments are setup without any communication over control channel 
> (since it is executed locally on Foreman)
> Non-root fragments are setup by sending control message 
> (REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending 
> any such control message (like due to network hiccup's) during query setup 
> then the query is failed and client is notified. 
> Each fragment is executed on it's node with the help Fragment Executor which 
> has an instance for FragmentStatusReporter. FragmentStatusReporter helps to 
> update the status of a fragment to Foreman node over a control tunnel or 
> connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root 
> fragments. 
> Based on above when root fragment is submitted for setup then it is done 
> locally without any RPC communication whereas when status for that fragment 
> is reported by fragment executor that happens over control connection by 
> sending a RPC message. But for non-root fragment setup and status update both 
> happens using RPC message over control connection.
> *Issue 1:*
> What was observed is if for a simple query which has only 1 root fragment 
> running on Foreman node then setup will work fine. But as part of status 
> update when the fragment tries to create a control connection and fails to 
> establish that, then the query hangs. This is because the root fragment will 
> complete execution but will fail to update Foreman about it and Foreman think 
> that the query is running for ever. 
> *Proposed Solution:*
> For root fragment the setup of fragment is happening locally without RPC 
> message, so we can do the same for status update of root fragments. This will 
> avoid RPC communication for status update of fragments running locally on the 
> foreman and hence will resolve issue 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5726) Support Impersonation without authentication for REST API

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135489#comment-16135489
 ] 

ASF GitHub Bot commented on DRILL-5726:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/910
  
Had a chat with Arina on this and it is not the final PR, she will be 
updating the PR with new set of changes. @arina-ielchiieva - It would be great 
if you can put some context related to new set of changes in JIRA.


> Support Impersonation without authentication for REST API
> -
>
> Key: DRILL-5726
> URL: https://issues.apache.org/jira/browse/DRILL-5726
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.12.0
>
> Attachments: login_page.JPG
>
>
> Today if a user is not authenticated via REST API then there is no way to 
> provide a user name for executing queries. It will by default be executed as 
> "anonymous" user. This doesn't work when impersonation without authentication 
> is enabled on Drill server side, since anonymous user doesn't exist the query 
> will fail. We need a way to provide a user name when impersonation is enabled 
> on Drill side and query is executed from REST API.
> _Implementation details:_
> When only impersonation is enabled form-based authentication will be used.
> On Web UI user will be prompted to enter only login, then session for that 
> user will be created, user will be treated as admin. Form-based 
> authentication will cache user information, so user won't need to set 
> username each time he / she wants to execute the query. Log in / out options 
> will be also available. Screenshot of login page is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) date time test cases is not Local independent

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135314#comment-16135314
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/904#discussion_r134259055
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestDateFunctions.java
 ---
@@ -43,6 +46,11 @@
 public class TestDateFunctions extends PopUnitTestBase {
 static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestDateFunctions.class);
 
+@BeforeClass
+public static void setupLocal() {
+Locale.setDefault(new Locale("en", "US"));
--- End diff --

This change also affects other unit tests which will be executed after the 
tests in this class. 
So to avoid this, for each unit test which depends on the locale, we should:
1. preserve current locale
2. change locale to "en"
3. execute test (in the try block)
4. restore locale (in the finally block).

As the example, you may use test testConstantFolding_allTypes() from the 
class TestConstantFolding below.


> date time test cases is not Local independent
> -
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) date time test cases is not Local independent

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135313#comment-16135313
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/904#discussion_r134262565
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java
 ---
@@ -30,6 +31,11 @@
 
 public class TestCastFunctions extends BaseTestQuery {
 
+  @BeforeClass
+  public static void setupLocal() {
+System.setProperty("user.timezone","Etc/GMT");
--- End diff --

We should change the timezone in the same way as I proposed for the locale.
Also since we fix test failures connected with the timezone, please update 
Jira and pull request titles.


> date time test cases is not Local independent
> -
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) date time test cases is not Local independent

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135312#comment-16135312
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/904#discussion_r134260296
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewDateFunctions.java
 ---
@@ -51,9 +51,9 @@ public void testIsDate() throws Exception {
 .sqlQuery("select case when isdate(date1) then cast(date1 as date) 
else null end res1 from " + dateValues)
 .unOrdered()
 .baselineColumns("res1")
-.baselineValues(new DateTime(Date.valueOf("1900-01-01").getTime()))
-.baselineValues(new DateTime(Date.valueOf("3500-01-01").getTime()))
-.baselineValues(new DateTime(Date.valueOf("2000-12-31").getTime()))
+.baselineValues(new DateTime(1900,1,1,0,0))
+.baselineValues(new DateTime(3500,1,1,0,0))
+.baselineValues(new DateTime(2000,12,31,0,0))
--- End diff --

Please add spaces after the comma.


> date time test cases is not Local independent
> -
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135302#comment-16135302
 ] 

ASF GitHub Bot commented on DRILL-5377:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/916
  
@paul-rogers The similar manner is used for Time millis showing in Drill 
([TimePrintMillis](https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/vector/accessor/sql/TimePrintMillis.java))
But you are right, using of the custom format for date-to-string conversion 
is better decision.

Not only test framework converts `Date` to `String`, but 
[sqlline](https://github.com/julianhyde/sqlline/blob/master/src/main/java/sqlline/Rows.java#L183)
 as well. So I am going to create an issue ticket for sqlline.


> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135167#comment-16135167
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/889
  
Thanks @weijietong. LGTM.
@amansinha100 could you please do the final CR since you are familiar with 
this issue?


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5735) UI options grouping and filtering & Metrics hints

2017-08-21 Thread Muhammad Gelbana (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Gelbana updated DRILL-5735:

Description: 
I'm thinking of some UI improvements that could make all the difference for 
users trying to optimize low-performing queries.

h2. Options
h3. Grouping
We can organize the options to be grouped by their scope of effect, this will 
help users easily locate the options they may need to tune.
h3. Filtering
Since the options are a lot, we can add a filtering mechanism (i.e. string 
search or group\scope filtering) so the user can filter out the options he's 
not interested in. To provide more benefit than the grouping idea mentioned 
above, filtering may include keywords also and not just the option name, since 
the user may not be aware of the name of the option he's looking for.

h2. Metrics
I'm referring here to the metrics page and the query execution plan page that 
displays the overview section and major\minor fragments metrics. We can show 
hints for each metric such as:
# What does it represent in more details.
# What option\scope-of-options to tune (increase ? decrease ?) to improve the 
performance reported by this metric.
# May be even provide a small dialog to quickly allow the modification of the 
related option(s) to that metric

  was:
I can think of some UI improvements that could make all the difference for 
users trying to optimize low-performing queries.

h2. Options
h3. Grouping
We can organize the options to be grouped by their scope of effect, this will 
help users easily locate the options they may need to tune.
h3. Filtering
Since the options are a lot, we can add a filtering mechanism (i.e. string 
search or group\scope filtering) so the user can filter out the options he's 
not interested in. To provide more benefit than the grouping idea mentioned 
above, filtering may include keywords also and not just the option name, since 
the user may not be aware of the name of the option he's looking for.

h2. Metrics
I'm referring here to the metrics page and the query execution plan page that 
displays the overview section and major\minor fragments metrics. We can show 
hints for each metric such as:
# What does it represent in more details.
# What option\scope-of-options to tune (increase ? decrease ?) to improve the 
performance reported by this metric.
# May be even provide a small dialog to quickly allow the modification of the 
related option(s) to that metric


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5735) UI options grouping and filtering & Metrics hints

2017-08-21 Thread Muhammad Gelbana (JIRA)
Muhammad Gelbana created DRILL-5735:
---

 Summary: UI options grouping and filtering & Metrics hints
 Key: DRILL-5735
 URL: https://issues.apache.org/jira/browse/DRILL-5735
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Affects Versions: 1.11.0, 1.10.0, 1.9.0
Reporter: Muhammad Gelbana


I can think of some UI improvements that could make all the difference for 
users trying to optimize low-performing queries.

h2. Options
h3. Grouping
We can organize the options to be grouped by their scope of effect, this will 
help users easily locate the options they may need to tune.
h3. Filtering
Since the options are a lot, we can add a filtering mechanism (i.e. string 
search or group\scope filtering) so the user can filter out the options he's 
not interested in. To provide more benefit than the grouping idea mentioned 
above, filtering may include keywords also and not just the option name, since 
the user may not be aware of the name of the option he's looking for.

h2. Metrics
I'm referring here to the metrics page and the query execution plan page that 
displays the overview section and major\minor fragments metrics. We can show 
hints for each metric such as:
# What does it represent in more details.
# What option\scope-of-options to tune (increase ? decrease ?) to improve the 
performance reported by this metric.
# May be even provide a small dialog to quickly allow the modification of the 
related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5725) Update Jackson version to 2.7.8

2017-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134844#comment-16134844
 ] 

ASF GitHub Bot commented on DRILL-5725:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/908
  
Maven uses 'nearest-win' strategy to resolve conflicts and since we specify 
the Jackson library in the pom file, there is no need to exclude it from other 
libraries.
Therefore the result of the command `mvn dependency:tree | grep 
com.fasterxml.jackson` is
```
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  \- 
com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.7.8:compile
[INFO] +- 
com.fasterxml.jackson.module:jackson-module-afterburner:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.7.8:compile
[INFO] |  |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.7.8:compile
[INFO] |  |  \- 
com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.module:jackson-module-afterburner:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.7.8:compile
[INFO] |  |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.7.8:compile
[INFO] |  |  \- 
com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.module:jackson-module-afterburner:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.7.8:compile
[INFO] |  |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.7.8:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] |  |  \- 
com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.module:jackson-module-afterburner:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.7.8:compile
[INFO] |  |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.7.8:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-core:jar:2.7.8:compile
[INFO] |  |  \- 
com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.module:jackson-module-afterburner:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.7.8:compile
[INFO] |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.7.8:compile
[INFO] |  |  +- 
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.7.8:compile
[INFO] |  |  +-