date:20191104

[jira] [Updated] (DRILL-7436) Fix record count, vector structure issues in several operators

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7436:

Reviewer: Arina Ielchiieva

> Fix record count, vector structure issues in several operators
> --
>
> Key: DRILL-7436
> URL: https://issues.apache.org/jira/browse/DRILL-7436
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> This is the next in a continuing series of fixes to the container record 
> count, batch record count, and vector structure in several operators. This 
> batch represents the smallest change needed to add checking for the Filter 
> operator.
> In order to get Filter to pass checks, many of its upstream operators needed 
> to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7436) Fix record count, vector structure issues in several operators

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7436:

Fix Version/s: 1.17.0

> Fix record count, vector structure issues in several operators
> --
>
> Key: DRILL-7436
> URL: https://issues.apache.org/jira/browse/DRILL-7436
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> This is the next in a continuing series of fixes to the container record 
> count, batch record count, and vector structure in several operators. This 
> batch represents the smallest change needed to add checking for the Filter 
> operator.
> In order to get Filter to pass checks, many of its upstream operators needed 
> to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6938) SQL get the wrong result after hashjoin and hashagg disabled

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6938:

Fix Version/s: (was: 1.17.0)

> SQL get the wrong result after hashjoin and hashagg disabled
> 
>
> Key: DRILL-6938
> URL: https://issues.apache.org/jira/browse/DRILL-6938
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Dony Dong
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>
> Hi Team
> After we disable hashjoin and hashagg to fix out of memory issue, we got the 
> wrong result.
> With these two parameters enabled, we will get 8 rows. After we disable them, 
> it only return 3 rows. It seems some MEM_ID had exclude before group or some 
> other step.
> select b.MEM_ID,count(distinct b.DEP_NO)
> from dfs.test.emp b
> where b.DEP_NO<>'-'
> and b.MEM_ID in ('68','412','852','117','657','816','135','751')
> and b.HIRE_DATE>'2014-06-01'
> group by b.MEM_ID
> order by 1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7234) Allow support for using Drill WebU through a Reverse Proxy server

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7234:

Fix Version/s: (was: 1.17.0)

> Allow support for using Drill WebU through a Reverse Proxy server
> -
>
> Key: DRILL-7234
> URL: https://issues.apache.org/jira/browse/DRILL-7234
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Priority: Critical
>  Labels: reverse-proxy
>
> Currently, Drill's WebUI has a lot of links and references going through the 
> root of the URL.
> i.e. to access the profiles listing or submitting a query, we'd need to 
> access the following URL format:
> {code}
> http://localhost:8047/profiles
> http://localhost:8047/query
> {code}
> With a reverse proxy, these pages need to be accessed by:
> {code}
> http://localhost:8047/x/y/z/profiles
> http://localhost:8047/x/y/z/query
> {code}
> However, the links within these page do not include the *{{x/y/z/}}* part, as 
> a result of which the visiting those links will fail.
> The WebServer should implement a mechanism that can detect this additional 
> layer and modify the links within the webpage accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7416) Updates required to dependencies to resolve potential security vulnerabilities

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7416:

Fix Version/s: (was: 1.17.0)

> Updates required to dependencies to resolve potential security 
> vulnerabilities 
> ---
>
> Key: DRILL-7416
> URL: https://issues.apache.org/jira/browse/DRILL-7416
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Bradley Parker
>Assignee: Bradley Parker
>Priority: Critical
>  Labels: security
>
> After running an OWASP Dependency Check and ruling out false positives, I 
> have found 25 dependencies that should be updated to remove potential 
> vulnerabilities. They are listed alphabetically with their CVE information 
> below.
>  
> [CVSS 
> scores|[https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System]] 
> represent the severity of a vulnerability on a scale of 1-10, 10 being 
> critical. [CVEs 
> |[https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures]] are 
> public identifiers used to reference known vulnerabilities. 
>  
> Package: avro-1.8.2
> Should be: 1.9.0 (*Existing item at* *DRILL-7302*)
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: commons-beanutils-1.9.2
> Should be: 1.9.4
> Max CVE (CVSS): CVE-2019-10086 (7.3)
> Complete CVE list: CVE-2019-10086
> Package: commons-beanutils-core-1.8.0
> Should be: Moved to commons-beanutils
> Max CVE (CVSS): CVE-2014-0114 (7.5)
> Complete CVE list: CVE-2014-0114Deprecated, replaced by commons-beanutils
> Package: converter-jackson
> Should be: 2.5.0
> Max CVE (CVSS): CVE-2018-1000850 (7.5)
> Complete CVE list: CVE-2018-1000850
> Package: derby-10.10.2.0
> Should be: 10.14.2.0
> Max CVE (CVSS): CVE-2015-1832 (9.1)
> Complete CVE list: CVE-2015-1832
> CVE-2018-1313
> Package: drill-hive-exec-shaded
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (7.5)
> Complete CVE list: CVE-2018-10237
> Package: drill-java-exec
> Should be: New release needed with updated JjQuery and Bootstrap
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2018-14040
> CVE-2018-14041 
> CVE-2018-14042
> CVE-2019-8331
> CVE-2019-11358
> Package: drill-shaded-guava-23
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: guava-19.0
> Should be: 24.1.1
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: hadoop-yarn-common-2.7.4
> Should be: 3.2.1
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2012-6708
> CVE-2015-9251
> CVE-2019-11358
> CVE-2010-5312
> CVE-2016-7103
> Package: hbase-http-2.1.1.jar 
> Should be: 2.1.4
> Max CVE (CVSS): CVE-2019-0212 (7.5)
> Complete CVE list: CVE-2019-0212
> Package: httpclient-4.2.5.jar
> Should be: 4.3.6
> Max CVE (CVSS): CVE-2014-3577  (5.8)
> Complete CVE list: CVE-2014-3577
> CVE-2015-5262
> Package: jackson-databind-2.9.5
> Should be: 2.10.0
> Max CVE (CVSS): CVE-2018-14721  (10)
> Complete CVE list: CVE-2019-17267
> CVE-2019-16943
> CVE-2019-16942
> CVE-2019-16335
> CVE-2019-14540
> CVE-2019-14439
> CVE-2019-14379
> CVE-2018-11307
> CVE-2019-12384
> CVE-2019-12814
> CVE-2019-12086
> CVE-2018-12023
> CVE-2018-12022
> CVE-2018-19362
> CVE-2018-19361
> CVE-2018-19360
> CVE-2018-14721
> CVE-2018-14720
> CVE-2018-14719
> CVE-2018-14718
> CVE-2018-1000873
> Package: jetty-server-9.3.25.v20180904.jar (*Existing DRILL-7135, but that's 
> to go to 9.4 and it's blocked, we should go to latest 9.3 in the meantime*)
> Should be: 9.3.27.v20190418
> Max CVE (CVSS): CVE-2017-9735 (7.5)
> Complete CVE list: CVE-2017-9735
> CVE-2019-10241
> CVE-2019-10247
> Package: Kafka 0.11.0.1
> Should be: 2.2.0 (*Existing item DRILL-6739*)
> Max CVE (CVSS): CVE-2018-17196 (8.8)
> Complete CVE list: CVE-2018-17196
> CVE-2018-1288
> CVE-2017-12610
> Package: kudu-client-1.3.0.jar 
> Should be: 1.10.0
> Max CVE (CVSS): CVE-2015-5237  (8.8)
> Complete CVE list: CVE-2018-10237
> CVE-2015-5237
> CVE-2019-16869Only a partial fix, no fix for netty CVE-2019-16869 (7.5), kudu 
> still needs to update their netty (this is not unexpected as this CVE is 
> newer)
> Package: libfb303-0.9.3.jar
> Should be: 0.12.0
> Max CVE (CVSS): CVE-2018-1320 (7.5)
> Complete CVE list: CVE-2018-1320Moved to libthrift
> Package: okhttp-3.3.0
> Should be: 3.12.0
> Max CVE (CVSS): CVE-2018-20200 (5.9)
> Complete CVE list: CVE-2018-20200
> Package: protobuf-java-2.5.0
> Should be: 3.4.0
> Max CVE (CVSS): CVE-2015-5237  (8.8)
> Complete CVE list: CVE-2015-5237 
> Package: retrofit-2.1.0
> Should be: 2.5.0
> Max CVE (CVSS): CVE-2018-1000850 (7.5)
> Complete CVE list: CVE-2018-1000850
> Package: scala-library-2.11.0
> Should

[jira] [Assigned] (DRILL-7234) Allow support for using Drill WebU through a Reverse Proxy server

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7234:
---

Assignee: (was: Kunal Khatua)

> Allow support for using Drill WebU through a Reverse Proxy server
> -
>
> Key: DRILL-7234
> URL: https://issues.apache.org/jira/browse/DRILL-7234
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Priority: Critical
>  Labels: reverse-proxy
> Fix For: 1.17.0
>
>
> Currently, Drill's WebUI has a lot of links and references going through the 
> root of the URL.
> i.e. to access the profiles listing or submitting a query, we'd need to 
> access the following URL format:
> {code}
> http://localhost:8047/profiles
> http://localhost:8047/query
> {code}
> With a reverse proxy, these pages need to be accessed by:
> {code}
> http://localhost:8047/x/y/z/profiles
> http://localhost:8047/x/y/z/query
> {code}
> However, the links within these page do not include the *{{x/y/z/}}* part, as 
> a result of which the visiting those links will fail.
> The WebServer should implement a mechanism that can detect this additional 
> layer and modify the links within the webpage accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7270) Fix non-https dependency urls and add checksum checks

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7270:
---

Assignee: (was: Dmytriy Grinchenko)

> Fix non-https dependency urls and add checksum checks
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>  Components: Security
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7233) Format Plugin for HDF5

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7233:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM table(dfs.test.`dset.h5` (type => 'hdf5', 
> defaultPath => '/dset'));
> +---+---+---+---+---+---+
> | int_col_0 | int_col_1 | int_col_2 | int_col_3 | int_col_4 | int_col_5 |
> +---+---+---+---+---+---+
> | 1 | 2 | 3 | 4 | 5 | 6 |
> | 7 | 8 | 9 | 10| 11| 12|
> | 13| 14| 15| 16| 17| 18|
> | 19| 20| 21| 22| 23| 24

[jira] [Updated] (DRILL-7223) Make the timeout in TimedCallable a configurable boot time parameter

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7223:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Make the timeout in TimedCallable a configurable boot time parameter
> 
>
> Key: DRILL-7223
> URL: https://issues.apache.org/jira/browse/DRILL-7223
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Boaz Ben-Zvi
>Priority: Minor
> Fix For: 1.18.0
>
>
> The 
> [TimedCallable.TIMEOUT_PER_RUNNABLE_IN_MSECS|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java#L52]
>  is currently an internal Drill constant defined as 15 secs. This has been 
> there from day 1 of the introduction. Drill's TimedCallable implements the 
> Java concurrency's Callable interface to create timed threads. It is used by 
> the REFRESH METADATA command which creates multiple threads on the Foreman 
> node to gather Parquet metadata to build the metadata cache.
> Depending on the load on the system or for very large scale number of parquet 
> files (millions) it is possible to exceed this timeout.  While the exact root 
> cause of exceeding the timeout is being investigated, it makes sense to make 
> this timeout a configurable parameter to aid with large scale testing. This 
> JIRA is to make this a configurable bootstrapping option in the 
> drill-override.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7270) Fix non-https dependency urls and add checksum checks

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7270:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Fix non-https dependency urls and add checksum checks
> -
>
> Key: DRILL-7270
> URL: https://issues.apache.org/jira/browse/DRILL-7270
> Project: Apache Drill
>  Issue Type: Task
>  Components: Security
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.18.0
>
>
> Review any build scripts and configurations for insecure urls and make 
> appropriate fixes to use secure urls.
> Projects like Lucene do checksum whitelists of all their build dependencies, 
> and you may wish to consider that as a
> protection against threats beyond just MITM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7210) Batch Sizing in HashPartitionSender

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7210:
---

Assignee: (was: Karthikeyan Manivannan)

> Batch Sizing in HashPartitionSender
> ---
>
> Key: DRILL-7210
> URL: https://issues.apache.org/jira/browse/DRILL-7210
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Karthikeyan Manivannan
>Priority: Major
>
> Jira to track changes required in HashPartitionSender for performing Batch 
> Sizing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7284) reusing the hashCodes computed at exchange nodes

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7284:

Fix Version/s: (was: 1.17.0)
   1.18.0

> reusing the hashCodes computed at exchange nodes
> 
>
> Key: DRILL-7284
> URL: https://issues.apache.org/jira/browse/DRILL-7284
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Major
> Fix For: 1.18.0
>
>
> To HashJoin or HashAggregate, we will shuffle the input data according to 
> hashCodes of join conditions or group by keys at the exchange nodes. This 
> computing of the hash codes will be redo at the HashJoin or HashAggregate 
> nodes. We could send the computed hashCodes of exchange nodes to the upper 
> nodes. So the HashJoin or HashAggregate nodes will not need to do the hash 
> computing again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7099) Resource Management in Exchange Operators

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7099:
---

Assignee: (was: Karthikeyan Manivannan)

> Resource Management in Exchange Operators
> -
>
> Key: DRILL-7099
> URL: https://issues.apache.org/jira/browse/DRILL-7099
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.17.0
>
>
> This Jira will be used to track the changes required for implementing 
> Resource Management in Exchange operators.
> The design can be found here: 
> https://docs.google.com/document/d/1N9OXfCWcp68jsxYVmSt9tPgnZRV_zk8rwwFh0BxXZeE/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7099) Resource Management in Exchange Operators

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7099:

Fix Version/s: (was: 1.17.0)

> Resource Management in Exchange Operators
> -
>
> Key: DRILL-7099
> URL: https://issues.apache.org/jira/browse/DRILL-7099
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Karthikeyan Manivannan
>Priority: Major
>
> This Jira will be used to track the changes required for implementing 
> Resource Management in Exchange operators.
> The design can be found here: 
> https://docs.google.com/document/d/1N9OXfCWcp68jsxYVmSt9tPgnZRV_zk8rwwFh0BxXZeE/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7184) Set the IDs for the unique HTML tags in the Drill Web UI

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7184:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Set the IDs for the unique HTML tags in the Drill Web UI
> 
>
> Key: DRILL-7184
> URL: https://issues.apache.org/jira/browse/DRILL-7184
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Denys Ordynskiy
>Assignee: Denys Ordynskiy
>Priority: Major
> Fix For: 1.18.0
>
>
> Selenium web pages automation requires identifiers on HTML tags.
> We need to find all HTML tags without ID's that are useful for Drill Web UI 
> automation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7091) Query with EXISTS and correlated subquery fails with NPE in HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7091:

Reviewer: Vova Vysotskyi

> Query with EXISTS and correlated subquery fails with NPE in 
> HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl
> --
>
> Key: DRILL-7091
> URL: https://issues.apache.org/jira/browse/DRILL-7091
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Vova Vysotskyi
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Steps to reproduce:
> 1. Create view:
> {code:sql}
> create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
> {code}
> Run the following query:
> {code:sql}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey)
> {code}
> This query fails with NPE:
> {noformat}
> [Error Id: 9a592635-f792-4403-965c-bd2eece7e8fc on cv1:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize(HashJoinMemoryCalculatorImpl.java:267)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:959)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.HashAggregatorGen2.doWork(HashAggTemplate.java:642)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:295)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.dr

[jira] [Updated] (DRILL-3290) Hive Storage: Add support for Hive complex types

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-3290:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Hive Storage: Add support for Hive complex types
> 
>
> Key: DRILL-3290
> URL: https://issues.apache.org/jira/browse/DRILL-3290
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Hive, Storage - Hive
>Reporter: Rahul Kumar Challapalli
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> Improve the hive storage plugin to add support for complex types in hive. 
> Below are the complex types hive supports
> {code}
> ARRAY
>  MAP
> STRUCT
> UNIONTYPE
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5028:

Fix Version/s: (was: 1.17.0)

> Opening profiles page from web ui gets very slow when a lot of history files 
> have been stored in HDFS or Local FS.
> --
>
> Key: DRILL-5028
> URL: https://issues.apache.org/jira/browse/DRILL-5028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Account Not Used
>Priority: Minor
>
> We have a Drill cluster with 20+ Nodes and we store all history profiles in 
> hdfs. Without doing periodically cleans for hdfs, the profiles page gets 
> slower while serving more queries.
> Code from LocalPersistentStore.java uses fs.list(false, basePath) for 
> fetching the latest 100 history profiles by default, I guess this operation 
> blocks the page loading (Millions small files can be stored in the basePath), 
> maybe we can try some other ways to reach the same goal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-5028) Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS.

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5028:
---

Assignee: (was: Kunal Khatua)

> Opening profiles page from web ui gets very slow when a lot of history files 
> have been stored in HDFS or Local FS.
> --
>
> Key: DRILL-5028
> URL: https://issues.apache.org/jira/browse/DRILL-5028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Account Not Used
>Priority: Minor
> Fix For: 1.17.0
>
>
> We have a Drill cluster with 20+ Nodes and we store all history profiles in 
> hdfs. Without doing periodically cleans for hdfs, the profiles page gets 
> slower while serving more queries.
> Code from LocalPersistentStore.java uses fs.list(false, basePath) for 
> fetching the latest 100 history profiles by default, I guess this operation 
> blocks the page loading (Millions small files can be stored in the basePath), 
> maybe we can try some other ways to reach the same goal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-2362) Drill should manage Query Profiling archiving

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-2362:
---

Assignee: (was: Kunal Khatua)

> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Priority: Major
> Fix For: 1.17.0
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-2362) Drill should manage Query Profiling archiving

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-2362:

Fix Version/s: (was: 1.17.0)

> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Priority: Major
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-5270) Improve loading of profiles listing in the WebUI

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5270:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Improve loading of profiles listing in the WebUI
> 
>
> Key: DRILL-5270
> URL: https://issues.apache.org/jira/browse/DRILL-5270
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6825) Applying different hash function according to data types and data size

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6825:

Fix Version/s: (was: 1.17.0)

> Applying different hash function according to data types and data size
> --
>
> Key: DRILL-6825
> URL: https://issues.apache.org/jira/browse/DRILL-6825
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Reporter: Weijie Tong
>Assignee: Weijie Tong
>Priority: Major
>
> Different hash functions have different performance according to different 
> data types and data size. We should choose a right one to apply not just 
> Murmurhash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-4587) Document Drillbit launch options

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4587:

Fix Version/s: (was: 1.17.0)

> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Major
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a tool such as YARN or Mesos. Options are of 
> the form:
> $ drillbit.sh start -Dvariable=value
> For example, to control the HTTP port:
> $ drillbit.sh start -Ddrill.exec.http.port=8099 
> Properties are of three types.
> 1. Launch-only properties: those that can be set only through environment 
> variables (such as JAVA_HOME.)
> 2. Drill startup properties which can be set in the locations detailed below.
> 3. Drill runtime properties which are set in drill-override.conf also via SQL.
> Drill startup propeties can be set in a number of locations. Those listed 
> later take precedence over those listed earlier.
> 1. Drill-override.conf as identified by DRILL_CONF_DIR or its default.
> 2. Set in the environment using DRILL_JAVA_OPTS or DRILL_DRILLBIT_JAVA_OPTS.
> 3. Set in drill-env.sh using the above two variables.
> 4. Set on the drill.bit command line as explained above. (Drill 1.7 and 
> later.)
> You can see the actual set of properties used (from items 2-3 above) by using 
> the "debug" command (Drill 1.7 or later):
> $ drillbit.sh debug



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6543) Option for memory mgmt: Reserve allowance for non-buffered

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6543:

Fix Version/s: (was: 1.17.0)

> Option for memory mgmt: Reserve allowance for non-buffered
> --
>
> Key: DRILL-6543
> URL: https://issues.apache.org/jira/browse/DRILL-6543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Major
>
> Introduce a new option to enforce/remind users to reserve some allowance when 
> budgeting their memory:
> The problem: When the "planner.memory.max_query_memory_per_node" (MQMPN) 
> option is set equal (or "nearly equal") to the allocated *Direct Memory*, an 
> OOM is still possible. The reason is that the memory used by the 
> "non-buffered" operators is not taken into account.
> For example, MQMPN == Direct-Memory == 100 MB. Run a query with 5 buffered 
> operators (e.g., 5 instances of a Hash-Join), so each gets "promised" 20 MB. 
> When other non-buffered operators (e.g., a Scanner, or a Sender) also grab 
> some of the Direct Memory, then less than 100 MB is left available. And if 
> all those 5 Hash-Joins are pushing their limits, then one HJ may have only 
> allocated 12MB so far, but on the next 1MB allocation it will hit an OOM 
> (from the JVM, as all the 100MB Direct memory is already used).
> A solution -- a new option to _*reserve*_ some of the Direct Memory for those 
> non-buffered operators (e.g., default %25). This *allowance* may prevent many 
> of the cases like the example above. The new option would return an error 
> (when a query initiates) if the MQMPN is set too high. Note that this option 
> +can not+ address concurrent queries.
> This should also apply to the alternative for the MQMPN - the 
> {{"planner.memory.percent_per_query"}} option (PPQ). The PPQ does not 
> _*reserve*_ such memory (e.g., can set it to %100); only its documentation 
> clearly explains this issue (that doc suggests reserving %50 allowance, as it 
> was written when the Hash-Join was non-buffered; i.e., before spill was 
> implemented).
> The memory given to the buffered operators is the highest calculated between 
> the MQMPN and the PPQ. The new reserve option would verify that this figure 
> allows the allowance.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6845) Eliminate duplicates for Semi Hash Join

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6845:

Fix Version/s: (was: 1.17.0)

> Eliminate duplicates for Semi Hash Join
> ---
>
> Key: DRILL-6845
> URL: https://issues.apache.org/jira/browse/DRILL-6845
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Minor
>
> Following DRILL-6735: The performance of the new Semi Hash Join may degrade 
> if the build side contains excessive number of join-key duplicate rows; this 
> mainly a result of the need to store all those rows first, before the hash 
> table is built.
>   Proposed solution: For Semi, the Hash Agg would create a Hash-Table 
> initially, and use it to eliminate key-duplicate rows as they arrive.
>   Proposed extra: That Hash-Table has an added cost (e.g. resizing). So 
> perform "runtime stats" – Check initial number of incoming rows (e.g. 32k), 
> and if the number of duplicates is less than some threshold (e.g. %20) – 
> cancel that "early" hash table.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7244) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7244:
---

Assignee: Vova Vysotskyi  (was: Venkata Jyothsna Donapati)

> Run-time rowgroup pruning match() fails on casting a Long to an Integer
> ---
>
> Key: DRILL-7244
> URL: https://issues.apache.org/jira/browse/DRILL-7244
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Boaz Ben-Zvi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> See DRILL-7240 , where a temporary workaround was created, skipping pruning 
> (and logging) instead of this failure: 
> After a Parquet table is refreshed with selected "interesting" columns, a 
> query whose WHERE clause contains a condition on a "non interesting" INT64 
> column fails during run-time pruning (calling match()) with:
> {noformat}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
> {noformat}
> A long term solution is to pass the whole (or the relevant part of the) 
> schema to the runtime, instead of just passing the "interesting" columns.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7028) Reduce the planning time of queries on large Parquet tables with large metadata cache files

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7028:

Fix Version/s: (was: 1.16.0)

> Reduce the planning time of queries on large Parquet tables with large 
> metadata cache files
> ---
>
> Key: DRILL-7028
> URL: https://issues.apache.org/jira/browse/DRILL-7028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: performance
> Fix For: 1.17.0
>
>
> If the Parquet table has a large number of small files, the metadata cache 
> files grow larger and the planner tries to read the large metadata cache file 
> which leads to the planning time overhead. Most of the time of execution is 
> spent during the planning phase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7203) Back button for failed query does not return on Query page

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7203:
---

Assignee: (was: Kunal Khatua)

>  Back button for failed query does not return on Query page
> ---
>
> Key: DRILL-7203
> URL: https://issues.apache.org/jira/browse/DRILL-7203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: back_button.JPG
>
>
> Back button for failed query returns on previous page before Query page but 
> not on the Query page.
> Steps: 
> 1. go to Logs page
> 2. go to Query page
> 3. execute query with incorrect syntax (ex: x)
> 4. error message will be displayed, Back button will be in left corner 
> (screenshot attached)
> 5. press Back button
> 6. user is redirected to Logs page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7203) Back button for failed query does not return on Query page

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7203:

Fix Version/s: (was: 1.17.0)

>  Back button for failed query does not return on Query page
> ---
>
> Key: DRILL-7203
> URL: https://issues.apache.org/jira/browse/DRILL-7203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Priority: Major
> Attachments: back_button.JPG
>
>
> Back button for failed query returns on previous page before Query page but 
> not on the Query page.
> Steps: 
> 1. go to Logs page
> 2. go to Query page
> 3. execute query with incorrect syntax (ex: x)
> 4. error message will be displayed, Back button will be in left corner 
> (screenshot attached)
> 5. press Back button
> 6. user is redirected to Logs page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7352) Introduce new checkstyle rules to make code style more consistent

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7352:

Fix Version/s: (was: 1.17.0)

> Introduce new checkstyle rules to make code style more consistent
> -
>
> Key: DRILL-7352
> URL: https://issues.apache.org/jira/browse/DRILL-7352
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vova Vysotskyi
>Priority: Major
>
> Source - https://checkstyle.sourceforge.io/checks.html
> List of rules to be enabled:
> * [LeftCurly|https://checkstyle.sourceforge.io/config_blocks.html#LeftCurly] 
> - force placement of a left curly brace at the end of the line.
> * 
> [RightCurly|https://checkstyle.sourceforge.io/config_blocks.html#RightCurly] 
> - force placement of a right curly brace
> * 
> [NewlineAtEndOfFile|https://checkstyle.sourceforge.io/config_misc.html#NewlineAtEndOfFile]
> * 
> [UnnecessaryParentheses|https://checkstyle.sourceforge.io/config_coding.html#UnnecessaryParentheses]
> * 
> [MethodParamPad|https://checkstyle.sourceforge.io/config_whitespace.html#MethodParamPad]
> * [InnerTypeLast 
> |https://checkstyle.sourceforge.io/config_design.html#InnerTypeLast]
> * 
> [MissingOverride|https://checkstyle.sourceforge.io/config_annotation.html#MissingOverride]
> * 
> [InvalidJavadocPosition|https://checkstyle.sourceforge.io/config_javadoc.html#InvalidJavadocPosition]
> * 
> [ArrayTypeStyle|https://checkstyle.sourceforge.io/config_misc.html#ArrayTypeStyle]
> * [UpperEll|https://checkstyle.sourceforge.io/config_misc.html#UpperEll]
> and others



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7340) Filter is not pushed to JDBC database when several databases are used in the query

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7340:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Filter is not pushed to JDBC database when several databases are used in the 
> query
> --
>
> Key: DRILL-7340
> URL: https://issues.apache.org/jira/browse/DRILL-7340
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.18.0
>
>
> For the case when several databases are used in the query, some rules weren't 
> added to the rule set for one of the conventions. It is observed in queries 
> similar to the next query:
> {code:sql}
> select * from mysql.`drill_mysql_test`.person t1
> INNER JOIN h2.drill_h2_test.person t2 on t1.person_id = t2.person_id where 
> t1.first_name = 'first_name_1' and t2.last_name = 'last_name_1
> {code}
> Plan for this query is the following:
> {noformat}
> 00-00Screen
> 00-01  Project(person_id=[$0], first_name=[$1], last_name=[$2], 
> address=[$3], city=[$4], state=[$5], zip=[$6], json=[$7], bigint_field=[$8], 
> smallint_field=[$9], numeric_field=[$10], boolean_field=[$11], 
> double_field=[$12], float_field=[$13], real_field=[$14], time_field=[$15], 
> timestamp_field=[$16], date_field=[$17], datetime_field=[$18], 
> year_field=[$19], text_field=[$20], tiny_text_field=[$21], 
> medium_text_field=[$22], long_text_field=[$23], blob_field=[$24], 
> bit_field=[$25], enum_field=[$26], PERSON_ID0=[$27], FIRST_NAME0=[$28], 
> LAST_NAME0=[$29], ADDRESS0=[$30], CITY0=[$31], STATE0=[$32], ZIP0=[$33], 
> JSON0=[$34], BIGINT_FIELD0=[$35], SMALLINT_FIELD0=[$36], 
> NUMERIC_FIELD0=[$37], BOOLEAN_FIELD0=[$38], DOUBLE_FIELD0=[$39], 
> FLOAT_FIELD0=[$40], REAL_FIELD0=[$41], TIME_FIELD0=[$42], 
> TIMESTAMP_FIELD0=[$43], DATE_FIELD0=[$44], CLOB_FIELD=[$45])
> 00-02HashJoin(condition=[=($0, $27)], joinType=[inner], semi-join: 
> =[false])
> 00-03  Project(PERSON_ID0=[$0], FIRST_NAME0=[$1], LAST_NAME0=[$2], 
> ADDRESS0=[$3], CITY0=[$4], STATE0=[$5], ZIP0=[$6], JSON0=[$7], 
> BIGINT_FIELD0=[$8], SMALLINT_FIELD0=[$9], NUMERIC_FIELD0=[$10], 
> BOOLEAN_FIELD0=[$11], DOUBLE_FIELD0=[$12], FLOAT_FIELD0=[$13], 
> REAL_FIELD0=[$14], TIME_FIELD0=[$15], TIMESTAMP_FIELD0=[$16], 
> DATE_FIELD0=[$17], CLOB_FIELD=[$18])
> 00-05SelectionVectorRemover
> 00-06  Filter(condition=[=($2, 'last_name_1')])
> 00-07Jdbc(sql=[SELECT * FROM "TMP"."DRILL_H2_TEST"."PERSON" ])
> 00-04  Jdbc(sql=[SELECT * FROM `drill_mysql_test`.`person` WHERE 
> `first_name` = 'first_name_1' ])
> {noformat}
> {{DrillJdbcFilterRule}} wasn't applied for H2 convention and Filter wasn't 
> pushed to H2 database.
> This issue may be fixed by specifying {{JdbcConvention}} in rules 
> descriptions in Drill {{DrillJdbcFilterRule}} and {{DrillJdbcProjectRule}} 
> rules and other rules should be fixed in Calcite in the scope of CALCITE-3115.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7406) Update Calcite to 1.21.0

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7406:

Fix Version/s: 1.18.0

> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7340 should be fixed by this update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7344) Add GEO-IP Functions

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7344:

Fix Version/s: (was: 1.17.0)

> Add GEO-IP Functions
> 
>
> Key: DRILL-7344
> URL: https://issues.apache.org/jira/browse/DRILL-7344
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>
> Geolocating IP Addresses is very useful for security data analysis. This 
> collection of UDFs enables Drill users to extract geo-locational data from IP 
> addresses in data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7366) Improve Null Handling for UDFs with Complex Output

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7366:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Improve Null Handling for UDFs with Complex Output
> --
>
> Key: DRILL-7366
> URL: https://issues.apache.org/jira/browse/DRILL-7366
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> If there is a UDF which has a complex field (Map or List) as output, Drill 
> does not allow the UDF to have nullable input and it creates additional 
> complexity when writing these kinds of UDFs. 
> I therefore would like to propose that two options be added to the 
> FunctionTemplate for null handling:  {{EMPTY_LIST_IF_NULL}}, and 
> {{EMPTY_MAP_IF_NULL}} which, would simplify UDF creation.  I'm envisioning 
> that if either of these options were selected, and the UDF receives any null 
> value as input, the UDF will return either an empty map or list. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7311) Partial fixes for empty batch bugs

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7311:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Partial fixes for empty batch bugs
> --
>
> Key: DRILL-7311
> URL: https://issues.apache.org/jira/browse/DRILL-7311
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7305 explains that multiple operators have serious bugs when presented 
> with empty batches. DRILL-7306 explains that the EVF (AKA "new scan 
> framework") was originally coded to emit an empty "fast schema" batch, but 
> that the feature was disabled because of the many empty-batch operator 
> failures.
> This ticket covers a set of partial fixes for empty-batch issues. This is the 
> result of work done to get the converted JSON reader to work with a "fast 
> schema." The JSON work, in the end, revealed that Drill has too many bugs to 
> enable fast schema, and so the DRILL-7306 was implemented instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7325) Many operators do not set container record count

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7325:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Many operators do not set container record count
> 
>
> Key: DRILL-7325
> URL: https://issues.apache.org/jira/browse/DRILL-7325
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-7324. The following are problems found because some operators fail 
> to set the record count for their containers.
> h4. Scan
> TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ScanBatch
> ScanBatch: Container record count not set
> Reason: ScanBatch never sets the record count of its container (this is a 
> generic issue, not specific to the PojoRecordReader).
> h4. Filter
> {{TestComplexTypeReader.testNonExistentFieldConverting()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from FilterRecordBatch
> FilterRecordBatch: Container record count not set
> {noformat}
> h4. Hash Join
> {{TestComplexTypeReader.test_array()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashJoinBatch
> HashJoinBatch: Container record count not set
> {noformat}
> Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} 
> with no records.
> h4. Project
> TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
> schema-only batches):
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ProjectRecordBatch
> ProjectRecordBatch: Container record count not set
> {noformat}
> Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
> does not set the value count to 0.
> h4. Unordered Receiver
> {{TestCsvWithSchema.testMultiFileSchema()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnorderedReceiverBatch
> UnorderedReceiverBatch: Container record count not set
> {noformat}
> The problem is that {{RecordBatchLoader.load()}} does not set the container 
> record count.
> h4. Streaming Aggregate
> {{TestJsonReader.testSumWithTypeCase()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from StreamingAggBatch
> StreamingAggBatch: Container record count not set
> {noformat}
> The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
> container record count to 0.
> h4. Limit
> {{TestJsonReader.testDrill_1419()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from LimitRecordBatch
> LimitRecordBatch: Container record count not set
> {noformat}
> None of the paths in {{LimitRecordBatch.innerNext()}} set the container 
> record count.
> h4. Union All
> {{TestJsonReader.testKvgenWithUnionAll()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnionAllRecordBatch
> UnionAllRecordBatch: Container record count not set
> {noformat}
> When {{UnionAllRecordBatch}} calls 
> {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
> container count.
> h4. Hash Aggregate
> {{TestJsonReader.drill_4479()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashAggBatch
> HashAggBatch: Container record count not set
> {noformat}
> Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
> record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}
> h4. And Many More
> I turns out that most operators fail to set one of the many row count 
> variables somewhere in their code path: maybe in the schema setup path, maybe 
> when building a batch along one of the many paths that operators follow. 
> Further, we have multiple row counts that must be set:
> * Values in each vector ({{setValueCount()}},
> * Row count in the container ({{setRecordCount()}}), which must be the same 
> as the vector value count.
> * Row count in the operator (batch), which is the (possibly filtered) count 
> of records presented to downstream operators. It must be less than or equal 
> to the container row count (except for an SV4.)
> * The SV2 record count, which is the number of entries in the SV2 and must be 
> the same as the batch row count (and less or equal to the container row 
> count.)
> * The SV2 actual bactch record count, which must be the same as the container 
> row count.
> * The SV4 record count, which must be the same as the batch record count. 
>

[jira] [Updated] (DRILL-7299) Infinite exception loop in Sqlline after kill process

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7299:

Fix Version/s: (was: 1.17.0)

> Infinite exception loop in Sqlline after kill process
> -
>
> Key: DRILL-7299
> URL: https://issues.apache.org/jira/browse/DRILL-7299
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Vova Vysotskyi
>Priority: Major
>
> Tried killing Sqlline using the "kill" command. Ended up in an infinte loop 
> that repeated printed the following to the console:
> {noformat}
> java.lang.IllegalStateException
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:464)
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:445)
>   at sqlline.SqlLine.begin(SqlLine.java:537)
>   at sqlline.SqlLine.start(SqlLine.java:266)
>   at sqlline.SqlLine.main(SqlLine.java:205)
> java.lang.IllegalStateException
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:464)
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:445)
>   at sqlline.SqlLine.begin(SqlLine.java:537)
>   at sqlline.SqlLine.start(SqlLine.java:266)
>   at sqlline.SqlLine.main(SqlLine.java:205)
> ...
> {noformat}
> Using "kill -9" properly killed the process.
> Expected a simple "kill" ({{SIGTERM}}) to have done the job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-5046) Add documentation for directory based partition pruning

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5046:

Fix Version/s: (was: 1.17.0)

> Add documentation for directory based partition pruning
> ---
>
> Key: DRILL-5046
> URL: https://issues.apache.org/jira/browse/DRILL-5046
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.9.0
>Reporter: Rahul Kumar Challapalli
>Assignee: Bridget Bevens
>Priority: Major
>
> Drill's documentation for partition pruning should cover the below 2 features
> 1. Directory based partition pruning
> 2. Partition pruning based on auto-partitioned parquet files
> The first one seems to be missing from our documentation. At the very least 
> we should cover
> a. How we can leverage this feature to avoid full table scans
> b. How this feature works in-conjunction with metadata cache pruning
> c. A few examples which involve using wildcards for one of the sub-directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (DRILL-7299) Infinite exception loop in Sqlline after kill process

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-7299.
---
Resolution: Cannot Reproduce

> Infinite exception loop in Sqlline after kill process
> -
>
> Key: DRILL-7299
> URL: https://issues.apache.org/jira/browse/DRILL-7299
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Vova Vysotskyi
>Priority: Major
>
> Tried killing Sqlline using the "kill" command. Ended up in an infinte loop 
> that repeated printed the following to the console:
> {noformat}
> java.lang.IllegalStateException
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:464)
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:445)
>   at sqlline.SqlLine.begin(SqlLine.java:537)
>   at sqlline.SqlLine.start(SqlLine.java:266)
>   at sqlline.SqlLine.main(SqlLine.java:205)
> java.lang.IllegalStateException
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:464)
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:445)
>   at sqlline.SqlLine.begin(SqlLine.java:537)
>   at sqlline.SqlLine.start(SqlLine.java:266)
>   at sqlline.SqlLine.main(SqlLine.java:205)
> ...
> {noformat}
> Using "kill -9" properly killed the process.
> Expected a simple "kill" ({{SIGTERM}}) to have done the job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-4667) Improve memory footprint of broadcast joins

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4667:

Fix Version/s: (was: 1.17.0)

> Improve memory footprint of broadcast joins
> ---
>
> Key: DRILL-4667
> URL: https://issues.apache.org/jira/browse/DRILL-4667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Boaz Ben-Zvi
>Priority: Major
>
> For broadcast joins, currently Drill optimizes the data transfer across the 
> network for broadcast table by sending a single copy to the receiving node 
> which then distributes it to all minor fragments running on that particular 
> node.  However, each minor fragment builds its own hash table (for a hash 
> join) using this broadcast table.  We can substantially improve the memory 
> footprint by having a shared copy of the hash table among multiple minor 
> fragments on a node.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6412) Clients are created for all storages even for disabled plugins

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6412:

Fix Version/s: (was: 1.17.0)

> Clients are created for all storages even for disabled plugins
> --
>
> Key: DRILL-6412
> URL: https://issues.apache.org/jira/browse/DRILL-6412
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Other
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> The storage plugin configs for Hive storage plugin is not shown for the case 
> when some error happened while instantiating HiveMetaStoreClient:
> {code}
>  
> distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-embedded
>  
> May 12, 2018 5:21:45 PM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.14.0-SNAPSHOT 
> "the only truly happy people are children, the creative minority and drill 
> users"
> 0: jdbc:drill:zk=local> 17:21:54.609 [qtp17064901-66] ERROR 
> o.a.h.h.metastore.RetryingHMSHandler - MetaException(message:Version 
> information not found in metastore. )
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7542)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
> at com.sun.proxy.$Proxy72.verifySchema(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:591)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:584)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:651)
> 
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient.createCloseableClientWithCaching(DrillHiveMetaStoreClient.java:136)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory.(HiveSchemaFactory.java:76)
> at 
> org.apache.drill.exec.store.hive.HiveStoragePlugin.(HiveStoragePlugin.java:69)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.drill.exec.store.StoragePluginRegistryImpl.create(StoragePluginRegistryImpl.java:345)
> at 
> org.apache.drill.exec.store.StoragePluginRegistryImpl.createOrUpdate(StoragePluginRegistryImpl.java:238)
> {code}
> When drillbit is starting and StoragePluginRegistryImpl creates the 
> HIveStoragePlugin the DrillHiveMetaStoreClient is instantiated even when it 
> is disabled in the plugin template configs and there is no Hive on the 
> machine. 
> The solution is to check status for plugin from the template, when it is 
> disabled there is no need to instantiate the client for the appropriate 
> storage.
> The workaround is to use  "hive.metastore.schema.verification": "false", for 
> this case DrillHiveMetaStoreClient is created successfully (but for proper 
> working it should be performed only when plugin is enabled).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7133) Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7133:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Duplicate Corrupt PCAP Functionality in PCAP-NG Plugin
> --
>
> Key: DRILL-7133
> URL: https://issues.apache.org/jira/browse/DRILL-7133
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.18.0
>
>
> There was a JIRA (https://issues.apache.org/jira/browse/DRILL-7032) which 
> resulted in some improvements to the PCAP format plugin which converted the 
> TCP flags to boolean format and also added a {{is_corrupt}} boolean field.  
> This field allows users to look for packets that are corrupt. 
> Unfortunately, this functionality is not duplicated in the PCAP-NG format 
> plugin, so this JIRA proposes to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7129:

Fix Version/s: (was: 1.17.0)

> Join with more than 1 condition is not using stats to compute row count 
> estimate
> 
>
> Key: DRILL-7129
> URL: https://issues.apache.org/jira/browse/DRILL-7129
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anisha Reddy
>Priority: Major
>
> Below are the details: 
>  
> {code:java}
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | 
> 57068 | +-+ 1 row selected (0.179 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | 
> 7474 | +-+ 1 row selected (0.171 seconds) 
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | 
> +-+ 1 row selected (0.769 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes 
> for select * from `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; 
> +--+--+
>  | text | json | 
> +--+--+
>  | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
> rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 
> io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], 
> **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount 
> = 57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], 
> T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR 
> T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 
> cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 
> HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: 
> =[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY 
> l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], 
> l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, 
> ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = 
> {114136.0 rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 
> 107572 00-07 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/lineitem]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `l_partkey`, `l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 
> rows, 171204.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 
> Project(T11¦¦**=[$0], ps_partkey=[$1], ps_suppkey=[$2]) : rowType = 
> RecordType(DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 7474.0, cumulative cost = {14948.0 rows, 44844.0 cpu, 22422.0 io, 0.0 
> network, 0.0 memory}, id = 107574 00-06 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/partsupp]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 
> rows, 22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
> {code}
> The ndv for l_partkey = 2000
> ps_partkey = 1817
> l_supkey = 100
> ps_suppkey = 100 
> We see that such joins is just taking the max of left side and the right side 
> tab

[jira] [Updated] (DRILL-6245) Clicking on anything redirects to main login page

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6245:

Fix Version/s: (was: 1.17.0)

> Clicking on anything redirects to main login page
> -
>
> Key: DRILL-6245
> URL: https://issues.apache.org/jira/browse/DRILL-6245
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>
> When the Drill Web UI is accessed using https and then by http protocol, the 
> Web UI is always trying to redirect to main login page if anything is clicked 
> on index page. However, this works fine if the cookies are cleared.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7135) Upgrade to Jetty 9.4

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7135:

Fix Version/s: (was: 1.17.0)

> Upgrade to Jetty 9.4
> 
>
> Key: DRILL-7135
> URL: https://issues.apache.org/jira/browse/DRILL-7135
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> Initially DRILL-7051 updated Jetty to 9.4 version and DRILL-7081 updated 
> Jersey version to 2.28 version. These versions work fine for Drill with 
> Hadoop version below 3.0.
>  Starting from Hadoop 3.0 it uses 
> [org.eclipse.jetty|https://github.com/apache/hadoop/blob/branch-3.0/hadoop-project/pom.xml#L38]
>  9.3 version.
>  That's why it conflicts with newer Jetty&Jersey versions.
> Drill can update Jetty and Jersey versions after resolution HADOOP-14930 and 
> HBASE-19256.
>  Or alternatively these libs can be shaded in Drill, but there is no real 
> reason to do it nowadays.
> See details in 
> [#1681|https://github.com/apache/drill/pull/1681#discussion_r265904521] PR.
> _Notes_: 
> * For Jersey update it is necessary to add 
> org.glassfish.jersey.inject:jersey-hk2 in Drill to solve all compilation 
> failures.
> * See doc for Jetty update: 
> https://www.eclipse.org/jetty/documentation/9.4.x/upgrading-jetty.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7129:
---

Assignee: (was: Gautam Parai)

> Join with more than 1 condition is not using stats to compute row count 
> estimate
> 
>
> Key: DRILL-7129
> URL: https://issues.apache.org/jira/browse/DRILL-7129
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anisha Reddy
>Priority: Major
> Fix For: 1.17.0
>
>
> Below are the details: 
>  
> {code:java}
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | 
> 57068 | +-+ 1 row selected (0.179 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | 
> 7474 | +-+ 1 row selected (0.171 seconds) 
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | 
> +-+ 1 row selected (0.769 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes 
> for select * from `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; 
> +--+--+
>  | text | json | 
> +--+--+
>  | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
> rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 
> io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], 
> **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount 
> = 57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], 
> T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR 
> T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 
> cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 
> HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: 
> =[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY 
> l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], 
> l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, 
> ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = 
> {114136.0 rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 
> 107572 00-07 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/lineitem]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `l_partkey`, `l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 
> rows, 171204.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 
> Project(T11¦¦**=[$0], ps_partkey=[$1], ps_suppkey=[$2]) : rowType = 
> RecordType(DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 7474.0, cumulative cost = {14948.0 rows, 44844.0 cpu, 22422.0 io, 0.0 
> network, 0.0 memory}, id = 107574 00-06 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/partsupp]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 
> rows, 22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
> {code}
> The ndv for l_partkey = 2000
> ps_partkey = 1817
> l_supkey = 100
> ps_suppkey = 100 
> We see that such joins is just taking the ma

[jira] [Updated] (DRILL-7112) Code Cleanup for HTTPD Format Plugin

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7112:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Code Cleanup for HTTPD Format Plugin
> 
>
> Key: DRILL-7112
> URL: https://issues.apache.org/jira/browse/DRILL-7112
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.15.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
> Fix For: 1.18.0
>
>
> Address code clean up issues cited in 
> https://github.com/apache/drill/pull/1635.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7172) README files for steps describing building C++ Drill client (with protobuf) needs to be updated

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7172:

Fix Version/s: (was: 1.17.0)

> README files for steps describing building C++ Drill client (with protobuf) 
> needs to be updated
> ---
>
> Key: DRILL-7172
> URL: https://issues.apache.org/jira/browse/DRILL-7172
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Denys Ordynskiy
>Priority: Major
>
> During the 1.16.0 release, it was noticed that the steps (primarily library 
> versions) for rebuilding with protobuf-3.6.1 was outdated. 
> e.g. the Boost library version for building is reported as 1.53, where as 
> 1.60 in another place. The steps worked on an Ubuntu setup, but failed for 
> CentOS 7x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7282) Apache Drill using Outdated Version of many Libraries

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7282:

Fix Version/s: (was: 1.17.0)

> Apache Drill using Outdated Version of many Libraries
> -
>
> Key: DRILL-7282
> URL: https://issues.apache.org/jira/browse/DRILL-7282
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Ayush Sharma
>Priority: Major
>
> Apache Drill is using outdated version of libraries and should update to the 
> latest versions to avoid security issues in future.
> The below are the list of libraries which needs to be updated:
> commons-compiler-2.7.6 - Latest Version 3.0.12 -Jan 2019 
> commons-compress-1.4.1 - Latest Version 1.18 - Aug 2018
> janino-2.7.6 - Latest Version 3.0.12 - Jan 2019
> jersey-common-2.8 - Latest Version 2.28 - Jan 2019
> jersey-container-servlet-core-2.8 - Latest Version 2.28 - Jan 2019
> jersey-guava-2.8 - Latest Version 2.28 - Jan 2019
> jersey-media-multipart-2.8 - Latest Version 2.28 - Jan 2019
> jersey-mvc-2.8 - Latest Version 2.28 - Jan 2019
> jersey-mvc-freemarker-2.8 - Latest Version 2.28 - Jan 2019
> jersey-server-2.8 - Latest Version 2.28 - Jan 2019
> jline-2.10 - Latest Version 3.0.0.M1 - May 2016
> log4j-over-slf4j-1.7.6 - Latest Version 1.8.0-beta4 - Feb 2019
> logback-classic-1.2.3 - Latest Version 1.3.0-alpha4 Feb 2018
> logback-core-1.2.3 - Latest Version 1.3.0-alpha4 Feb 2018
> mimepull-1.9.3 - Latest Version 1.9.11 - Jan 2019
> protostuff-json-1.0.8 - Latest Version 1.1.3 - Nov 2017
> reflections-0.9.10 - Latest Version 0.9.11 - Mar 2017
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7226) Compilation error on Windows when building from the release tarball sources

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7226:

Fix Version/s: (was: 1.17.0)

> Compilation error on Windows when building from the release tarball sources
> ---
>
> Key: DRILL-7226
> URL: https://issues.apache.org/jira/browse/DRILL-7226
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Denys Ordynskiy
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-impacting
> Attachments: 7z_version.png, broken_filenames.png, 
> tarball_building.log
>
>
> *Description:*
>  OS - Windows.
>  Downloaded tarball with sources for the 
> [1.15|http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/apache-drill-1.15.0-src.tar.gz]
>  or 
> [1.16|http://home.apache.org/~sorabh/drill/releases/1.16.0/rc2/apache-drill-1.16.0-src.tar.gz]
>  Drill release.
>  Extracted the sources.
>  Built sources using the following command:
> {noformat}
> mvn clean install -DskipTests -Pmapr
> {noformat}
> *Expected result:*
>  BUILD SUCCESS
> *Actual result:*
> {noformat}
> ...
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> D:\src\rc2\apache-drill-1.16.0-src\protocol\src\main\java\org\apache\drill\exec\proto\beans\RecordBatchDef.java:[53,17]
>  error: cannot find symbol
>   symbol:   class SerializedField
>   location: class RecordBatchDef
> ...
> BUILD FAILURE
> {noformat}
> See "tarball_building.log"
> There are no errors when building sources on Windows from the GitHub release 
> [branch|https://github.com/sohami/drill/commits/drill-1.16.0].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7175) Configuration of store.parquet.use_new_reader in drill-override.conf has no effect

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7175:

Fix Version/s: (was: 1.17.0)

> Configuration of store.parquet.use_new_reader in drill-override.conf has no 
> effect
> --
>
> Key: DRILL-7175
> URL: https://issues.apache.org/jira/browse/DRILL-7175
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: benj
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>
> As related in drill user mailing list 
> ([http://mail-archives.apache.org/mod_mbox/drill-user/201904.mbox/%3ceec5a6bc-fa95-44ce-d2b6-c02c1bfd0...@laposte.net%3e])
> It's possible to configure any system option of Drill (sys.option) in the 
> drill-override.conf.
>  Exemple with _drill.exec.storage.file.partition.column.label_
> {code:java}
> drill.exec: {
>   cluster-id: "drillbits-test",
>   ...
> },
> drill.exec.options: {
>   drill.exec.storage.file.partition.column.label: "drill_dir",
>   ...
> }
> {code}
> But the configuration of the particular option *store.parquet.use_new_reader* 
> in the same way has absolutly no effect.
> Have no idea if this in rapport  with the "Not supported in this release" 
> description found for this option, but it's appears at minimal strange that 
> it's possible to configure it with ALTER SESSION/SYSTEM and not in 
> drill-override.conf
> As an additional point, I had a little trouble finding that the options had 
> to fit into *drill.exec.options:* so I propose to add in 
> drill-override-example.conf a sample of configuration of any option to avoid 
> this trouble.
> Even with keeping the mention found in drill-module.conf if necessary
> {code:json}
> # Users are not supposed to set these options in the drill-override.conf file.
> # Users should use ALTER SYSTEM and ALTER SESSION to set the options.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7224) Update example row set test

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7224:

Fix Version/s: (was: 1.17.0)

> Update example row set test
> ---
>
> Key: DRILL-7224
> URL: https://issues.apache.org/jira/browse/DRILL-7224
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The example row set test {{ExampleTest}} is a bit outdated. This PR will 
> update it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7173) Analyze table may fail when prefer_plain_java is set to true on codegen for resetValues

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7173:

Fix Version/s: (was: 1.17.0)

> Analyze table may fail when prefer_plain_java is set to true on codegen for 
> resetValues 
> 
>
> Key: DRILL-7173
> URL: https://issues.apache.org/jira/browse/DRILL-7173
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.15.0
> Environment: *prefer_plain_java: true*
>  
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>   The *prefer_plain_java* compile option is useful for debugging of generated 
> code (can be set in dril-override.conf; the default value is false). When set 
> to true, some "analyze table" calls generate code that fails due to addition 
> of a SchemaChangeException which is not in the Streaming Aggr template.
> For example:
> {noformat}
> apache drill (dfs.tmp)> create table lineitem3 as select * from 
> cp.`tpch/lineitem.parquet`;
> +--+---+
> | Fragment | Number of records written |
> +--+---+
> | 0_0 | 60175 |
> +--+---+
> 1 row selected (2.06 seconds)
> apache drill (dfs.tmp)> analyze table lineitem3 compute statistics;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[StreamingAggregatorGen4.java]',
>  Line 7869, Column 20: StreamingAggregatorGen4.java:7869: error: 
> resetValues() in org.apache.drill.exec.test.generated.StreamingAggregatorGen4 
> cannot override resetValues() in 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggTemplate
>  public boolean resetValues()
>  ^
>  overridden method does not throw 
> org.apache.drill.exec.exception.SchemaChangeException 
> (compiler.err.override.meth.doesnt.throw)
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6945) Update INFORMATION_SCHEMA.SCHEMATA table description

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6945:

Fix Version/s: (was: 1.17.0)

> Update INFORMATION_SCHEMA.SCHEMATA table description
> 
>
> Key: DRILL-6945
> URL: https://issues.apache.org/jira/browse/DRILL-6945
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-impacting
>
> https://drill.apache.org/docs/querying-the-information-schema/
> Currently documentation states that SCHEMA table contains only several 
> columns:
> {noformat}
> The SCHEMATA table contains the CATALOG_NAME and SCHEMA_NAME columns. To 
> allow maximum flexibility inside BI tools, the only catalog that Drill 
> supports is DRILL.
> {noformat}
> In reality it contains far more columns (especially TYPE and IS_MUTABLE) 
> which can be considered to be documented:
> {noformat}
> drill (information_schema)>select * from schemata;
> +---+--+---++-+
> | CATALOG_NAME  | SCHEMA_NAME  | SCHEMA_OWNER  |  TYPE  | 
> IS_MUTABLE  |
> +---+--+---++-+
> | DRILL | cp.default   || file   | NO  
> |
> | DRILL | dfs.default  || file   | NO  
> |
> | DRILL | dfs.myschemainitcap  || file   | YES 
> |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6923:

Fix Version/s: (was: 1.17.0)

> Show schemas uses default(user defined) schema first for resolving table from 
> information_schema
> 
>
> Key: DRILL-6923
> URL: https://issues.apache.org/jira/browse/DRILL-6923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
>
> Show tables tries to find table `information_schema`.`schemata` in default 
> (user defined) schema, and after failed attempt it resolves table 
> successfully against root schema. Please check description below for details 
> explained using example with hive plugin. 
> *Abstract* 
> When Drill used with enabled Hive SQL Standard authorization, execution of 
> queries like,
> {code:sql}
> USE hive.db_general;
> SHOW SCHEMAS LIKE 'hive.%'; {code}
> results in error DrillRuntimeException: Failed to use the Hive authorization 
> components: Error getting object from metastore for Object 
> [type=TABLE_OR_VIEW, name=db_general.information_schema] . 
> *Details* 
> Consider showSchemas() test similar to one defined in 
> TestSqlStdBasedAuthorization : 
> {code:java}
> @Test
> public void showSchemas() throws Exception {
>   test("USE " + hivePluginName + "." + db_general);
>   testBuilder()
>   .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
>   .unOrdered()
>   .baselineColumns("SCHEMA_NAME")
>   .baselineValues("hive.db_general")
>   .baselineValues("hive.default")
>   .go();
> }
> {code}
> Currently execution of such test will produce following stacktrace: 
> {code:none}
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed 
> to use the Hive authorization components: Error getting object from metastore 
> for Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
> at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
> at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
> at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
> at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
> at 
> org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
> at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
> at 
> org.apache.calcite.sql.validate.SqlValidator

[jira] [Updated] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7162:

Fix Version/s: (was: 1.17.0)

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Major
> Attachments: Jars.xlsx
>
>
> Apache Drill uses 3rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6836) Eliminate StreamingAggr for COUNT DISTINCT

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6836:

Fix Version/s: (was: 1.17.0)

> Eliminate StreamingAggr for COUNT DISTINCT
> --
>
> Key: DRILL-6836
> URL: https://issues.apache.org/jira/browse/DRILL-6836
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Minor
>
> The COUNT DISTINCT operation is often implemented with a Hash-Aggr operator 
> for the DISTINCT, and a Streaming-Aggr above to perform the COUNT.  That 
> Streaming-Aggr does the counting like any aggregation, counting each value, 
> batch after batch.
>   While very efficient, that counting work is basically not needed, as the 
> Hash-Aggr knows the number of distinct values (in the in-memory partitions).
>   Hence _a possible small performance improvement_ - eliminate the 
> Streaming-Aggr operator, and notify the Hash-Aggr to return a COUNT (these 
> are Planner changes). The Hash-Aggr operator would need to generate the 
> single Float8 column output schema, and output that batch with a single 
> value, just like the Streaming -Aggr did (likely without generating code).
>   In case of a spill, the Hash-Aggr still needs to read and process those 
> partitions, to get the exact distinct number.
>    The expected improvement is the elimination of the batch by batch output 
> from the Hash-Aggr, and the batch by batch, row by row processing of the 
> Streaming-Aggr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7141) Hash-Join (and Agg) should always spill to disk the least used partition

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7141:

Fix Version/s: (was: 1.17.0)

> Hash-Join (and Agg) should always spill to disk the least used partition
> 
>
> Key: DRILL-7141
> URL: https://issues.apache.org/jira/browse/DRILL-7141
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Boaz Ben-Zvi
>Priority: Major
>
> When the probe-side data for a hash join is skewed, it is preferable to have 
> the corresponding partition on the build side to be in memory. 
> Currently, with the spill-to-disk feature, the partition selected for 
> spilling to disk is done at random. This means that a highly skewed 
> probe-side data would also spill for lack of a corresponding hash table 
> partition in memory. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6839:

Fix Version/s: (was: 1.17.0)

> Failed to plan (aggregate + Hash or NL join) when slice target is low 
> --
>
> Key: DRILL-6839
> URL: https://issues.apache.org/jira/browse/DRILL-6839
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> *Case 1.* When nested loop join is about to be used:
>  - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
>  - Option "_planner.slice_target_" is set to low value for imitation of big 
> input tables
>  
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>  startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
>try {
>  client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), 
> false);
>  client.alterSession(ExecConstants.SLICE_TARGET, 1);
>  queryBuilder().sql(
> "SELECT COUNT(l.nation_id) " +
> "FROM cp.`tpch/nation.parquet` l " +
> ", cp.`tpch/region.parquet` r")
>  .run();
>} finally {
> client.resetSession(ExecConstants.SLICE_TARGET);
> client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
>}
>  }
> }{code}
>  
> *Case 2.* When hash join is about to be used:
>  - Option "planner.enable_mergejoin" is set to false, so hash join will be 
> used instead
>  - Option "planner.slice_target" is set to low value for imitation of big 
> input tables
>  - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in 
> PlannerPhase.getPhysicalRules method
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
>try {
> client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
> client.alterSession(ExecConstants.SLICE_TARGET, 1);
> queryBuilder().sql(
>   "SELECT COUNT(l.nation_id) " +
>   "FROM cp.`tpch/nation.parquet` l " +
>   "INNER JOIN cp.`tpch/region.parquet` r " +
>   "ON r.nation_id = l.nation_id")
> .run();
>} finally {
> client.resetSession(ExecConstants.SLICE_TARGET);
> client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
>}
>  }
> }
> {code}
>  
> *Workaround:* To avoid the exception we need to set option 
> "_planner.enable_multiphase_agg_" to false. By doing this we avoid 
> unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule 
> and guarantee that logical aggregate will be converted to physical one. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6899) Fix timestamp issues in unit tests ignored with DRILL-6833

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6899:

Fix Version/s: (was: 1.17.0)

> Fix timestamp issues in unit tests ignored with DRILL-6833
> --
>
> Key: DRILL-6899
> URL: https://issues.apache.org/jira/browse/DRILL-6899
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> {{The following tests were disabled in the PR for DRILL-6833}}
> {{IndexPlanTest.testCastTimestampPlan() - Re-enable after the MapRDB format 
> plugin issue is fixed.}}
> {{IndexPlanTest.testRowkeyJoinPushdown_13() - Re-enable the testcase after 
> fixing the execution issue with HashJoin used as Rowkeyjoin.}}
> {{IndexPlanTest.testRowkeyJoinPushdown_12() - Remove the testcase since the 
> SemiJoin transformation makes the rowkeyjoinpushdown transformation invalid.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6799) Enhance the Hash-Join Operator to perform Anti-Semi-Join

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6799:

Fix Version/s: (was: 1.17.0)

> Enhance the Hash-Join Operator to perform Anti-Semi-Join
> 
>
> Key: DRILL-6799
> URL: https://issues.apache.org/jira/browse/DRILL-6799
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Minor
>
> Similar to handling Semi-Join (see DRILL-6735), the Anti-Semi-Join can be 
> enhanced by eliminating the extra DISTINCT (i.e. Hash-Aggr) operator.
> Example (note the NOT IN):
> select c.c_first_name, c.c_last_name from dfs.`/data/json/s1/customer` c 
> where c.c_customer_sk NOT IN (select s.ss_customer_sk from 
> dfs.`/data/json/s1/store_sales` s) limit 4;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7274) Introduce ANALYZE TABLE statements

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7274:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Introduce ANALYZE TABLE statements
> --
>
> Key: DRILL-7274
> URL: https://issues.apache.org/jira/browse/DRILL-7274
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7330) Implement metadata usage for text format plugin

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7330:

Fix Version/s: 1.18.0

> Implement metadata usage for text format plugin
> ---
>
> Key: DRILL-7330
> URL: https://issues.apache.org/jira/browse/DRILL-7330
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> 1. Change the current group scan to leverage Schema from Metastore;
> 2. Use stats for enabling additional logical planning rules for text format 
> plugin. It will enable such optimizations as limit, filter push and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6552) Drill Metadata management "Drill Metastore"

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6552:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Drill Metadata management "Drill Metastore"
> ---
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6695) Graceful shutdown removes spill directory before query finished

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6695:

Fix Version/s: (was: 1.17.0)

> Graceful shutdown removes spill directory before query finished 
> 
>
> Key: DRILL-6695
> URL: https://issues.apache.org/jira/browse/DRILL-6695
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Attachments: drillbit.log
>
>
> Ran the following query from sqlline:
> select a.columns[0], b.columns[1], a.columns[2], b.columns[3], a.columns[4] 
> from `test` a join `test` b on a.columns[0]=b.columns[0] and 
> a.columns[4]=b.columns[4] order by a.columns[0] limit 1000;
> While the query was running, initiated a graceful shutdown from command line 
> on the foreman node.  The query failed with the following error message:
> Error: RESOURCE ERROR: Hash Join failed to open spill file: 
> /tmp/drill/spill/248a054a-ee63-e795-a44e-d9205df8e9b8_HashJoin_3-2-0/spill7_outer
> Fragment 3:0
> Looks like somehow the spill directory gets deleted while query is still 
> running when graceful_shutdown is initiated.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7229) Add scripts to the release folder in Drill Repo

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7229:

Reporter: Vova Vysotskyi  (was: Sorabh Hamirwasia)

> Add scripts to the release folder in Drill Repo
> ---
>
> Key: DRILL-7229
> URL: https://issues.apache.org/jira/browse/DRILL-7229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Reporter: Vova Vysotskyi
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.17.0
>
>
> Move the release automation script into Drill repo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7230) Add README.md with instructions for release

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7230:

Fix Version/s: (was: 1.17.0)

> Add README.md with instructions for release
> ---
>
> Key: DRILL-7230
> URL: https://issues.apache.org/jira/browse/DRILL-7230
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6615) To prevent the limit operator after topN operator if there is a single fragment

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6615:

Fix Version/s: (was: 1.17.0)

> To prevent the limit operator after topN operator if there is a single 
> fragment
> ---
>
> Key: DRILL-6615
> URL: https://issues.apache.org/jira/browse/DRILL-6615
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Attachments: topnNlimit.pdf
>
>
> The limit operator is called after topN operator which is not needed if u 
> have only 1 fragment .
> For eg :- 
> {code}
> 00-00 Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY EXPR$2, 
> ANY EXPR$3): rowcount = 50.0, cumulative cost = \{116621.0 rows, 
> 2787624.201857771 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7854 00-01 
> Project(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY EXPR$2, ANY EXPR$3): rowcount = 
> 50.0, cumulative cost = \{116616.0 rows, 2787619.201857771 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 7853 00-02 SelectionVectorRemover : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY ITEM, ANY ITEM1): rowcount = 50.0, 
> cumulative cost = \{116566.0 rows, 2787419.201857771 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 7852 00-03 Limit(fetch=[50]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY ITEM, ANY ITEM1): rowcount = 50.0, 
> cumulative cost = \{116516.0 rows, 2787369.201857771 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 7851 00-04 SelectionVectorRemover : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY ITEM, ANY ITEM1): rowcount = 
> 29116.0, cumulative cost = \{116466.0 rows, 2787169.201857771 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 7850 00-05 TopN(limit=[50]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY ITEM, ANY ITEM1): rowcount = 
> 29116.0, cumulative cost = \{87350.0 rows, 2758053.201857771 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 7849 00-06 LateralJoin(correlation=[$cor2], 
> joinType=[inner], requiredColumns=[\{0}], column excluded from output: 
> =[`c_orders`]) : rowType = RecordType(ANY c_custkey, ANY c_name, ANY ITEM, 
> ANY ITEM1): rowcount = 29116.0, cumulative cost = \{58234.0 rows, 786135.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7848 00-08 
> Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateralUnnest/sf0dot01/json/customer, 
> numFiles=1, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateralUnnest/sf0dot01/json/customer/customer.json]]])
>  : rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 29116.0, cumulative cost = \{29116.0 rows, 87348.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 7846 00-07 Project(ITEM=[ITEM($0, 'o_orderkey')], 
> ITEM1=[ITEM($0, 'o_totalprice')]) : rowType = RecordType(ANY ITEM, ANY 
> ITEM1): rowcount = 1.0, cumulative cost = \{2.0 rows, 3.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 7847 00-09 Unnest [srcOp=00-06] : rowType = 
> RecordType(ANY c_orders): rowcount = 1.0, cumulative cost = \{1.0 rows, 1.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7700
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6593) unordered receivers for broadcast senders dont rerpot memmory consumption

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6593:

Fix Version/s: (was: 1.17.0)

> unordered receivers for broadcast senders dont rerpot memmory consumption
> -
>
> Key: DRILL-6593
> URL: https://issues.apache.org/jira/browse/DRILL-6593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
> Environment: RHEL 7
>Reporter: Dechang Gu
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Attachments: TPCDS_78_3_id_24c43954-65b4-07b6-7e53-a37ad47fa963.json
>
>
> In my regression test on TPCDS SF100 dataset, query 78 profile shows the 
> following:
> {code}
> 05-xx-02  PROJECT 0.000s  0.001s  0.003s  0.022s  0.000s  0.000s  0.000s  
> 0.02%   0.00%   64,787,488  3MB 3MB
> 05-xx-03  HASH_JOIN   0.000s  0.000s  0.774s  1.002s  0.000s  0.000s  
> 0.000s  6.87%   0.32%   69,186,507  8MB 10MB
> 05-xx-04  UNORDERED_RECEIVER  0.000s  0.000s  0.000s  0.000s  0.000s  
> 0.000s  0.000s  0.00%   0.00%   4,382,940   -   -
> 05-xx-05  PROJECT 0.000s  0.001s  0.002s  0.015s  0.000s  0.000s  0.000s  
> 0.02%   0.00%   64,803,567  3MB 3MB
> 05-xx-06  SELECTION_VECTOR_REMOVER0.000s  0.000s  0.333s  0.566s  
> 0.000s  0.000s  0.000s  2.95%   0.14%   64,803,567  5MB 5MB
> {code}
> Note 05-xx-04 did not show memory usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6501) Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6501:

Fix Version/s: (was: 1.17.0)

> Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed
> 
>
> Key: DRILL-6501
> URL: https://issues.apache.org/jira/browse/DRILL-6501
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DRILL-6212 is a temporary fix to alleviate issues due to CALCITE-2223. Once, 
> CALCITE-2223 is fixed this change needs to be reverted back which would 
> require DrillProjectMergeRule to go back to extending the ProjectMergeRule. 
> Please take a look at how CALCITE-2223 is eventually fixed (as of now it is 
> still not clear which fix is the way to do). Depending on the fix we may need 
> to additional work to integrate these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7220) Create a release package in Drill repo with automated scripts and instructions

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7220:

Reporter: Vova Vysotskyi  (was: Sorabh Hamirwasia)

> Create a release package in Drill repo with automated scripts and instructions
> --
>
> Key: DRILL-7220
> URL: https://issues.apache.org/jira/browse/DRILL-7220
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Vova Vysotskyi
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7209) Bundle parquet-tools Jar file with Apache Drill distribution

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7209:

Fix Version/s: (was: 1.17.0)

> Bundle parquet-tools Jar file with Apache Drill distribution
> 
>
> Key: DRILL-7209
> URL: https://issues.apache.org/jira/browse/DRILL-7209
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Metadata
>Reporter: Kunal Khatua
>Priority: Trivial
>
> It would be nice to have the parquet-tools JAR as part of the distribution, 
> so as to allow users to peek into the files' schema, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7221) Exclude debug files generated my maven debug option from jar

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7221:

Reporter: Vova Vysotskyi  (was: Sorabh Hamirwasia)

> Exclude debug files generated my maven debug option from jar
> 
>
> Key: DRILL-7221
> URL: https://issues.apache.org/jira/browse/DRILL-7221
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Reporter: Vova Vysotskyi
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>
> Release automated script was using -X debug option at release:prepare phase. 
> This was generating some debug files which were getting packaged in the jars. 
> This is because the pattern of these debug files were not ignored in exclude 
> configuration of maven-jar plugin. It would be good to ignore these.
> *Debug files which were included:*
> *javac.sh*
> *org.codehaus.plexus.compiler.javac.JavacCompiler1256088670033285178arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler1458111453480208588arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler2392560589194600493arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler4475905192586529595arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler4524532450095901144arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler4670895443631397937arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler5215058338087807885arguments*
> *org.codehaus.plexus.compiler.javac.JavacCompiler7526103232425779297arguments*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7208) Drill commit is no showed if build Drill from the 1.16.0-rc1 release sources.

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7208:
---

Assignee: Vova Vysotskyi

> Drill commit is no showed if build Drill from the 1.16.0-rc1 release sources.
> -
>
> Key: DRILL-7208
> URL: https://issues.apache.org/jira/browse/DRILL-7208
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> *Steps:*
>  # Download the rc1 sources tarball:
>  
> [apache-drill-1.16.0-src.tar.gz|http://home.apache.org/~sorabh/drill/releases/1.16.0/rc1/apache-drill-1.16.0-src.tar.gz]
>  # Unpack
>  # Build:
> {noformat}
> mvn clean install -DskipTests
> {noformat}
>  # Start Drill in embedded mode:
> {noformat}
> Linux:
> distribution/target/apache-drill-1.16.0/apache-drill-1.16.0/bin/drill-embedded
> Windows:
> distribution\target\apache-drill-1.16.0\apache-drill-1.16.0\bin\sqlline.bat 
> -u "jdbc:drill:zk=local"
> {noformat}
>  # Run the query:
> {code:sql}
> select * from sys.version;
> {code}
> *Expected result:*
>  Drill version, commit_id, commit_message, commit_time, build_email, 
> build_time should be correctly displayed.
> *Actual result:*
> {noformat}
> apache drill> select * from sys.version;
> +-+---++-+-++
> | version | commit_id | commit_message | commit_time | build_email | 
> build_time |
> +-+---++-+-++
> | 1.16.0  | Unknown   || | Unknown |  
>   |
> +-+---++-+-++
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7136:

Fix Version/s: (was: 1.17.0)

> Num_buckets for HashAgg in profile may be inaccurate
> 
>
> Key: DRILL-7136
> URL: https://issues.apache.org/jira/browse/DRILL-7136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill
>
>
> I ran TPCH query 17 with sf 1000.  Here is the query:
> {noformat}
> select
>   sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem l,
>   part p
> where
>   p.p_partkey = l.l_partkey
>   and p.p_brand = 'Brand#13'
>   and p.p_container = 'JUMBO CAN'
>   and l.l_quantity < (
> select
>   0.2 * avg(l2.l_quantity)
> from
>   lineitem l2
> where
>   l2.l_partkey = p.p_partkey
>   );
> {noformat}
> One of the hash agg operators has resized 6 times.  It should have 4M 
> buckets.  But the profile shows it has 64K buckets.
> I have attached a sample profile.  In this profile, the hash agg operator is 
> (04-02).
> {noformat}
> Operator Metrics
> Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING
> RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB  
>   SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
> AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
> AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
> 04-00-02  65,536 748,746  6   364 1   
> 582 0   813 582,653 18  26,316,456  401 1,631,943 
>   25  26,176,350
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (DRILL-7192) Drill limits rows when autoLimit is disabled

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7192:
---

Assignee: Vova Vysotskyi  (was: Kunal Khatua)

> Drill limits rows when autoLimit is disabled
> 
>
> Key: DRILL-7192
> URL: https://issues.apache.org/jira/browse/DRILL-7192
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> In DRILL-7048 was implemented autoLimit for JDBC and rest clients.
> *Steps to reproduce the issue:*
>  1. Check that autoLimit was disabled, if not, disable it and restart Drill.
>  2. Submit any query, and verify that rows count is correct, for example,
> {code:sql}
> SELECT * FROM cp.`employee.json`;
> {code}
> returns 1,155 rows
>  3. Enable autoLimit for sqlLine sqlLine client:
> {code:sql}
> !set rowLimit 10
> {code}
> 4. Submit the same query and verify that the result has 10 rows.
>  5. Disable autoLimit:
> {code:sql}
> !set rowLimit 0
> {code}
> 6. Submit the same query, but for this time, *it returns 10 rows instead of 
> 1,155*.
> Correct rows count is returned only after creating a new connection.
> The same issue is also observed for SQuirreL SQL client, but for example, for 
> Postgres, it works correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7193) Integration changes of the Distributed RM queue configuration with Simple Parallelizer.

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7193:

Fix Version/s: (was: 1.17.0)

> Integration changes of the Distributed RM queue configuration with Simple 
> Parallelizer.
> ---
>
> Key: DRILL-7193
> URL: https://issues.apache.org/jira/browse/DRILL-7193
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> Refactoring fragment generation code for the RM to accommodate non RM, ZK 
> based queue RM and Distributed RM.
> Calling the Distributed RM for queue selection based on memory requirements.
> Adjustment of the operator memory based on the memory limits of the selected 
> queue.
> Setting of the optimal memory allocation per operator in each minor fragment. 
> This shows up in the query profile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7192) Drill limits rows when autoLimit is disabled

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7192:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Drill limits rows when autoLimit is disabled
> 
>
> Key: DRILL-7192
> URL: https://issues.apache.org/jira/browse/DRILL-7192
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.18.0
>
>
> In DRILL-7048 was implemented autoLimit for JDBC and rest clients.
> *Steps to reproduce the issue:*
>  1. Check that autoLimit was disabled, if not, disable it and restart Drill.
>  2. Submit any query, and verify that rows count is correct, for example,
> {code:sql}
> SELECT * FROM cp.`employee.json`;
> {code}
> returns 1,155 rows
>  3. Enable autoLimit for sqlLine sqlLine client:
> {code:sql}
> !set rowLimit 10
> {code}
> 4. Submit the same query and verify that the result has 10 rows.
>  5. Disable autoLimit:
> {code:sql}
> !set rowLimit 0
> {code}
> 6. Submit the same query, but for this time, *it returns 10 rows instead of 
> 1,155*.
> Correct rows count is returned only after creating a new connection.
> The same issue is also observed for SQuirreL SQL client, but for example, for 
> Postgres, it works correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7015) Improve documentation for PARTITION BY

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7015:

Fix Version/s: (was: 1.17.0)

> Improve documentation for PARTITION BY
> --
>
> Key: DRILL-7015
> URL: https://issues.apache.org/jira/browse/DRILL-7015
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Boaz Ben-Zvi
>Assignee: Bridget Bevens
>Priority: Minor
>
> The documentation for CREATE TABLE AS (CTAS) shows the syntax of the command, 
> without the optional PARTITION BY clause. That option is only mentioned later 
> under the usage notes.
> *+_Suggestion_+*: Add this optional clause to the syntax (same as for CREATE 
> TEMPORARY TABLE (CTTAS)). And mention that this option is only applicable 
> when storing in Parquet. 
> And the documentation for CREATE TEMPORARY TABLE (CTTAS), the comment says:
> {panel}
> An optional parameter that can *only* be used to create temporary tables with 
> the Parquet data format. 
> {panel}
> Which can mistakenly be understood as "only for temporary tables". 
> *_+Suggestion+_*: erase the "to create temporary tables" part (not needed, as 
> it is implied from the context of this page).
> *_+Last suggestion+_*: In the documentation for the PARTITION BY clause, can 
> add an example using the implicit column "filename" to demonstrate how the 
> partitioning column puts each distinct value into a separate file. For 
> example, add in the "Other Examples" section :
> {noformat}
> 0: jdbc:drill:zk=local> select distinct r_regionkey, filename from mytable1;
> +--++
> | r_regionkey  |filename|
> +--++
> | 2| 0_0_3.parquet  |
> | 1| 0_0_2.parquet  |
> | 0| 0_0_1.parquet  |
> | 3| 0_0_4.parquet  |
> | 4| 0_0_5.parquet  |
> +--++
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7191:

Fix Version/s: (was: 1.17.0)

> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning & Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> Changes to support storing UUID for each Drillbit Service Instance locally to 
> be used by planner and execution layer. This UUID is used to uniquely 
> identify a Drillbit and register Drillbit information in the RM StateBlobs.
> Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore 
> with Transactional capabilities using Zookeeper Transactional API’s. This is 
> used for updating RM State blobs as all the updates need to happen in 
> transactional manner. Added RMStateBlobs definition and support for serde to 
> Zookeeper.
> Implementation for DistributedRM and its corresponding QueryRM apis and state 
> management.
> Updated the state management of Query in Foreman so that same Foreman object 
> can be submitted multiple times. Also introduced concept of 2 maps keeping 
> track of waiting and running queries. These were done to support for async 
> admit protocol which will be needed with Distributed RM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6956) Maintain a single entry for Drill Version in the pom file

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6956:

Fix Version/s: (was: 1.17.0)

> Maintain a single entry for Drill Version in the pom file
> -
>
> Key: DRILL-6956
> URL: https://issues.apache.org/jira/browse/DRILL-6956
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>
> Currently, updating the version information for a Drill release involves 
> updating 30+ pom files.
> The right way would be to use the Multi Module Setup for Maven CI.
> https://maven.apache.org/maven-ci-friendly.html#Multi_Module_Setup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6758) Hash Join should not return the join columns when they are not needed downstream

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6758:

Fix Version/s: (was: 1.17.0)

> Hash Join should not return the join columns when they are not needed 
> downstream
> 
>
> Key: DRILL-6758
> URL: https://issues.apache.org/jira/browse/DRILL-6758
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Minor
>
> Currently the Hash-Join operator returns all its (both sides) incoming 
> columns. In cases where the join columns are not used further downstream, 
> this is a waste (allocating vectors, copying each value, etc).
>   Suggestion: Have the planner pass this information to the Hash-Join 
> operator, to enable skipping the return of these columns.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6949) Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition the inner data any further" when Semi join is enabled

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6949:

Fix Version/s: (was: 1.17.0)

> Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition 
> the inner data any further" when Semi join is enabled
> 
>
> Key: DRILL-6949
> URL: https://issues.apache.org/jira/browse/DRILL-6949
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Ravi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Attachments: 23cc1240-74ff-a0c0-8cd5-938fc136e4e2.sys.drill, 
> 23cc1369-0812-63ce-1861-872636571437.sys.drill
>
>
> Following query fails when with *Error: UNSUPPORTED_OPERATION ERROR: 
> Hash-Join can not partition the inner data any further (probably due to too 
> many join-key duplicates)* on TPC-H SF100 data.
> {code:sql}
> set `exec.hashjoin.enable.runtime_filter` = true;
> set `exec.hashjoin.runtime_filter.max.waiting.time` = 1;
> set `planner.enable_broadcast_join` = false;
> select
>  count(*)
> from
>  lineitem l1
> where
>  l1.l_discount IN (
>  select
>  distinct(cast(l2.l_discount as double))
>  from
>  lineitem l2);
> reset `exec.hashjoin.enable.runtime_filter`;
> reset `exec.hashjoin.runtime_filter.max.waiting.time`;
> reset `planner.enable_broadcast_join`;
> {code}
> The subquery contains *distinct* keyword and hence there should not be 
> duplicate values. 
> I suspect that the failure is caused by semijoin because the query succeeds 
> when semijoin is disabled explicitly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6767) Simplify transfer of information from the planner to the operators

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6767:

Fix Version/s: (was: 1.17.0)

> Simplify transfer of information from the planner to the operators
> --
>
> Key: DRILL-6767
> URL: https://issues.apache.org/jira/browse/DRILL-6767
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Minor
>
> Currently little specific information known to the planner is passed to the 
> operators. For example, see the `joinType` parameter passed to the Join 
> operators (specifying whether this is a LEFT, RIGHT, INNER of FULL join). 
>  The relevant code passes this information explicitly via the constructors' 
> signature (e.g., see HashJoinPOP, AbstractJoinPop, etc), and uses specific 
> fields for this information, and affects all the test code using it, etc.
>  In the near future many more such "pieces of information" will possibly be 
> added to Drill, including:
>  (1) Is this a Semi (or Anti-Semi) join.
>  (2) `joinControl`
>  (3) `isRowKeyJoin`
>  (4) `isBroadcastJoin`
>  (5) Which join columns are not needed (DRILL-6758)
>  (6) Is this operator positioned between Lateral and UnNest.
>  (7) For Hash-Agg: Which phase (already implemented).
>  (8) For Hash-Agg: Perform COUNT  (DRILL-6836) 
> Each addition of such information would require a significant code change, 
> and add some code clutter.
> *Suggestion*: Instead pass a single object containing all the needed planner 
> information. So the next time another field is added, only that object needs 
> to be changed. (Ideally the whole plan could be passed, and then each 
> operator could poke and pick its needed fields)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6728) DRILL-4864 - doc udfs for date, time, timestamp functions

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6728:

Fix Version/s: (was: 1.17.0)

> DRILL-4864 - doc udfs for date, time, timestamp functions
> -
>
> Key: DRILL-6728
> URL: https://issues.apache.org/jira/browse/DRILL-6728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6794) Document the JDBC properties required to retrieve result sets in batches while querying large tables

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6794:

Fix Version/s: (was: 1.17.0)

> Document the JDBC properties required to retrieve result sets in batches 
> while querying large tables
> 
>
> Key: DRILL-6794
> URL: https://issues.apache.org/jira/browse/DRILL-6794
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Rahul Raj
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-impacting
>
> Document the JDBC properties required to retrieve result sets in batches 
> while querying large tables
> Querying large tables using JDBC plugin causes OOM as most JDBC drivers cache 
> the entire result set at the client by default.
> To avoid this additional parameters needs to be specified with the JDBC 
> connection string so that the driver fetches records in batches and reloads 
> when exhausted.
> For postgres driver set autocommit mode to false - 
> jdbc:postgresql://url:port/schema?defaultAutoCommit=false
> Links
> [1] https://issues.apache.org/jira/browse/DRILL-4177
> [2] https://jdbc.postgresql.org/documentation/93/query.html#fetchsize-example
> [3] https://www.postgresql.org/docs/9.3/static/ecpg-sql-set-autocommit.html
> [4] https://jdbc.postgresql.org/documentation/head/ds-cpds.htm



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7359:

Fix Version/s: (was: 1.18.0)
   1.17.0

> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7359:

Fix Version/s: (was: 1.17.0)
   1.18.0

> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7391) Wrong result when doing left outer join on CSV table

2019-11-04 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966737#comment-16966737
 ] 

ASF GitHub Bot commented on DRILL-7391:
---

vvysotskyi commented on pull request #1890: DRILL-7391: Wrong result when doing 
left outer join on CSV table
URL: https://github.com/apache/drill/pull/1890
 
 
   Jira: [DRILL-7391](https://issues.apache.org/jira/browse/DRILL-7391).
   
   Actual fixes were done in Calcite 
([CALCITE-3390](https://issues.apache.org/jira/browse/CALCITE-3390), 
[CALCITE-3457](https://issues.apache.org/jira/browse/CALCITE-3457)), this PR 
just adds unit test and updates Calcite version to include these fixes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Wrong result when doing left outer join on CSV table
> 
>
> Key: DRILL-7391
> URL: https://issues.apache.org/jira/browse/DRILL-7391
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: tt5.tar.gz, tt6.tar.gz
>
>
> The following query shows 1 row that is incorrect.  For the non-null rows, 
> both columns should have the same value.  This is on CSV sample data (I will 
> attach the files). 
> {noformat}
> apache drill (dfs.tmp)> select tt5.columns[0], tt6.columns[0] from tt5 left 
> outer join tt6  on tt5.columns[0] = tt6.columns[0];
> +++
> | EXPR$0 | EXPR$1 |
> +++
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 580| null   |
> |    | null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 580| null   |
> | 6767   | null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 6767   | null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 9669   | 1414   |  <--- Wrong result
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 580| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 409| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 555| null   |
> +++
> 75 rows selected 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7391) Wrong result when doing left outer join on CSV table

2019-11-04 Thread Arina Ielchiieva (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7391:

Reviewer: Igor Guzenko

> Wrong result when doing left outer join on CSV table
> 
>
> Key: DRILL-7391
> URL: https://issues.apache.org/jira/browse/DRILL-7391
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: tt5.tar.gz, tt6.tar.gz
>
>
> The following query shows 1 row that is incorrect.  For the non-null rows, 
> both columns should have the same value.  This is on CSV sample data (I will 
> attach the files). 
> {noformat}
> apache drill (dfs.tmp)> select tt5.columns[0], tt6.columns[0] from tt5 left 
> outer join tt6  on tt5.columns[0] = tt6.columns[0];
> +++
> | EXPR$0 | EXPR$1 |
> +++
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 580| null   |
> |    | null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 580| null   |
> | 6767   | null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 6767   | null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 9669   | 1414   |  <--- Wrong result
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 580| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 409| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 555| null   |
> +++
> 75 rows selected 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7391) Wrong result when doing left outer join on CSV table

2019-11-04 Thread Igor Guzenko (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7391:

Labels: ready-to-commit  (was: )

> Wrong result when doing left outer join on CSV table
> 
>
> Key: DRILL-7391
> URL: https://issues.apache.org/jira/browse/DRILL-7391
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: tt5.tar.gz, tt6.tar.gz
>
>
> The following query shows 1 row that is incorrect.  For the non-null rows, 
> both columns should have the same value.  This is on CSV sample data (I will 
> attach the files). 
> {noformat}
> apache drill (dfs.tmp)> select tt5.columns[0], tt6.columns[0] from tt5 left 
> outer join tt6  on tt5.columns[0] = tt6.columns[0];
> +++
> | EXPR$0 | EXPR$1 |
> +++
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 580| null   |
> |    | null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 580| null   |
> | 6767   | null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 6767   | null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 9669   | 1414   |  <--- Wrong result
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 580| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 409| null   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 455| null   |
> | 455| null   |
> | 555| null   |
> | 455| null   |
> | 555| null   |
> | 1414   | 1414   |
> | 455| null   |
> | 555| null   |
> | 555| null   |
> | 555| null   |
> +++
> 75 rows selected 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7372) MethodAnalyzer consumes too much memory

2019-11-04 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966800#comment-16966800
 ] 

ASF GitHub Bot commented on DRILL-7372:
---

vvysotskyi commented on issue #1887: DRILL-7372: MethodAnalyzer consumes too 
much memory
URL: https://github.com/apache/drill/pull/1887#issuecomment-549433415
 
 
   @paul-rogers, unfortunately, I didn't see the results of these performance 
measurements, but I believe that there are some corner cases where our byte 
code analysis will bring some benefit, for example, when the generated method 
is too large for JVM optimizations and general time for query execution is much 
larger compared to the time required for producing scalar replacement and 
classes merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MethodAnalyzer consumes too much memory
> ---
>
> Key: DRILL-7372
> URL: https://issues.apache.org/jira/browse/DRILL-7372
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> In the scope of DRILL-6524 was added logic for determining whether a variable 
> is assigned in conditional block to prevent incorrect scalar replacement for 
> such cases.
> But for some queries, this logic consumes too many memory, for example, for 
> the following query:
> {code:sql}
> SELECT *
> FROM cp.`employee.json`
> WHERE employee_id+0 < employee_id
>   OR employee_id+1 < employee_id
>   AND employee_id+2 < employee_id
>   OR employee_id+3 < employee_id
>   AND employee_id+4 < employee_id
>   OR employee_id+5 < employee_id
>   AND employee_id+6 < employee_id
>   OR employee_id+7 < employee_id
>   AND employee_id+8 < employee_id
>   OR employee_id+9 < employee_id
>   AND employee_id+10 < employee_id
>   OR employee_id+11 < employee_id
>   AND employee_id+12 < employee_id
>   OR employee_id+13 < employee_id
>   AND employee_id+14 < employee_id
>   OR employee_id+15 < employee_id
>   AND employee_id+16 < employee_id
>   OR employee_id+17 < employee_id
>   AND employee_id+18 < employee_id
>   OR employee_id+19 < employee_id
>   AND employee_id+20 < employee_id
>   OR employee_id+21 < employee_id
>   AND employee_id+22 < employee_id
>   OR employee_id+23 < employee_id
>   AND employee_id+24 < employee_id
>   OR employee_id+25 < employee_id
>   AND employee_id+26 < employee_id
>   OR employee_id+27 < employee_id
>   AND employee_id+28 < employee_id
>   OR employee_id+29 < employee_id
>   AND employee_id+30 < employee_id
>   OR employee_id+31 < employee_id
>   AND employee_id+32 < employee_id
>   OR employee_id+33 < employee_id
>   AND employee_id+34 < employee_id
>   OR employee_id+35 < employee_id
>   AND employee_id+36 < employee_id
>   OR employee_id+37 < employee_id
>   AND employee_id+38 < employee_id
>   OR employee_id+39 < employee_id
>   AND employee_id+40 < employee_id
>   OR employee_id+41 < employee_id
>   AND employee_id+42 < employee_id
>   OR employee_id+43 < employee_id
>   AND employee_id+44 < employee_id
>   OR employee_id+45 < employee_id
>   AND employee_id+46 < employee_id
>   OR employee_id+47 < employee_id
>   AND employee_id+48 < employee_id
>   OR employee_id+49 < employee_id
>   AND TRUE;
> {code}
> Drill consumes more than 6 GB memory.
> One of the issues to fix is to replace {{Deque> 
> localVariablesSet;}} with {{Deque}}, it will reduce memory usage 
> significantly.
> Additionally should be investigated why these objects cannot be collected by 
> GC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7416) Updates required to dependencies to resolve potential security vulnerabilities

2019-11-04 Thread Charles Givre (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966828#comment-16966828
 ] 

Charles Givre commented on DRILL-7416:
--

Hi [~bradpark], [~arina].  Is there any low hanging fruit that we could get 
into version 1.17?

> Updates required to dependencies to resolve potential security 
> vulnerabilities 
> ---
>
> Key: DRILL-7416
> URL: https://issues.apache.org/jira/browse/DRILL-7416
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Bradley Parker
>Assignee: Bradley Parker
>Priority: Critical
>  Labels: security
>
> After running an OWASP Dependency Check and ruling out false positives, I 
> have found 25 dependencies that should be updated to remove potential 
> vulnerabilities. They are listed alphabetically with their CVE information 
> below.
>  
> [CVSS 
> scores|[https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System]] 
> represent the severity of a vulnerability on a scale of 1-10, 10 being 
> critical. [CVEs 
> |[https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures]] are 
> public identifiers used to reference known vulnerabilities. 
>  
> Package: avro-1.8.2
> Should be: 1.9.0 (*Existing item at* *DRILL-7302*)
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: commons-beanutils-1.9.2
> Should be: 1.9.4
> Max CVE (CVSS): CVE-2019-10086 (7.3)
> Complete CVE list: CVE-2019-10086
> Package: commons-beanutils-core-1.8.0
> Should be: Moved to commons-beanutils
> Max CVE (CVSS): CVE-2014-0114 (7.5)
> Complete CVE list: CVE-2014-0114Deprecated, replaced by commons-beanutils
> Package: converter-jackson
> Should be: 2.5.0
> Max CVE (CVSS): CVE-2018-1000850 (7.5)
> Complete CVE list: CVE-2018-1000850
> Package: derby-10.10.2.0
> Should be: 10.14.2.0
> Max CVE (CVSS): CVE-2015-1832 (9.1)
> Complete CVE list: CVE-2015-1832
> CVE-2018-1313
> Package: drill-hive-exec-shaded
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (7.5)
> Complete CVE list: CVE-2018-10237
> Package: drill-java-exec
> Should be: New release needed with updated JjQuery and Bootstrap
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2018-14040
> CVE-2018-14041 
> CVE-2018-14042
> CVE-2019-8331
> CVE-2019-11358
> Package: drill-shaded-guava-23
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: guava-19.0
> Should be: 24.1.1
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: hadoop-yarn-common-2.7.4
> Should be: 3.2.1
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2012-6708
> CVE-2015-9251
> CVE-2019-11358
> CVE-2010-5312
> CVE-2016-7103
> Package: hbase-http-2.1.1.jar 
> Should be: 2.1.4
> Max CVE (CVSS): CVE-2019-0212 (7.5)
> Complete CVE list: CVE-2019-0212
> Package: httpclient-4.2.5.jar
> Should be: 4.3.6
> Max CVE (CVSS): CVE-2014-3577  (5.8)
> Complete CVE list: CVE-2014-3577
> CVE-2015-5262
> Package: jackson-databind-2.9.5
> Should be: 2.10.0
> Max CVE (CVSS): CVE-2018-14721  (10)
> Complete CVE list: CVE-2019-17267
> CVE-2019-16943
> CVE-2019-16942
> CVE-2019-16335
> CVE-2019-14540
> CVE-2019-14439
> CVE-2019-14379
> CVE-2018-11307
> CVE-2019-12384
> CVE-2019-12814
> CVE-2019-12086
> CVE-2018-12023
> CVE-2018-12022
> CVE-2018-19362
> CVE-2018-19361
> CVE-2018-19360
> CVE-2018-14721
> CVE-2018-14720
> CVE-2018-14719
> CVE-2018-14718
> CVE-2018-1000873
> Package: jetty-server-9.3.25.v20180904.jar (*Existing DRILL-7135, but that's 
> to go to 9.4 and it's blocked, we should go to latest 9.3 in the meantime*)
> Should be: 9.3.27.v20190418
> Max CVE (CVSS): CVE-2017-9735 (7.5)
> Complete CVE list: CVE-2017-9735
> CVE-2019-10241
> CVE-2019-10247
> Package: Kafka 0.11.0.1
> Should be: 2.2.0 (*Existing item DRILL-6739*)
> Max CVE (CVSS): CVE-2018-17196 (8.8)
> Complete CVE list: CVE-2018-17196
> CVE-2018-1288
> CVE-2017-12610
> Package: kudu-client-1.3.0.jar 
> Should be: 1.10.0
> Max CVE (CVSS): CVE-2015-5237  (8.8)
> Complete CVE list: CVE-2018-10237
> CVE-2015-5237
> CVE-2019-16869Only a partial fix, no fix for netty CVE-2019-16869 (7.5), kudu 
> still needs to update their netty (this is not unexpected as this CVE is 
> newer)
> Package: libfb303-0.9.3.jar
> Should be: 0.12.0
> Max CVE (CVSS): CVE-2018-1320 (7.5)
> Complete CVE list: CVE-2018-1320Moved to libthrift
> Package: okhttp-3.3.0
> Should be: 3.12.0
> Max CVE (CVSS): CVE-2018-20200 (5.9)
> Complete CVE list: CVE-2018-20200
> Package: protobuf-java-2.5.0
> Should be: 3.4.0
> Max CVE (CVSS): CVE-2015-5237  (8.8)
> Complete CVE list: CVE-2015-5237 
> Package: retrofit-2.1.0
> Should be: 2.5.0
> Max CVE

[jira] [Commented] (DRILL-7244) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-11-04 Thread Vova Vysotskyi (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966877#comment-16966877
 ] 

Vova Vysotskyi commented on DRILL-7244:
---

[~ben-zvi], I cannot reproduce class cast exception with removed try-catch 
blocks, could you please provide some steps how this issue can be observed?

> Run-time rowgroup pruning match() fails on casting a Long to an Integer
> ---
>
> Key: DRILL-7244
> URL: https://issues.apache.org/jira/browse/DRILL-7244
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Boaz Ben-Zvi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> See DRILL-7240 , where a temporary workaround was created, skipping pruning 
> (and logging) instead of this failure: 
> After a Parquet table is refreshed with selected "interesting" columns, a 
> query whose WHERE clause contains a condition on a "non interesting" INT64 
> column fails during run-time pruning (calling match()) with:
> {noformat}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
> {noformat}
> A long term solution is to pass the whole (or the relevant part of the) 
> schema to the runtime, instead of just passing the "interesting" columns.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7416) Updates required to dependencies to resolve potential security vulnerabilities

2019-11-04 Thread Arina Ielchiieva (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966901#comment-16966901
 ] 

Arina Ielchiieva commented on DRILL-7416:
-

[~cgivre] If PR will be contributed timely before 1.17, sure it can be added 
into the release, so far since I don't see any activity I have removed 1.17 tag 
to have clear picture what is doing to be done in 1.17.

> Updates required to dependencies to resolve potential security 
> vulnerabilities 
> ---
>
> Key: DRILL-7416
> URL: https://issues.apache.org/jira/browse/DRILL-7416
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Bradley Parker
>Assignee: Bradley Parker
>Priority: Critical
>  Labels: security
>
> After running an OWASP Dependency Check and ruling out false positives, I 
> have found 25 dependencies that should be updated to remove potential 
> vulnerabilities. They are listed alphabetically with their CVE information 
> below.
>  
> [CVSS 
> scores|[https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System]] 
> represent the severity of a vulnerability on a scale of 1-10, 10 being 
> critical. [CVEs 
> |[https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures]] are 
> public identifiers used to reference known vulnerabilities. 
>  
> Package: avro-1.8.2
> Should be: 1.9.0 (*Existing item at* *DRILL-7302*)
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: commons-beanutils-1.9.2
> Should be: 1.9.4
> Max CVE (CVSS): CVE-2019-10086 (7.3)
> Complete CVE list: CVE-2019-10086
> Package: commons-beanutils-core-1.8.0
> Should be: Moved to commons-beanutils
> Max CVE (CVSS): CVE-2014-0114 (7.5)
> Complete CVE list: CVE-2014-0114Deprecated, replaced by commons-beanutils
> Package: converter-jackson
> Should be: 2.5.0
> Max CVE (CVSS): CVE-2018-1000850 (7.5)
> Complete CVE list: CVE-2018-1000850
> Package: derby-10.10.2.0
> Should be: 10.14.2.0
> Max CVE (CVSS): CVE-2015-1832 (9.1)
> Complete CVE list: CVE-2015-1832
> CVE-2018-1313
> Package: drill-hive-exec-shaded
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (7.5)
> Complete CVE list: CVE-2018-10237
> Package: drill-java-exec
> Should be: New release needed with updated JjQuery and Bootstrap
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2018-14040
> CVE-2018-14041 
> CVE-2018-14042
> CVE-2019-8331
> CVE-2019-11358
> Package: drill-shaded-guava-23
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: guava-19.0
> Should be: 24.1.1
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: hadoop-yarn-common-2.7.4
> Should be: 3.2.1
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2012-6708
> CVE-2015-9251
> CVE-2019-11358
> CVE-2010-5312
> CVE-2016-7103
> Package: hbase-http-2.1.1.jar 
> Should be: 2.1.4
> Max CVE (CVSS): CVE-2019-0212 (7.5)
> Complete CVE list: CVE-2019-0212
> Package: httpclient-4.2.5.jar
> Should be: 4.3.6
> Max CVE (CVSS): CVE-2014-3577  (5.8)
> Complete CVE list: CVE-2014-3577
> CVE-2015-5262
> Package: jackson-databind-2.9.5
> Should be: 2.10.0
> Max CVE (CVSS): CVE-2018-14721  (10)
> Complete CVE list: CVE-2019-17267
> CVE-2019-16943
> CVE-2019-16942
> CVE-2019-16335
> CVE-2019-14540
> CVE-2019-14439
> CVE-2019-14379
> CVE-2018-11307
> CVE-2019-12384
> CVE-2019-12814
> CVE-2019-12086
> CVE-2018-12023
> CVE-2018-12022
> CVE-2018-19362
> CVE-2018-19361
> CVE-2018-19360
> CVE-2018-14721
> CVE-2018-14720
> CVE-2018-14719
> CVE-2018-14718
> CVE-2018-1000873
> Package: jetty-server-9.3.25.v20180904.jar (*Existing DRILL-7135, but that's 
> to go to 9.4 and it's blocked, we should go to latest 9.3 in the meantime*)
> Should be: 9.3.27.v20190418
> Max CVE (CVSS): CVE-2017-9735 (7.5)
> Complete CVE list: CVE-2017-9735
> CVE-2019-10241
> CVE-2019-10247
> Package: Kafka 0.11.0.1
> Should be: 2.2.0 (*Existing item DRILL-6739*)
> Max CVE (CVSS): CVE-2018-17196 (8.8)
> Complete CVE list: CVE-2018-17196
> CVE-2018-1288
> CVE-2017-12610
> Package: kudu-client-1.3.0.jar 
> Should be: 1.10.0
> Max CVE (CVSS): CVE-2015-5237  (8.8)
> Complete CVE list: CVE-2018-10237
> CVE-2015-5237
> CVE-2019-16869Only a partial fix, no fix for netty CVE-2019-16869 (7.5), kudu 
> still needs to update their netty (this is not unexpected as this CVE is 
> newer)
> Package: libfb303-0.9.3.jar
> Should be: 0.12.0
> Max CVE (CVSS): CVE-2018-1320 (7.5)
> Complete CVE list: CVE-2018-1320Moved to libthrift
> Package: okhttp-3.3.0
> Should be: 3.12.0
> Max CVE (CVSS): CVE-2018-20200 (5.9)
> Complete CVE list: CVE-2018-20200
> Package: protobuf-java-2.5.0
> Should be: 3.4.0
> Ma

[jira] [Commented] (DRILL-7416) Updates required to dependencies to resolve potential security vulnerabilities

2019-11-04 Thread Vova Vysotskyi (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966953#comment-16966953
 ] 

Vova Vysotskyi commented on DRILL-7416:
---

{{commons-beanutils}}/{{commons-beanutils-core}} come as a transitive 
dependency, mostly from the {{hadoop-common}}, so we need to verify that this 
library works correctly with the newer version.
{{converter-jackson}} - it should be checked that the opentsdb storage plugin 
will work correctly with this version of the library. Perhaps, also 
{{retrofit}} version should be updated.
{{derby}} - used by {{hive-metastore}}, so we also need to verify that this 
library works correctly with the newer version.
{{drill-shaded-guava-23}} - OK.
{{guava-19.0}} - we cannot update it since most of the projects use much older 
versions, but still partially compatible with 1.19. For example, current HBase 
version still uses guava 11 and wouldn't work with versions newer than 19.
{{hadoop-yarn-common}} - should be updated in the scope of DRILL-6540.
{{jackson-databind}} - is definitely should be updated and perhaps with other 
jackson libraries.
{{httpclient}} - come as a transitive dependency, mostly from the 
{{hadoop-common}}/{{hive}}/{{hbase}}, so we need to verify that these libraries 
work correctly with the newer version.
TBA

> Updates required to dependencies to resolve potential security 
> vulnerabilities 
> ---
>
> Key: DRILL-7416
> URL: https://issues.apache.org/jira/browse/DRILL-7416
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Bradley Parker
>Assignee: Bradley Parker
>Priority: Critical
>  Labels: security
>
> After running an OWASP Dependency Check and ruling out false positives, I 
> have found 25 dependencies that should be updated to remove potential 
> vulnerabilities. They are listed alphabetically with their CVE information 
> below.
>  
> [CVSS 
> scores|[https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System]] 
> represent the severity of a vulnerability on a scale of 1-10, 10 being 
> critical. [CVEs 
> |[https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures]] are 
> public identifiers used to reference known vulnerabilities. 
>  
> Package: avro-1.8.2
> Should be: 1.9.0 (*Existing item at* *DRILL-7302*)
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: commons-beanutils-1.9.2
> Should be: 1.9.4
> Max CVE (CVSS): CVE-2019-10086 (7.3)
> Complete CVE list: CVE-2019-10086
> Package: commons-beanutils-core-1.8.0
> Should be: Moved to commons-beanutils
> Max CVE (CVSS): CVE-2014-0114 (7.5)
> Complete CVE list: CVE-2014-0114Deprecated, replaced by commons-beanutils
> Package: converter-jackson
> Should be: 2.5.0
> Max CVE (CVSS): CVE-2018-1000850 (7.5)
> Complete CVE list: CVE-2018-1000850
> Package: derby-10.10.2.0
> Should be: 10.14.2.0
> Max CVE (CVSS): CVE-2015-1832 (9.1)
> Complete CVE list: CVE-2015-1832
> CVE-2018-1313
> Package: drill-hive-exec-shaded
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (7.5)
> Complete CVE list: CVE-2018-10237
> Package: drill-java-exec
> Should be: New release needed with updated JjQuery and Bootstrap
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2018-14040
> CVE-2018-14041 
> CVE-2018-14042
> CVE-2019-8331
> CVE-2019-11358
> Package: drill-shaded-guava-23
> Should be: New release needed with updated Guava
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: guava-19.0
> Should be: 24.1.1
> Max CVE (CVSS): CVE-2018-10237 (5.9)
> Complete CVE list: CVE-2018-10237
> Package: hadoop-yarn-common-2.7.4
> Should be: 3.2.1
> Max CVE (CVSS): CVE-2019-11358 (6.1)
> Complete CVE list: CVE-2012-6708
> CVE-2015-9251
> CVE-2019-11358
> CVE-2010-5312
> CVE-2016-7103
> Package: hbase-http-2.1.1.jar 
> Should be: 2.1.4
> Max CVE (CVSS): CVE-2019-0212 (7.5)
> Complete CVE list: CVE-2019-0212
> Package: httpclient-4.2.5.jar
> Should be: 4.3.6
> Max CVE (CVSS): CVE-2014-3577  (5.8)
> Complete CVE list: CVE-2014-3577
> CVE-2015-5262
> Package: jackson-databind-2.9.5
> Should be: 2.10.0
> Max CVE (CVSS): CVE-2018-14721  (10)
> Complete CVE list: CVE-2019-17267
> CVE-2019-16943
> CVE-2019-16942
> CVE-2019-16335
> CVE-2019-14540
> CVE-2019-14439
> CVE-2019-14379
> CVE-2018-11307
> CVE-2019-12384
> CVE-2019-12814
> CVE-2019-12086
> CVE-2018-12023
> CVE-2018-12022
> CVE-2018-19362
> CVE-2018-19361
> CVE-2018-19360
> CVE-2018-14721
> CVE-2018-14720
> CVE-2018-14719
> CVE-2018-14718
> CVE-2018-1000873
> Package: jetty-server-9.3.25.v20180904.jar (*Existing DRILL-7135, but that's 
> to go to 9.4 and it's blocked, we should go to latest 9.3 in the meantime*)
> Should be: 9.3.27.v2019

[jira] [Commented] (DRILL-7244) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-11-04 Thread Boaz Ben-Zvi (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967118#comment-16967118
 ] 

Boaz Ben-Zvi commented on DRILL-7244:
-

I just retried the original example (see DRILL-7240) with recent code (Oct 20~) 
and a debugger, and a breakpoint was indeed hit inside the "catch 
(ClassCastException cce)" clause (line 204 - 
AbstractParquetScanBatchCreator.java), and the log does note this event 
correctly:
{noformat}
2019-11-04 16:23:15,378 [223f3f7f-bcd5-8632-94e9-842495cdfd7d:frag:0:0] INFO 
o.a.d.e.s.p.AbstractParquetScanBatchCreator - Finished parquet_runtime_pruning 
in 45444993 usec. Out of given 2 rowgroups, 0 were pruned. 
2019-11-04 16:23:15,379 [223f3f7f-bcd5-8632-94e9-842495cdfd7d:frag:0:0] INFO 
o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning skipped for 1 
out of 2 rowgroups due to: java.lang.Integer cannot be cast to 
java.lang.Long{noformat}
There are some newer code changes there (from DRILL-4517 and DRILL-7314) but 
they did not seem to matter for this issue.

 

  

> Run-time rowgroup pruning match() fails on casting a Long to an Integer
> ---
>
> Key: DRILL-7244
> URL: https://issues.apache.org/jira/browse/DRILL-7244
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Boaz Ben-Zvi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> See DRILL-7240 , where a temporary workaround was created, skipping pruning 
> (and logging) instead of this failure: 
> After a Parquet table is refreshed with selected "interesting" columns, a 
> query whose WHERE clause contains a condition on a "non interesting" INT64 
> column fails during run-time pruning (calling match()) with:
> {noformat}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
> {noformat}
> A long term solution is to pass the whole (or the relevant part of the) 
> schema to the runtime, instead of just passing the "interesting" columns.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-6949) Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition the inner data any further" when Semi join is enabled

2019-11-04 Thread Boaz Ben-Zvi (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-6949:

Fix Version/s: Future

> Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition 
> the inner data any further" when Semi join is enabled
> 
>
> Key: DRILL-6949
> URL: https://issues.apache.org/jira/browse/DRILL-6949
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Ravi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: Future
>
> Attachments: 23cc1240-74ff-a0c0-8cd5-938fc136e4e2.sys.drill, 
> 23cc1369-0812-63ce-1861-872636571437.sys.drill
>
>
> Following query fails when with *Error: UNSUPPORTED_OPERATION ERROR: 
> Hash-Join can not partition the inner data any further (probably due to too 
> many join-key duplicates)* on TPC-H SF100 data.
> {code:sql}
> set `exec.hashjoin.enable.runtime_filter` = true;
> set `exec.hashjoin.runtime_filter.max.waiting.time` = 1;
> set `planner.enable_broadcast_join` = false;
> select
>  count(*)
> from
>  lineitem l1
> where
>  l1.l_discount IN (
>  select
>  distinct(cast(l2.l_discount as double))
>  from
>  lineitem l2);
> reset `exec.hashjoin.enable.runtime_filter`;
> reset `exec.hashjoin.runtime_filter.max.waiting.time`;
> reset `planner.enable_broadcast_join`;
> {code}
> The subquery contains *distinct* keyword and hence there should not be 
> duplicate values. 
> I suspect that the failure is caused by semijoin because the query succeeds 
> when semijoin is disabled explicitly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-6949) Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition the inner data any further" when Semi join is enabled

2019-11-04 Thread Boaz Ben-Zvi (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967156#comment-16967156
 ] 

Boaz Ben-Zvi commented on DRILL-6949:
-

The two profiles clearly show that this is a fundamental issue: The are only 
561 distinct values of "l_discount" out of 600M rows. The new "efficient 
semi-join" was designed based on the assumption that the build side has a 
relatively small number of duplicates.  The new implementation saves the cost 
of the hash-aggregation, but then keeps *ALL* the build side rows . In a case 
like this query, the build side of the hash-join balloons to a huge size, which 
causes a huge spilling, which leads to secondary/tertiary/... spillings that 
try to subdivide the partitions by keys, but those keys are duplicates hence 
the subdivision does not help (as the error message explains).

Such an extreme situation is not expected in real-world queries. The only 
workaround for such unusual query is (as [~agirish] suggested) is to disable 
the new semi-join (hence the Hash-Aggregate would eliminate the duplicates).

Another far future solution is to use statistics in the planar to detect such a 
case, and then the planner would disable the new semi-join automatically.

 

> Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition 
> the inner data any further" when Semi join is enabled
> 
>
> Key: DRILL-6949
> URL: https://issues.apache.org/jira/browse/DRILL-6949
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Ravi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Attachments: 23cc1240-74ff-a0c0-8cd5-938fc136e4e2.sys.drill, 
> 23cc1369-0812-63ce-1861-872636571437.sys.drill
>
>
> Following query fails when with *Error: UNSUPPORTED_OPERATION ERROR: 
> Hash-Join can not partition the inner data any further (probably due to too 
> many join-key duplicates)* on TPC-H SF100 data.
> {code:sql}
> set `exec.hashjoin.enable.runtime_filter` = true;
> set `exec.hashjoin.runtime_filter.max.waiting.time` = 1;
> set `planner.enable_broadcast_join` = false;
> select
>  count(*)
> from
>  lineitem l1
> where
>  l1.l_discount IN (
>  select
>  distinct(cast(l2.l_discount as double))
>  from
>  lineitem l2);
> reset `exec.hashjoin.enable.runtime_filter`;
> reset `exec.hashjoin.runtime_filter.max.waiting.time`;
> reset `planner.enable_broadcast_join`;
> {code}
> The subquery contains *distinct* keyword and hence there should not be 
> duplicate values. 
> I suspect that the failure is caused by semijoin because the query succeeds 
> when semijoin is disabled explicitly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 114 matches

Mail list logo