[jira] [Updated] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-23 Thread James Turton (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8283:

Description: 
Currently a malicious or merely unwitting user can crash their Drill foreman by 
sending
{code:java}
select * from dfs.huge_workspace limit 10
{code}
causing the query planner to recurse over every file in huge_workspace and 
culminating in
{code:java}
2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
Foreman.java.lang.OutOfMemoryError: null {code}
if there are enough files in huge_workspace. A SHOW FILES command can produce 
the same effect. This issue proposes a new BOOT option named 
drill.exec.storage.file.recursive_listing_max_size with a default value of, say 
10 000. If a file listing task exceeds this limit then the initiating operation 
is terminated with a UserException preventing runaway resource usage.

  was:
Currently a malicious, or merely an unwitting user can crash their Drill 
foreman by sending
{code:java}
select * from dfs.huge_workspace limit 10
{code}
causing the query planner to recurse over every file in huge_workspace and 
culminating in
{code:java}
2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
Foreman.java.lang.OutOfMemoryError: null {code}
if there are enough files in huge_workspace. A SHOW FILES command can produce 
the same effect. This issue proposes a new BOOT option named 
drill.exec.storage.file.max_listing_size with a default value of, say 10 000. 
If a file listing task exceeds this limit then the current operation is 
terminated with a UserException and runaway resource usage is prevented.


> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-23 Thread James Turton (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8283:

Summary: Add a configurable recursive file listing size limit  (was: 
Implement a configurable file listing size limit)

> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious, or merely an unwitting user can crash their Drill 
> foreman by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.max_listing_size with a default value of, say 10 000. 
> If a file listing task exceeds this limit then the current operation is 
> terminated with a UserException and runaway resource usage is prevented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (DRILL-8284) Apache SQL Query failing while accessing the Json with complex data model

2022-08-23 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre closed DRILL-8284.

Resolution: Not A Bug

> Apache SQL Query failing while accessing the Json with complex data model
> -
>
> Key: DRILL-8284
> URL: https://issues.apache.org/jira/browse/DRILL-8284
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: SHUBHAM KUMAR
>Priority: Major
>
> Apache SQL Query failing while accessing the Json with complex data model. 
> Complex Json: 
> Map object inside another map object then Array Object. 
> Case1: When we have nested objects within array map, and map within map. 
> {"attributes": [
>                     {
>                         "name": "webBrandName",
>                         "value": {
>                             "en-US": "Smashbox"
>                         }
>                     },
>                     {
>                         "name": "startDate",
>                         "value": "2011-07-25T15:30:00.000Z"
>                     }
>                 ]
> }
> Case2: Having array with multiple map items with diff data types. eg. String 
> and Boolean both type. 
> {"attributes": [
>                     {
>                         "name": "startDate",
>                         "value": "2011-07-25T15:30:00.000Z"
>                     },
>                     {
>                         "name": "hasCBD",
>                         "value": false
>                     }
>                 ]
> }
> Query: 
> select flatten(attributes) as Var from dfs.`/filepath/filename.json`
>  
> Error: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1764642048 (expected: 
> 0 <= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0 Please, refer 
> to logs for more information. [Error Id: c5a3b8fa-cad1-4c9a-8673-de5745e9170b 
> on GGNUWT461535L.ad.infosys.com:31010]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8284) Apache SQL Query failing while accessing the Json with complex data model

2022-08-23 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583946#comment-17583946
 ] 

Charles Givre commented on DRILL-8284:
--

[~shubhamsmvdu] This is normal behavior for Drill.  The issue you are 
encountering is a schema change exception on the `value` field.  In both cases, 
what is happening is that Drill first encounters one data type and creates a 
vector for that, then in the next row, encounters the same field but in a 
different data type and throws an exception. 

The are a few options:
 #  If you use the v1 JSON reader, you can enable the UNION data type which 
allows heterogeneous data types.  We are working on enabling this for the V2 
JSON reader, but for the moment, it is not.  This is a variable which must be 
set at the system level.
 # Provide a schema:  You can provide a schema for the field `value` and set 
`mode` to JSON.  I'd have to dig up the documentation for this but what this 
does is force the field to a string.  If JSON objects are encountered, those 
will be rendered as a string. 

I'm going to close this as this is expected behavior.  Please use github issues 
or slack to continue the conversation. 

> Apache SQL Query failing while accessing the Json with complex data model
> -
>
> Key: DRILL-8284
> URL: https://issues.apache.org/jira/browse/DRILL-8284
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: SHUBHAM KUMAR
>Priority: Major
>
> Apache SQL Query failing while accessing the Json with complex data model. 
> Complex Json: 
> Map object inside another map object then Array Object. 
> Case1: When we have nested objects within array map, and map within map. 
> {"attributes": [
>                     {
>                         "name": "webBrandName",
>                         "value": {
>                             "en-US": "Smashbox"
>                         }
>                     },
>                     {
>                         "name": "startDate",
>                         "value": "2011-07-25T15:30:00.000Z"
>                     }
>                 ]
> }
> Case2: Having array with multiple map items with diff data types. eg. String 
> and Boolean both type. 
> {"attributes": [
>                     {
>                         "name": "startDate",
>                         "value": "2011-07-25T15:30:00.000Z"
>                     },
>                     {
>                         "name": "hasCBD",
>                         "value": false
>                     }
>                 ]
> }
> Query: 
> select flatten(attributes) as Var from dfs.`/filepath/filename.json`
>  
> Error: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1764642048 (expected: 
> 0 <= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0 Please, refer 
> to logs for more information. [Error Id: c5a3b8fa-cad1-4c9a-8673-de5745e9170b 
> on GGNUWT461535L.ad.infosys.com:31010]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8284) Apache SQL Query failing while accessing the Json with complex data model

2022-08-23 Thread SHUBHAM KUMAR (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583896#comment-17583896
 ] 

SHUBHAM KUMAR commented on DRILL-8284:
--

Json sample for case1:

{"attributes": [
                    {
                        "name": "webBrandName",
                        "value": {
                            "en-US": "Smashbox"
                        }
                    },
                    {
                        "name": "startDate",
                        "value": "2011-07-25T15:30:00.000Z"
                    }
                ]
}

Json sample for case2:

{"attributes": [
                    {
                        "name": "startDate",
                        "value": "2011-07-25T15:30:00.000Z"
                    },
                    {
                        "name": "hasCBD",
                        "value": false
                    }
                ]
}

 

 

> Apache SQL Query failing while accessing the Json with complex data model
> -
>
> Key: DRILL-8284
> URL: https://issues.apache.org/jira/browse/DRILL-8284
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: SHUBHAM KUMAR
>Priority: Major
>
> Apache SQL Query failing while accessing the Json with complex data model. 
> Complex Json: 
> Map object inside another map object then Array Object. 
> Case1: When we have nested objects within array map, and map within map. 
> {"attributes": [
>                     {
>                         "name": "webBrandName",
>                         "value": {
>                             "en-US": "Smashbox"
>                         }
>                     },
>                     {
>                         "name": "startDate",
>                         "value": "2011-07-25T15:30:00.000Z"
>                     }
>                 ]
> }
> Case2: Having array with multiple map items with diff data types. eg. String 
> and Boolean both type. 
> {"attributes": [
>                     {
>                         "name": "startDate",
>                         "value": "2011-07-25T15:30:00.000Z"
>                     },
>                     {
>                         "name": "hasCBD",
>                         "value": false
>                     }
>                 ]
> }
> Query: 
> select flatten(attributes) as Var from dfs.`/filepath/filename.json`
>  
> Error: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1764642048 (expected: 
> 0 <= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0 Please, refer 
> to logs for more information. [Error Id: c5a3b8fa-cad1-4c9a-8673-de5745e9170b 
> on GGNUWT461535L.ad.infosys.com:31010]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8284) Apache SQL Query failing while accessing the Json with complex data model

2022-08-23 Thread SHUBHAM KUMAR (Jira)
SHUBHAM KUMAR created DRILL-8284:


 Summary: Apache SQL Query failing while accessing the Json with 
complex data model
 Key: DRILL-8284
 URL: https://issues.apache.org/jira/browse/DRILL-8284
 Project: Apache Drill
  Issue Type: Bug
Reporter: SHUBHAM KUMAR


Apache SQL Query failing while accessing the Json with complex data model. 

Complex Json: 

Map object inside another map object then Array Object. 

Case1: When we have nested objects within array map, and map within map. 

{"attributes": [
                    {
                        "name": "webBrandName",
                        "value": {
                            "en-US": "Smashbox"
                        }
                    },
                    {
                        "name": "startDate",
                        "value": "2011-07-25T15:30:00.000Z"
                    }
                ]
}

Case2: Having array with multiple map items with diff data types. eg. String 
and Boolean both type. 

{"attributes": [
                    {
                        "name": "startDate",
                        "value": "2011-07-25T15:30:00.000Z"
                    },
                    {
                        "name": "hasCBD",
                        "value": false
                    }
                ]
}

Query: 

select flatten(attributes) as Var from dfs.`/filepath/filename.json`

 

Error: 

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1764642048 (expected: 0 
<= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0 Please, refer to 
logs for more information. [Error Id: c5a3b8fa-cad1-4c9a-8673-de5745e9170b on 
GGNUWT461535L.ad.infosys.com:31010]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8283) Implement a configurable file listing size limit

2022-08-23 Thread James Turton (Jira)
James Turton created DRILL-8283:
---

 Summary: Implement a configurable file listing size limit
 Key: DRILL-8283
 URL: https://issues.apache.org/jira/browse/DRILL-8283
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.2
Reporter: James Turton
Assignee: James Turton
 Fix For: 1.20.3


Currently a malicious, or merely an unwitting user can crash their Drill 
foreman by sending
{code:java}
select * from dfs.huge_workspace limit 10
{code}
causing the query planner to recurse over every file in huge_workspace and 
culminating in
{code:java}
2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
Foreman.java.lang.OutOfMemoryError: null {code}
if there are enough files in huge_workspace. A SHOW FILES command can produce 
the same effect. This issue proposes a new BOOT option named 
drill.exec.storage.file.max_listing_size with a default value of, say 10 000. 
If a file listing task exceeds this limit then the current operation is 
terminated with a UserException and runaway resource usage is prevented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-7856) Add lgtm badge to Drill and fix alerts

2022-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583648#comment-17583648
 ] 

ASF GitHub Bot commented on DRILL-7856:
---

cgivre closed pull request #2187: DRILL-7856 Add lgtm badge to Drill and fix 
alerts
URL: https://github.com/apache/drill/pull/2187




> Add lgtm badge to Drill and fix alerts
> --
>
> Key: DRILL-7856
> URL: https://issues.apache.org/jira/browse/DRILL-7856
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.18.0
>Reporter: Vitalii Diravka
>Priority: Trivial
>  Labels: badge, github
>
> Consider adding new badges to Drill github, for instance _lgtm_ badges (code 
> quality and alerts number):
> [https://lgtm.com/projects/g/apache/drill/context:java]
> As an example please check:
> [https://github.com/kaitoy/pcap4j]
> As a separate ticket can be considered decreasing the number of alerts of 
> Drill project:
> https://lgtm.com/projects/g/apache/drill/alerts/?mode=list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-7856) Add lgtm badge to Drill and fix alerts

2022-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583647#comment-17583647
 ] 

ASF GitHub Bot commented on DRILL-7856:
---

cgivre commented on PR #2187:
URL: https://github.com/apache/drill/pull/2187#issuecomment-1224126726

   LGTM is closing in Dec, 2022.  
https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/




> Add lgtm badge to Drill and fix alerts
> --
>
> Key: DRILL-7856
> URL: https://issues.apache.org/jira/browse/DRILL-7856
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.18.0
>Reporter: Vitalii Diravka
>Priority: Trivial
>  Labels: badge, github
>
> Consider adding new badges to Drill github, for instance _lgtm_ badges (code 
> quality and alerts number):
> [https://lgtm.com/projects/g/apache/drill/context:java]
> As an example please check:
> [https://github.com/kaitoy/pcap4j]
> As a separate ticket can be considered decreasing the number of alerts of 
> Drill project:
> https://lgtm.com/projects/g/apache/drill/alerts/?mode=list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (DRILL-8282) Upgrade to hadoop-common 3.2.4 due to CVE

2022-08-23 Thread James Turton (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8282:

Summary: Upgrade to hadoop-common 3.2.4 due to CVE   (was: upgrade to 
hadoop-common 3.2.4 due to cve )

> Upgrade to hadoop-common 3.2.4 due to CVE 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)