[
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584463#comment-17584463
]
ASF GitHub Bot commented on DRILL-8283:
---------------------------------------
vvysotskyi commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954278444
##########
exec/java-exec/src/main/resources/drill-module.conf:
##########
@@ -115,7 +115,8 @@ drill.exec: {
text: {
buffer.size: 262144,
batch.size: 4000
- }
+ },
+ recursive_listing_max_size: 10000
Review Comment:
Yes, the default value should be adjusted. For the big data world, thousands
of files are quite a small amount. For non-parquet files FileStatus is small,
so it shouldn't cause large pressure on memory. For parquet files, it would be
good to provide the functionality to disable reading metadata for planning and
use it only during execution to avoid issues with huge files amount.
> Add a configurable recursive file listing size limit
> ----------------------------------------------------
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Other
> Affects Versions: 1.20.2
> Reporter: James Turton
> Assignee: James Turton
> Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred,
> exiting. Information message: Unable to handle out of memory condition in
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce
> the same effect. This issue proposes a new BOOT option named
> drill.exec.storage.file.recursive_listing_max_size with a default value of,
> say 10 000. If a file listing task exceeds this limit then the initiating
> operation is terminated with a UserException preventing runaway resource
> usage.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)