georgew5656 opened a new pull request, #16512:
URL: https://github.com/apache/druid/pull/16512

   Currently it's possible to submit a really big task payload and have it OOM 
the overlord if the request fails and logs the whole thing in 
SQLMetadataStorageActionHandler.insertEntryWithHandle.
   
   If the request happens to be larger than max_allowed_packet for a mysql 
metadata store it will always fail. Since really large task payloads seem to 
cause overlord instability in general I think it makes sense to limit the size 
of task payloads at the task queue level.
   
   ### Description
   Add a new config that sets a limit for task payload sizes. Throw a exception 
if the limit is exceeded.
   
   The default limit of 60 MB is based on the 64 MB default value of 
max_allowed_packet in MySQL 8+.
   
   I would ideally like to use the http request content-length header to 
calculate the size of the task payload rather than re-serializing it in memory 
but we also call taskQueue.add directly from supervisors so that would bypass 
the check. If others think this is acceptable and it would be better to check 
Content-Length in OverlordResource I am fine with changing this logic.
   
   #### Release note
   - Adding a new guardrail for submitting tasks that are too large
   
   ##### Key changed/added classes in this PR
    * `TaskQueue`
   
   This PR has:
   
   - [X] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ X added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to