gianm commented on code in PR #19011:
URL: https://github.com/apache/druid/pull/19011#discussion_r2876782438


##########
server/src/main/java/org/apache/druid/server/QueryLifecycle.java:
##########
@@ -159,6 +164,8 @@ public <T> QueryResponse<T> runSimple(
   {
     initialize(query);
 
+    checkQueryBlocklist();

Review Comment:
   `runSimple` is not always used, which means the blocklist won't always be 
checked if the call is put here. Some usages of `QueryLifecycle` call the 
methods individually.
   
   IMO best to put it in `doAuthorize`. This method is always called-- it's the 
only place the state can transition to `AUTHORIZED`.
   
   It would also work to put it in `execute`.



##########
server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java:
##########
@@ -351,6 +360,17 @@ public Set<String> getTurboLoadingNodes()
     return turboLoadingNodes;
   }
 
+  /**
+   * List of query blocklist rules for dynamically blocking queries on brokers.
+   *
+   * @return List of query blocklist rules
+   */
+  @JsonProperty
+  public List<QueryBlocklistRule> getQueryBlocklist()

Review Comment:
   Please make this `@JsonInclude(NON_EMPTY)` so it is skipped from the 
serialized form of config if not present. Keeps things cleaner.



##########
docs/configuration/index.md:
##########
@@ -838,6 +838,73 @@ The following table shows the dynamic configuration 
properties for the Coordinat
 |`replicateAfterLoadTimeout`|Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the Coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow Historicals 
in the cluster. However, the slow Historical may still load the segment later 
and the Coordinator may issue drop requests if the segment is 
over-replicated.|false|
 |`turboLoadingNodes`| Experimental. List of Historical servers to place in 
turbo loading mode. These servers use a larger thread-pool to load segments 
faster but at the cost of query performance. For servers specified in 
`turboLoadingNodes`, `druid.coordinator.loadqueuepeon.http.batchSize` is 
ignored and the coordinator uses the value of the respective 
`numLoadingThreads` instead.<br/>Please use this config with caution. All 
servers should eventually be removed from this list once the segment loading on 
the respective historicals is finished. |none|
 |`cloneServers`| Experimental. Map from target Historical server to source 
Historical server which should be cloned by the target. The target Historical 
does not participate in regular segment assignment or balancing. Instead, the 
Coordinator mirrors any segment assignment made to the source Historical onto 
the target Historical, so that the target becomes an exact copy of the source. 
Segments on the target Historical do not count towards replica counts either. 
If the source disappears, the target remains in the last known state of the 
source server until removed from the configuration. <br/>Use this config with 
caution. All servers should eventually be removed from this list once the 
desired state on the respective Historicals is achieved. |none|
+|`queryBlocklist`| List of rules to block queries based on datasource, query 
type, and/or query context parameters. Each rule defines criteria that are 
combined with AND logic. Blocked queries return an HTTP 403 error. See [Query 
blocklist rules](#query-blocklist-rules) for details.|none|
+
+##### Query blocklist rules
+
+Query blocklist rules allow you to block specific queries based on datasource, 
query type, and/or query context parameters. This feature is useful for 
preventing expensive or problematic queries from impacting cluster performance.

Review Comment:
   The documentation should mention that blocking is best-effort, and may not 
be applied in certain cases (such as if the Broker has recently started up, 
can't contact the Coordinator, etc).



##########
docs/configuration/index.md:
##########
@@ -838,6 +838,73 @@ The following table shows the dynamic configuration 
properties for the Coordinat
 |`replicateAfterLoadTimeout`|Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the Coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow Historicals 
in the cluster. However, the slow Historical may still load the segment later 
and the Coordinator may issue drop requests if the segment is 
over-replicated.|false|
 |`turboLoadingNodes`| Experimental. List of Historical servers to place in 
turbo loading mode. These servers use a larger thread-pool to load segments 
faster but at the cost of query performance. For servers specified in 
`turboLoadingNodes`, `druid.coordinator.loadqueuepeon.http.batchSize` is 
ignored and the coordinator uses the value of the respective 
`numLoadingThreads` instead.<br/>Please use this config with caution. All 
servers should eventually be removed from this list once the segment loading on 
the respective historicals is finished. |none|
 |`cloneServers`| Experimental. Map from target Historical server to source 
Historical server which should be cloned by the target. The target Historical 
does not participate in regular segment assignment or balancing. Instead, the 
Coordinator mirrors any segment assignment made to the source Historical onto 
the target Historical, so that the target becomes an exact copy of the source. 
Segments on the target Historical do not count towards replica counts either. 
If the source disappears, the target remains in the last known state of the 
source server until removed from the configuration. <br/>Use this config with 
caution. All servers should eventually be removed from this list once the 
desired state on the respective Historicals is achieved. |none|
+|`queryBlocklist`| List of rules to block queries based on datasource, query 
type, and/or query context parameters. Each rule defines criteria that are 
combined with AND logic. Blocked queries return an HTTP 403 error. See [Query 
blocklist rules](#query-blocklist-rules) for details.|none|
+
+##### Query blocklist rules
+
+Query blocklist rules allow you to block specific queries based on datasource, 
query type, and/or query context parameters. This feature is useful for 
preventing expensive or problematic queries from impacting cluster performance.
+
+Each rule in the `queryBlocklist` array is a JSON object with the following 
properties:
+
+|Property|Description|Required|Default|
+|--------|-----------|--------|-------|
+|`ruleName`|Unique name identifying this blocklist rule. Used in error 
messages when queries are blocked.|Yes|N/A|
+|`dataSources`|List of datasource names to match. A query matches if it 
references any datasource in this list.|No|Matches all datasources|
+|`queryTypes`|List of query types to match (e.g., `scan`, `timeseries`, 
`groupBy`, `topN`). A query matches if its type is in this list.|No|Matches all 
query types|

Review Comment:
   I don't see a test for this `queryTypes` feature. Please add one.



##########
server/src/main/java/org/apache/druid/server/QueryLifecycle.java:
##########
@@ -121,6 +124,7 @@ public QueryLifecycle(
       final DefaultQueryConfig defaultQueryConfig,
       final AuthConfig authConfig,
       final PolicyEnforcer policyEnforcer,
+      @Nullable final BrokerViewOfCoordinatorConfig 
brokerViewOfCoordinatorConfig,

Review Comment:
   Better to keep things targeted and only pass in the 
`List<QueryBlocklistRule>`. It makes usage and testing easier.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to