[ 
https://issues.apache.org/jira/browse/SOLR-15252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325995#comment-17325995
 ] 

Chris M. Hostetter commented on SOLR-15252:
-------------------------------------------

A few thoughts in no particular order...
 * if we're going to be opinionated about people using a large value for 
{{rows}} because it leads to large allocation for the matches, then what we 
should actually be opinionated about is the size of "{{start + rows}}" since 
that's what really matters internally.
 ** {{start=0&rows=100001}} is no worse then {{start=100000&rows=1}}
 * any opinionated values we have in the code that produce warnings/failures 
should have some config option to override that opinion.
 * if we're going to log warnings that tell solr _administrators_ "your 
_clients_ are sending you queries that are hozing your system performance" 
there should be at least 2 distinct actions those administrators can take to 
act on this warning:
 ** override that opinion to suppress/ignore the warning
 ** block the bad behavior (and improve sytem performance)
 * In general i dislike the "log at most once a minute" concept ...
 ** for things that are checked once on startup, or once on core/plugin config, 
then i think a "log once" pattern (explaining why something configured isn't 
going to work the way you might expect) isn't too terrible – but for something 
request specific this feels like it will be very confusing for solr 
admins/users.
 ** If i'm an app developer watching the logs while i build my app and do some 
test queries, i might see a WARNing for one request, but then not again for a 
very similar request - or even for the exact same request - but then a few 
minutes (because i might not run these "big rows" queries in stead state) i see 
the WARN again ... but then it still doesn't reproduce.
 ** even if the WARN messages include info like "this message will be logged at 
most once every 60 seconds" .. i may never notice them in my testing (because i 
only look at the logs "in between" when solr decides to emit the WARN) but then 
later when i "go live" with my deployment my logs are full and I'm cursing solr 
for not making this more obvious during small scale testing

----
 

I think that in general if we want to be opinionated then those opinions sould 
cause failures unless the solr-admins override our opinions.

if we want to "do something" to help address performance risks for "big rows 
values" – or more specifically "big start+rows combinations" we should:
 * support a configurable upper bound via some QueryComponent.init option – for 
sake of argument call it: {{maxStartPlusRows}}
 ** have a reasonable hardcoded default if config option is not set
 ** have ref-guide docs making it clear how users can declare a {{"query"}} 
component instance to override the implicit default instance (with it's 
hardcoded default {{maxStartPlusRows}} )
 * QueryComponent.prepare should fail any request where the start+rows exceeds 
{{maxStartPlusRows}}
 * for the sake of completeness, we may also want to support independent 
{{maxStart}} and {{maxRows}} for deployments that want to say "regardless of 
how big your pages are, i never want an individual query trying to go deeper 
then page X" or "regardless of how deep you paginate, i never want an 
individual query trying to return more then X rows per page over the wire"

> Solr should log WARN log when a query requests huge rows number
> ---------------------------------------------------------------
>
>                 Key: SOLR-15252
>                 URL: https://issues.apache.org/jira/browse/SOLR-15252
>             Project: Solr
>          Issue Type: Improvement
>          Components: query
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have all seen it - clients that use Integer.MAX_VALUE or 10000000 as rows 
> parameter, to just make sure they get all possible results. And this of 
> course leads to high GC pauses since Lucene allocates an array up front to 
> hold results.
> Solr should either log WARN when it encounters a value above a certain 
> threshold, such as 100k (then you should use cursormark instead). Or it 
> should simply respond with 400 error and have a system property or query 
> parameter folks can use to override if they know what they are doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to