Re: [PR] SOLR-18071 Support Stored Fields in Export Writer [solr]

via GitHub Fri, 23 Jan 2026 18:42:18 -0800


dsmiley commented on code in PR #4053:
URL: https://github.com/apache/solr/pull/4053#discussion_r2723464293



##########
solr/solr-ref-guide/modules/query-guide/pages/exporting-result-sets.adoc:
##########
@@ -23,6 +23,26 @@ This feature uses a stream sorting technique that begins to 
send records within
 
 The cases where this functionality may be useful include: session analysis, 
distributed merge joins, time series roll-ups, aggregations on high cardinality 
fields, fully distributed field collapsing, and sort-based stats.
 
+== Comparison with Cursors
+
+The `/export` handler offers several advantages over 
xref:pagination-of-results.adoc#fetching-a-large-number-of-sorted-results-cursors[cursor-based
 pagination] for streaming large result sets.
+
+With cursors, the query is re-executed for each page of results.
+In contrast, `/export` runs the filter query once and the resulting 
segment-level bitmasks are applied once per segment, after which the documents 
are simply iterated over.
+Additionally, the segments that existed when the stream was opened are held 
open for the duration of the export, eliminating the disappearing or duplicate 
document issues that can occur with cursors.
+The trade-off is that IndexReaders are kept around for longer periods of time.
+
+Another advantage of `/export` is significantly lower latency until the first 
document is returned, because the internal batch size is decoupled from the 
response message size.
+With cursors, you typically need to set the `rows` parameter to a high value 
(e.g., 100,000) to achieve decent throughput.
+However, this creates a "glugging" effect: when you request a large batch, 
Solr must build the entire payload and send it over the wire while your client 
waits.

Review Comment:
   bq. My recollection is that the full responses of each shard are merged 
together at once (not streamed) but I could be misunderstanding.
   
   Aaaah, ok right, you have a good point here!  I wasn't considering 
distributed-search, only a single shard's contribution.  Albeit comparing 
/export with cursor under apple's to apple's, it'd be single-shard to 
single-shard.  But the big picture changes things, so you're right about that 
in practice.  I suppose there's some useful need for a streaming expression 
aggregator that can cursor-mark over the shards individually to avoid that 
cost.  I looked for cursorMark usage in streaming expressions and I'm surprised 
to see none.  There will be even less need for such a thing after this PR!
   
   Any way, I don't think the pros/cons in this document needs to go into 
technical/internal depth; it's so rare that the ref guide speaks of such geeky 
internal things (e.g. searchers).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-18071 Support Stored Fields in Export Writer [solr]

Reply via email to