This is all really interesting, thanks for the detailed feedback (esp.
for trying out 9.1)! I think the optimization I mentioned from 9.1 is
not relevant, but I should have mentioned: the optimization in 9.1
only applies to `*:*` with sort-by-score. Including the
`sort=[anything-other-than-score]` param means sort is relevant, so
you can't take the optimization (which only applies when it's possible
to avoid sorting altogether) -- and sorting top-N for such a large
number of rows is definitely going to take some time.

I'm curious to know whether you see a difference on 9.1.1 if you omit
the `sort` param (defaulting to sort-by-score, which should allow the
sort-bypass optimization to be taken). But again, it doesn't look like
this accounts for the increased latency you're observing (in fact I'm
mainly curious because this might help narrow down where to look for
possible cause of the reported performance degradation in 9.1.1 --
definitely worth looking into).

"string" values for javabin are basically written direct bytes to an
OutputStream, whereas for the text response formats, string values'
lucene-native utf8 representation are converted to utf16 and then
written back as text (which is then of course converted into utf8 in
the context of the response). I suspect that if you're really chasing
top performance, it'd be well worth finding (or even just
implementing!) a client that "speaks" javabin.

Like Kevin, I'm also surprised by the difference between jetty and
Nginx serving static files. I'm curious, does your Nginx config have
`sendfile` directive enabled? ("By default, NGINX handles file
transmission itself and copies the file into the buffer before sending
it. Enabling the sendfile directive eliminates the step of copying the
data into the buffer and enables direct copying data from one file
descriptor to another.")

One thought: to further isolate "core" solr concerns from
serialization/network concerns, it could be useful to separate latency
according to before/after first bytes are received.



On Sun, Feb 26, 2023 at 8:11 PM Fikavec F <fika...@yandex.ru> wrote:
>
> David Smiley, sorry for my terminology, I’m used to calling a full data 
> fetching by small parts from DB table (collection) as "scrolling". Of course, 
> in Solr cursors (cursorMark) are designed for this and I use them. Large 
> "rows" values in my examples (measurements) are needed to show the speed at 
> which data from a 10 GB test collection is transmitted from Solr. When using 
> cursors and passing through a 250+ gigabyte collection, data is transmitted 
> at the same speed as with a single call /select, but with delays between 
> scrolls. In practice, I noticed that regardless of the performance of the 
> hardware (SAS disk vs RAMdisk; 1 Gbps vs 10 Gbps network; CPU with a core 
> frequency of 30% more), the full data fetching time does not change and data 
> transfer speed keep around 350 Megabits. At the same time, if instead of 
> select from Solr, start downloading the file via Solr Jetty from the folder 
> ".../solr-webapp/webapp/test.bin" the speed quickly goes beyond the gigabit. 
> And here it doesn't matter if we take top-X and how big this X is - it still 
> happens 4-8x slower than the capabilities of Jetty and Solr and that's not 
> good. It looks like the slowdown affects everything "responce writes" (json, 
> xml, csv, python) I've tested except javabin, which just demonstrates that a 
> bottleneck is possible not deep in Solr, but somewhere at the level of 
> transformation-data transfer by the "responce writers" (except javabin). I 
> don't know where to look for the problem further than the FastWriter output 
> buffer, but I hope the specialists will succeed.
>
> To the remark of Michael Gibney, I tested the speed of Solr 9.1.1. Strangely, 
> it turned out to be much slower than Solr 8.11.2, even if use /export, 
> DocValues and javabin responce writer:
>
> 1.50 Gb/s - HANDLER /select;  wt=javabin; with stored=true docValues=false 
> field
> 489 Mb/s  - HANDLER /select;  wt=csv;       with stored=true docValues=false 
> field
> 459 Mb/s  - HANDLER /select;  wt=json;       with stored=true docValues=false 
> field
> 433 Mb/s  - HANDLER /export;  wt=javabin; works only on docValues=true feild
> 194 Mb/s  - HANDLER /select;  wt=json;    with stored=false docValues=true 
> field
>
> All conditions are the same as before, just Solr 9.1.1 is installed on the 
> ram disk (java --version - OpenJDK 64-Bit Server VM (build 
> 11.0.17+8-post-Ubuntu-1ubuntu220.04; OS - Ubuntu 20.04.3 LTS; 
> SOLR_JAVA_MEM="-Xms8g -Xmx8g" running in cloud mode, other - defaults). It is 
> not so difficult to create ram disk and try to repeat the above commands on 
> 127.0.0.1 or at least in the gigabit network to see how far the speed from 
> gigabit in the absence of a bottleneck in the disk or network.
>
> All measurement results:
>
>
> <<<<----- SOLR 9.1.1 tests ----->>>>
> ---[ /select HANDLER ] ---
> /select HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' --post-data 
> 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' 
> http://192.168.220.135:8983/solr/test_collection/select
> 10.03G in 3m 8s
> 2023-02-26 19:09:45 (459 Mb/s) - ‘/dev/null’ saved [10772687921]
>
> /select HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' --post-data 
> 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' 
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in 57s
> 2023-02-26 19:26:55 (1.50 Gb/s) - ‘/dev/null’ saved [10749142324]
>
> /select HANDLER wt=csv
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' --post-data 
> 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' 
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in 2m 56s
> 2023-02-26 19:30:06 (489 Mb/s) - ‘/dev/null’ saved [10751204971]
>
> # 2. Experiments with docValues=true stored=false (for testing /export 
> HANDLER)
> ---[ /select HANDLER ] ---
> /select HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' --post-data 
> 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' 
> http://192.168.220.135:8983/solr/test_collection/select
> 10.03G in 7m 24s
> 2023-02-26 20:35:55 (194 Mb/s) - ‘/dev/null’ saved [10772687921]
>
> /select HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' --post-data 
> 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' 
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in 4m 13s
> 2023-02-26 20:45:00 (340 Mb/s) - ‘/dev/null’ saved [10749142324]
>
> /select HANDLER wt=csv
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' --post-data 
> 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' 
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in 7m 5s
> 2023-02-26 21:25:35 (202 Mb/s) - ‘/dev/null’ saved [10751204971]
>
> ---[ /export HANDLER ] ---
> /export HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' 
> "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=json&q=*%3A*&sort=id%20desc&fl=id,text_sn";
> 10.01G in 4m 7s
> 2023-02-26 21:32:40 (349 Mb/s) - ‘/dev/null’ saved [10751758398]
>
> /export HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' 
> "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=javabin&q=*%3A*&sort=id%20desc&fl=id,text_sn";
> 10.00G in 3m 18s
> 2023-02-26 21:37:38 (433 Mb/s) - ‘/dev/null’ saved [10742601804]
>
> /export HANDLER wt=csv
> wget --report-speed=bits --server-response -O /dev/null 
> --header='Accept-Encoding: ' 
> "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=csv&q=*%3A*&sort=id%20desc&fl=id,text_sn";
> 10.01G in 3m 59s
> 2023-02-26 21:42:33 (360 Mb/s) - ‘/dev/null’ saved [10751758398]
>
>
> Best Regards,
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to