On Sun, Mar 19, 2023 at 8:39 AM Fikavec F <[email protected]> wrote:
> I was able to create a collection with "solr.SimpleTextCodecFactory"
> codecFactory and solr can proces (return) only 2x more documents per second
> from it (214 410 documents per second vs 115 000 "solr.SchemaCodecFactory"
> with compression).
>
I expected much much more, because this is a simple iteration and sending
> small fields to the output. Is this enough to make sure that the Solr limit
> of processing 115 000 documents per second is not due only to compression,
> but something else? Or is the speed of SimpleTextCodecFactory in this case
> not an indicator for correct testing and yet it is necessary to create my
> own codecFactory class without compression?
>
SimpleTextCodecFactory is more for demonstration and clear-text readability
of the data on disk. For educational purposes; it's a "toy" of sorts. So
It's impressive you doubled the performance of your use-case with it.
> I also tried to create a collection with a standard codec of 8 shards for
> the test, the documents iteration rate is the same about 115 000 small
> documents per second.
>
Interesting. This suggests to me the aggregation node is limiting things.
This is implemented by SearchHandler, and I don't imagine any simple
changes to make it operate differently.
I think Solr's "streaming expressions" capability (really a set of
capabilities) is much closer to this use-case but I looked around and I
think you'll probably have the same limitation with the "select"
expression. I was hoping it would send a "/select" to each shard with
distrib=false to bypass SearchHandler's distributed search but no. I could
imagine improvements there.
> P.S. As a <codecFactory class="my.Lucene87CodecWithNoFieldCompression"/>
> in solrconfig.xml, I currently can't connect even a simple codec layer:
>
> package my;
> import org.apache.lucene.codecs.FilterCodec;
> import org.apache.lucene.codecs.StoredFieldsFormat;
> import org.apache.lucene.codecs.lucene87.Lucene87Codec;
> import org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat;
>
> public final class Lucene87CodecWithNoFieldCompression extends FilterCodec
> {
> private final StoredFieldsFormat storedFieldsFormat;
>
> public Lucene87CodecWithNoFieldCompression() {
> super("Lucene87CodecWithNoFieldCompression", new Lucene87Codec());
> storedFieldsFormat = new Lucene87StoredFieldsFormat();
> }
> @Override
> public StoredFieldsFormat storedFieldsFormat() {
> return storedFieldsFormat;
> }
> @Override
> public String toString() {
> return getClass().getSimpleName();
> }
> }
>
> At a glance it looks good. What error do you get?
~ David