RE: Programmatic Basic Auth on CloudSolrClient

2021-03-05 Thread Subhajit Das

Hi Tomas,

Tried your suggestion. But last suggestion (directly passing the httpclient) 
resilts in NonRepeatableRequestException. And using full step, also didn’t 
recognize the auth.

Anything I should look for?

Thanks,
Subhajit

From: Tomás Fernández Löbbe<mailto:tomasflo...@gmail.com>
Sent: 05 March 2021 04:23 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Re: Programmatic Basic Auth on CloudSolrClient

Ah, right, now I remember that something like this was possible with the
"http1" version of the clients, which is why I created the Jira issues for
the http2 ones. Maybe you can even skip the "LBHttpSolrClient" step, I
believe you can just pass the HttpClient to the CloudSolrClient? you will
have to make sure to close all the clients that are created externally
after done, since the Solr client won't in this case.

On Thu, Mar 4, 2021 at 1:22 PM Mark H. Wood  wrote:

> On Wed, Mar 03, 2021 at 10:34:50AM -0800, Tomás Fernández Löbbe wrote:
> > As far as I know the current OOTB options are system properties or
> > per-request (which would allow you to use different per collection, but
> > probably not ideal if you do different types of requests from different
> > parts of your code). A workaround (which I've used in the past) is to
> have
> > a custom client that overrides and sets the credentials in the "request"
> > method (you can put whatever logic there to identify which credentials to
> > use). I recently created
> https://issues.apache.org/jira/browse/SOLR-15154
> > and https://issues.apache.org/jira/browse/SOLR-15155 to try to address
> this
> > issue in future releases.
>
> I have not tried it, but could you not:
>
> 1. set up an HttpClient with an appropriate CredentialsProvider;
> 2. pass it to HttpSolrClient.Builder.withHttpClient();
> 2. pass that Builder to
> LBHttpSolrClient.Builder.withHttpSolrClientBuilder();
> 3. pass *that* Builder to
> CloudSolrClient.Builder.withLBHttpSolrClientBuilder();
>
> Now you have control of the CredentialsProvider and can have it return
> whatever credentials you wish, so long as you still have a reference
> to it.
>
> > On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das 
> wrote:
> >
> > >
> > > Hi There,
> > >
> > > Is there any way to programmatically set basic authentication
> credential
> > > on CloudSolrClient?
> > >
> > > The only documentation available is to use system property. This is not
> > > useful if two collection required two separate set of credentials and
> they
> > > are parallelly accessed.
> > > Thanks in advance.
> > >
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu<http://www.ulib.iupui.edu>
>



Re: Investigating Seeming Deadlock

2021-03-05 Thread Mike Drob
Were you having any OOM errors beforehand? If so, that could have caused
some GC of objects that other threads still expect to be reachable, leading
to these null monitors.

On Fri, Mar 5, 2021 at 12:55 PM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> I'm investigating a node on solr 8.3.1 running in cloud mode which appears
> to have deadlocked, and I'm trying to figure out if this is a known issue
> or not, and looking for some guidance in understanding both (a) whether
> this is a resolved issue in future releases or needs a bug, and (b) how to
> lower the risk of recurrence until it is fixed.
>
> Here is what I've observed:
>
>- strace shows the main process waiting. A spot check on child processes
>shows the same, though I did not deep dive all of the threads yet (there
>are over 100).
>- the server was not doing anything or busy, except for jvm sitting at
>constant memory usage. No resource of memory, swap, cpu, etc... was
> limited
>or showing active usage.
>- jcmd Thread.Print shows some interesting info which suggests a
>deadlock or another type of locking issue
>   - For example, I found this log suggests something unusual because it
>   looks like it's trying to lock a null object
>  - "Finalizer" #3 daemon prio=8 os_prio=0 cpu=11.11ms
>  elapsed=11.11s tid=0x0100 nid=0x in
> Object.wait()
>   [0x1000]
> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(java.base@11.0.7/Native Method)
>  - waiting on 
>  at java.lang.ref.ReferenceQueue.remove(java.base@11.0.7
>  /ReferenceQueue.java:155)
>  - waiting to re-lock in wait() <0x00020020> (a
>  java.lang.ref.ReferenceQueue$Lock)
>  at java.lang.ref.ReferenceQueue.remove(java.base@11.0.7
>  /ReferenceQueue.java:176)
>  at
>  java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.7
>  /Finalizer.java:170)
>  - I also see a lot of this. Some addressess occur multiple times,
>   but one in particular occurs 31 times. Maybe related?
>  - "h2sc-1-thread-11" #110 prio=5 os_prio=0 cpu=54.29ms
>  elapsed=11.11s tid=0x10010100 nid=0x waiting
> on condition
>   [0x10011000]
> java.lang.Thread.State: WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native
>  Method)
>  - parking to wait for  <0x00030033>
>
> Can anyone help answer whether this is known or what I could look at next?
>
> Thanks!
> Stephen
>


Re: What controls field cache size and eviction rates?

2021-03-05 Thread Stephen Lewis Bianamara
Should say -- Can anyone confirm if it's right *still*, since the article
is 10 years old :)

On Fri, Mar 5, 2021 at 10:36 AM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> Just following up here with an update. I found this article which goes
> into depth on the field cache though stops short of discussing how it
> handles eviction. Can anyone confirm if this info is right?
>
> https://lucidworks.com/post/scaling-lucene-and-solr/
>
>
> Also, can anyone speak to how the field cache handles evictions?
>
> Best,
> Stephen
>
> On Wed, Feb 24, 2021 at 4:43 PM Stephen Lewis Bianamara <
> stephen.bianam...@gmail.com> wrote:
>
>> Hi SOLR Community,
>>
>> I've been trying to understand how the field cache in SOLR manages
>> its evictions, and it is not easily readable from the code or documentation
>> the simple question of when and how something gets evicted from the field
>> cache. This cache also doesn't show hit ratio, total hits, eviction ratio,
>> total evictions, etc... in the web UI.
>>
>> For example: I've observed that if I write one document and trigger a
>> query with a sort on the field, it will generate two entries in the field
>> cache. Then if I repush the document, the entries get removed, but will
>> otherwise stay there seemingly forever. If my query matches 2 docs, same
>> thing but with 4 entries (2 each). Then, if I rewrite one of the docs,
>> those two entries go away but not the two from the first one. This
>> obviously implies that there are implications to write throughput
>> performance based on this cache, so the fact that it is not configurable by
>> the user and doesn't have very clear documentation is a bit worrisome.
>>
>> Can someone here help out and explain how the filter cache handles
>> evictions, or perhaps send me the documentation if I missed it?
>>
>>
>> Thanks!
>> Stephen
>>
>


Re: What controls field cache size and eviction rates?

2021-03-05 Thread Stephen Lewis Bianamara
Hi SOLR Community,

Just following up here with an update. I found this article which goes into
depth on the field cache though stops short of discussing how it handles
eviction. Can anyone confirm if this info is right?

https://lucidworks.com/post/scaling-lucene-and-solr/


Also, can anyone speak to how the field cache handles evictions?

Best,
Stephen

On Wed, Feb 24, 2021 at 4:43 PM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> I've been trying to understand how the field cache in SOLR manages
> its evictions, and it is not easily readable from the code or documentation
> the simple question of when and how something gets evicted from the field
> cache. This cache also doesn't show hit ratio, total hits, eviction ratio,
> total evictions, etc... in the web UI.
>
> For example: I've observed that if I write one document and trigger a
> query with a sort on the field, it will generate two entries in the field
> cache. Then if I repush the document, the entries get removed, but will
> otherwise stay there seemingly forever. If my query matches 2 docs, same
> thing but with 4 entries (2 each). Then, if I rewrite one of the docs,
> those two entries go away but not the two from the first one. This
> obviously implies that there are implications to write throughput
> performance based on this cache, so the fact that it is not configurable by
> the user and doesn't have very clear documentation is a bit worrisome.
>
> Can someone here help out and explain how the filter cache handles
> evictions, or perhaps send me the documentation if I missed it?
>
>
> Thanks!
> Stephen
>


Re: Caffeine Cache Metrics Broken?

2021-03-05 Thread Stephen Lewis Bianamara
Thanks Shawn. Something seems different between the two because Caffeine
Cache is having much higher volume per hour than our previous
implementation was. So I guess it is then more likely that it is something
actually expected due to a change in what is getting kept/warmed, so I'll
look into this more and get back to you if that doesn't end up making sense
based on what I observe.

Thanks again,
Stephen

On Tue, Mar 2, 2021 at 6:35 PM Shawn Heisey  wrote:

> On 3/2/2021 3:47 PM, Stephen Lewis Bianamara wrote:
> > I'm investigating a weird behavior I've observed in the admin page for
> > caffeine cache metrics. It looks to me like on the older caches, warm-up
> > queries were not counted toward hit/miss ratios, which of course makes
> > sense, but on Caffeine cache it looks like they are. I'm using solr 8.3.
> >
> > Obviously this makes measuring its true impact a little tough. Is this by
> > any chance a known issue and already fixed in later versions?
>
> The earlier cache implementations are entirely native to Solr -- all the
> source code is include in the Solr codebase.
>
> Caffeine is a third-party cache implementation that has been integrated
> into Solr.  Some of the metrics might come directly from Caffeine, not
> Solr code.
>
> I would expect warming queries to be counted on any of the cache
> implementations.  One of the reasons that the warming capability exists
> is to pre-populate the caches before actual queries begin.  If warming
> queries are somehow excluded, then the cache metrics would not be correct.
>
> I looked into the code and did not find anything that would keep warming
> queries from affecting stats.  But it is always possible that I just
> didn't know what to look for.
>
> In the master branch (Solr 9.0), CaffeineCache is currently the only
> implementation available.
>
> Thanks,
> Shawn
>


Re: org.apache.solr.common.SolrException: this IndexWriter is closed

2021-03-05 Thread Dominique Bejean
Hi,
You are using RAMDirectoryFactory without enough RAM ?
regards
Dominique

Le ven. 5 mars 2021 à 16:18, 李世明  a écrit :

> Hello:
>
> Have you encountered the following exception that will cause the index to
> not be written? But you can query
> Version:8.7.0
>
> org.apache.solr.common.SolrException: this IndexWriter is closed
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:234)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
> at
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:500)
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at
> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at org.eclipse.jetty.server.HttpChannel.run(HttpChannel.java:335)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
> at
> org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
> at
> org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
> at
> org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
> at org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: 

Re: new tlog files are not created per commit but adding into latest existing tlog file after replica reload

2021-03-04 Thread Michael Hu
Hi experts:

After I sent out previous email, I issued commit on that replica core and 
observed the same "ClosedChannelException", please refer to below under 
"issuing core commit" section

Then I issued a core reload, and I see the timestamp of the latest tlog file 
changed, please refer to "files under tlog directory " section below. Not sure 
those information is useful or not.

Thank you!

--Michael Hu

--- beginning for issuing core commit ---

$ curl 
'http://localhost:8983/solr/myconection_myshard_replica_t7/update?commit=true'

{

  "responseHeader":{

"status":500,

"QTime":71},

  "error":{

"metadata":[

  "error-class","org.apache.solr.common.SolrException",

  "root-error-class","java.nio.channels.ClosedChannelException"],

"msg":"java.nio.channels.ClosedChannelException",

"trace":"org.apache.solr.common.SolrException:

--- end for issuing core commit ---

--- beginning for files under tlog directory ---
before core reload:

-rw-r--r-- 1 solr solr   47527321 Mar  4 20:14 tlog.877

-rw-r--r-- 1 solr solr   42614907 Mar  4 20:14 tlog.878

-rw-r--r-- 1 solr solr   37524663 Mar  4 20:14 tlog.879

-rw-r--r-- 1 solr solr   44067997 Mar  4 20:14 tlog.880

-rw-r--r-- 1 solr solr   33209784 Mar  4 20:15 tlog.881

-rw-r--r-- 1 solr solr   55435186 Mar  4 20:15 tlog.882

-rw-r--r-- 1 solr solr 2179991713 Mar  4 20:29 tlog.883


after core reload:

-rw-r--r-- 1 solr solr   47527321 Mar  4 20:14 tlog.877
-rw-r--r-- 1 solr solr   42614907 Mar  4 20:14 tlog.878
-rw-r--r-- 1 solr solr   37524663 Mar  4 20:14 tlog.879
-rw-r--r-- 1 solr solr   44067997 Mar  4 20:14 tlog.880
-rw-r--r-- 1 solr solr   33209784 Mar  4 20:15 tlog.881
-rw-r--r-- 1 solr solr   55435186 Mar  4 20:15 tlog.882
-rw-r--r-- 1 solr solr 2179991717 Mar  4 22:23 tlog.883


--- end for files under tlog directory ---



From: Michael Hu 
Sent: Thursday, March 4, 2021 1:58 PM
To: solr-user@lucene.apache.org 
Subject: new tlog files are not created per commit but adding into latest 
existing tlog file after replica reload

Hi experts:

Need some help and suggestion about an issue I am facing

Solr info:
 - Solr 8.7
 - Solr cloud with tlog replica; replica size is 3 for my Solr collection

Issue:
 - before issuing collection reload; I observed a new tlog file are created 
after every commit; and those tlog files are deleted after a while (may be 
after index are merged?)
 - then I issued a collection reload using collection API on my collection at 
20:15
 - after leader replica is reloaded; no new tlog file are created; instead 
latest tlog file is growing, and no tlog file is deleted after reload. Below 
under "files under tlog directory" section is a snapshot of the tlog files 
under tlog directory of the leader replica. Again, I issued collection reload 
at 20:15, and after that tlog.883 is growing
 - I looked into log file and find error log entries below under "log entries" 
section, and the log entry repeats continuously for every auto commit after 
reload. I hope this log entry can provide some information for the issue.

Please help and suggestion what I may do incorrectly. Or this is a known issue, 
is there a way I can fix or work-around it?

Thank you so much!

--Michael Hu

--- beginning for files under tlog directory ---

-rw-r--r-- 1 solr solr   47527321 Mar  4 20:14 tlog.877

-rw-r--r-- 1 solr solr   42614907 Mar  4 20:14 tlog.878

-rw-r--r-- 1 solr solr   37524663 Mar  4 20:14 tlog.879

-rw-r--r-- 1 solr solr   44067997 Mar  4 20:14 tlog.880

-rw-r--r-- 1 solr solr   33209784 Mar  4 20:15 tlog.881

-rw-r--r-- 1 solr solr   55435186 Mar  4 20:15 tlog.882

-rw-r--r-- 1 solr solr 2179991713 Mar  4 20:29 tlog.883

--- end for files under tlog directory ---

--- beginning for log entries ---

2021-03-04 20:15:38.251 ERROR (commitScheduler-4327-thread-1) [c:mycollection 
s:myshard r:core_node10 x:mycolletion_myshard_replica_t7] o.a.s.u.CommitTracker 
auto commit error...:

org.apache.solr.common.SolrException: java.nio.channels.ClosedChannelException

at 
org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:503)

at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:835)

at org.apache.solr.update.UpdateLog.preCommit(UpdateLog.java:819)

at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:673)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:273)

at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)

at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)

  

Re: Programmatic Basic Auth on CloudSolrClient

2021-03-04 Thread Tomás Fernández Löbbe
Ah, right, now I remember that something like this was possible with the
"http1" version of the clients, which is why I created the Jira issues for
the http2 ones. Maybe you can even skip the "LBHttpSolrClient" step, I
believe you can just pass the HttpClient to the CloudSolrClient? you will
have to make sure to close all the clients that are created externally
after done, since the Solr client won't in this case.

On Thu, Mar 4, 2021 at 1:22 PM Mark H. Wood  wrote:

> On Wed, Mar 03, 2021 at 10:34:50AM -0800, Tomás Fernández Löbbe wrote:
> > As far as I know the current OOTB options are system properties or
> > per-request (which would allow you to use different per collection, but
> > probably not ideal if you do different types of requests from different
> > parts of your code). A workaround (which I've used in the past) is to
> have
> > a custom client that overrides and sets the credentials in the "request"
> > method (you can put whatever logic there to identify which credentials to
> > use). I recently created
> https://issues.apache.org/jira/browse/SOLR-15154
> > and https://issues.apache.org/jira/browse/SOLR-15155 to try to address
> this
> > issue in future releases.
>
> I have not tried it, but could you not:
>
> 1. set up an HttpClient with an appropriate CredentialsProvider;
> 2. pass it to HttpSolrClient.Builder.withHttpClient();
> 2. pass that Builder to
> LBHttpSolrClient.Builder.withHttpSolrClientBuilder();
> 3. pass *that* Builder to
> CloudSolrClient.Builder.withLBHttpSolrClientBuilder();
>
> Now you have control of the CredentialsProvider and can have it return
> whatever credentials you wish, so long as you still have a reference
> to it.
>
> > On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das 
> wrote:
> >
> > >
> > > Hi There,
> > >
> > > Is there any way to programmatically set basic authentication
> credential
> > > on CloudSolrClient?
> > >
> > > The only documentation available is to use system property. This is not
> > > useful if two collection required two separate set of credentials and
> they
> > > are parallelly accessed.
> > > Thanks in advance.
> > >
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


Re: Programmatic Basic Auth on CloudSolrClient

2021-03-04 Thread Mark H. Wood
On Wed, Mar 03, 2021 at 10:34:50AM -0800, Tomás Fernández Löbbe wrote:
> As far as I know the current OOTB options are system properties or
> per-request (which would allow you to use different per collection, but
> probably not ideal if you do different types of requests from different
> parts of your code). A workaround (which I've used in the past) is to have
> a custom client that overrides and sets the credentials in the "request"
> method (you can put whatever logic there to identify which credentials to
> use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
> and https://issues.apache.org/jira/browse/SOLR-15155 to try to address this
> issue in future releases.

I have not tried it, but could you not:

1. set up an HttpClient with an appropriate CredentialsProvider;
2. pass it to HttpSolrClient.Builder.withHttpClient();
2. pass that Builder to LBHttpSolrClient.Builder.withHttpSolrClientBuilder();
3. pass *that* Builder to CloudSolrClient.Builder.withLBHttpSolrClientBuilder();

Now you have control of the CredentialsProvider and can have it return
whatever credentials you wish, so long as you still have a reference
to it.

> On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das  wrote:
> 
> >
> > Hi There,
> >
> > Is there any way to programmatically set basic authentication credential
> > on CloudSolrClient?
> >
> > The only documentation available is to use system property. This is not
> > useful if two collection required two separate set of credentials and they
> > are parallelly accessed.
> > Thanks in advance.
> >

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Get first value in a multivalued field

2021-03-04 Thread Walter Underwood
You can copy the field to another field, then use the 
FirstFieldValueUpdateProcessorFactory to limit that field to the first value. 
At least, that seems to be what that URP does. I have not used it.

https://solr.apache.org/guide/8_8/update-request-processors.html

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 4, 2021, at 11:42 AM, ufuk yılmaz  wrote:
> 
> Hi,
> 
> Is it possible in any way to get the first value in a multivalued field? 
> Using function queries, streaming expressions or any other way without 
> reindexing? (Stream decorators have array(), but no way to get a value at a 
> specific index?)
> 
> Another one, is it possible to match a regex to a text field and extract only 
> the matching part?
> 
> I tried very hard for this too but couldn’t find a way.
> 
> --ufuk
> 
> Sent from Mail for Windows 10
> 



Re: wordpress anyone?

2021-03-04 Thread dmitri maziuk

On 2021-03-03 10:24 PM, Gora Mohanty wrote:

... there does seem to be another plugin that is

open-source,and hosted on Github: https://wordpress.org/plugins/solr-power/


I saw it, they lost me at

"you'll need access to a functioning Solr 3.6 instance for the plugin to 
work as expected. This plugin does not support other versions of Solr."


Dima



Re: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-04 Thread Ere Maijala

Hi,

Solr uses JIRA for issue tickets. You can find it here: 
https://issues.apache.org/jira/browse/SOLR


I'd suggest filing a new bug issue in the SOLR project (note that 
several other projects also use this JIRA installation). Here's an 
example of an existing highlighter issue for reference: 
https://issues.apache.org/jira/browse/SOLR-14019.


See also some brief documentation:

https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker)

Regards,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.58:

Hi Ere

Please to be of service!

No I have not filed a JIRA ticket. I am new to interacting with the Solr
Community and only beginning to 'find my legs'. I am not too sure what JIRA
is I am afraid!

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.



-Original Message-
From: Ere Maijala 
Sent: 01 March 2021 12:53
To: solr-user@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole lot
of trouble. Did you file a JIRA ticket already?

Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:

Hi There

I just came across a situation where a unified highlighting search
under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times

out.

I resolved it by a config change – but it can catch you out. Hence
this email.

With solr 8.8.0 a new unified highlighting parameter
 was implemented which if not set defaults to 0.5.
This attempts to improve the high lighting so that highlighted text
does not appear right at the left. This works well but if you have a
search result with numerous occurrences of the word in question within
the record performance goes right down!

2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select
params={hl.snippets=2=test=on=100=id,d
escription,specification,score=20=*=10&_=161440511913
4}
hits=57008 status=0 QTime=1414320

2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.s.HttpSolrCall Unable to write response, client closed
connection or we are shutting down =>
org.eclipse.jetty.io.EofException

at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

org.eclipse.jetty.io.EofException: null

at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

at
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

at
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

when I set =0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.25=100=2=test
axAnalyzedChars=100=*=unified=9&_=
1614430061690}
hits=136939 status=0 QTime=87024

And  =0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.1=100=2=test
xAnalyzedChars=100=*=unified=9&_=1
614430061690}
hits=136939 status=0 QTime=69033

And =0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.0=100=2=test
xAnalyzedChars=100=*=unified=9&_=1
614430061690}
hits=136939 status=0 QTime=2841

I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully
left aligned).  I am not too sure as to how many time a word has to
occur in a record for performance to go right down – but if too many
it can have a BIG impact.

I also noticed that setting =9 did not break out of
the query until it finished. Perhaps because the query finished
quickly and what took the time was the highlighting. It might be an
idea to get  to also cover any highlighting so that the
query does not run until the jetty timeout is hit. The machine 100%
one core for about
20 mins!.

Hope this helps.

Regards

Matthew

*Matthew Flowerday*| Consultant | ULEAF

Unisys | 01908 774830| matthew.flower...@unisys.com
<mailto:matthew.flower...@unisys.com>

Address Enigma | Wavendon Business Park |

Re: wordpress anyone?

2021-03-03 Thread Gora Mohanty
On Thu, 4 Mar 2021 at 01:50, dmitri maziuk  wrote:

> Hi all,
>
> does anyone use Solr with WP? It seems there is one for-pay-only
> offering and a few defunct projects from a decade ago... a great web
> search engine is particularly useful if it can actually be used in a
> client.
>
> So has anyone heard about any active WP integration projects other than
> wpsolr.com?
>

Haven't had occasion to use Wordpress, and Solr with it, for a while. Since
nobody else has replied, there does seem to be another plugin that is
open-source,and hosted on Github: https://wordpress.org/plugins/solr-power/
. Cannot comment as to how well it works. Alternatively, one could use a
PHP client library like Solarium.

Regards,
Gora


RE: Programmatic Basic Auth on CloudSolrClient

2021-03-03 Thread Subhajit Das
Thanks. This would be very helpful.

From: Tomás Fernández Löbbe<mailto:tomasflo...@gmail.com>
Sent: 04 March 2021 12:32 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Re: Programmatic Basic Auth on CloudSolrClient

Maybe something like this (I omitted a lot of things you'll have to do,
like passing zk or the list of hosts):

static class CustomCloudSolrClient extends CloudSolrClient {

  protected CustomCloudSolrClient(CustomCloudSolrClientBuilder builder) {
super(builder);
  }

  @Override
  public NamedList request(SolrRequest request, String
collection) throws SolrServerException, IOException {
// your logic here to figure out which credentials to use...
String user = "user";
String pass = "pass";
request.setBasicAuthCredentials(user, pass);
return super.request(request, collection);
  }
}

static class CustomCloudSolrClientBuilder extends CloudSolrClient.Builder {

  @Override
  public CloudSolrClient build() {
return new CustomCloudSolrClient(this);
  }
}

public static void main(String[] args) {
  CloudSolrClient c = new CustomCloudSolrClientBuilder().build();
  ...
}

Do consider that "request" method is called per request, make sure whatever
logic you have there is not super expensive.

On Wed, Mar 3, 2021 at 10:48 AM Subhajit Das 
wrote:

> Hi Thomas,
>
> Thanks. Can you please also share a sample of code to configure the client
> with your workaround?
>
> From: Tomás Fernández Löbbe<mailto:tomasflo...@gmail.com>
> Sent: 04 March 2021 12:05 AM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Subject: Re: Programmatic Basic Auth on CloudSolrClient
>
> As far as I know the current OOTB options are system properties or
> per-request (which would allow you to use different per collection, but
> probably not ideal if you do different types of requests from different
> parts of your code). A workaround (which I've used in the past) is to have
> a custom client that overrides and sets the credentials in the "request"
> method (you can put whatever logic there to identify which credentials to
> use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
> and https://issues.apache.org/jira/browse/SOLR-15155 to try to address
> this
> issue in future releases.
>
> On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das 
> wrote:
>
> >
> > Hi There,
> >
> > Is there any way to programmatically set basic authentication credential
> > on CloudSolrClient?
> >
> > The only documentation available is to use system property. This is not
> > useful if two collection required two separate set of credentials and
> they
> > are parallelly accessed.
> > Thanks in advance.
> >
>
>



Re: Programmatic Basic Auth on CloudSolrClient

2021-03-03 Thread Tomás Fernández Löbbe
Maybe something like this (I omitted a lot of things you'll have to do,
like passing zk or the list of hosts):

static class CustomCloudSolrClient extends CloudSolrClient {

  protected CustomCloudSolrClient(CustomCloudSolrClientBuilder builder) {
super(builder);
  }

  @Override
  public NamedList request(SolrRequest request, String
collection) throws SolrServerException, IOException {
// your logic here to figure out which credentials to use...
String user = "user";
String pass = "pass";
request.setBasicAuthCredentials(user, pass);
return super.request(request, collection);
  }
}

static class CustomCloudSolrClientBuilder extends CloudSolrClient.Builder {

  @Override
  public CloudSolrClient build() {
return new CustomCloudSolrClient(this);
  }
}

public static void main(String[] args) {
  CloudSolrClient c = new CustomCloudSolrClientBuilder().build();
  ...
}

Do consider that "request" method is called per request, make sure whatever
logic you have there is not super expensive.

On Wed, Mar 3, 2021 at 10:48 AM Subhajit Das 
wrote:

> Hi Thomas,
>
> Thanks. Can you please also share a sample of code to configure the client
> with your workaround?
>
> From: Tomás Fernández Löbbe<mailto:tomasflo...@gmail.com>
> Sent: 04 March 2021 12:05 AM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Subject: Re: Programmatic Basic Auth on CloudSolrClient
>
> As far as I know the current OOTB options are system properties or
> per-request (which would allow you to use different per collection, but
> probably not ideal if you do different types of requests from different
> parts of your code). A workaround (which I've used in the past) is to have
> a custom client that overrides and sets the credentials in the "request"
> method (you can put whatever logic there to identify which credentials to
> use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
> and https://issues.apache.org/jira/browse/SOLR-15155 to try to address
> this
> issue in future releases.
>
> On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das 
> wrote:
>
> >
> > Hi There,
> >
> > Is there any way to programmatically set basic authentication credential
> > on CloudSolrClient?
> >
> > The only documentation available is to use system property. This is not
> > useful if two collection required two separate set of credentials and
> they
> > are parallelly accessed.
> > Thanks in advance.
> >
>
>


RE: Programmatic Basic Auth on CloudSolrClient

2021-03-03 Thread Subhajit Das
Hi Thomas,

Thanks. Can you please also share a sample of code to configure the client with 
your workaround?

From: Tomás Fernández Löbbe<mailto:tomasflo...@gmail.com>
Sent: 04 March 2021 12:05 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Re: Programmatic Basic Auth on CloudSolrClient

As far as I know the current OOTB options are system properties or
per-request (which would allow you to use different per collection, but
probably not ideal if you do different types of requests from different
parts of your code). A workaround (which I've used in the past) is to have
a custom client that overrides and sets the credentials in the "request"
method (you can put whatever logic there to identify which credentials to
use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
and https://issues.apache.org/jira/browse/SOLR-15155 to try to address this
issue in future releases.

On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das  wrote:

>
> Hi There,
>
> Is there any way to programmatically set basic authentication credential
> on CloudSolrClient?
>
> The only documentation available is to use system property. This is not
> useful if two collection required two separate set of credentials and they
> are parallelly accessed.
> Thanks in advance.
>



Re: NPE in QueryComponent.mergeIds when using timeAllowed and sorting SOLR 8.7

2021-03-03 Thread Tomás Fernández Löbbe
Patch looks good to me. Since it's a bugfix it can be committed to 8_8
branch and released on the next bugfix release, though I don't think it
should trigger one. In the meantime, if you can patch your environment and
confirm that it fixes your problem, that's a good comment to leave in
SOLR-14758. 

On Mon, Mar 1, 2021 at 3:12 PM Phill Campbell 
wrote:

> Anyone?
>
> > On Feb 24, 2021, at 7:47 AM, Phill Campbell
>  wrote:
> >
> > Last week I switched to Solr 8.7 from a “special” build of Solr 6.6
> >
> > The system has a timeout set for querying. I am now seeing this bug.
> >
> > https://issues.apache.org/jira/browse/SOLR-14758 <
> https://issues.apache.org/jira/browse/SOLR-14758>
> >
> > Max Query Time goes from 1.6 seconds to 20 seconds and affects the
> entire system for about 2 minutes as reported in New Relic.
> >
> > null:java.lang.NullPointerException
> >   at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:935)
> >   at
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
> >   at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
> >   at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:486)
> >   at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
> >   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> >
> >
> > Can this be fixed in a patch for Solr 8.8? I do not want to have to go
> back to Solr 6 and reindex the system, that takes 2 days using 180 EMR
> instances.
> >
> > Pease advise. Thank you.
>
>


Re: Programmatic Basic Auth on CloudSolrClient

2021-03-03 Thread Tomás Fernández Löbbe
As far as I know the current OOTB options are system properties or
per-request (which would allow you to use different per collection, but
probably not ideal if you do different types of requests from different
parts of your code). A workaround (which I've used in the past) is to have
a custom client that overrides and sets the credentials in the "request"
method (you can put whatever logic there to identify which credentials to
use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
and https://issues.apache.org/jira/browse/SOLR-15155 to try to address this
issue in future releases.

On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das  wrote:

>
> Hi There,
>
> Is there any way to programmatically set basic authentication credential
> on CloudSolrClient?
>
> The only documentation available is to use system property. This is not
> useful if two collection required two separate set of credentials and they
> are parallelly accessed.
> Thanks in advance.
>


RE: Filter by sibling ?

2021-03-02 Thread Manoj Mokashi
I tried passing a parent parser connected to the {!child} parser using query 
params, and it seems to work !

q=type:C1 AND AND {!child of='type:PR' v=$statusqry}
statusqry={!parent which='type:PR' }type:C2

Note that my real query is not exactly this, so I haven't tried the exact 
expression above

-Original Message-
From: Manoj Mokashi 
Sent: Wednesday, March 3, 2021 9:56 AM
To: solr-user@lucene.apache.org
Subject: RE: Filter by sibling ?

Ok. Will check. thanks !

-Original Message-
From: Joel Bernstein 
Sent: Tuesday, March 2, 2021 8:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Filter by sibling ?

Solr's graph expressions can do this type of thing. It allows you to walk the 
relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child
> 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}] }
>
> Can I fetch type:C1 documents which are children of parent docs that
> have child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of
> the intended recipient and may contain confidential and privileged 
> information.
> Any unauthorized view, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message. Anju Software, 
> Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>
Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


RE: Filter by sibling ?

2021-03-02 Thread Manoj Mokashi
Ok. Will check. thanks !

-Original Message-
From: Joel Bernstein 
Sent: Tuesday, March 2, 2021 8:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Filter by sibling ?

Solr's graph expressions can do this type of thing. It allows you to walk the 
relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child
> 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}] }
>
> Can I fetch type:C1 documents which are children of parent docs that
> have child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of
> the intended recipient and may contain confidential and privileged 
> information.
> Any unauthorized view, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message. Anju Software, 
> Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>
Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Re: Caffeine Cache Metrics Broken?

2021-03-02 Thread Shawn Heisey

On 3/2/2021 3:47 PM, Stephen Lewis Bianamara wrote:

I'm investigating a weird behavior I've observed in the admin page for
caffeine cache metrics. It looks to me like on the older caches, warm-up
queries were not counted toward hit/miss ratios, which of course makes
sense, but on Caffeine cache it looks like they are. I'm using solr 8.3.

Obviously this makes measuring its true impact a little tough. Is this by
any chance a known issue and already fixed in later versions?


The earlier cache implementations are entirely native to Solr -- all the 
source code is include in the Solr codebase.


Caffeine is a third-party cache implementation that has been integrated 
into Solr.  Some of the metrics might come directly from Caffeine, not 
Solr code.


I would expect warming queries to be counted on any of the cache 
implementations.  One of the reasons that the warming capability exists 
is to pre-populate the caches before actual queries begin.  If warming 
queries are somehow excluded, then the cache metrics would not be correct.


I looked into the code and did not find anything that would keep warming 
queries from affecting stats.  But it is always possible that I just 
didn't know what to look for.


In the master branch (Solr 9.0), CaffeineCache is currently the only 
implementation available.


Thanks,
Shawn


RE: Idle timeout expired and Early Client Disconnect errors

2021-03-02 Thread ufuk yılmaz
I divided the query to 1000 pieces and removed the parallel stream clause, it 
seems to be working without timeout so far, if it does I just can divide it to 
even smaller pieces I guess.

I tried to send all 1000 pieces in a “list” expression to be executed linearly, 
it didn’t work but I was just curious if it could handle such a large query 

Now I’m just generating expression strings from java code and sending them one 
by one. I tried to use SolrJ for this, but encountered a weird problem where 
even the simplest expression (echo) stops working after a few iterations in a 
loop. I’m guessing the underlying HttpClient is not closing connections timely, 
hitting the OS per-host connection limit. I asked a separate question about 
this. I was following the example on lucidworks: 
https://lucidworks.com/post/streaming-expressions-in-solrj/

I just modified my code to use regular REST calls using okhttp3, it’s a shame 
that I couldn’t use SolrJ since it truly streams every result 1 by 1 
continuously. REST just returns a single large response at the very end of the 
stream.

Thanks again for your help.

Sent from Mail for Windows 10

From: Joel Bernstein
Sent: 02 March 2021 00:19
To: solr-user@lucene.apache.org
Subject: Re: Idle timeout expired and Early Client Disconnect errors

Also the parallel function builds hash partitioning filters that could lead
to timeouts if they take too long to build. Try the query without the
parallel function if you're still getting timeouts when making the query
smaller.



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 4:03 PM Joel Bernstein  wrote:

> The settings in your version are 30 seconds and 15 seconds for socket and
> connection timeouts.
>
> Typically timeouts occur because one or more shards in the query are idle
> beyond the timeout threshold. This happens because lot's of data is being
> read from other shards.
>
> Breaking the query into small parts would be a good strategy.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz 
> wrote:
>
>> Hello Mr. Bernstein,
>>
>> I’m using version 8.4. So, if I understand correctly, I can’t increase
>> timeouts and they are bound to happen in such a large stream. Should I just
>> reduce the output of my search expressions?
>>
>> Maybe I can split my search results into ~100 parts and run the same
>> query 100 times in series. Each part would emit ~3M documents so they
>> should finish before timeout?
>>
>> Is this a reasonable solution?
>>
>> Btw how long is the default hard-coded timeout value? Because yesterday I
>> ran another query which took more than 1 hour without any timeouts and
>> finished successfully.
>>
>> Sent from Mail for Windows 10
>>
>> From: Joel Bernstein
>> Sent: 01 March 2021 23:03
>> To: solr-user@lucene.apache.org
>> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>>
>> Oh wait, I misread your email. The idle timeout issue is configurable in:
>>
>> https://issues.apache.org/jira/browse/SOLR-14672
>>
>> This unfortunately missed the 8.8 release and will be 8.9.
>>
>>
>>
>> This i
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:
>>
>> > What version are you using?
>> >
>> > Solr 8.7 has changes that caused these errors to hit the logs. These
>> used
>> > to be suppressed. This has been fixed in Solr 9.0 but it has not been
>> back
>> > ported to Solr 8.x.
>> >
>> > The errors are actually normal operational occurrences when doing joins
>> so
>> > should be suppressed in the logs and were before the specific release.
>> >
>> > It might make sense to do a release that specifically suppresses these
>> > errors without backporting the full Solr 9.0 changes which impact the
>> > memory footprint of export.
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> >
>> > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz > >
>> > wrote:
>> >
>> >> Hello all,
>> >>
>> >> I’m running a large streaming expression and feeding the result to
>> update
>> >> expression.
>> >>
>> >>  update(targetCollection, ...long running stream here...,
>> >>
>> >> I tried sending the exact same query multiple times, it sometimes works
>> >> and indexes some results, then gives exception, other tim

RE: Default conjunction behaving differently after field type change

2021-03-02 Thread ufuk yılmaz
I changed the tokenizer class from KeywordTokenizerFactory to 
WhitespaceTokenizerFactory for the query analyzer using the Schema API, it 
seems to have solved the problem.

Sent from Mail for Windows 10

From: ufuk yılmaz
Sent: 02 March 2021 20:47
To: solr-user@lucene.apache.org
Subject: Default conjunction behaving differently after field type change

Hello all,

>From the Solr 8.4 (my version) documentation:

“The OR operator is the default conjunction operator. This means that if there 
is no Boolean operator between two terms, the OR operator is used. To search 
for documents that contain either "jakarta apache" or just "jakarta," use the 
query:

"jakarta apache" jakarta

or

"jakarta apache" OR jakarta”


I had a field type=”string” in my old schema:





I could use queries like:
username: (user1 user2 user3)

So it would find the documents of all 3 users (conjunction is OR)
-
Recently I changed the field definition in a new schema:


  
  
  
  




When I search with the same query:

username: (user1 user2 user3)

I get no results unless I change it to either:
username: (user1 OR user2 OR user3) //
username: (“user1” “user2” “user3”)


First I was thinking the default conjunction operator changed to AND, but it 
seems now standart query parser thinks user1 user2 user3 is a single string 
containin spaces I guess?

I couldn’t find how the default “string” field queries are analyzed, what is 
the difference that may cause this behavior?

--ufuk yilmaz



Sent from Mail for Windows 10




RE: Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
It worked! Thanks Mr. Rafalovitch. I just removed “type”: “query”.. keys from 
the json, and used indexAnalyzer and queryAnalyzer in place of analyzer json 
node.

Sent from Mail for Windows 10

From: Alexandre Rafalovitch
Sent: 03 March 2021 01:19
To: solr-user
Subject: Re: Schema API specifying different analysers for query and index

RefGuide gives this for Adding, I would hope the Replace would be similar:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type":{
 "name":"myNewTextField",
 "class":"solr.TextField",
 "indexAnalyzer":{
"tokenizer":{
   "class":"solr.PathHierarchyTokenizerFactory",
   "delimiter":"/" }},
 "queryAnalyzer":{
"tokenizer":{
   "class":"solr.KeywordTokenizerFactory" }}}
}' http://localhost:8983/solr/gettingstarted/schema

So, indexAnalyzer/queryAnalyzer, rather than array:
https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type

Hope this works,
Alex.
P.s. Also check whether you are using matching API and V1/V2 end point.

On Tue, 2 Mar 2021 at 15:25, ufuk yılmaz  wrote:
>
> Hello,
>
> I’m trying to change a field’s query analysers. The following works but it 
> replaces both index and query type analysers:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzer": {
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }
> }
> }
>
> I tried to change analyzer field to analyzers, to specify different analysers 
> for query and index, but it gave error:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzers": [{
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> },{
> "type": "index",
> "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }]
> }
> }
>
> "errorMessages":["Plugin init failure for [schema.xml]
> "msg":"error processing commands",...
>
> How can I specify different analyzers for query and index type when using 
> schema api?
>
> Sent from Mail for Windows 10
>



Re: Schema API specifying different analysers for query and index

2021-03-02 Thread Alexandre Rafalovitch
RefGuide gives this for Adding, I would hope the Replace would be similar:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type":{
 "name":"myNewTextField",
 "class":"solr.TextField",
 "indexAnalyzer":{
"tokenizer":{
   "class":"solr.PathHierarchyTokenizerFactory",
   "delimiter":"/" }},
 "queryAnalyzer":{
"tokenizer":{
   "class":"solr.KeywordTokenizerFactory" }}}
}' http://localhost:8983/solr/gettingstarted/schema

So, indexAnalyzer/queryAnalyzer, rather than array:
https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type

Hope this works,
Alex.
P.s. Also check whether you are using matching API and V1/V2 end point.

On Tue, 2 Mar 2021 at 15:25, ufuk yılmaz  wrote:
>
> Hello,
>
> I’m trying to change a field’s query analysers. The following works but it 
> replaces both index and query type analysers:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzer": {
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }
> }
> }
>
> I tried to change analyzer field to analyzers, to specify different analysers 
> for query and index, but it gave error:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzers": [{
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> },{
> "type": "index",
> "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }]
> }
> }
>
> "errorMessages":["Plugin init failure for [schema.xml]
> "msg":"error processing commands",...
>
> How can I specify different analyzers for query and index type when using 
> schema api?
>
> Sent from Mail for Windows 10
>


Re: Location of Solr 9 Branch

2021-03-02 Thread Houston Putman
Solr 9 is an unreleased major version, so it lives in *master*. Once the
release process starts for Solr 9, it will live at *branch_9x*, and *master*
will host Solr 10.

On Tue, Mar 2, 2021 at 3:49 PM Phill Campbell 
wrote:

> I have just begun investigating Solr source code. Where is the branch for
> Solr 9?
>
>
>


Re: Filter by sibling ?

2021-03-02 Thread Joel Bernstein
Solr's graph expressions can do this type of thing. It allows you to walk
the relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}]
> }
>
> Can I fetch type:C1 documents which are children of parent docs that have
> child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of the
> intended recipient and may contain confidential and privileged information.
> Any unauthorized view, use, disclosure or distribution is prohibited. If
> you are not the intended recipient, please contact the sender by reply
> email and destroy all copies of the original message. Anju Software, Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>


Re: Partial update bug on solr 8.8.0

2021-03-02 Thread Mike Drob
This looks like a bug that is already fixed but not yet released in 8.9

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-13034

On Tue, Mar 2, 2021 at 6:27 AM Mohsen Saboorian  wrote:

> Any idea about this post?
> https://stackoverflow.com/q/66335803/141438
>
> Regards.
>


Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Martin Graney
Hi Alex

Thanks for the reply.
We are not using the 'copyField bucket' approach as it is inflexible. Our
textual fields are all multivalued dynamic fields, which allows us to craft
a list of `pf` (phrase fields) with associated weighting boosts that are
meant to be used in the search on a *per-collection* basis. This allows us
to have all of the textual fields indexed independently and then simply
change the query when we want to include/exclude a field from the search
without the need to reindex the entire collection. e/dismax makes this more
flexible approach possible.

I'll take a look at the ComplexQueryParser and see if it is a good fit.
We use a lot of the e/dismax params though, such as `bf` (boost functions),
`bq` (boost queries), and 'pf' (phrase fields), to influence the relevance
score.

FYI: We are using Solr 8.3.

On Tue, 2 Mar 2021 at 13:38, Alexandre Rafalovitch 
wrote:

> I admit to not fully understanding the examples, but ComplexQueryParser
> looks like something worth at least reviewing:
>
>
> https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser
>
> Also I did not see any references to trying to copyField and process same
> content in different ways. If copyField is not stored, the overhead is not
> as large.
>
> Regards,
> Alex
>
>
>
> On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, 
> wrote:
>
> > Hi All
> >
> > I have been trying to implement multi word synonyms using `sow=false`
> into
> > a pre-existing system that applied pre-processing to the phrase to apply
> > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
> >
> > I got the synonyms expansion working perfectly, after discovering the
> > `preserveOriginal` filter param, but then I needed to re-implement the
> > existing wildcard behaviour.
> > I tried using the edge-ngram filter, but found that when searching for
> the
> > phrase `bread stick` on a field containing the word `breadstick` and
> > `q.op=AND` it returns no results, as the content `breadstick` does not
> > _start with_ `stick`. The previous wildcard behaviour would return all
> > documents that contain the substrings `bread` AND `stick`, which is the
> > desired behaviour.
> > I tried using the ngram filter, but this does not support the
> > `preserveOriginal`, and so loses a lot of relevance for exact matches,
> but
> > it also results in matches that are far too broad, creating 21 tokens
> from
> > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> > essentially matches all of the documents. Which means that boosts applied
> > to other fields, such as 'in stock', push irrelevant documents to the
> top.
> >
> > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> > syntax and local params, a solr feature that is not very well documented.
> > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> > sow=false v=$plain}` to effectively create a union of results, one with
> > multi word synonyms support and one with wildcard support.
> > But then I had to implement the other edismax params and immediately
> > stumbled.
> > Each query in production normally has a slew of `bf` and `bq` params,
> and I
> > cannot see a way to pass these into the nested query using local
> variables.
> > If I have 3 different `bf` params how can I pass them into the local
> param
> > subqueries?
> >
> > Also, as the search in production is across multiple fields I found
> passing
> > `qf` to both subqueries using dereferencing failed, as the parser saw it
> as
> > a single field and threw a 'number format exception'.
> > i.e.
> > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> > $tw=*bread* *stick*
> > $tp=bread stick
> > $tqf=title^2 desctiption^0.5
> >
> > As you can guess, I have spent quite some time going down this rabbit
> hole
> > in my attempt to reproduce the existing desired functionality alongside
> > multiterm synonyms.
> > Is there a way to get multiterm synonyms working with substring matching
> > effectively?
> > I am sure there is a much simpler way that I am missing than all of my
> > attempts so far.
> >
> > Solr: 8.3
> >
> > Thanks
> > Martin Graney
> >
> > --
> >  <https://www.linkedin.com/company/sooqr-com/>
> >
>


-- 
Martin Graney
Lead Developer

http://sooqr.com <http://www.sooqr.com/>
http://twitter.com/sooqrcom

Office: +31 (0) 88 766 7700
Mobile: +31 (0) 64 660 8543

-- 
 <https://www.linkedin.com/company/sooqr-com/>


Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Alexandre Rafalovitch
I admit to not fully understanding the examples, but ComplexQueryParser
looks like something worth at least reviewing:

https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser

Also I did not see any references to trying to copyField and process same
content in different ways. If copyField is not stored, the overhead is not
as large.

Regards,
Alex



On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, 
wrote:

> Hi All
>
> I have been trying to implement multi word synonyms using `sow=false` into
> a pre-existing system that applied pre-processing to the phrase to apply
> wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
>
> I got the synonyms expansion working perfectly, after discovering the
> `preserveOriginal` filter param, but then I needed to re-implement the
> existing wildcard behaviour.
> I tried using the edge-ngram filter, but found that when searching for the
> phrase `bread stick` on a field containing the word `breadstick` and
> `q.op=AND` it returns no results, as the content `breadstick` does not
> _start with_ `stick`. The previous wildcard behaviour would return all
> documents that contain the substrings `bread` AND `stick`, which is the
> desired behaviour.
> I tried using the ngram filter, but this does not support the
> `preserveOriginal`, and so loses a lot of relevance for exact matches, but
> it also results in matches that are far too broad, creating 21 tokens from
> `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> essentially matches all of the documents. Which means that boosts applied
> to other fields, such as 'in stock', push irrelevant documents to the top.
>
> Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> syntax and local params, a solr feature that is not very well documented.
> I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> sow=false v=$plain}` to effectively create a union of results, one with
> multi word synonyms support and one with wildcard support.
> But then I had to implement the other edismax params and immediately
> stumbled.
> Each query in production normally has a slew of `bf` and `bq` params, and I
> cannot see a way to pass these into the nested query using local variables.
> If I have 3 different `bf` params how can I pass them into the local param
> subqueries?
>
> Also, as the search in production is across multiple fields I found passing
> `qf` to both subqueries using dereferencing failed, as the parser saw it as
> a single field and threw a 'number format exception'.
> i.e.
> q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> $tw=*bread* *stick*
> $tp=bread stick
> $tqf=title^2 desctiption^0.5
>
> As you can guess, I have spent quite some time going down this rabbit hole
> in my attempt to reproduce the existing desired functionality alongside
> multiterm synonyms.
> Is there a way to get multiterm synonyms working with substring matching
> effectively?
> I am sure there is a much simpler way that I am missing than all of my
> attempts so far.
>
> Solr: 8.3
>
> Thanks
> Martin Graney
>
> --
>  <https://www.linkedin.com/company/sooqr-com/>
>


Re: Solr wiki page update

2021-03-02 Thread Jan Høydahl
Vincent,

I added you as editor, please try editing that page again.

Jan

> 11. feb. 2021 kl. 17:43 skrev Vincent Brehin :
> 
> Hi community members,
> I work for Adelean  https://www.adelean.com/ , we are offering services
> around everything Search related, and especially Solr consulting and
> support. We are based in Paris and operate mainly in France.
> Is it possible to list our company on the support page (Support - SOLR -
> Apache Software Foundation
> ) ?
> Or give me the permission to edit it on confluence (my user:
> vincent.brehin) ?
> Thanks !
> Best Regards,
> 
> Vincent



Re: Zookeeper 3.4.5 with Solr 8.8.0

2021-03-01 Thread Shawn Heisey

On 3/1/2021 9:45 PM, Subhajit Das wrote:

That is not possible at this time.

Will it be ok, if remote the zookeeper dependencies (jars) from solr and 
replace it with 3.5.5 jars?
Thanks in advance.


Maybe.  But I cannot say for sure.

I know that when we upgraded to ZK 3.5, some fairly significant code 
changes in Solr were required.  I did not see whether more changes were 
needed when we upgraded again.


It would not surprise me to learn that a jar swap won't work.  Upgrades 
are far more likely to work than downgrades.


Thanks,
Shawn


RE: Zookeeper 3.4.5 with Solr 8.8.0

2021-03-01 Thread Subhajit Das
Hi Shawn,

That is not possible at this time.

Will it be ok, if remote the zookeeper dependencies (jars) from solr and 
replace it with 3.5.5 jars?
Thanks in advance.


From: Shawn Heisey 
Sent: Monday, March 1, 2021 11:17:24 PM
To: solr-user@lucene.apache.org ; 
u...@zookeeper.apache.org 
Subject: Re: Zookeeper 3.4.5 with Solr 8.8.0

On 3/1/2021 6:51 AM, Subhajit Das wrote:
> I noticed, that Solr 8.8.0 uses Zookeeper 3.6.2 client, while Solr 6.3.0 uses 
> Zookeeper 3.4.6 client. Is this a client bug or mismatch issue?
> If so, how to fix this?

The ZK project guarantees that each minor version (X.Y.Z, where Y is the
same) will work with the previous minor version or the next minor version.

3.4 and 3.6 are two minor versions apart, and thus compatibility cannot
be guaranteed.

See the "backward compatibility" matrix here:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

I think you'll need to upgrade your ZK server ensemble to fix it.

Thanks,
Shawn


Re: NPE in QueryComponent.mergeIds when using timeAllowed and sorting SOLR 8.7

2021-03-01 Thread Phill Campbell
Anyone?

> On Feb 24, 2021, at 7:47 AM, Phill Campbell  
> wrote:
> 
> Last week I switched to Solr 8.7 from a “special” build of Solr 6.6
> 
> The system has a timeout set for querying. I am now seeing this bug.
> 
> https://issues.apache.org/jira/browse/SOLR-14758 
> 
> 
> Max Query Time goes from 1.6 seconds to 20 seconds and affects the entire 
> system for about 2 minutes as reported in New Relic.
> 
> null:java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:935)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:486)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> 
> 
> Can this be fixed in a patch for Solr 8.8? I do not want to have to go back 
> to Solr 6 and reindex the system, that takes 2 days using 180 EMR instances.
> 
> Pease advise. Thank you.



Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
Also the parallel function builds hash partitioning filters that could lead
to timeouts if they take too long to build. Try the query without the
parallel function if you're still getting timeouts when making the query
smaller.



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 4:03 PM Joel Bernstein  wrote:

> The settings in your version are 30 seconds and 15 seconds for socket and
> connection timeouts.
>
> Typically timeouts occur because one or more shards in the query are idle
> beyond the timeout threshold. This happens because lot's of data is being
> read from other shards.
>
> Breaking the query into small parts would be a good strategy.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz 
> wrote:
>
>> Hello Mr. Bernstein,
>>
>> I’m using version 8.4. So, if I understand correctly, I can’t increase
>> timeouts and they are bound to happen in such a large stream. Should I just
>> reduce the output of my search expressions?
>>
>> Maybe I can split my search results into ~100 parts and run the same
>> query 100 times in series. Each part would emit ~3M documents so they
>> should finish before timeout?
>>
>> Is this a reasonable solution?
>>
>> Btw how long is the default hard-coded timeout value? Because yesterday I
>> ran another query which took more than 1 hour without any timeouts and
>> finished successfully.
>>
>> Sent from Mail for Windows 10
>>
>> From: Joel Bernstein
>> Sent: 01 March 2021 23:03
>> To: solr-user@lucene.apache.org
>> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>>
>> Oh wait, I misread your email. The idle timeout issue is configurable in:
>>
>> https://issues.apache.org/jira/browse/SOLR-14672
>>
>> This unfortunately missed the 8.8 release and will be 8.9.
>>
>>
>>
>> This i
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:
>>
>> > What version are you using?
>> >
>> > Solr 8.7 has changes that caused these errors to hit the logs. These
>> used
>> > to be suppressed. This has been fixed in Solr 9.0 but it has not been
>> back
>> > ported to Solr 8.x.
>> >
>> > The errors are actually normal operational occurrences when doing joins
>> so
>> > should be suppressed in the logs and were before the specific release.
>> >
>> > It might make sense to do a release that specifically suppresses these
>> > errors without backporting the full Solr 9.0 changes which impact the
>> > memory footprint of export.
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> >
>> > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz > >
>> > wrote:
>> >
>> >> Hello all,
>> >>
>> >> I’m running a large streaming expression and feeding the result to
>> update
>> >> expression.
>> >>
>> >>  update(targetCollection, ...long running stream here...,
>> >>
>> >> I tried sending the exact same query multiple times, it sometimes works
>> >> and indexes some results, then gives exception, other times fails with
>> an
>> >> exception after 2 minutes.
>> >>
>> >> Response is like:
>> >> "EXCEPTION":"java.util.concurrent.ExecutionException:
>> >> java.io.IOException: params distrib=false=4 and my long
>> >> stream expression
>> >>
>> >> Server log (short):
>> >> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
>> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concurrent.TimeoutException: Idle timeout expired:
>> 12/12
>> >> ms
>> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concurrent.TimeoutException: Idle timeout expired:
>> 12/12
>> >> ms
>> >>
>> >> I tried to increase the jetty idle timeout value on the node which
>> hosts
>> >> my target collection to something like an hour. It didn’t affect.
>> >>
>> >>
>> >> Server logs (long)
>> >> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
>> >> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concur

Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
The settings in your version are 30 seconds and 15 seconds for socket and
connection timeouts.

Typically timeouts occur because one or more shards in the query are idle
beyond the timeout threshold. This happens because lot's of data is being
read from other shards.

Breaking the query into small parts would be a good strategy.




Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz 
wrote:

> Hello Mr. Bernstein,
>
> I’m using version 8.4. So, if I understand correctly, I can’t increase
> timeouts and they are bound to happen in such a large stream. Should I just
> reduce the output of my search expressions?
>
> Maybe I can split my search results into ~100 parts and run the same query
> 100 times in series. Each part would emit ~3M documents so they should
> finish before timeout?
>
> Is this a reasonable solution?
>
> Btw how long is the default hard-coded timeout value? Because yesterday I
> ran another query which took more than 1 hour without any timeouts and
> finished successfully.
>
> Sent from Mail for Windows 10
>
> From: Joel Bernstein
> Sent: 01 March 2021 23:03
> To: solr-user@lucene.apache.org
> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>
> Oh wait, I misread your email. The idle timeout issue is configurable in:
>
> https://issues.apache.org/jira/browse/SOLR-14672
>
> This unfortunately missed the 8.8 release and will be 8.9.
>
>
>
> This i
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:
>
> > What version are you using?
> >
> > Solr 8.7 has changes that caused these errors to hit the logs. These used
> > to be suppressed. This has been fixed in Solr 9.0 but it has not been
> back
> > ported to Solr 8.x.
> >
> > The errors are actually normal operational occurrences when doing joins
> so
> > should be suppressed in the logs and were before the specific release.
> >
> > It might make sense to do a release that specifically suppresses these
> > errors without backporting the full Solr 9.0 changes which impact the
> > memory footprint of export.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz  >
> > wrote:
> >
> >> Hello all,
> >>
> >> I’m running a large streaming expression and feeding the result to
> update
> >> expression.
> >>
> >>  update(targetCollection, ...long running stream here...,
> >>
> >> I tried sending the exact same query multiple times, it sometimes works
> >> and indexes some results, then gives exception, other times fails with
> an
> >> exception after 2 minutes.
> >>
> >> Response is like:
> >> "EXCEPTION":"java.util.concurrent.ExecutionException:
> >> java.io.IOException: params distrib=false=4 and my long
> >> stream expression
> >>
> >> Server log (short):
> >> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
> >> java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12
> >> ms
> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
> >> java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12
> >> ms
> >>
> >> I tried to increase the jetty idle timeout value on the node which hosts
> >> my target collection to something like an hour. It didn’t affect.
> >>
> >>
> >> Server logs (long)
> >> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
> >> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
> >> java.util.concurrent.TimeoutException: Idle timeout expired: 1
> >> 2/12 ms
> >> solr-01|at
> >>
> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
> >> solr-01|at
> >> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
> >> solr-01|at
> >> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
> >> solr-01|at
> >>
> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
> >> solr-01|at
> >> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
> >> solr-01|at
> >> java.base/sun.nio.cs.StreamEncoder.implWrite(Stream

RE: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread ufuk yılmaz
Hello Mr. Bernstein,

I’m using version 8.4. So, if I understand correctly, I can’t increase timeouts 
and they are bound to happen in such a large stream. Should I just reduce the 
output of my search expressions?

Maybe I can split my search results into ~100 parts and run the same query 100 
times in series. Each part would emit ~3M documents so they should finish 
before timeout?

Is this a reasonable solution?

Btw how long is the default hard-coded timeout value? Because yesterday I ran 
another query which took more than 1 hour without any timeouts and finished 
successfully.

Sent from Mail for Windows 10

From: Joel Bernstein
Sent: 01 March 2021 23:03
To: solr-user@lucene.apache.org
Subject: Re: Idle timeout expired and Early Client Disconnect errors

Oh wait, I misread your email. The idle timeout issue is configurable in:

https://issues.apache.org/jira/browse/SOLR-14672

This unfortunately missed the 8.8 release and will be 8.9.



This i



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:

> What version are you using?
>
> Solr 8.7 has changes that caused these errors to hit the logs. These used
> to be suppressed. This has been fixed in Solr 9.0 but it has not been back
> ported to Solr 8.x.
>
> The errors are actually normal operational occurrences when doing joins so
> should be suppressed in the logs and were before the specific release.
>
> It might make sense to do a release that specifically suppresses these
> errors without backporting the full Solr 9.0 changes which impact the
> memory footprint of export.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz 
> wrote:
>
>> Hello all,
>>
>> I’m running a large streaming expression and feeding the result to update
>> expression.
>>
>>  update(targetCollection, ...long running stream here...,
>>
>> I tried sending the exact same query multiple times, it sometimes works
>> and indexes some results, then gives exception, other times fails with an
>> exception after 2 minutes.
>>
>> Response is like:
>> "EXCEPTION":"java.util.concurrent.ExecutionException:
>> java.io.IOException: params distrib=false=4 and my long
>> stream expression
>>
>> Server log (short):
>> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
>> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
>> ms
>> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
>> ms
>>
>> I tried to increase the jetty idle timeout value on the node which hosts
>> my target collection to something like an hour. It didn’t affect.
>>
>>
>> Server logs (long)
>> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
>> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 1
>> 2/12 ms
>> solr-01|at
>> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
>> solr-01|at
>> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
>> solr-01|at
>> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
>> solr-01|at
>> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> solr-01|at java.base/java.io
>> .OutputStreamWriter.write(OutputStreamWriter.java:211)
>> solr-01|at
>> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140)
>> solr-01|at
>> org.apache.solr.common.util.FastWriter.write(FastWriter.java:54)
>> solr-01|at
>> org.apache.solr.response.JSONWriter._writeChar(JSONWriter.java:173)
>> solr-01|at
>> org.apache.solr.common.util.JsonTextWriter.writeStr(JsonTextWriter.java:86)
>> solr-01|at
>> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:52)
>> solr-01|at
>> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
>> solr-01|at
&g

Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
Oh wait, I misread your email. The idle timeout issue is configurable in:

https://issues.apache.org/jira/browse/SOLR-14672

This unfortunately missed the 8.8 release and will be 8.9.



This i



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:

> What version are you using?
>
> Solr 8.7 has changes that caused these errors to hit the logs. These used
> to be suppressed. This has been fixed in Solr 9.0 but it has not been back
> ported to Solr 8.x.
>
> The errors are actually normal operational occurrences when doing joins so
> should be suppressed in the logs and were before the specific release.
>
> It might make sense to do a release that specifically suppresses these
> errors without backporting the full Solr 9.0 changes which impact the
> memory footprint of export.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz 
> wrote:
>
>> Hello all,
>>
>> I’m running a large streaming expression and feeding the result to update
>> expression.
>>
>>  update(targetCollection, ...long running stream here...,
>>
>> I tried sending the exact same query multiple times, it sometimes works
>> and indexes some results, then gives exception, other times fails with an
>> exception after 2 minutes.
>>
>> Response is like:
>> "EXCEPTION":"java.util.concurrent.ExecutionException:
>> java.io.IOException: params distrib=false=4 and my long
>> stream expression
>>
>> Server log (short):
>> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
>> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
>> ms
>> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
>> ms
>>
>> I tried to increase the jetty idle timeout value on the node which hosts
>> my target collection to something like an hour. It didn’t affect.
>>
>>
>> Server logs (long)
>> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
>> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 1
>> 2/12 ms
>> solr-01|at
>> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
>> solr-01|at
>> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
>> solr-01|at
>> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
>> solr-01|at
>> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> solr-01|at java.base/java.io
>> .OutputStreamWriter.write(OutputStreamWriter.java:211)
>> solr-01|at
>> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140)
>> solr-01|at
>> org.apache.solr.common.util.FastWriter.write(FastWriter.java:54)
>> solr-01|at
>> org.apache.solr.response.JSONWriter._writeChar(JSONWriter.java:173)
>> solr-01|at
>> org.apache.solr.common.util.JsonTextWriter.writeStr(JsonTextWriter.java:86)
>> solr-01|at
>> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:52)
>> solr-01|at
>> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
>> solr-01|at
>> org.apache.solr.common.util.JsonTextWriter$2.put(JsonTextWriter.java:176)
>> solr-01|at
>> org.apache.solr.common.MapWriter$EntryWriter.put(MapWriter.java:154)
>> solr-01|at
>> org.apache.solr.handler.export.StringFieldWriter.write(StringFieldWriter.java:77)
>> solr-01|at
>> org.apache.solr.handler.export.ExportWriter.writeDoc(ExportWriter.java:313)
>> solr-01|at
>> org.apache.solr.handler.export.ExportWriter.lambda$addDocsToItemWriter$4(ExportWriter.java:263)
>> --
>> solr-01|at org.eclipse.jetty.io
>> .FillInterest.fillable(FillInterest.java:103)
>> solr-01|at org.eclipse.jetty.io
>> .ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>> solr-01|at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>> solr-01|at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>> solr-01|at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>> solr-01|at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>> solr-01|at
>> 

Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
What version are you using?

Solr 8.7 has changes that caused these errors to hit the logs. These used
to be suppressed. This has been fixed in Solr 9.0 but it has not been back
ported to Solr 8.x.

The errors are actually normal operational occurrences when doing joins so
should be suppressed in the logs and were before the specific release.

It might make sense to do a release that specifically suppresses these
errors without backporting the full Solr 9.0 changes which impact the
memory footprint of export.




Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz 
wrote:

> Hello all,
>
> I’m running a large streaming expression and feeding the result to update
> expression.
>
>  update(targetCollection, ...long running stream here...,
>
> I tried sending the exact same query multiple times, it sometimes works
> and indexes some results, then gives exception, other times fails with an
> exception after 2 minutes.
>
> Response is like:
> "EXCEPTION":"java.util.concurrent.ExecutionException: java.io.IOException:
> params distrib=false=4 and my long stream expression
>
> Server log (short):
> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 12/12 ms
> o.a.s.s.HttpSolrCall null:java.io.IOException:
> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
> ms
>
> I tried to increase the jetty idle timeout value on the node which hosts
> my target collection to something like an hour. It didn’t affect.
>
>
> Server logs (long)
> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
> java.util.concurrent.TimeoutException: Idle timeout expired: 1
> 2/12 ms
> solr-01|at
> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
> solr-01|at
> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
> solr-01|at
> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
> solr-01|at
> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> solr-01|at java.base/java.io
> .OutputStreamWriter.write(OutputStreamWriter.java:211)
> solr-01|at
> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140)
> solr-01|at
> org.apache.solr.common.util.FastWriter.write(FastWriter.java:54)
> solr-01|at
> org.apache.solr.response.JSONWriter._writeChar(JSONWriter.java:173)
> solr-01|at
> org.apache.solr.common.util.JsonTextWriter.writeStr(JsonTextWriter.java:86)
> solr-01|at
> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:52)
> solr-01|at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> solr-01|at
> org.apache.solr.common.util.JsonTextWriter$2.put(JsonTextWriter.java:176)
> solr-01|at
> org.apache.solr.common.MapWriter$EntryWriter.put(MapWriter.java:154)
> solr-01|at
> org.apache.solr.handler.export.StringFieldWriter.write(StringFieldWriter.java:77)
> solr-01|at
> org.apache.solr.handler.export.ExportWriter.writeDoc(ExportWriter.java:313)
> solr-01|at
> org.apache.solr.handler.export.ExportWriter.lambda$addDocsToItemWriter$4(ExportWriter.java:263)
> --
> solr-01|at org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:103)
> solr-01|at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> solr-01|at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
> solr-01|at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
> solr-01|at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
> solr-01|at java.base/java.lang.Thread.run(Thread.java:834)
> solr-01| Caused by: java.util.concurrent.TimeoutException: Idle
> timeout expired: 12/12 ms
> solr-01|at 

Re: Zookeeper 3.4.5 with Solr 8.8.0

2021-03-01 Thread Shawn Heisey

On 3/1/2021 6:51 AM, Subhajit Das wrote:

I noticed, that Solr 8.8.0 uses Zookeeper 3.6.2 client, while Solr 6.3.0 uses 
Zookeeper 3.4.6 client. Is this a client bug or mismatch issue?
If so, how to fix this?


The ZK project guarantees that each minor version (X.Y.Z, where Y is the 
same) will work with the previous minor version or the next minor version.


3.4 and 3.6 are two minor versions apart, and thus compatibility cannot 
be guaranteed.


See the "backward compatibility" matrix here:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

I think you'll need to upgrade your ZK server ensemble to fix it.

Thanks,
Shawn


RE: How to read tlog

2021-03-01 Thread Subhajit Das
Thanks for reply.
Will try.

From: Gael Jourdan-Weil<mailto:gael.jourdan-w...@kelkoogroup.com>
Sent: 01 March 2021 05:48 PM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: RE: How to read tlog

Hello,

You can just use "cat" or "tail", even though the tlog is not a text file, its 
content can mostly be read using these commands.
You will have one document per line and should be able to see the fields 
content.

I don't know is there is a Solr command which would give better display though.

Gaël

De : Subhajit Das 
Envoyé : samedi 27 février 2021 16:00
À : solr-user@lucene.apache.org 
Objet : How to read tlog


Hi There,

I faced a issue, on a core, in multicore standalone instance.
Is there any way to read tlog contents as text files. This might help to 
resolve the issue.
Thanks in advance.



RE: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Flowerday, Matthew J
Hi Ere

Please to be of service!

No I have not filed a JIRA ticket. I am new to interacting with the Solr
Community and only beginning to 'find my legs'. I am not too sure what JIRA
is I am afraid!

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com 
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.
   

-Original Message-
From: Ere Maijala  
Sent: 01 March 2021 12:53
To: solr-user@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole lot
of trouble. Did you file a JIRA ticket already?

Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:
> Hi There
>
> I just came across a situation where a unified highlighting search 
> under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times
out.
> I resolved it by a config change – but it can catch you out. Hence 
> this email.
>
> With solr 8.8.0 a new unified highlighting parameter 
>  was implemented which if not set defaults to 0.5. 
> This attempts to improve the high lighting so that highlighted text 
> does not appear right at the left. This works well but if you have a 
> search result with numerous occurrences of the word in question within 
> the record performance goes right down!
>
> 2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
> params={hl.snippets=2=test=on=100=id,d
> escription,specification,score=20=*=10&_=161440511913
> 4}
> hits=57008 status=0 QTime=1414320
>
> 2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.s.HttpSolrCall Unable to write response, client closed 
> connection or we are shutting down => 
> org.eclipse.jetty.io.EofException
>
>at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
>
> org.eclipse.jetty.io.EofException: null
>
>at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
>at
> org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
>at
> org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
> when I set =0.25 results came back much quicker
>
> 2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.25=100=2=test
> axAnalyzedChars=100=*=unified=9&_=
> 1614430061690}
> hits=136939 status=0 QTime=87024
>
> And  =0.1
>
> 2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.1=100=2=test
> xAnalyzedChars=100=*=unified=9&_=1
> 614430061690}
> hits=136939 status=0 QTime=69033
>
> And =0.0
>
> 2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.0=100=2=test
> xAnalyzedChars=100=*=unified=9&_=1
> 614430061690}
> hits=136939 status=0 QTime=2841
>
> I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully 
> left aligned).  I am not too sure as to how many time a word has to 
> occur in a record for performance to go right down – but if too many 
> it can have a BIG impact.
>
> I also noticed that setting =9 did not break out of 
> the query until it finished. Perhaps because the query finished 
> quickly and what took the time was the highlighting. It might be an 
> idea to get  to also cover any highlighting so that the 
> query does not run until the jetty timeout is hit. The machine 100% 
> one core for about
> 20 mins!.
>
> Hope this helps.
>
> Regards
>
> Matthew
>
> *Matthew Flowerday*| Consultant | ULEAF
>
> Unisys | 01908 774830| matthew.flower...@unisys.com 
> <mailto:matthew.flower...@unisys.com>
>
> Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes |
> MK17 8LX
>
> unisys_logo <http://www.

Re: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Ere Maijala

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole 
lot of trouble. Did you file a JIRA ticket already?


Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:

Hi There

I just came across a situation where a unified highlighting search under 
solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out. 
I resolved it by a config change – but it can catch you out. Hence this 
email.


With solr 8.8.0 a new unified highlighting parameter  
was implemented which if not set defaults to 0.5. This attempts to 
improve the high lighting so that highlighted text does not appear right 
at the left. This works well but if you have a search result with 
numerous occurrences of the word in question within the record 
performance goes right down!


2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
params={hl.snippets=2=test=on=100=id,description,specification,score=20=*=10&_=1614405119134} 
hits=57008 status=0 QTime=1414320


2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.s.HttpSolrCall Unable to write response, client closed connection 
or we are shutting down => org.eclipse.jetty.io.EofException


   at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)


org.eclipse.jetty.io.EofException: null

   at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


   at 
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


   at 
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


when I set =0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.25=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=87024


And  =0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.1=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=69033


And =0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.0=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=2841


I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully 
left aligned).  I am not too sure as to how many time a word has to 
occur in a record for performance to go right down – but if too many it 
can have a BIG impact.


I also noticed that setting =9 did not break out of the 
query until it finished. Perhaps because the query finished quickly and 
what took the time was the highlighting. It might be an idea to get 
 to also cover any highlighting so that the query does not 
run until the jetty timeout is hit. The machine 100% one core for about 
20 mins!.


Hope this helps.

Regards

Matthew

*Matthew Flowerday*| Consultant | ULEAF

Unisys | 01908 774830| matthew.flower...@unisys.com 



Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | 
MK17 8LX


unisys_logo 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all devices.


Grey_LI Grey_TW 
Grey_YT 
Grey_FB 
Grey_Vimeo 
Grey_UB 




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


RE: How to read tlog

2021-03-01 Thread Gael Jourdan-Weil
Hello,

You can just use "cat" or "tail", even though the tlog is not a text file, its 
content can mostly be read using these commands.
You will have one document per line and should be able to see the fields 
content.

I don't know is there is a Solr command which would give better display though.

Gaël

De : Subhajit Das 
Envoyé : samedi 27 février 2021 16:00
À : solr-user@lucene.apache.org 
Objet : How to read tlog 
 

Hi There,

I faced a issue, on a core, in multicore standalone instance.
Is there any way to read tlog contents as text files. This might help to 
resolve the issue.
Thanks in advance.

Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-27 Thread Joel Bernstein
Congratulations Jan!

Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Feb 22, 2021 at 2:41 AM Danilo Tomasoni  wrote:

> Congratulations Jan!
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..=mailto%3acalabro%40cosbi.eu
> >
> http://www.cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..=http%3a%2f%2fwww.cosbi.eu%2f
> >
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
> 
> Da: Yonik Seeley 
> Inviato: domenica 21 febbraio 2021 05:51
> A: solr-user@lucene.apache.org 
> Cc: Lucene Dev 
> Oggetto: Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!
>
> [CAUTION: EXTERNAL SENDER]
> [Please check correspondence between Sender Display Name and Sender Email
> Address before clicking on any link or opening attachments]
>
>
> Congrats Jan! Go Solr!
> -Yonik
>
>
> On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta 
> wrote:
>
> > Hi everyone,
> >
> > I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated
> > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> > President. This decision was approved by the board in its February 2021
> > meeting.
> >
> > Congratulations Jan!
> >
> > --
> > Anshum Gupta
> >
>


Re: [ANNOUNCE] Apache Solr 8.8.1 released

2021-02-27 Thread Timothy Potter
Awesome! Thank you David and Tobias ;-)

On Sat, Feb 27, 2021 at 12:21 PM David Smiley  wrote:
>
> The corresponding docker image has been released as well:
> https://hub.docker.com/_/solr
> (credit to Tobias Kässmann for helping)
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Feb 23, 2021 at 10:39 AM Timothy Potter 
> wrote:
>
> > The Lucene PMC is pleased to announce the release of Apache Solr 8.8.1.
> >
> >
> > Solr is the popular, blazing fast, open source NoSQL search platform from
> > the Apache Lucene project. Its major features include powerful full-text
> > search, hit highlighting, faceted search, dynamic clustering, database
> > integration, rich document handling, and geospatial search. Solr is highly
> > scalable, providing fault tolerant distributed search and indexing, and
> > powers the search and navigation features of many of the world's largest
> > internet sites.
> >
> >
> > Solr 8.8.1 is available for immediate download at:
> >
> >
> >   
> >
> >
> > ### Solr 8.8.1 Release Highlights:
> >
> >
> > Fix for a SolrJ backwards compatibility issue when upgrading the server to
> > 8.8.0 without upgrading SolrJ to 8.8.0.
> >
> >
> > Please refer to the Upgrade Notes in the Solr Ref Guide for information on
> > upgrading from previous Solr versions:
> >
> >
> >   
> >
> >
> > Please read CHANGES.txt for a full list of bugfixes:
> >
> >
> >   
> >
> >
> > Solr 8.8.1 also includes bugfixes in the corresponding Apache Lucene
> > release:
> >
> >
> >   
> >
> >
> >
> > Note: The Apache Software Foundation uses an extensive mirroring network
> > for
> >
> > distributing releases. It is possible that the mirror you are using may not
> > have
> >
> > replicated the release yet. If that is the case, please try another mirror.
> >
> > This also applies to Maven access.
> >
> > 
> >


Re: [ANNOUNCE] Apache Solr 8.8.1 released

2021-02-27 Thread David Smiley
The corresponding docker image has been released as well:
https://hub.docker.com/_/solr
(credit to Tobias Kässmann for helping)

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Feb 23, 2021 at 10:39 AM Timothy Potter 
wrote:

> The Lucene PMC is pleased to announce the release of Apache Solr 8.8.1.
>
>
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the Apache Lucene project. Its major features include powerful full-text
> search, hit highlighting, faceted search, dynamic clustering, database
> integration, rich document handling, and geospatial search. Solr is highly
> scalable, providing fault tolerant distributed search and indexing, and
> powers the search and navigation features of many of the world's largest
> internet sites.
>
>
> Solr 8.8.1 is available for immediate download at:
>
>
>   
>
>
> ### Solr 8.8.1 Release Highlights:
>
>
> Fix for a SolrJ backwards compatibility issue when upgrading the server to
> 8.8.0 without upgrading SolrJ to 8.8.0.
>
>
> Please refer to the Upgrade Notes in the Solr Ref Guide for information on
> upgrading from previous Solr versions:
>
>
>   
>
>
> Please read CHANGES.txt for a full list of bugfixes:
>
>
>   
>
>
> Solr 8.8.1 also includes bugfixes in the corresponding Apache Lucene
> release:
>
>
>   
>
>
>
> Note: The Apache Software Foundation uses an extensive mirroring network
> for
>
> distributing releases. It is possible that the mirror you are using may not
> have
>
> replicated the release yet. If that is the case, please try another mirror.
>
> This also applies to Maven access.
>
> 
>


RE: Select streaming expression, add a field to every tuple, replaceor raw not working

2021-02-26 Thread ufuk yılmaz
I tried to debug this to the best of my ability, and it seems the correct name 
for the “raw” evaluator is “val”.

Copied from StreamContext: val=class 
org.apache.solr.client.solrj.io.eval.RawValueEvaluator

I think there’s a small error in stream evaluator documentation of 8.4

https://lucene.apache.org/solr/guide/8_4/stream-evaluator-reference.html

When I used “val” instead of “raw”, I got the expected response:

select(
search(
myCollection,
q="*:*",
qt="/export",
sort="id_str asc",
fl="id_str"
),
id_str,
val(abc) as text
)

{
  "result-set": {
"docs": [
  {
"id_str": "deneme123",
"text": "abc"
  },
  {
"EOF": true,
"RESPONSE_TIME": 70
  }
]
  }
}

--ufuk yilmaz


Sent from Mail for Windows 10

From: ufuk yılmaz
Sent: 26 February 2021 16:38
To: solr-user@lucene.apache.org
Subject: Select streaming expression, add a field to every tuple, replaceor raw 
not working

Hello all,

Solr version 8.4

I have a very simple select expression here. What I’m trying to do is to add a 
constant value to incoming tuples.

My collection has only 1 document. Id_str is of type String. Other fields are 
Solr generated.

{
"_version_":1692761378187640832,
"id_str":"experiment123",
"id":"18d658b13b6b072f"}]
  }

My streaming expression:

select(
search(
myCollection,
q="*:*",
qt="/export",
sort="id_str asc",
fl="id_str"
),
id_str,
raw(ttt) as text // Docs state that select works with any 
evaluator. “raw” here is a stream evaluator.
)

I also tried:

select(
search(
myCollection,
q="*:*",
qt="/export",
sort="id_str asc",
fl="id_str"
),
id_str,
replace(text, null, withValue=raw(ttt)) as text //replace is 
described in select expression documentation. I also tried withValue=ttt 
directly
)

No matter what I do, response only includes id_str field, without any error:

{
  "result-set":{
"docs":[{
"id_str":" experiment123"}
  ,{
"EOF":true,
"RESPONSE_TIME":45}]}}

I also tried wrapping text value with quotes, that didn’t work too.

What am I doing wrong?

--ufuk yilmaz

Sent from Mail for Windows 10




Re: Add plugins to Solr docker container

2021-02-25 Thread Prabhatika Vij
Hey Anil,

If you want to execute anything before Solr starts, you can do the
following: -

mkdir initdb; echo "echo hi" > initdb/hi.sh
docker run -v $PWD/initdb:/docker-entrypoint-initdb.d solr
Using the above, you can have any script executed before Solr starts.

Source:
https://github.com/docker-solr/docker-solr/blob/master/8.8/scripts/run-initdb

Hope this helps. Please feel free to ask any further questions.

I am replying to this mailing list for the first time. If I am not
following any convention, please let me know.

Thank you,
Prabhatika

On Fri, Feb 26, 2021 at 11:32 AM anilkumar panditi <
anilkumar.pand...@gmail.com> wrote:

> Hi,
> I am first time user of the apache Solr, and i have brought up the Solr as
> docker container , and i am unable to install/enable someplugins
> (authentication,authorization etc)..
> Could you please help. how to add these plug ins to solr which is running
> as docker container.
>
> Thanks
> Anil
>


RE: Handling Locales in Solr

2021-02-25 Thread Krönert Florian
Hi Markus,

thank you a lot for your response, that helps us very much.

We will try out the approach of separating the cores by topic only.

Kind Regards,
Florian 

-Original Message-
From: Markus Jelsma  
Sent: Mittwoch, 24. Februar 2021 12:27
To: solr-user@lucene.apache.org
Subject: Re: Handling Locales in Solr

Hello,

We put all our customers in the same core/collection because of this, it is not 
practical to manage hundreds of cores, including their small overhead.
Although it can be advantageous when it comes to relevance tuning, no skewed 
statistics because of other customers.

In your case, an unused core is probably slow because it is not in cached 
memory anymore, and/or it has to load from a slow drive.

With regards to the locales, i would probably separate the cores by topic only, 
and have different languages share the same collection/core.

Regards,
Markus



Op wo 24 feb. 2021 om 12:09 schreef Krönert Florian <
florian.kroen...@orbis.de>:

> Hi everyone,
>
>
>
> First up thanks for this group, I appreciate it very much for 
> exchanging opinions on how to use Solr.
>
>
>
> We built a Solr instance for one of our customers which is used for 
> searching data on his website.
>
> We need to search different data (kb articles, products and external
> links) in different locales.
>
>
>
> For our logic it seemed best to separate solr Cores by topic and 
> locale, so we have cores like this:
>
> kbarticle_de-de
>
> kbarticle_en-us
>
> …
>
> products_de-de
>
> products_en-us
>
> …
>
> links_de-de
>
> links_en-us
>
>
>
> First we had only 3 locales, but it grew pretty fast to 16 locales, so 
> that we’re having 48 solr cores by now already.
>
> There would have been different approaches for realizing this of 
> course, so we’re wondering whether we are using Solr not in the optimal way?
>
>
>
> We found out that when a search on a locale that was not used for some 
> time is started, it takes >10 seconds in many cases to execute the search.
>
>
>
> We then find logs like this, where it seems as if Solr needs to start 
> a searcher first, which takes time:
>
> 2021-02-20 04:33:42.634 INFO  (Thread-20674) [   ]
> o.a.s.s.SolrIndexSearcher Opening [Searcher@775f8595[kbarticles_en-gb]
> main]
>
> 2021-02-20 04:33:42.643 INFO  (searcherExecutor-26-thread-1) [   ]
> o.a.s.c.QuerySenderListener QuerySenderListener sending requests to 
> Searcher@775f8595[kbarticles_en-gb]
>
> …
>
>
>
> Is that an issue? It would be good to know whether our localization 
> approach causes issues with Solr and whether we should restructure our 
> core design.
>
> Any help would be very much appreciated.
>
>
>
> Kind Regards,
>
>
>
> *Florian Krönert*
> Senior Software Developer
>
> <https://www.orbis.de/de.html>
>
> *ORBIS AG | *Planckstraße 10 | D-88677 Markdorf
>
> Phone: +49 7544 50398 21 | Mobile: +49 162 3065972 | E-Mail:
> florian.kroen...@orbis.de
> www.orbis.de
>
>
> <https://www.orbis.de/de/sap-by-orbis.html>
>
> Registered Seat: Saarbrücken
> Commercial Register Court: Amtsgericht Saarbrücken, HRB 12022 Board of 
> Management: Thomas Gard (Chairman), Michael Jung, Stefan Mailänder, 
> Frank Schmelzer Chairman of the Supervisory Board: Ulrich Holzer
>
> <https://www.facebook.com/ORBIS.AG>
> <https://www.linkedin.com/company/orbis-ag/>
> <https://www.xing.com/companies/orbisag><https://twitter.com/ORBIS_AG>
> 
> <https://www.youtube.com/channel/UC5Fx5UsUy4FkzB0dfCWXwYw/featured>
>
>
>
>
> <https://www.orbis.de/de/aktuelles/news/aktuelle-presse/news-detail/ar
> ticle/inner-circle-award-orbis-zaehlt-erneut-zu-den-weltbesten-partner
> n-fuer-microsoft-business-application.html>
>
>
>
>
> <https://www.orbis.de/de/microsoft-by-orbis/beratungskompetenz/power-p
> latform.html?wmc=Banner-Microsoft-Power-Platform>
>
>
> <https://www.orbis.de/de/microsoft-by-orbis.html?wmc=Banner-Microsoft-
> by-ORBIS>
>


Re: Solr Cloud Autoscaling Basics

2021-02-24 Thread yasoobhaider
Any pointers here would be appreciated :)



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Dynamic starting or stoping of zookeepers in a cluster

2021-02-24 Thread Shawn Heisey

On 2/24/2021 9:04 AM, DAVID MARTIN NIETO wrote:

If I'm not mistaken the number of zookeepers must be odd. Having 3 zoos on 3 
different machines, if we temporarily lost one of the three machines, we would 
have only two running and it would be an even number.Would it be advisable in 
this case to raise a third party in one of the 2 active machines or with only 
two zookeepers there would be no blockages in their internal votes?


It does not HAVE to be an odd number.  But increasing the total by one 
doesn't add any additional fault tolerance, and exposes an additional 
point of failure.


If you have 3 servers, 2 of them have to be running to maintain quorum. 
 If you have 4 servers, 3 of them have to be running for the cluster to 
be fully operational.


So a 3-server cluster and a 4-server cluster can survive the failure of 
one machine.  This holds true for larger numbers as well -- with 5 
servers or with 6 servers, you can lose two and stay fully operational. 
 Having that extra server that makes the total even is just wasteful.


Thanks,
Shawn


RE: Dynamic starting or stoping of zookeepers in a cluster

2021-02-24 Thread DAVID MARTIN NIETO
One doubt about it:

In order to have a highly available zookeeper, you must have at least
three separate physical servers for ZK.  Running multiple zookeepers on
one physical machine gains you nothing ... because if the whole machine
fails, you lose all of those zookeepers.  If you have three physical
servers, one can fail with no problems.  If you have five separate
physical servers running ZK, then two of the machines can fail without
taking the cluster down.

If I'm not mistaken the number of zookeepers must be odd. Having 3 zoos on 3 
different machines, if we temporarily lost one of the three machines, we would 
have only two running and it would be an even number.Would it be advisable in 
this case to raise a third party in one of the 2 active machines or with only 
two zookeepers there would be no blockages in their internal votes?

About the dynamic reconfiguration many thanks we've 8.2 but the zoos are in 
3.4.2 version, we're going to test with 3.5 version and the dynamic 
configuration of zookeepers to avoid this problem.

Many thanks.
Kind regards.



De: Joe Lerner 
Enviado: viernes, 19 de febrero de 2021 18:56
Para: solr-user@lucene.apache.org 
Asunto: Re: Dynamic starting or stoping of zookeepers in a cluster

This is solid information. *How about the application, which uses
SOLR/Zookeeper?*

Do we have to follow this guidance, to make the application ZK config aware:

https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html#ch_reconfig_rebalancing
<https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html#ch_reconfig_rebalancing>

Or, could we leave it as is, and as long as the ZK Ensemble has the same
IPs?

Thanks!

Joe




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Mark Robinson
Thanks Marcus for your response.

Best,
Mark

On Wed, Feb 24, 2021 at 4:50 PM Markus Jelsma 
wrote:

> I would stick to the query elevation component, it is pretty fast and
> easier to handle/configure elevation IDs, instead of using function queries
> for it. We have customers that set a dozen of documents for a given query
> and it works just fine.
>
> I also do not expect the function query variant to be more performant, but
> i am not sure. If it were, would it be measurable?
>
> Regards,
> Markus
>
> Op wo 24 feb. 2021 om 12:15 schreef Mark Robinson  >:
>
> > Thanks for the reply Markus!
> >
> > I did try it.
> > My question specifically was (repasting here):-
> >
> > Which is more recommended/ performant?
> >
> > Note:- Assume that I have hundreds of ids to boost like this.
> > Is there a difference to the answer if docs to be boosted after the sort
> is
> > less?
> >
> > Thanks!
> > Mark
> >
> > On Wed, Feb 24, 2021 at 4:41 PM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > You are probably looking for the elevator component, check it out:
> > >
> >
> https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html
> > >
> > > Regards,
> > > Markus
> > >
> > > Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson <
> > mark123lea...@gmail.com
> > > >:
> > >
> > > > Hi,
> > > >
> > > > I wanted to sort and then boost some docs to the top and these docs
> > > should
> > > > be my first set in the results and the following ones appearing
> > according
> > > > to my sort criteria.
> > > >
> > > > I understand that sort overrides bq hence bq may not be used in this
> > case
> > > >
> > > > - I brought my boost into sort using "query()" and achieved my goal.
> > > > - I tried sort and then elevate with forceElevation and that also
> > worked.
> > > >
> > > > My question is which is more recommended/ performant?
> > > >
> > > > Note:- Assume that I have hundreds of ids to boost like this.
> > > > Is there a difference to the answer if docs to be boosted after the
> > sort
> > > is
> > > > less?
> > > >
> > > > Could someone please share your thoughts/experience?
> > > >
> > > > Thanks!
> > > > Mark.
> > > >
> > >
> >
>


Re: Handling Locales in Solr

2021-02-24 Thread Markus Jelsma
Hello,

We put all our customers in the same core/collection because of this, it is
not practical to manage hundreds of cores, including their small overhead.
Although it can be advantageous when it comes to relevance tuning, no
skewed statistics because of other customers.

In your case, an unused core is probably slow because it is not in cached
memory anymore, and/or it has to load from a slow drive.

With regards to the locales, i would probably separate the cores by topic
only, and have different languages share the same collection/core.

Regards,
Markus



Op wo 24 feb. 2021 om 12:09 schreef Krönert Florian <
florian.kroen...@orbis.de>:

> Hi everyone,
>
>
>
> First up thanks for this group, I appreciate it very much for exchanging
> opinions on how to use Solr.
>
>
>
> We built a Solr instance for one of our customers which is used for
> searching data on his website.
>
> We need to search different data (kb articles, products and external
> links) in different locales.
>
>
>
> For our logic it seemed best to separate solr Cores by topic and locale,
> so we have cores like this:
>
> kbarticle_de-de
>
> kbarticle_en-us
>
> …
>
> products_de-de
>
> products_en-us
>
> …
>
> links_de-de
>
> links_en-us
>
>
>
> First we had only 3 locales, but it grew pretty fast to 16 locales, so
> that we’re having 48 solr cores by now already.
>
> There would have been different approaches for realizing this of course,
> so we’re wondering whether we are using Solr not in the optimal way?
>
>
>
> We found out that when a search on a locale that was not used for some
> time is started, it takes >10 seconds in many cases to execute the search.
>
>
>
> We then find logs like this, where it seems as if Solr needs to start a
> searcher first, which takes time:
>
> 2021-02-20 04:33:42.634 INFO  (Thread-20674) [   ]
> o.a.s.s.SolrIndexSearcher Opening [Searcher@775f8595[kbarticles_en-gb]
> main]
>
> 2021-02-20 04:33:42.643 INFO  (searcherExecutor-26-thread-1) [   ]
> o.a.s.c.QuerySenderListener QuerySenderListener sending requests to
> Searcher@775f8595[kbarticles_en-gb]
>
> …
>
>
>
> Is that an issue? It would be good to know whether our localization
> approach causes issues with Solr and whether we should restructure our core
> design.
>
> Any help would be very much appreciated.
>
>
>
> Kind Regards,
>
>
>
> *Florian Krönert*
> Senior Software Developer
>
> 
>
> *ORBIS AG | *Planckstraße 10 | D-88677 Markdorf
>
> Phone: +49 7544 50398 21 | Mobile: +49 162 3065972 | E-Mail:
> florian.kroen...@orbis.de
> www.orbis.de
>
>
> 
>
> Registered Seat: Saarbrücken
> Commercial Register Court: Amtsgericht Saarbrücken, HRB 12022
> Board of Management: Thomas Gard (Chairman), Michael Jung, Stefan
> Mailänder, Frank Schmelzer
> Chairman of the Supervisory Board: Ulrich Holzer
>
> 
> 
> 
> 
>
>
>
>
> 
>
>
>
>
> 
>
>
> 
>


Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Markus Jelsma
I would stick to the query elevation component, it is pretty fast and
easier to handle/configure elevation IDs, instead of using function queries
for it. We have customers that set a dozen of documents for a given query
and it works just fine.

I also do not expect the function query variant to be more performant, but
i am not sure. If it were, would it be measurable?

Regards,
Markus

Op wo 24 feb. 2021 om 12:15 schreef Mark Robinson :

> Thanks for the reply Markus!
>
> I did try it.
> My question specifically was (repasting here):-
>
> Which is more recommended/ performant?
>
> Note:- Assume that I have hundreds of ids to boost like this.
> Is there a difference to the answer if docs to be boosted after the sort is
> less?
>
> Thanks!
> Mark
>
> On Wed, Feb 24, 2021 at 4:41 PM Markus Jelsma 
> wrote:
>
> > Hello,
> >
> > You are probably looking for the elevator component, check it out:
> >
> https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html
> >
> > Regards,
> > Markus
> >
> > Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson <
> mark123lea...@gmail.com
> > >:
> >
> > > Hi,
> > >
> > > I wanted to sort and then boost some docs to the top and these docs
> > should
> > > be my first set in the results and the following ones appearing
> according
> > > to my sort criteria.
> > >
> > > I understand that sort overrides bq hence bq may not be used in this
> case
> > >
> > > - I brought my boost into sort using "query()" and achieved my goal.
> > > - I tried sort and then elevate with forceElevation and that also
> worked.
> > >
> > > My question is which is more recommended/ performant?
> > >
> > > Note:- Assume that I have hundreds of ids to boost like this.
> > > Is there a difference to the answer if docs to be boosted after the
> sort
> > is
> > > less?
> > >
> > > Could someone please share your thoughts/experience?
> > >
> > > Thanks!
> > > Mark.
> > >
> >
>


Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Mark Robinson
Thanks for the reply Markus!

I did try it.
My question specifically was (repasting here):-

Which is more recommended/ performant?

Note:- Assume that I have hundreds of ids to boost like this.
Is there a difference to the answer if docs to be boosted after the sort is
less?

Thanks!
Mark

On Wed, Feb 24, 2021 at 4:41 PM Markus Jelsma 
wrote:

> Hello,
>
> You are probably looking for the elevator component, check it out:
> https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html
>
> Regards,
> Markus
>
> Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson  >:
>
> > Hi,
> >
> > I wanted to sort and then boost some docs to the top and these docs
> should
> > be my first set in the results and the following ones appearing according
> > to my sort criteria.
> >
> > I understand that sort overrides bq hence bq may not be used in this case
> >
> > - I brought my boost into sort using "query()" and achieved my goal.
> > - I tried sort and then elevate with forceElevation and that also worked.
> >
> > My question is which is more recommended/ performant?
> >
> > Note:- Assume that I have hundreds of ids to boost like this.
> > Is there a difference to the answer if docs to be boosted after the sort
> is
> > less?
> >
> > Could someone please share your thoughts/experience?
> >
> > Thanks!
> > Mark.
> >
>


Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Markus Jelsma
Hello,

You are probably looking for the elevator component, check it out:
https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html

Regards,
Markus

Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson :

> Hi,
>
> I wanted to sort and then boost some docs to the top and these docs should
> be my first set in the results and the following ones appearing according
> to my sort criteria.
>
> I understand that sort overrides bq hence bq may not be used in this case
>
> - I brought my boost into sort using "query()" and achieved my goal.
> - I tried sort and then elevate with forceElevation and that also worked.
>
> My question is which is more recommended/ performant?
>
> Note:- Assume that I have hundreds of ids to boost like this.
> Is there a difference to the answer if docs to be boosted after the sort is
> less?
>
> Could someone please share your thoughts/experience?
>
> Thanks!
> Mark.
>


Re: Caffeine Cache and Filter Cache in 8.3

2021-02-23 Thread Stephen Lewis Bianamara
Thanks Shawn! This is great clarity, really appreciate it. I'll proceed to
performance testing of the Caffeine Cache 

Is there a Jira issue needed for tracking these two documentation updates (
here

and here

)?

On Mon, Feb 22, 2021 at 1:16 PM Shawn Heisey  wrote:

> On 2/22/2021 1:50 PM, Stephen Lewis Bianamara wrote:
>
> 
>
> > (a) At what version did the caffeine cache reach production stability?
> > (b) Is the caffeine cache, and really all implementations, able to be
> used
> > on any cache, or are the restrictions about which cache implementations
> may
> > be used for which cache? If the latter, can you provide some guidance?
>
> The caffiene-based cache was introduced in Solr 8.3.  It was considered
> viable for production from the time it was introduced.
>
> https://issues.apache.org/jira/browse/SOLR-8241
>
> Something was found and fixed in 8.5.  I do not know what the impact of
> that issue was:
>
> https://issues.apache.org/jira/browse/SOLR-14239
>
> The other cache implementations were deprecated at some point.  Those
> implementations have been removed from the master branch, but still
> exist in the code for 8.x versions.
>
> If you want to use one of the older implementations like FastLRUCache,
> you still can, and will be able to for all future 8.x versions.  When
> 9.0 is released at some future date, that will no longer be possible.
>
> The Caffeine-based implementation is probably the best option, but I do
> not have any concrete data to give you.
>
> Thanks,
> Shawn
>


Re: R: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-23 Thread dmitri maziuk

On 2021-02-23 1:53 AM, Danilo Tomasoni wrote:

Thank you all for the suggestions,
The OS is not windows, it's centos, a colleague thinks that even on linux 
defragmenting can improve performance about 2X because it keeps the data 
contiguous on disk.


You may want to check the filesystem you're using and read up on XFS vs 
EXT4.


FWIW we've had reasonable success with ZFS on Linux (look on github) 
binary drivers for centos 6 and, a bit less so: 7. With effectively 
RAID-10'ed HDDs and a regular SSD for read & write caching.


Either way, check with `df` first: if you're more than ~75% full, you 
need a bigger disk no matter what else you do.


Dima


Re: Is 8.8.x going be stabilized and finalized?

2021-02-22 Thread S G
Hey Subhajit,

Can you share briefly what issues are being seen with 8.7+ versions?
We are planning to move a big workload from 7.6 to 8.7 version.

We created a small load-testing tool for sanitizing new Solr versions and
that showed throughput of traffic decreasing much more than Solr 7.6 as we
loaded more and more data in both the versions.
So we are a bit concerned if we should make this move or not.
If 8.7 has some grave blockers (fetaures or performance) known already,
then we will probably hold off on making the move.

Regards
SG

On Wed, Feb 17, 2021 at 11:58 AM Subhajit Das 
wrote:

> Hi Shawn,
>
> Nice to know that Solr will be considered top level project of Apache.
>
> I asked based on earlier 3 version patterns. Just hoping that 8.8 would be
> long term stable, kind of like 7.7.x line-up.
>
> Thanks for the clarification.
>
> Regards,
> Subhajit
>
> From: Shawn Heisey<mailto:apa...@elyograg.org>
> Sent: 17 February 2021 09:33 AM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Subject: Re: Is 8.8.x going be stabilized and finalized?
>
> On 2/16/2021 7:57 PM, Subhajit Das wrote:
> > I am planning to use 8.8 line-up for production use.
> >
> > But recently, a lot of people are complaining on 8.7 and 8.8. Also,
> there is a clearly known issue on 8.8 as well.
> >
> > Following trends of earlier versions (5.x, 6.x and 7.x), will 8.8 will
> also be finalized?
> > For 5.x, 5.5.x was last version. For 6.x, 6.6.x was last version. For
> 7.x, 7.7.x was last version. It would match the pattern, it seems.
> > And 9.x is already planned and under development.
> > And it seems, we require some stability.
>
> All released versions are considered stable.  Sometimes problems are
> uncovered after release.  Sometimes BIG problems.  We try our very best
> to avoid bugs, but achieving that kind of perfection is nearly
> impossible for any software project.
>
> 8.8.0 is the most current release.  The 8.8.1 release is underway, but
> there's no way I can give you a concrete date.  The announcement MIGHT
> come in the next few days, but it's always possible it could get pushed
> back.  At this time, the changelog for 8.8.1 has five bugfixes
> mentioned.  It should be more stable than 8.8.0, but it's impossible for
> me to tell you whether you will have any problems with it.
>
> On the dev list, the project is discussing the start of work on the 9.0
> release, but that work has not yet begun.  Even if it started tomorrow,
> it would be several weeks, maybe even a few months, before 9.0 is
> actually released.  On top of the "normal" headaches involved in any new
> major version release, there are some other things going on that might
> further delay 9.0 and future 8.x versions:
>
> * Solr is being promoted from a subproject of Lucene to it's own
> top-level project at Apache.  This involves a LOT of work.  Much of that
> work is administrative in nature, which is going to occupy us and take
> away from time that we might spend working on the code and new releases.
> * The build system for the master branch, which is currently versioned
> as 9.0.0-SNAPSHOT, was recently switched from Ant+Ivy to Gradle.  It's
> going to take some time to figure out all the fallout from that migration.
> * Some of the devs have been involved in an effort to greatly simplify
> and rewrite how SolrCloud does internal management of a cluster.  The
> intent is much better stability and better performance.  You might have
> seen public messages referring to a "reference implementation."  At this
> time, it is unclear how much of that work will make it into 9.0 and how
> much will be revealed in later releases.  We would like very much to
> include at least the first phase in 9.0 if we can.
>
>  From what I have seen over the last several years as one of the
> developers on this project, it is likely that 8.9 and possibly even 8.10
> and 8.11 will be released before we see 9.0.  Releases are NOT made on a
> specific schedule, so I cannot tell you which versions you will see or
> when they might happen.
>
> I am fully aware that despite typing quite a lot of text here, that I
> provided almost nothing in the way of concrete information that you can
> use.  Sorry about that.
>
> Thanks,
> Shawn
>
>


Re: Caffeine Cache and Filter Cache in 8.3

2021-02-22 Thread Shawn Heisey

On 2/22/2021 1:50 PM, Stephen Lewis Bianamara wrote:




(a) At what version did the caffeine cache reach production stability?
(b) Is the caffeine cache, and really all implementations, able to be used
on any cache, or are the restrictions about which cache implementations may
be used for which cache? If the latter, can you provide some guidance?


The caffiene-based cache was introduced in Solr 8.3.  It was considered 
viable for production from the time it was introduced.


https://issues.apache.org/jira/browse/SOLR-8241

Something was found and fixed in 8.5.  I do not know what the impact of 
that issue was:


https://issues.apache.org/jira/browse/SOLR-14239

The other cache implementations were deprecated at some point.  Those 
implementations have been removed from the master branch, but still 
exist in the code for 8.x versions.


If you want to use one of the older implementations like FastLRUCache, 
you still can, and will be able to for all future 8.x versions.  When 
9.0 is released at some future date, that will no longer be possible.


The Caffeine-based implementation is probably the best option, but I do 
not have any concrete data to give you.


Thanks,
Shawn


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Walter Underwood
True, but Windows does cache files. It has been a couple of decades since I ran 
search on Windows, but Ultraseek got large gains from setting some sort of 
system property to make it act like a file server and give file caching equal 
priority with program caching.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 22, 2021, at 9:22 AM, dmitri maziuk  wrote:
> 
> On 2021-02-22 11:18 AM, Shawn Heisey wrote:
> 
>> The OS automatically uses unallocated memory to cache data on the disk.   
>> Because memory is far faster than any disk, even SSD, it performs better.
> 
> Depends on the os, from "defragmenting solrdata folder" I suspect the OP is 
> on windows whose filesystems and memory management does not always work the 
> way the Unix textbook says.
> 
> Dima



Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread dmitri maziuk

On 2021-02-22 11:18 AM, Shawn Heisey wrote:

The OS automatically uses unallocated memory to cache data on the disk. 
  Because memory is far faster than any disk, even SSD, it performs better.


Depends on the os, from "defragmenting solrdata folder" I suspect the OP 
is on windows whose filesystems and memory management does not always 
work the way the Unix textbook says.


Dima


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Shawn Heisey

On 2/22/2021 12:52 AM, Danilo Tomasoni wrote:

we are running a solr instance with around 41 MLN documents on a SATA class 10 
disk with around 10.000 rpm.
We are experiencing very slow query responses (in the order of hours..) with an 
average of 205 segments.
We made a test with a normal pc and an SSD disk, and there the same solr 
instance with the same data and the same number of segments was around 45 times 
faster.
Force optimize was also tried to improve the performances, but it was very 
slow, so we abandoned it.

Since we still don't have enterprise server ssd disks, we are now wondering if 
in the meanwhile defragmenting the solrdata folder can help.
The idea is that due to many updates, each segment file is fragmented across 
different phisical blocks.
Put in another way, each segment file is non-contiguous on disk, and this can 
slow-down the solr response.


The absolute best thing you can do to improve Solr performance is add 
memory.


The OS automatically uses unallocated memory to cache data on the disk. 
 Because memory is far faster than any disk, even SSD, it performs better.


I wrote a wiki page about it:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems

If you have sufficient memory, the speed of your disks will have little 
effect on performance.  It's only in cases where there is not enough 
memory that disk performance will matter.


Thanks,
Shawn



Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Walter Underwood
A forced merge might improve speed 20%. Going from spinning disk to SSD
will improve speed 20X or more. Don’t waste your time even thinking about
forced merges.

You need to get SSDs.

The even bigger speedup is to get enough RAM that the OS can keep the 
Solr index files in file system buffers. Check how much space is used by
your indexes, then make sure that there is that much available RAM that
is not used by the OS or Solr JVM.

Some people make the mistake of giving a huge heap to the JVM, thinking
this will improve caching. This almost always makes things worse, by 
using RAM that could be use for caching files. 8GB of heap is usually enough.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 21, 2021, at 11:52 PM, Danilo Tomasoni  wrote:
> 
> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA class 
> 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..) with 
> an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr 
> instance with the same data and the same number of segments was around 45 
> times faster.
> Force optimize was also tried to improve the performances, but it was very 
> slow, so we abandoned it.
> 
> Since we still don't have enterprise server ssd disks, we are now wondering 
> if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented across 
> different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this can 
> slow-down the solr response.
> 
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
> 
> Thank you.
> Danilo
> 
> Danilo Tomasoni
> 
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
> 
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to



Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread dmitri maziuk

On 2021-02-22 1:52 AM, Danilo Tomasoni wrote:

Hello all,
we are running a solr instance with around 41 MLN documents on a SATA class 10 
disk with around 10.000 rpm.
We are experiencing very slow query responses (in the order of hours..) with an 
average of 205 segments.
We made a test with a normal pc and an SSD disk, and there the same solr 
instance with the same data and the same number of segments was around 45 times 
faster.


What is your actual hardware and OS, as opposed to "normal pc"?

Dima


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Dario Rigolin
Hi Danilo, following my experience now SSD or RAM Disk is the only way to
speed up queries. It depends on your storage occupation of your 41M docs.
If you don't have Enterprise SSD you can add consumer SSD as a fast cache
(linux caching modules "flashcache / bcache" are able to use cheap SSD as a
data cache and have your data safe stored on SATA Disks).

I don't think you can increase performances without changing technology on
the storage system.

Regards.
Dario

Il giorno lun 22 feb 2021 alle ore 08:52 Danilo Tomasoni 
ha scritto:

> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA
> class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..)
> with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr
> instance with the same data and the same number of segments was around 45
> times faster.
> Force optimize was also tried to improve the performances, but it was very
> slow, so we abandoned it.
>
> Since we still don't have enterprise server ssd disks, we are now
> wondering if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented
> across different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this
> can slow-down the solr response.
>
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
>
> Thank you.
> Danilo
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..=mailto%3acalabro%40cosbi.eu
> >
> http://www.cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..=http%3a%2f%2fwww.cosbi.eu%2f
> >
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: HTML sample.html not indexing in Solr 8.8

2021-02-21 Thread Shawn Heisey

On 2/21/2021 3:07 PM, cratervoid wrote:

Thanks Shawn, I copied the solrconfig.xml file from the gettingstarted
example on 7.7.3 installation to the 8.8.0 installation, restarted the
server and it now works. Comparing the two files it looks like as you said
this section was left out of the _default/solrconfig.xml file in version
8.8.0:


 
   true
   ignored_
   _text_
 
   

So those trying out the tutorial will need to add this section to get it to
work for sample.html.



This line from that config also is involved:

  regex=".*\.jar" />


That loads the contrib jars needed for the ExtractingRequestHandler to 
work right.  There are a LOT of jars there.  Tika is a very heavyweight 
piece of software.


Thanks,
Shawn


Re: HTML sample.html not indexing in Solr 8.8

2021-02-21 Thread cratervoid
Thanks Shawn, I copied the solrconfig.xml file from the gettingstarted
example on 7.7.3 installation to the 8.8.0 installation, restarted the
server and it now works. Comparing the two files it looks like as you said
this section was left out of the _default/solrconfig.xml file in version
8.8.0:



  true
  ignored_
  _text_

  

So those trying out the tutorial will need to add this section to get it to
work for sample.html.



On Sat, Feb 20, 2021 at 4:21 PM Shawn Heisey  wrote:

> On 2/20/2021 3:58 PM, cratervoid wrote:
> > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
> >
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
>
> The problem here is that the solrconfig.xml in use by the index named
> "gettingstarted" does not define a handler at /update/extract.
>
> Typically a handler defined at that URL path will utilize the extracting
> request handler class.  This handler uses Tika (another Apache project)
> to extract usable data from rich text formats like PDF, HTML, etc.
>
>
>startup="lazy"
>class="solr.extraction.ExtractingRequestHandler" >
>  
>true
>ignored_
>_text_
>  
>
>
> Note that using this handler will require adding some contrib jars to Solr.
>
> Tika can become very unstable because it deals with undocumented file
> formats, so we do not recommend using that handler in production.  If
> the functionality is important, Tika should be included in a program
> that's separate from Solr, so that if it crashes, it does not take Solr
> down with it.
>
> Thanks,
> Shawn
>


Re: HTML sample.html not indexing in Solr 8.8

2021-02-21 Thread cratervoid
Thanks Alex. I copied the solrconfig.xml over from 7.7.3 to the 8.8.0 conf
folder and restarted the server.  Now indexing works without erroring on
sample.html.  There is 1K difference between the 2 files so I'll diff them
to see what was left out of the 8.8 version.

On Sat, Feb 20, 2021 at 4:27 PM Alexandre Rafalovitch 
wrote:

> Most likely issue is that your core configuration (solrconfig.xml)
> does not have the request handler for that. The same config may have
> had that in 7.x, but changed since.
>
> More details:
> https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html
>
> Regards,
>Alex.
>
> On Sat, 20 Feb 2021 at 17:59, cratervoid  wrote:
> >
> > I am trying out indexing the exampledocs in the examples folder with the
> > SimplePostTool on windows 10 using solr 8.8.  All the documents index
> > except sample.html. For that file I get the errors below.  I then
> > downloaded solr 7.7.3 and indexed the exampledocs folder with no errors,
> > including sample.html.
> > ```
> > PS C:\solr-8.8.0> java -jar -Dc=gettingstarted -Dauto
> > example\exampledocs\post.jar example\exampledocs\sample.html
> > SimplePostTool version 5.0.0
> > Posting files to [base] url
> > http://localhost:8983/solr/gettingstarted/update...
> > Entering auto mode. File endings considered are
> >
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> > POSTing file sample.html (text/html) to [base]/extract
> > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
> >
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
> > SimplePostTool: WARNING: Response: 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404 Not Found
> > 
> > URI:/solr/gettingstarted/update/extract
> > STATUS:404
> > MESSAGE:Not Found
> > SERVLET:default
> > 
> >
> > 
> > 
> > SimplePostTool: WARNING: IOException while reading response:
> > java.io.FileNotFoundException:
> >
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
> > 1 files indexed.
> > COMMITting Solr index changes to
> > http://localhost:8983/solr/gettingstarted/update...
> > Time spent: 0:00:00.086
> > ```
> >
> > However the json and all other file types index with no problem. For
> > example:
> > ```
> > PS C:\solr-8.8.0> java -jar -Dc=gettingstarted -Dauto
> > example\exampledocs\post.jar example\exampledocs\books.json
> > SimplePostTool version 5.0.0
> > Posting files to [base] url
> > http://localhost:8983/solr/gettingstarted/update...
> > Entering auto mode. File endings considered are
> >
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> > POSTing file books.json (application/json) to [base]/json/docs
> > 1 files indexed.
> > COMMITting Solr index changes to
> > http://localhost:8983/solr/gettingstarted/update...
> > ```
> > Just following this tutorial:[
> >
> https://lucene.apache.org/solr/guide/8_8/post-tool.html#post-tool-windows-support][1
> > ]
> >
> >   [1]:
> >
> https://lucene.apache.org/solr/guide/8_8/post-tool.html#post-tool-windows-support
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Yonik Seeley
Congrats Jan! Go Solr!
-Yonik


On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta  wrote:

> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Shalin Shekhar Mangar
Congratulations, Jan and thank you!

Thanks, Anshum for all your work as chair.

On Fri, Feb 19, 2021 at 12:26 AM Anshum Gupta 
wrote:

> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: HTML sample.html not indexing in Solr 8.8

2021-02-20 Thread Alexandre Rafalovitch
Most likely issue is that your core configuration (solrconfig.xml)
does not have the request handler for that. The same config may have
had that in 7.x, but changed since.

More details: 
https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html

Regards,
   Alex.

On Sat, 20 Feb 2021 at 17:59, cratervoid  wrote:
>
> I am trying out indexing the exampledocs in the examples folder with the
> SimplePostTool on windows 10 using solr 8.8.  All the documents index
> except sample.html. For that file I get the errors below.  I then
> downloaded solr 7.7.3 and indexed the exampledocs folder with no errors,
> including sample.html.
> ```
> PS C:\solr-8.8.0> java -jar -Dc=gettingstarted -Dauto
> example\exampledocs\post.jar example\exampledocs\sample.html
> SimplePostTool version 5.0.0
> Posting files to [base] url
> http://localhost:8983/solr/gettingstarted/update...
> Entering auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file sample.html (text/html) to [base]/extract
> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
> SimplePostTool: WARNING: Response: 
> 
> 
> Error 404 Not Found
> 
> HTTP ERROR 404 Not Found
> 
> URI:/solr/gettingstarted/update/extract
> STATUS:404
> MESSAGE:Not Found
> SERVLET:default
> 
>
> 
> 
> SimplePostTool: WARNING: IOException while reading response:
> java.io.FileNotFoundException:
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
> 1 files indexed.
> COMMITting Solr index changes to
> http://localhost:8983/solr/gettingstarted/update...
> Time spent: 0:00:00.086
> ```
>
> However the json and all other file types index with no problem. For
> example:
> ```
> PS C:\solr-8.8.0> java -jar -Dc=gettingstarted -Dauto
> example\exampledocs\post.jar example\exampledocs\books.json
> SimplePostTool version 5.0.0
> Posting files to [base] url
> http://localhost:8983/solr/gettingstarted/update...
> Entering auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file books.json (application/json) to [base]/json/docs
> 1 files indexed.
> COMMITting Solr index changes to
> http://localhost:8983/solr/gettingstarted/update...
> ```
> Just following this tutorial:[
> https://lucene.apache.org/solr/guide/8_8/post-tool.html#post-tool-windows-support][1
> ]
>
>   [1]:
> https://lucene.apache.org/solr/guide/8_8/post-tool.html#post-tool-windows-support


Re: HTML sample.html not indexing in Solr 8.8

2021-02-20 Thread Shawn Heisey

On 2/20/2021 3:58 PM, cratervoid wrote:

SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html


The problem here is that the solrconfig.xml in use by the index named 
"gettingstarted" does not define a handler at /update/extract.


Typically a handler defined at that URL path will utilize the extracting 
request handler class.  This handler uses Tika (another Apache project) 
to extract usable data from rich text formats like PDF, HTML, etc.


  
  

  true
  ignored_
  _text_

  

Note that using this handler will require adding some contrib jars to Solr.

Tika can become very unstable because it deals with undocumented file 
formats, so we do not recommend using that handler in production.  If 
the functionality is important, Tika should be included in a program 
that's separate from Solr, so that if it crashes, it does not take Solr 
down with it.


Thanks,
Shawn


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Lucky Sharma
Congratulations Jan

Regards,
Lucky Sharma

On Sat, 20 Feb 2021 at 8:07 PM, Karl Wright  wrote:

> Congratulations!
> Karl
>
> On Sat, Feb 20, 2021 at 6:28 AM Uwe Schindler  wrote:
>
>> Congrats Jan!
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>> 
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Anshum Gupta 
>> *Sent:* Thursday, February 18, 2021 7:55 PM
>> *To:* Lucene Dev ; solr-user@lucene.apache.org
>> *Subject:* Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> I’d like to inform everyone that the newly formed Apache Solr PMC
>> nominated and elected Jan Høydahl for the position of the Solr PMC Chair
>> and Vice President. This decision was approved by the board in its February
>> 2021 meeting.
>>
>>
>>
>> Congratulations Jan!
>>
>>
>>
>> --
>>
>> Anshum Gupta
>>
> --
Warm Regards,

Lucky Sharma
Contact No :+91 9821559918


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Karl Wright
Congratulations!
Karl

On Sat, Feb 20, 2021 at 6:28 AM Uwe Schindler  wrote:

> Congrats Jan!
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Anshum Gupta 
> *Sent:* Thursday, February 18, 2021 7:55 PM
> *To:* Lucene Dev ; solr-user@lucene.apache.org
> *Subject:* Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!
>
>
>
> Hi everyone,
>
>
>
> I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated and elected Jan Høydahl for the position of the Solr PMC Chair
> and Vice President. This decision was approved by the board in its February
> 2021 meeting.
>
>
>
> Congratulations Jan!
>
>
>
> --
>
> Anshum Gupta
>


RE: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Uwe Schindler
Congrats Jan!

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

  https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Anshum Gupta  
Sent: Thursday, February 18, 2021 7:55 PM
To: Lucene Dev ; solr-user@lucene.apache.org
Subject: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

 

Hi everyone,

 

I’d like to inform everyone that the newly formed Apache Solr PMC nominated and 
elected Jan Høydahl for the position of the Solr PMC Chair and Vice President. 
This decision was approved by the board in its February 2021 meeting.

 

Congratulations Jan! 

 

-- 

Anshum Gupta



RE: nodes() stream to infinite depth

2021-02-20 Thread Subhajit Das
Hi Joel,

This stream seems to be somewhat alternative. Thanks for the suggestion.

The main issue here would be:

  1.  As I am traversing up in a tree, I would need a dummy root node for all 
documents, as “to” is mandatory.
  2.  This emits one tuple per path. So, for my requirement, I will get only 
one data tuple. This removes the advantage of streaming, with 1 tuple per node.

So, it seems that I have to use {!graph...} syntax. As, my collection has one 
shard, so it should be same, basically.

Though, one noticeable advantage of using shortest path, would be:

  1.  The list inside tuple, is based on traversal order. Unlike {!graph..}.

Though, I can sort by date to get the same result. Additionally I would get 
full result, will all document fields.
Thanks for suggestion though.


From: Joel Bernstein<mailto:joels...@gmail.com>
Sent: 20 February 2021 01:20 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Re: nodes() stream to infinite depth

You could see if this meets you needs:

https://lucene.apache.org/solr/guide/8_8/stream-source-reference.html#shortestpath





Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 19, 2021 at 2:45 PM Subhajit Das 
wrote:

> Hi Joel,
>
> Thanks for response. But, is there any way to simulate the same?
>
>
> From: Joel Bernstein<mailto:joels...@gmail.com>
> Sent: 20 February 2021 01:13 AM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Subject: Re: nodes() stream to infinite depth
>
> Nodes is designed for a stepwise graph walk. It doesn't do a full
> traversal.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Feb 19, 2021 at 4:47 AM Subhajit Das 
> wrote:
>
> >
> > Hi,
> >
> > “{!graph ...}” goes to infinite depth by default. But “nodes()” stream
> > does not go to infinite depth.
> >
> > Is there any way to go to infinite depth?
> >
> >
>
>



Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Ritvik Sharma
Congratulations Jan !

On Sat, 20 Feb 2021 at 5:30 AM, Jan Høydahl  wrote:

> Thanks everyone!
>
> It's an exciting season for Solr, and I look forward to serving the
> project as chair.
> Also thanks to Anshum for an excellent job last term for Lucene.
>
> Jan
>
> > 18. feb. 2021 kl. 19:55 skrev Anshum Gupta :
> >
> > Hi everyone,
> >
> > I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated
> > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> > President. This decision was approved by the board in its February 2021
> > meeting.
> >
> > Congratulations Jan!
> >
> > --
> > Anshum Gupta
>
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Jan Høydahl
Thanks everyone!

It's an exciting season for Solr, and I look forward to serving the project as 
chair.
Also thanks to Anshum for an excellent job last term for Lucene.

Jan

> 18. feb. 2021 kl. 19:55 skrev Anshum Gupta :
> 
> Hi everyone,
> 
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
> 
> Congratulations Jan!
> 
> -- 
> Anshum Gupta



Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Aroop Ganguly
Congrats Jan and Best Wishes!

Also many thanks to Anshum for all his efforts in the last term!
Also a belated shout out to Cassandra and other chairs in the past for their 
much appreciated efforts for the community ! 

Regards
Aroop

> On Feb 18, 2021, at 10:55 AM, Anshum Gupta  wrote:
> 
> Hi everyone,
> 
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
> 
> Congratulations Jan!
> 
> -- 
> Anshum Gupta



Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Jason Gerlowski
Congrats!

On Fri, Feb 19, 2021 at 10:06 AM Divye  wrote:
>
> Congratulations Jan!
>
> Regards,
> Divye
>
> On Fri, 19 Feb, 2021, 00:26 Anshum Gupta,  wrote:
>
> > Hi everyone,
> >
> > I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> > President. This decision was approved by the board in its February 2021
> > meeting.
> >
> > Congratulations Jan!
> >
> > --
> > Anshum Gupta
> >


Re: nodes() stream to infinite depth

2021-02-19 Thread Joel Bernstein
You could see if this meets you needs:

https://lucene.apache.org/solr/guide/8_8/stream-source-reference.html#shortestpath





Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 19, 2021 at 2:45 PM Subhajit Das 
wrote:

> Hi Joel,
>
> Thanks for response. But, is there any way to simulate the same?
>
>
> From: Joel Bernstein<mailto:joels...@gmail.com>
> Sent: 20 February 2021 01:13 AM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Subject: Re: nodes() stream to infinite depth
>
> Nodes is designed for a stepwise graph walk. It doesn't do a full
> traversal.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Feb 19, 2021 at 4:47 AM Subhajit Das 
> wrote:
>
> >
> > Hi,
> >
> > “{!graph ...}” goes to infinite depth by default. But “nodes()” stream
> > does not go to infinite depth.
> >
> > Is there any way to go to infinite depth?
> >
> >
>
>


RE: nodes() stream to infinite depth

2021-02-19 Thread Subhajit Das
Hi Joel,

Thanks for response. But, is there any way to simulate the same?


From: Joel Bernstein<mailto:joels...@gmail.com>
Sent: 20 February 2021 01:13 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Re: nodes() stream to infinite depth

Nodes is designed for a stepwise graph walk. It doesn't do a full traversal.


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 19, 2021 at 4:47 AM Subhajit Das 
wrote:

>
> Hi,
>
> “{!graph ...}” goes to infinite depth by default. But “nodes()” stream
> does not go to infinite depth.
>
> Is there any way to go to infinite depth?
>
>



Re: nodes() stream to infinite depth

2021-02-19 Thread Joel Bernstein
Nodes is designed for a stepwise graph walk. It doesn't do a full traversal.


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 19, 2021 at 4:47 AM Subhajit Das 
wrote:

>
> Hi,
>
> “{!graph ...}” goes to infinite depth by default. But “nodes()” stream
> does not go to infinite depth.
>
> Is there any way to go to infinite depth?
>
>


Re: Dynamic starting or stoping of zookeepers in a cluster

2021-02-19 Thread Joe Lerner
This is solid information. *How about the application, which uses
SOLR/Zookeeper?*

Do we have to follow this guidance, to make the application ZK config aware:

https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html#ch_reconfig_rebalancing

  

Or, could we leave it as is, and as long as the ZK Ensemble has the same
IPs?

Thanks!

Joe




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

2021-02-19 Thread David Smiley
Even if you could do an "fl" with the ability to exclude certain fields, it
begs the question of what goes into the document cache.  The doc cache is
doc oriented, not field oriented.  So there needs to be some sort of
stand-in value if you don't want to cache a value there and that ends
up being LazyField if you have that feature enabled, or possible wasted
space if you don't have that enabled.  So I don't think the ability to
exclude fields in "fl" would obsolete enableLazyFieldLoading which I think
you are implying?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Feb 19, 2021 at 10:10 AM Gus Heck  wrote:

> Actually I suspect it's there because the ability to exclude fields
> rather than include them is still pending...
> https://issues.apache.org/jira/browse/SOLR-3191
> See also
> https://issues.apache.org/jira/browse/SOLR-10367
> https://issues.apache.org/jira/browse/SOLR-9467
>
> All of these and lazy field loading are motivated by the case where you
> have a very large stored field and you sometimes don't want it, but do want
> everything else, and an explicit list of fields is not convenient (i.e. the
> field list would have to be hard coded in an application, or alternately
> require some sort of schema parsing to build a list of possible fields or
> other severe ugliness..)
>
> -Gus
>
> On Thu, Feb 18, 2021 at 8:42 AM David Smiley  wrote:
>
> > IMO enableLazyFieldLoading is a small optimization for most apps.  It
> saves
> > memory in the document cache at the expense of increased latency if your
> > usage pattern wants a field later that wasn't requested earlier.  You'd
> > probably need detailed metrics/benchmarks to observe a difference, and
> you
> > might reach a conclusion that enableLazyFieldLoading is best at "false"
> for
> > you irrespective of the bug.  I suspect it may have been developed for
> > particularly large document use-cases where you don't normally need some
> > large text fields for retrieval/highlighting.  For example imagine if you
> > stored the entire input data as JSON in a _json_ field or some-such.
> > Nowadays, I'd set large="true" on such a field, which is a much newer
> > option.
> >
> > I was able to tweak my test to have only alphabetic IDs, and the test
> still
> > failed.  I don't see how the ID's contents/format could cause any effect.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Thu, Feb 18, 2021 at 5:04 AM Nussbaum, Ronen <
> ronen.nussb...@verint.com
> > >
> > wrote:
> >
> > > You're right, I was able to reproduce it too without highlighting.
> > > Regarding the existing bug, I think there might be an additional issue
> > > here because it happens only when id field contains an underscore
> (didn't
> > > check for other special characters).
> > > Currently I have no other choice but to use
> enableLazyFieldLoading=false.
> > > I hope it wouldn't have a significant performance impact.
> > >
> > > -Original Message-
> > > From: David Smiley 
> > > Sent: יום ה 18 פברואר 2021 01:03
> > > To: solr-user 
> > > Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
> > > Loading => Invalid Index
> > >
> > > I think the issue is this existing bug, but needs to refer to
> > > toSolrInputDocument instead of toSolrDoc:
> > > https://issues.apache.org/jira/browse/SOLR-13034
> > > Highlighting isn't involved; you just need to somehow get a document
> > > cached with lazy fields.  In a test I was able to do this simply by
> > doing a
> > > query that only returns the "id" field.  No highlighting.
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > >
> > > On Wed, Feb 17, 2021 at 10:28 AM David Smiley 
> > wrote:
> > >
> > > > Thanks for more details.  I was able to reproduce this locally!  I
> > > > hacked a test to look similar to what you are doing.  BTW it's okay
> to
> > > > fill out a JIRA imperfectly; they can always be edited :-).  Once I
> > > > better understand the nature of the bug today, I'll file an issue and
> > > respond with it here.
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> >

Re: Elevation in dataDir in Solr Cloud

2021-02-19 Thread Mónica Marrero
Thank you! I just filed the bug in Jira:
https://issues.apache.org/jira/browse/SOLR-15170

About the workaround you mentioned, we ran a quick test on one server and
it apparently worked, but we did not check it properly in a cluster (we
decided that it is better not to go with this in production anyway). Just
out of curiosity, even if we could load the elevate.xml file from the data
folder in Cloud (or changing it directly in zk), does this make sense in
terms of performance? We have frequent commits and a big elevate.xml file.

We are considering your suggestion of using the elevation directly in the
queries (I have seen work to improve this by removing the requirement of
having an elevate.xml file at all). It seems to be straightforward to apply
in some cases, but not so much when you need normalization.

-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.


Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

2021-02-19 Thread Gus Heck
Actually I suspect it's there because the ability to exclude fields
rather than include them is still pending...
https://issues.apache.org/jira/browse/SOLR-3191
See also
https://issues.apache.org/jira/browse/SOLR-10367
https://issues.apache.org/jira/browse/SOLR-9467

All of these and lazy field loading are motivated by the case where you
have a very large stored field and you sometimes don't want it, but do want
everything else, and an explicit list of fields is not convenient (i.e. the
field list would have to be hard coded in an application, or alternately
require some sort of schema parsing to build a list of possible fields or
other severe ugliness..)

-Gus

On Thu, Feb 18, 2021 at 8:42 AM David Smiley  wrote:

> IMO enableLazyFieldLoading is a small optimization for most apps.  It saves
> memory in the document cache at the expense of increased latency if your
> usage pattern wants a field later that wasn't requested earlier.  You'd
> probably need detailed metrics/benchmarks to observe a difference, and you
> might reach a conclusion that enableLazyFieldLoading is best at "false" for
> you irrespective of the bug.  I suspect it may have been developed for
> particularly large document use-cases where you don't normally need some
> large text fields for retrieval/highlighting.  For example imagine if you
> stored the entire input data as JSON in a _json_ field or some-such.
> Nowadays, I'd set large="true" on such a field, which is a much newer
> option.
>
> I was able to tweak my test to have only alphabetic IDs, and the test still
> failed.  I don't see how the ID's contents/format could cause any effect.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Feb 18, 2021 at 5:04 AM Nussbaum, Ronen  >
> wrote:
>
> > You're right, I was able to reproduce it too without highlighting.
> > Regarding the existing bug, I think there might be an additional issue
> > here because it happens only when id field contains an underscore (didn't
> > check for other special characters).
> > Currently I have no other choice but to use enableLazyFieldLoading=false.
> > I hope it wouldn't have a significant performance impact.
> >
> > -Original Message-
> > From: David Smiley 
> > Sent: יום ה 18 פברואר 2021 01:03
> > To: solr-user 
> > Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
> > Loading => Invalid Index
> >
> > I think the issue is this existing bug, but needs to refer to
> > toSolrInputDocument instead of toSolrDoc:
> > https://issues.apache.org/jira/browse/SOLR-13034
> > Highlighting isn't involved; you just need to somehow get a document
> > cached with lazy fields.  In a test I was able to do this simply by
> doing a
> > query that only returns the "id" field.  No highlighting.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Wed, Feb 17, 2021 at 10:28 AM David Smiley 
> wrote:
> >
> > > Thanks for more details.  I was able to reproduce this locally!  I
> > > hacked a test to look similar to what you are doing.  BTW it's okay to
> > > fill out a JIRA imperfectly; they can always be edited :-).  Once I
> > > better understand the nature of the bug today, I'll file an issue and
> > respond with it here.
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > >
> > > On Wed, Feb 17, 2021 at 6:36 AM Nussbaum, Ronen
> > > 
> > > wrote:
> > >
> > >> Hello David,
> > >>
> > >> Thank you for your reply.
> > >> It was very hard but finally I discovered how to reproduce it. I
> > >> thought of issuing an issue but wasn't sure about the components and
> > priority.
> > >> I used the "tech products" configset, with the following changes:
> > >> 1. Added  > >> name="_nest_path_" class="solr.NestPathField" /> 2. Added  > >> name="text_en" type="text_en" indexed="true"
> > >> stored="true" termVectors="true" termOffsets="true"
> termPositions="true"
> > >> required="false" multiValued="true" /> Than I inserted one document
> > >> with a nested child e.g.
> > >> {id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
> > >>
> > >> To reproduce:

Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Divye
Congratulations Jan!

Regards,
Divye

On Fri, 19 Feb, 2021, 00:26 Anshum Gupta,  wrote:

> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Gus Heck
Congratulations :)

On Fri, Feb 19, 2021 at 6:51 AM Juan Eduardo Hernandez <
juaneduard...@gmail.com> wrote:

> Congratulations Jan!!
>
> El vie, 19 feb 2021 a las 5:56, Atita Arora ()
> escribió:
>
> > Congratulations Jan!
> >
> > On Fri, Feb 19, 2021 at 9:41 AM Dawid Weiss 
> wrote:
> >
> > > Congratulations, Jan!
> > >
> > > On Thu, Feb 18, 2021 at 7:56 PM Anshum Gupta 
> > > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I’d like to inform everyone that the newly formed Apache Solr PMC
> > > nominated and elected Jan Høydahl for the position of the Solr PMC
> Chair
> > > and Vice President. This decision was approved by the board in its
> > February
> > > 2021 meeting.
> > > >
> > > > Congratulations Jan!
> > > >
> > > > --
> > > > Anshum Gupta
> > >
> >
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Juan Eduardo Hernandez
Congratulations Jan!!

El vie, 19 feb 2021 a las 5:56, Atita Arora ()
escribió:

> Congratulations Jan!
>
> On Fri, Feb 19, 2021 at 9:41 AM Dawid Weiss  wrote:
>
> > Congratulations, Jan!
> >
> > On Thu, Feb 18, 2021 at 7:56 PM Anshum Gupta 
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > I’d like to inform everyone that the newly formed Apache Solr PMC
> > nominated and elected Jan Høydahl for the position of the Solr PMC Chair
> > and Vice President. This decision was approved by the board in its
> February
> > 2021 meeting.
> > >
> > > Congratulations Jan!
> > >
> > > --
> > > Anshum Gupta
> >
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Atita Arora
Congratulations Jan!

On Fri, Feb 19, 2021 at 9:41 AM Dawid Weiss  wrote:

> Congratulations, Jan!
>
> On Thu, Feb 18, 2021 at 7:56 PM Anshum Gupta 
> wrote:
> >
> > Hi everyone,
> >
> > I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated and elected Jan Høydahl for the position of the Solr PMC Chair
> and Vice President. This decision was approved by the board in its February
> 2021 meeting.
> >
> > Congratulations Jan!
> >
> > --
> > Anshum Gupta
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Dawid Weiss
Congratulations, Jan!

On Thu, Feb 18, 2021 at 7:56 PM Anshum Gupta  wrote:
>
> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated 
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice 
> President. This decision was approved by the board in its February 2021 
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread David Smiley
Congratulations Jan!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta  wrote:

> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread Michael McCandless
Congratulations and thank you, Jan!  It is so exciting that Solr is now a
TLP!

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta  wrote:

> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated and elected Jan Høydahl for the position of the Solr PMC Chair
> and Vice President. This decision was approved by the board in its February
> 2021 meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread Steve Rowe
Congratulations Jan!

--
Steve

> On Feb 18, 2021, at 1:55 PM, Anshum Gupta  wrote:
> 
> Hi everyone,
> 
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
> 
> Congratulations Jan!
> 
> -- 
> Anshum Gupta



Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread Michael Sokolov
Yes, Congratulations and a big thank you Jan!

On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta  wrote:
>
> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated 
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice 
> President. This decision was approved by the board in its February 2021 
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta


Re: Dynamic starting or stoping of zookeepers in a cluster

2021-02-18 Thread Shawn Heisey

On 2/18/2021 8:20 AM, DAVID MARTIN NIETO wrote:

We've a solr cluster with 4 solr servers and 5 zookeepers in HA mode.
We've tested about if our cluster can mantain the service with only the half of 
the cluster, in case of disaster os similar, and we've a problem with the 
zookepers config and its static configuration.

In the start script of the 4 solrs servers there are a list of 5 ip:port of the 5 
zookeepers of the cluster, so when we "lost" the half of machines (we've 2 zoos 
in one machine and 3 on another) in the worst case we lost 3 of these 5 zookeepers. We 
can start a sixth zookeeper (to have 3 with the half of cluster stopped) but to add in 
the solr server we need to stop and restart with a new list of ip:port adding it and 
that's not an automatic or dynamic thing.


In order to have a highly available zookeeper, you must have at least 
three separate physical servers for ZK.  Running multiple zookeepers on 
one physical machine gains you nothing ... because if the whole machine 
fails, you lose all of those zookeepers.  If you have three physical 
servers, one can fail with no problems.  If you have five separate 
physical servers running ZK, then two of the machines can fail without 
taking the cluster down.



¿Somebody knows another configuration or workaround to have a dynamic list of 
zoos and start or stop some of thems without changes in the config and 
start/stop the solr server?


The Zookeeper client was upgraded to 3.5 in Solr 8.2.0.

https://issues.apache.org/jira/browse/SOLR-8346

If you're running at least Solr 8.2.0, and your ZK servers are at least 
version 3.5, then ZK should support dynamic cluster reconfiguration. 
The ZK status page in the admin UI may have some problems after ZK 
undergoes a dynamic reconfiguration, but SolrCloud's core functionality 
should work fine.


Thanks,
Shawn


  1   2   3   4   5   6   7   8   9   10   >