Re: Learning to Rank (LTR) with grouping

2018-04-23 Thread ilayaraja
Between, I have applied the patch on top of solr 7.2.1 and it worked well for
me though the Test Cases were failing, yet to see why.

On another note, LTR with reRankDocs>page_size seems to create issue. For
example, Say my page_size=24 and reRankDocs=48. 

For first query with start=0, it returns 24 reranked results from top 2
result pages.
Say, if an item (Y) from second page is moved to first page after
re-ranking, while an item (X) from first page is moved away from the first
page. 

For second query with start=24, reRankDocs=48, it returns me second page of
results from results between second and third page that does not have item
X.

So eventually, I do not see item X from first page or next page of results.
Is n't it?

How do we solve this?



-
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Learning to Rank (LTR) with grouping

2018-04-23 Thread ilayaraja
Between, I have applied the patch on top of solr 7.2.1 and it worked well for
me though the Test Cases were failing, yet to see why.

On another note, LTR with reRankDocs>page_size seems to create issue. For
example, Say my page_size=24 and reRankDocs=48. 

For first query with start=0, it returns 24 reranked results from top 2
result pages.
Say, if an item (Y) from second page is moved to first page after
re-ranking, while an item (X) from first page is moved away from the first
page. 

For second query with start=24, reRankDocs=48, it returns me second page of
results from results between second and third page that does not have item
X.

So eventually, I do not see item X from first page or next page of results.
Is n't it?

How do we solve this?



-
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: versions of documentation: suggestion for improvement

2018-04-23 Thread Erick Erickson
One thing I do is download the complete ref guide as a PDF file.The
"Other formats">>"Archived PDFs" will let you download them.

Then I can search just within that PDF. This has plusses and minuses,
but I thought I'd mention it.

Best,
Erick

On Mon, Apr 23, 2018 at 4:49 PM, Chris Hostetter
 wrote:
>
>
> There's been some discussion along the lines of doing some things like
> what you propose which were spun out of discussion in SOLR-10595 into the
> issue LUCENE-7924 ... but so far no one has attempted the
> tooling/scripting work needed to make it happen.
>
> Pathes certainly welcome.
>
>
>
> : Date: Mon, 23 Apr 2018 09:55:35 +0200
> : From: Arturas Mazeika 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: versions of documentation: suggestion for improvement
> :
> : Hi Solr-Team,
> :
> : If I google for specific features for solr, I usually get redirected to 6.6
> : version of the documentation, like this one:
> :
> : 
> https://lucene.apache.org/solr/guide/6_6/overview-of-documents-fields-and-schema-design.html
> :
> : Since I am playing with 7.2 version of solr, I almost always need to change
> : this manually through to:
> :
> : 
> https://lucene.apache.org/solr/guide/7_2/overview-of-documents-fields-and-schema-design.html
> :
> : (by clicking on the url, going to the number, and replacing two
> : characters). This is somewhat cumbersome (especially after the first dozen
> : of changes in urls. Suggestion:
> :
> : (1) Would it make sense to include other versions of the document as urls
> : on the page? See, e.g., the following documentation of postgres, where each
> : page has a pointer to the same page in different versions:
> :
> : https://www.postgresql.org/docs/9.6/static/sql-createtable.html
> :
> : (especially "This page in other versions: 9.3
> :  / 9.4
> :  / 9.5
> :  / *9.6* /
> : current
> :  (10
> : )" line on
> : the page)
> :
> : (2) Would it make sense in addition to include "current", pointing to the
> : latest current release?
> :
> : This would help to find solr relevant infos from search engines faster.
> :
> : Cheers,
> : Arturas
> :
>
> -Hoss
> http://www.lucidworks.com/


RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-04-23 Thread howed
Finally got back to looking at this, and found that the solution was to
switch to the  unified

  
highlighter which doesn't seem to have the same problem with my complex
synonyms.  This required some tweaking of the highlighting parameters and my
code as it doesn't highlight exactly the same as the default highlighter,
but all is working now.

Thanks again for the assistance.

David



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: versions of documentation: suggestion for improvement

2018-04-23 Thread Chris Hostetter


There's been some discussion along the lines of doing some things like 
what you propose which were spun out of discussion in SOLR-10595 into the 
issue LUCENE-7924 ... but so far no one has attempted the 
tooling/scripting work needed to make it happen.

Pathes certainly welcome.



: Date: Mon, 23 Apr 2018 09:55:35 +0200
: From: Arturas Mazeika 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: versions of documentation: suggestion for improvement
: 
: Hi Solr-Team,
: 
: If I google for specific features for solr, I usually get redirected to 6.6
: version of the documentation, like this one:
: 
: 
https://lucene.apache.org/solr/guide/6_6/overview-of-documents-fields-and-schema-design.html
: 
: Since I am playing with 7.2 version of solr, I almost always need to change
: this manually through to:
: 
: 
https://lucene.apache.org/solr/guide/7_2/overview-of-documents-fields-and-schema-design.html
: 
: (by clicking on the url, going to the number, and replacing two
: characters). This is somewhat cumbersome (especially after the first dozen
: of changes in urls. Suggestion:
: 
: (1) Would it make sense to include other versions of the document as urls
: on the page? See, e.g., the following documentation of postgres, where each
: page has a pointer to the same page in different versions:
: 
: https://www.postgresql.org/docs/9.6/static/sql-createtable.html
: 
: (especially "This page in other versions: 9.3
:  / 9.4
:  / 9.5
:  / *9.6* /
: current
:  (10
: )" line on
: the page)
: 
: (2) Would it make sense in addition to include "current", pointing to the
: latest current release?
: 
: This would help to find solr relevant infos from search engines faster.
: 
: Cheers,
: Arturas
: 

-Hoss
http://www.lucidworks.com/


Re: SolrCloud cluster does not accept new documents for indexing

2018-04-23 Thread Denis Demichev
I conducted another experiment today with local SSD drives, but this did
not seem to fix my problem.
Don't see any extensive I/O in this case:


Device:tpskB_read/skB_wrtn/skB_readkB_wrtn

xvda  1.7688.83 5.521256191  77996

xvdb 13.95   111.30 56663.931573961  801303364

xvdb - is the device where SolrCloud is installed and data files are kept.

What I see:
- There are 17 "Lucene Merge Thread #..." running. Some of them are
blocked, some of them are RUNNING
- updateExecutor-N-thread-M threads are in parked mode and number of docs
that I am able to submit is still low
- Tried to change maxIndexingThreads, set it to something high. This seems
to prolong the time when cluster is accepting new indexing requests and
keeps CPU utilization a lot higher while the cluster is merging indexes

Could anyone please point me to the right direction (documentation or Java
classes) where I can read about how data is passed from updateExecutor
thread pool to Merge Threads? I assume there should be some internal
blocking queue or something similar.
Still cannot wrap my head around how Solr blocks incoming connections. Non
merged indexes are not kept in memory so I don't clearly understand why
Solr cannot keep writing index file to HDD while other threads are merging
indexes (since this is a continuous process anyway).

Does anyone use SPM monitoring tool for that type of problems? Is it of any
use at all?


Thank you in advance.

[image: image.png]


Regards,
Denis


On Fri, Apr 20, 2018 at 1:28 PM Denis Demichev  wrote:

> Mikhail,
>
> Sure, I will keep everyone posted. Moving to non-HVM instance may take
> some time, so hopefully I will be able to share my observations in the next
> couple of days or so.
> Thanks again for all the help.
>
> Regards,
> Denis
>
>
> On Fri, Apr 20, 2018 at 6:02 AM Mikhail Khludnev  wrote:
>
>> Denis, please let me know what it ends up with. I'm really curious
>> regarding this case and AWS instace flavours. fwiw since 7.4 we'll have
>> ioThrottle=false option.
>>
>> On Thu, Apr 19, 2018 at 11:06 PM, Denis Demichev 
>> wrote:
>>
>>> Mikhail, Erick,
>>>
>>> Thank you.
>>>
>>> What just occurred to me - we don't use local SSD but instead we're
>>> using EBS volumes.
>>> This was a wrong instance type that I looked at.
>>> Will try to set up a cluster with SSD nodes and retest.
>>>
>>> Regards,
>>> Denis
>>>
>>>
>>> On Thu, Apr 19, 2018 at 2:56 PM Mikhail Khludnev 
>>> wrote:
>>>
 I'm not sure it's the right context, but here is one guy shows really
 low throthle boundary

 https://issues.apache.org/jira/browse/SOLR-11200?focusedCommentId=16115348=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16115348


 On Thu, Apr 19, 2018 at 8:37 PM, Mikhail Khludnev 
 wrote:

> Threads are hanging on merge io throthling
>
> at 
> org.apache.lucene.index.MergePolicy$OneMergeProgress.pauseNanos(MergePolicy.java:150)
> at 
> org.apache.lucene.index.MergeRateLimiter.maybePause(MergeRateLimiter.java:148)
> at 
> org.apache.lucene.index.MergeRateLimiter.pause(MergeRateLimiter.java:93)
> at 
> org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:78)
>
> It seems odd. Please confirm that you don't commit on every update
> request.
> The only way to monitor io throthling is to enable infostream and read
> a lot of logs.
>
>
> On Thu, Apr 19, 2018 at 7:59 PM, Denis Demichev 
> wrote:
>
>> Erick,
>>
>> Thank you for your quick response.
>>
>> I/O bottleneck: Please see another screenshot attached, as you can
>> see disk r/w operations are pretty low or not significant.
>> iostat==
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda  0.00 0.000.000.00 0.00 0.00
>> 0.00 0.000.000.000.00   0.00   0.00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   12.520.000.000.000.00   87.48
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda  0.00 0.000.000.00 0.00 0.00
>> 0.00 0.000.000.000.00   0.00   0.00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   12.510.000.000.000.00   87.49
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda  0.00 0.000.000.00 0.00 

RE: Solr 6.6.2 Master/Slave SSL Replication Error

2018-04-23 Thread Kelly Rusk
Hello all,

I added the incorrect certificate and can clearly see the certificate in my 
keystore when I run the following command:

keytool -list -v -keystore D:\Solr\solr-6.6.2\server\etc\solr-ssl.keystore.pfx 
-storepass mypass

However, I can't remove it as this command states "keytool error: 
java.lang.Exception: Alias  does not exist":

keytool -delete -alias "MyCert" -keystore 
D:\Solr\solr-6.6.2\server\etc\solr-ssl.keystore.pfx -storepass mypass

How can it show it in the store, but not delete it? If I try to import it 
again, it says it can't import because it already exists in the store!

Thanks,

Kelly

-Original Message-
From: Kelly Rusk [mailto:kelly.r...@rackspace.com] 
Sent: Sunday, April 22, 2018 8:51 PM
To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
Subject: Re: Solr 6.6.2 Master/Slave SSL Replication Error

Makes perfect sense! Should I use the key tool to import the Certs? If so, do 
you have an example you prefer or should I just pull from the docs?

Regards,

Kelly
_
From: Shawn Heisey 
Sent: Sunday, April 22, 2018 8:40 PM
Subject: Re: Solr 6.6.2 Master/Slave SSL Replication Error
To: 


On 4/22/2018 6:27 PM, Kelly Rusk wrote:
> Thanks for the assistance. The Master Server has a self-signed Cert with its 
> machine name, and the Slave has a self-signed Cert with its machine name.
>
> They have identical configurations, and I created a keystore per server. 
> Should I import the self-signed Cert into each other's keystore? Or are you 
> stating that I need to copy the keystore over to the Slave instead of having 
> the one I created?

For the way you have it now, the trust store will need all of the certificates 
of all of the servers.  It's the remote certificate that must be validated, so 
having just the local certificate in the trust store doesn't do you any good.

A better option would be to have one certificate that covers all of the names 
you're using, and have all the servers set up identically.

Thanks,
Shawn






Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread Mikhail Khludnev
this one was caused by repeating command params

curl
"http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-im
port=true=true=reload-config"


500647java.util.Arrays$ArrayList cannot be cast to
java.lang.Stringjava.lang.ClassCastException:
java.util.Arrays$ArrayList cannot be cast to java.lang.String
at
org.apache.solr.handler.dataimport.RequestInfo.init(RequestInfo.java
:52)


On Mon, Apr 23, 2018 at 5:30 PM, msaunier  wrote:

> I have add debug:
>
> curl
> "http://srv-formation-solr:8983/solr/arguments_test/test_
> dih?command=full-im
> port=true=true"
> 
> 
> 500 name="QTime">588 name="runtimeLib">true1 name="defaults"> name="config">DIH/indexation_events.xml name="command">full-import name="trace">java.lang.NullPointerException
> at
> org.apache.solr.handler.dataimport.DataImporter.
> doFullImport(DataImporter.ja
> va:420)
> at
> org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:474
> )
> at
> org.apache.solr.handler.dataimport.DataImportHandler.
> handleRequestBody(DataI
> mportHandler.java:180)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.
> java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:529)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:
> 361)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:
> 305)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler
> .java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143
> )
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java
> :226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java
> :1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:
> 185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:
> 1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141
> )
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHand
> lerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.
> java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:1
> 34)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:
> 335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:1
> 34)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConne
> ction.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:
> 93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceC
> onsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(
> ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProd
> uceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:
> 671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:5
> 89)
> at java.lang.Thread.run(Thread.java:748)
> 500
> 
>
> ###
> And reload config:
>
> curl
> "http://srv-formation-solr:8983/solr/arguments_test/test_
> dih?command=full-im
> port=true=true=reload-config"
> 
> 
> 500 name="QTime">647 name="msg">java.util.Arrays$ArrayList cannot be cast to
> java.lang.Stringjava.lang.ClassCastException:
> java.util.Arrays$ArrayList cannot be cast to java.lang.String
> at
> org.apache.solr.handler.dataimport.RequestInfo.
> init(RequestInfo.java
> :52)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.
> handleRequestBody(DataI
> mportHandler.java:128)
> at
> 

Re[2]: Optimize question

2018-04-23 Thread Scott M.
So, basically I made the first mistake by Optimizing ? At this point, since it 
seems I can't stop these optimizations from running, should I just drop all 
data and start fresh ?
On Mon, Apr 23, 2018 at 01:23 PM, Erick Erickson  wrote:
No, it's not "optimizing on its own". At least it better not be.

As far as your index growing after optimize, that's the little
"gotcha" with optimize, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
 
(https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/)

This is being addressed in the 7.4 time frame (hopefully), see LUCENE-7976.

Best,
Erick

On Mon, Apr 23, 2018 at 10:13 AM, Scott M.  wrote:
I recently installed Solr 7.1 and configured it to work with Dovecot for 
full-text searching. It works great but after about 2 days of indexing, I've 
pressed the 'Optimize' button. At that point it had collected about 17 million 
documents and it was taking up about 60-70GB of space.

It completed once and the space dropped down to 30-45GB but since then it 
appears to be doing Optimize again on its own, regularly swelling up the total 
space used to double, then it shrinks again, stays a bit that way then it 
starts another optimize!

Logs show:
4/22/2018, 11:04:22 PM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.
4/23/2018, 3:18:35 AM
WARN true
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.
4/23/2018, 7:33:46 AM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.
4/23/2018, 9:48:32 AM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.
4/23/2018, 11:25:13 AM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.
4/23/2018, 1:00:42 PM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.
It's absolutely killing the computer this is running on. Now it just started 
another run...

In the logs all I see is entries like these, and it doesn't say anywhere 
optimize=true

2018-04-23 17:12:31.995 INFO  (qtp947679291-17200) [   x:dovecot] 
o.a.s.u.DirectUpdateHandler2 start 
commit{_version_=1598557836536709120,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=true,prepareCommit=false}


Re[2]: Optimize question

2018-04-23 Thread Scott M.
I only have one core, 'dovecot'. This is a pretty standard config. How do I 
stop it from doing all these 'Optimizes' ? Is there an automatic process that 
triggers them ?
On Mon, Apr 23, 2018 at 01:25 PM, Shawn Heisey  wrote:
On 4/23/2018 11:13 AM, Scott M. wrote:
I recently installed Solr 7.1 and configured it to work with Dovecot for 
full-text searching. It works great but after about 2 days of indexing, I've 
pressed the 'Optimize' button. At that point it had collected about 17 million 
documents and it was taking up about 60-70GB of space.

It completed once and the space dropped down to 30-45GB but since then it 
appears to be doing Optimize again on its own, regularly swelling up the total 
space used to double, then it shrinks again, stays a bit that way then it 
starts another optimize!

Are you running in SolrCloud mode with multiple replicas and/or multiple 
shards?

If so, SolrCloud does optimize a little differently than standalone 
mode.  It will optimize every core in the entire collection, one at a 
time, regardless of which actual core receives the optimize request.  In 
standalone mode, only the specific core you run the command on will be 
optimized.

Thanks,
Shawn


Re: Optimize question

2018-04-23 Thread Shawn Heisey

On 4/23/2018 11:13 AM, Scott M. wrote:

I recently installed Solr 7.1 and configured it to work with Dovecot for 
full-text searching. It works great but after about 2 days of indexing, I've 
pressed the 'Optimize' button. At that point it had collected about 17 million 
documents and it was taking up about 60-70GB of space.

It completed once and the space dropped down to 30-45GB but since then it 
appears to be doing Optimize again on its own, regularly swelling up the total 
space used to double, then it shrinks again, stays a bit that way then it 
starts another optimize!


Are you running in SolrCloud mode with multiple replicas and/or multiple 
shards?


If so, SolrCloud does optimize a little differently than standalone 
mode.  It will optimize every core in the entire collection, one at a 
time, regardless of which actual core receives the optimize request.  In 
standalone mode, only the specific core you run the command on will be 
optimized.


Thanks,
Shawn



Re: Optimize question

2018-04-23 Thread Erick Erickson
No, it's not "optimizing on its own". At least it better not be.

As far as your index growing after optimize, that's the little
"gotcha" with optimize, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

This is being addressed in the 7.4 time frame (hopefully), see LUCENE-7976.

Best,
Erick

On Mon, Apr 23, 2018 at 10:13 AM, Scott M.  wrote:
> I recently installed Solr 7.1 and configured it to work with Dovecot for 
> full-text searching. It works great but after about 2 days of indexing, I've 
> pressed the 'Optimize' button. At that point it had collected about 17 
> million documents and it was taking up about 60-70GB of space.
>
> It completed once and the space dropped down to 30-45GB but since then it 
> appears to be doing Optimize again on its own, regularly swelling up the 
> total space used to double, then it shrinks again, stays a bit that way then 
> it starts another optimize!
>
> Logs show:
> 4/22/2018, 11:04:22 PM
> WARN false
> DirectUpdateHandler2
> Starting optimize... Reading and rewriting the entire index! Use with 
> care.
> 4/23/2018, 3:18:35 AM
> WARN true
> DirectUpdateHandler2
> Starting optimize... Reading and rewriting the entire index! Use with 
> care.
> 4/23/2018, 7:33:46 AM
> WARN false
> DirectUpdateHandler2
> Starting optimize... Reading and rewriting the entire index! Use with 
> care.
> 4/23/2018, 9:48:32 AM
> WARN false
> DirectUpdateHandler2
> Starting optimize... Reading and rewriting the entire index! Use with 
> care.
> 4/23/2018, 11:25:13 AM
> WARN false
> DirectUpdateHandler2
> Starting optimize... Reading and rewriting the entire index! Use with 
> care.
> 4/23/2018, 1:00:42 PM
> WARN false
> DirectUpdateHandler2
> Starting optimize... Reading and rewriting the entire index! Use with 
> care.
> It's absolutely killing the computer this is running on. Now it just started 
> another run...
>
> In the logs all I see is entries like these, and it doesn't say anywhere 
> optimize=true
>
> 2018-04-23 17:12:31.995 INFO  (qtp947679291-17200) [   x:dovecot] 
> o.a.s.u.DirectUpdateHandler2 start 
> commit{_version_=1598557836536709120,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=true,prepareCommit=false}


Optimize question

2018-04-23 Thread Scott M.
I recently installed Solr 7.1 and configured it to work with Dovecot for 
full-text searching. It works great but after about 2 days of indexing, I've 
pressed the 'Optimize' button. At that point it had collected about 17 million 
documents and it was taking up about 60-70GB of space. 

It completed once and the space dropped down to 30-45GB but since then it 
appears to be doing Optimize again on its own, regularly swelling up the total 
space used to double, then it shrinks again, stays a bit that way then it 
starts another optimize!

Logs show:
4/22/2018, 11:04:22 PM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with 
care.
4/23/2018, 3:18:35 AM
WARN true
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with 
care.
4/23/2018, 7:33:46 AM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with 
care.
4/23/2018, 9:48:32 AM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with 
care.
4/23/2018, 11:25:13 AM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with 
care.
4/23/2018, 1:00:42 PM
WARN false
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with 
care.
It's absolutely killing the computer this is running on. Now it just started 
another run...

In the logs all I see is entries like these, and it doesn't say anywhere 
optimize=true

2018-04-23 17:12:31.995 INFO  (qtp947679291-17200) [   x:dovecot] 
o.a.s.u.DirectUpdateHandler2 start 
commit{_version_=1598557836536709120,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=true,prepareCommit=false}


Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread Shawn Heisey

On 4/23/2018 8:30 AM, msaunier wrote:

I have add debug:

curl
"http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-im
port=true=true"


500588true1DIH/indexation_events.xml

This is looking like a really nasty error that I cannot understand, 
possibly caused by an error in configuration.


Can you share your dataimport handler config (will likely be in 
solrconfig.xml) and the contents of DIH/indexation_events.xml?  There is 
probably a database password in that file, you'll want to redact that.


You should look at solr.log and see if there are other errors happening 
that didn't make it into the response.


Thanks,
Shawn



Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread Shawn Heisey

On 4/23/2018 6:12 AM, msaunier wrote:

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

Solr 6.6 in Cloud.

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

RESULT:


   
 
 Error 404 Not Found
   
   HTTP ERROR 404
 Problem accessing /solr/test_dih. Reason:
   Not Found
   



This looks like an incomplete URL.

What exactly is test-dih?  If it is the name of your collection, then 
you are missing the handler, which is usually "/dataimport". If 
"/test-dih" is the name if your handler, then you are missing the name 
of the core or the collection.


With SolrCloud, it's actually better to direct your request to a 
specific core for DIH, something like collection_shard1_replica1.  If 
you direct it to the collection you never know which core will actually 
end up with the request, and will have a hard time getting the status of 
the import if the status request ends up on a different core than the 
full-import command.


A correct full URL should look something like this:

http://host:port/solr/test_shard1_replica2/dataimport?command=full-import

Looking at later messages, you may have figured this out at least 
partially.  The exception in your second message looks really odd.  (and 
I really have no idea what you are talking about with an overlay)


Thanks,
Shawn



Re: Running an analyzer chain in an update request processor

2018-04-23 Thread Steve Rowe
Hi Walter,

I haven’t seen this before, but it looks like 
https://bugs.java.com/view_bug.do?bug_id=8071775

--
Steve
www.lucidworks.com

> On Apr 20, 2018, at 7:54 PM, Walter Underwood  wrote:
> 
> I’m back.
> 
> I think I’m following the steps in Eric Hatcher’s slides: 
> https://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks
> 
> With a few minor changes, like using getIndexAnalyzer() because getAnalyzer() 
> is gone. And I’ve pulled the subroutine code into the main processAdd 
> function.
> 
> Any ideas about the cause of this error?
> 
> java.lang.ClassCastException: Cannot cast 
> jdk.internal.dynalink.beans.StaticClass to java.lang.Class
>   at 
> java.lang.invoke.MethodHandleImpl.newClassCastException(MethodHandleImpl.java:361)
>   at 
> java.lang.invoke.MethodHandleImpl.castReference(MethodHandleImpl.java:356)
>   at 
> jdk.nashorn.internal.scripts.Script$Recompilation$37$104A$\^eval\_.processAdd(:15)
> 
> This is the code up through line 15:
> 
>// Generate minhashes using the "minhash" analyzer chain
>var analyzer = 
> req.getCore().getLatestSchema().getFieldTypeByName('minhash').getIndexAnalyzer();
>var hashes = [];
>var token_stream = analyzer.tokenStream(null, new 
> java.io.StringReader(question));
>var term_att = 
> token_stream.getAttribute(Packages.org.apache.lucene.analysis.tokenattributes.CharTermAttribute);
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 7, 2018, at 9:50 AM, Walter Underwood  wrote:
>> 
>> As I think more about this, we should have a signature processor that uses 
>> minhash. The MD5 signature processor was really easy to use.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org 
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Apr 7, 2018, at 4:55 AM, Emir Arnautović >> > wrote:
>>> 
>>> Hi Walter,
>>> I did this sample processor for the purpose of having doc values on 
>>> analysed field: https://github.com/od-bits/solr-multivaluefield-processor 
>>>  
>>> >> >
>>> 
>>> (+ related blog: 
>>> http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
>>>  
>>> >> >)
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ 
>>> 
>>> 
>>> 
>>> 
 On 6 Apr 2018, at 23:46, Walter Underwood > wrote:
 
 Is there an easy way to define an analyzer chain in schema.xml then run it 
 in an update request processor?
 
 I want to run a chain ending in the minhash token filter, then take those 
 minhashes, convert them to hex, and put them in a string field. I’d like 
 the values stored.
 
 It seems like this could all work in an update request processor. Grab the 
 text from one field, run it through the chain, format the output tokens 
 and add them to the field for hashes.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org 
 http://observer.wunderwood.org/  (my blog)
 
>>> 
>> 
> 



RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread msaunier
I have add debug:

curl
"http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-im
port=true=true"


500588true1DIH/indexation_events.xmlfull-importjava.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:420)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataI
mportHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143
)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java
:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:
185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141
)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
34)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:
335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
34)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConne
ction.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:
93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceC
onsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(
ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProd
uceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:
671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:5
89)
at java.lang.Thread.run(Thread.java:748)
500


###
And reload config:

curl
"http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-im
port=true=true=reload-config"


500647java.util.Arrays$ArrayList cannot be cast to
java.lang.Stringjava.lang.ClassCastException:
java.util.Arrays$ArrayList cannot be cast to java.lang.String
at
org.apache.solr.handler.dataimport.RequestInfo.init(RequestInfo.java
:52)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataI
mportHandler.java:128)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143
)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java
:226)
   

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread msaunier
I have correct url to : curl
http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-imp
ort

And change overlay config
"/configs/arguments_test/DIH/indexation_events.xml" to "
DIH/indexation_events.xml"

But I have a new error:

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to PropertyWriter implementation:ZKPropertiesWriter
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:330)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport
er.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:326)
... 4 more

Cordialement,





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] 
Envoyé : lundi 23 avril 2018 14:12
À : solr-user@lucene.apache.org
Objet : SolrCloud DIH (Data Import Handler) MySQL 404

Hello,

 

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

 

Solr 6.6 in Cloud.

 

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

 

RESULT:



  



Error 404 Not Found

  

  HTTP ERROR 404

Problem accessing /solr/test_dih. Reason:

  Not Found

  



 

 

##

CONFIG:

1.  I have create with the command the .system collection

2.  I have post in the blob the DataImportHandler jar file and the MySQL
connector jar

3.  I have add data-import-handler and mysql-connector-java runtimeLib
on the configoverlay.json file with the API

4.  I have create the DIH folder on the cloud with zkcli.sh script

5.  I have push with zkcli the DIH .xml configuration file

 

CONFIGOVERLAY CONTENT :

{

  "runtimeLib":{

"mysql-connector-java":{

  "name":"mysql-connector-java",

  "version":1},

"data-import-handler":{

  "name":"data-import-handler",

  "version":1}},

  "requestHandler":{"/test_dih":{

  "name":"/test_dih",

  "class":"org.apache.solr.handler.dataimport.DataImportHandler",

  "runtimeLib":true,

  "version":1,

 
"defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}}

}

 

 

Thanks for your help




SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread msaunier
Hello,

 

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

 

Solr 6.6 in Cloud.

 

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

 

RESULT:



  



Error 404 Not Found

  

  HTTP ERROR 404

Problem accessing /solr/test_dih. Reason:

  Not Found

  



 

 

##

CONFIG:

1.  I have create with the command the .system collection

2.  I have post in the blob the DataImportHandler jar file and the MySQL
connector jar

3.  I have add data-import-handler and mysql-connector-java runtimeLib
on the configoverlay.json file with the API

4.  I have create the DIH folder on the cloud with zkcli.sh script

5.  I have push with zkcli the DIH .xml configuration file

 

CONFIGOVERLAY CONTENT :

{

  "runtimeLib":{

"mysql-connector-java":{

  "name":"mysql-connector-java",

  "version":1},

"data-import-handler":{

  "name":"data-import-handler",

  "version":1}},

  "requestHandler":{"/test_dih":{

  "name":"/test_dih",

  "class":"org.apache.solr.handler.dataimport.DataImportHandler",

  "runtimeLib":true,

  "version":1,

 
"defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}}

}

 

 

Thanks for your help



Regarding Solr Admin "LoadTermInfo" section

2018-04-23 Thread Bharat Mishra
I am facing one issue with regard to "Load term info" section .
Using solr update collection api
*http://localhost:8983/solr/[my_core_name]/update?commit=true=
[custom_query]*
I have deleted records . When i search records in solr there are no records
that is verified but in "Load term info" inside solr admin section it is
still showing the count of those deleted records against the field name. So
my concern is are the records hard deleted or solr just sets a flag for
them so that they are not in search results but they still reside on disk.

-- 

*Bharat Mohan Mishra*
SSE
HighQ India Private Limited


versions of documentation: suggestion for improvement

2018-04-23 Thread Arturas Mazeika
Hi Solr-Team,

If I google for specific features for solr, I usually get redirected to 6.6
version of the documentation, like this one:

https://lucene.apache.org/solr/guide/6_6/overview-of-documents-fields-and-schema-design.html

Since I am playing with 7.2 version of solr, I almost always need to change
this manually through to:

https://lucene.apache.org/solr/guide/7_2/overview-of-documents-fields-and-schema-design.html

(by clicking on the url, going to the number, and replacing two
characters). This is somewhat cumbersome (especially after the first dozen
of changes in urls. Suggestion:

(1) Would it make sense to include other versions of the document as urls
on the page? See, e.g., the following documentation of postgres, where each
page has a pointer to the same page in different versions:

https://www.postgresql.org/docs/9.6/static/sql-createtable.html

(especially "This page in other versions: 9.3
 / 9.4
 / 9.5
 / *9.6* /
current
 (10
)" line on
the page)

(2) Would it make sense in addition to include "current", pointing to the
latest current release?

This would help to find solr relevant infos from search engines faster.

Cheers,
Arturas