Re: converting string to solr.TextField

2020-10-17 Thread Shawn Heisey

On 10/17/2020 6:23 AM, Vinay Rajput wrote:

That said, one more time I want to come back to the same question: why
solr/lucene can not handle this when we are updating all the documents?
Let's take a couple of examples :-

*Ex 1:*
Let's say I have only 10 documents in my index and all of them are in a
single segment (Segment 1). Now, I change the schema (update field type in
this case) and reindex all of them.
This is what (according to me) should happen internally :-

1st update req : Solr will mark 1st doc as deleted and index it again
(might run the analyser chain based on config)
2nd update req : Solr will mark 2st doc as deleted and index it again
(might run the analyser chain based on config)
And so on..
based on autoSoftCommit/autoCommit configuration, all new documents will be
indexed and probably flushed to disk as part of new segment (Segment 2)





*Ex 2:*
I see that it can be an issue if we think about reindexing millions of
docs. Because in that case, merging can be triggered when indexing is half
way through, and since there are some live docs in the old segment (with
old cofig), things will blow up. Please correct me if I am wrong.


If you could guarantee a few things, you could be sure this will work. 
But it's a serious long shot.


The change in schema might be such that when Lucene tries to merge them, 
it fails because the data in the old segments is incompatible with the 
new segments.  If that happens, then you're sunk ... it won't work at all.


If the merges of old and new segments are successful, then you would 
have to optimize the index after you're done indexing to be SURE there 
were no old documents remaining.  Lucene calls that operation 
"ForceMerge".  This operation is disruptive and can take a very long time.


You would also have to be sure there was no query activity until the 
update/merge is completely done.  Which probably means that you'd want 
to work on a copy of the index in another collection.  And if you're 
going to do that, you might as well start indexing from scratch into a 
new/empty collection.  That would also allow you to continue querying 
the old collection until the new one was ready.


Thanks,
Shawn


Re: converting string to solr.TextField

2020-10-17 Thread Erick Erickson
Did you read the long explanation in this thread already about
segment merging? If so, can you ask specific questions about
the information in those?

Best,
Erick

> On Oct 17, 2020, at 8:23 AM, Vinay Rajput  wrote:
> 
> Sorry to jump into this discussion. I also get confused whenever I see this
> strange Solr/Lucene behaviour. Probably, As @Erick said in his last year
> talk, this is how it has been designed to avoid many problems that are
> hard/impossible to solve.
> 
> That said, one more time I want to come back to the same question: why
> solr/lucene can not handle this when we are updating all the documents?
> Let's take a couple of examples :-
> 
> *Ex 1:*
> Let's say I have only 10 documents in my index and all of them are in a
> single segment (Segment 1). Now, I change the schema (update field type in
> this case) and reindex all of them.
> This is what (according to me) should happen internally :-
> 
> 1st update req : Solr will mark 1st doc as deleted and index it again
> (might run the analyser chain based on config)
> 2nd update req : Solr will mark 2st doc as deleted and index it again
> (might run the analyser chain based on config)
> And so on..
> based on autoSoftCommit/autoCommit configuration, all new documents will be
> indexed and probably flushed to disk as part of new segment (Segment 2)
> 
> 
> Now, whenever segment merging happens (during commit or later in time),
> lucene will create a new segment (Segment 3) can discard all the docs
> present in segment 1 as there are no live docs in it. And there would *NOT*
> be any situation to decide whether to choose the old config or new config
> as there is not even a single live document with the old config. Isn't it?
> 
> *Ex 2:*
> I see that it can be an issue if we think about reindexing millions of
> docs. Because in that case, merging can be triggered when indexing is half
> way through, and since there are some live docs in the old segment (with
> old cofig), things will blow up. Please correct me if I am wrong.
> 
> I am *NOT* a Solr/Lucene expert and just started learning the ways things
> are working internally. In the above example, I can be wrong at many
> places. Can someone confirm if scenarios like Ex-2 are the reasons behind
> the fact that even re-indexing all documents doesn't help if some
> incompatible schema changes are done?  Any other insight would also be
> helpful.
> 
> Thanks,
> Vinay
> 
> On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey  wrote:
> 
>> On 10/16/2020 2:36 PM, David Hastings wrote:
>>> sorry, i was thinking just using the
>>> *:*
>>> method for clearing the index would leave them still
>> 
>> In theory, if you delete all documents at the Solr level, Lucene will
>> delete all the segment files on the next commit, because they are empty.
>>  I have not confirmed with testing whether this actually happens.
>> 
>> It is far safer to use a new index as Erick has said, or to delete the
>> index directories completely and restart Solr ... so you KNOW the index
>> has nothing in it.
>> 
>> Thanks,
>> Shawn
>> 



Re: converting string to solr.TextField

2020-10-17 Thread Walter Underwood
Because Solr is not updating documents. Solr is adding to indexes
of fields. You cannot add a TextField document to a StringField index.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 17, 2020, at 5:23 AM, Vinay Rajput  wrote:
> 
> Sorry to jump into this discussion. I also get confused whenever I see this
> strange Solr/Lucene behaviour. Probably, As @Erick said in his last year
> talk, this is how it has been designed to avoid many problems that are
> hard/impossible to solve.
> 
> That said, one more time I want to come back to the same question: why
> solr/lucene can not handle this when we are updating all the documents?
> Let's take a couple of examples :-
> 
> *Ex 1:*
> Let's say I have only 10 documents in my index and all of them are in a
> single segment (Segment 1). Now, I change the schema (update field type in
> this case) and reindex all of them.
> This is what (according to me) should happen internally :-
> 
> 1st update req : Solr will mark 1st doc as deleted and index it again
> (might run the analyser chain based on config)
> 2nd update req : Solr will mark 2st doc as deleted and index it again
> (might run the analyser chain based on config)
> And so on..
> based on autoSoftCommit/autoCommit configuration, all new documents will be
> indexed and probably flushed to disk as part of new segment (Segment 2)
> 
> 
> Now, whenever segment merging happens (during commit or later in time),
> lucene will create a new segment (Segment 3) can discard all the docs
> present in segment 1 as there are no live docs in it. And there would *NOT*
> be any situation to decide whether to choose the old config or new config
> as there is not even a single live document with the old config. Isn't it?
> 
> *Ex 2:*
> I see that it can be an issue if we think about reindexing millions of
> docs. Because in that case, merging can be triggered when indexing is half
> way through, and since there are some live docs in the old segment (with
> old cofig), things will blow up. Please correct me if I am wrong.
> 
> I am *NOT* a Solr/Lucene expert and just started learning the ways things
> are working internally. In the above example, I can be wrong at many
> places. Can someone confirm if scenarios like Ex-2 are the reasons behind
> the fact that even re-indexing all documents doesn't help if some
> incompatible schema changes are done?  Any other insight would also be
> helpful.
> 
> Thanks,
> Vinay
> 
> On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey  wrote:
> 
>> On 10/16/2020 2:36 PM, David Hastings wrote:
>>> sorry, i was thinking just using the
>>> *:*
>>> method for clearing the index would leave them still
>> 
>> In theory, if you delete all documents at the Solr level, Lucene will
>> delete all the segment files on the next commit, because they are empty.
>>  I have not confirmed with testing whether this actually happens.
>> 
>> It is far safer to use a new index as Erick has said, or to delete the
>> index directories completely and restart Solr ... so you KNOW the index
>> has nothing in it.
>> 
>> Thanks,
>> Shawn
>> 



Re: Index Replication Failure

2020-10-17 Thread Parshant Kumar
*Architecture is master->repeater->slave servers in hierarchy.*

*One of the Below exceptions are occuring whenever replication fails.*

1)WARN : Error in fetching file: _4rnu_t.liv (downloaded 0 of 11505507
bytes)
java.io.EOFException: Unexpected end of ZLIB input stream
at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
at
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:139)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1443)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409)

2)
WARN : Error getting file length for [segments_568]
java.nio.file.NoSuchFileException:
/data/solr/search/application/core-conf/im-search/data/index.20200711012319226/segments_568
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:615)
at
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:588)
at
org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:335)


3)
WARN : Error in fetching file: _4nji.nvd (downloaded 507510784 of 555377795
bytes)
org.apache.http.MalformedChunkCodingException: CRLF expected at end of chunk
at
org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:255)
at
org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:128)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1458)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1390)
at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:872)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:438)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)


*Replication configuration of master,repeater,slave's is given below:*

 

${enable.master:false}
commit
startup
00:00:10



*Commit Configuration master,repeater,slave's is given below :*

 
10false






On Sat, Oct 17, 2020 at 5:12 PM Erick Erickson 
wrote:

> None of your images made it through the mail server. You’ll
> have to put them somewhere and provide a link.
>
> > On Oct 17, 2020, at 5:17 AM, Parshant Kumar <
> parshant.ku...@indiamart.com.INVALID> wrote:
> >
> > Architecture image: If not visible in previous mail
> >
> >
> >
> >
> > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar <
> parshant.ku...@indiamart.com> wrote:
> > Hi all,
> >
> > We are having solr architecture as below.
> >
> >
> >
> > We are facing the frequent replication failure between master to
> repeater server  as well as between repeater  to slave servers.
> > On checking logs found every time one of the below  exceptions occurred
> whenever the replication have failed.
> >
> > 1)
> >
> > 2)
> >
> >
> > 3)
> >
> >
> > The replication configuration of master,repeater,slave's is given below:
> >
> >
> >
> > Commit Configuration master,repeater,slave's is given below :
> >
> >
> >
> > Replication between master and repeater oc

Re: converting string to solr.TextField

2020-10-17 Thread Vinay Rajput
Sorry to jump into this discussion. I also get confused whenever I see this
strange Solr/Lucene behaviour. Probably, As @Erick said in his last year
talk, this is how it has been designed to avoid many problems that are
hard/impossible to solve.

That said, one more time I want to come back to the same question: why
solr/lucene can not handle this when we are updating all the documents?
Let's take a couple of examples :-

*Ex 1:*
Let's say I have only 10 documents in my index and all of them are in a
single segment (Segment 1). Now, I change the schema (update field type in
this case) and reindex all of them.
This is what (according to me) should happen internally :-

1st update req : Solr will mark 1st doc as deleted and index it again
(might run the analyser chain based on config)
2nd update req : Solr will mark 2st doc as deleted and index it again
(might run the analyser chain based on config)
And so on..
based on autoSoftCommit/autoCommit configuration, all new documents will be
indexed and probably flushed to disk as part of new segment (Segment 2)


Now, whenever segment merging happens (during commit or later in time),
lucene will create a new segment (Segment 3) can discard all the docs
present in segment 1 as there are no live docs in it. And there would *NOT*
be any situation to decide whether to choose the old config or new config
as there is not even a single live document with the old config. Isn't it?

*Ex 2:*
I see that it can be an issue if we think about reindexing millions of
docs. Because in that case, merging can be triggered when indexing is half
way through, and since there are some live docs in the old segment (with
old cofig), things will blow up. Please correct me if I am wrong.

I am *NOT* a Solr/Lucene expert and just started learning the ways things
are working internally. In the above example, I can be wrong at many
places. Can someone confirm if scenarios like Ex-2 are the reasons behind
the fact that even re-indexing all documents doesn't help if some
incompatible schema changes are done?  Any other insight would also be
helpful.

Thanks,
Vinay

On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey  wrote:

> On 10/16/2020 2:36 PM, David Hastings wrote:
> > sorry, i was thinking just using the
> > *:*
> > method for clearing the index would leave them still
>
> In theory, if you delete all documents at the Solr level, Lucene will
> delete all the segment files on the next commit, because they are empty.
>   I have not confirmed with testing whether this actually happens.
>
> It is far safer to use a new index as Erick has said, or to delete the
> index directories completely and restart Solr ... so you KNOW the index
> has nothing in it.
>
> Thanks,
> Shawn
>


Re: Index Replication Failure

2020-10-17 Thread Erick Erickson
None of your images made it through the mail server. You’ll
have to put them somewhere and provide a link.

> On Oct 17, 2020, at 5:17 AM, Parshant Kumar 
>  wrote:
> 
> Architecture image: If not visible in previous mail
> 
> 
> 
> 
> On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar  
> wrote:
> Hi all,
> 
> We are having solr architecture as below.
> 
> 
> 
> We are facing the frequent replication failure between master to repeater 
> server  as well as between repeater  to slave servers.
> On checking logs found every time one of the below  exceptions occurred 
> whenever the replication have failed. 
> 
> 1)
> 
> 2)
> 
> 
> 3)
> 
> 
> The replication configuration of master,repeater,slave's is given below:
> 
> 
> 
> Commit Configuration master,repeater,slave's is given below :
> 
> 
> 
> Replication between master and repeater occurs every 10 mins.
> Replication between repeater and slave servers occurs every 15 mins between 
> 4-7 am and after that in every 3 hours.
> 
> Thanks,
> Parshant Kumar
> 
> 
> 
> 
> 
> 
> 



Need help in understanding the below error message when running solr-exporter

2020-10-17 Thread yaswanth kumar
Using Solr 8.2; Zoo 3.4; Solr mode: Cloud with multiple collections; Basic
Authentication: Enabled

I am trying to run the

export JAVA_OPTS="-Djavax.net.ssl.trustStore=etc/solr-keystore.jks
-Djavax.net.ssl.trustStorePassword=solrssl
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
-Dbasicauth=solrrocks:"

export
CLASSPATH_PREFIX="../../server/solr-webapp/webapp/WEB-INF/lib/commons-codec-1.11.jar"

/bin/solr-exporter -p 8085 -z localhost:2181/solr -f
./conf/solr-exporter-config.xml -n 16

and seeing these below messages and on the grafana solr dashboard I do see
panels coming in but data is not populating on them.

Can someone help me if I am missing something interms of configuration?

WARN  - 2020-10-17 11:17:59.687; org.apache.solr.prometheus.scraper.Async;
Error occurred during metrics collection =>
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.lang.NullPointerException
at
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.lang.NullPointerException
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
~[?:?]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
~[?:?]
at
org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.scraper.Async$$Lambda$190/.accept(Unknown
Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
[?:?]
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
[?:?]
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
[?:?]
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:497) [?:?]
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:487)
[?:?]
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[?:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[?:?]
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239) [?:?]
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?]
at
org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.scraper.Async$$Lambda$165/.apply(Unknown
Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
[?:?]
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
[?:?]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
[?:?]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
Source) [solr-solrj-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:92)
~[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$163/.get(Unknown
Source) ~[?:?]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
~[?:?]
... 5 more
Caused by: java.lang.NullPointerException
at
org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:112)
~[solr-prometheus-exporter-8.2.0.jar:8.2.0
31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
at
org.apache.solr.prometheus.scra

Re: Index Replication Failure

2020-10-17 Thread Parshant Kumar
Architecture image: If not visible in previous mail

[image: image.png]


On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar 
wrote:

> Hi all,
>
> We are having solr architecture as below.
>
>
>
> *We are facing the frequent replication failure between master to repeater
> server  as well as between repeater  to slave servers.*
> On checking logs found every time one of the below  exceptions occurred
> whenever the replication have failed.
>
> 1)
> [image: image.png]
> 2)
> [image: image.png]
>
> 3)
> [image: image.png]
>
> The replication configuration of master,repeater,slave's is given below:
>
> [image: image.png]
>
> Commit Configuration master,repeater,slave's is given below :
>
> [image: image.png]
>
> Replication between master and repeater occurs every 10 mins.
> Replication between repeater and slave servers occurs every 15 mins
> between 4-7 am and after that in every 3 hours.
>
> Thanks,
> Parshant Kumar
>
>
>
>
>
>

-- 



Index Replication Failure

2020-10-17 Thread Parshant Kumar
Hi all,

We are having solr architecture as below.



*We are facing the frequent replication failure between master to repeater
server  as well as between repeater  to slave servers.*
On checking logs found every time one of the below  exceptions occurred
whenever the replication have failed.

1)
[image: image.png]
2)
[image: image.png]

3)
[image: image.png]

The replication configuration of master,repeater,slave's is given below:

[image: image.png]

Commit Configuration master,repeater,slave's is given below :

[image: image.png]

Replication between master and repeater occurs every 10 mins.
Replication between repeater and slave servers occurs every 15 mins between
4-7 am and after that in every 3 hours.

Thanks,
Parshant Kumar

--