Re: converting string to solr.TextField
On 10/17/2020 6:23 AM, Vinay Rajput wrote: That said, one more time I want to come back to the same question: why solr/lucene can not handle this when we are updating all the documents? Let's take a couple of examples :- *Ex 1:* Let's say I have only 10 documents in my index and all of them are in a single segment (Segment 1). Now, I change the schema (update field type in this case) and reindex all of them. This is what (according to me) should happen internally :- 1st update req : Solr will mark 1st doc as deleted and index it again (might run the analyser chain based on config) 2nd update req : Solr will mark 2st doc as deleted and index it again (might run the analyser chain based on config) And so on.. based on autoSoftCommit/autoCommit configuration, all new documents will be indexed and probably flushed to disk as part of new segment (Segment 2) *Ex 2:* I see that it can be an issue if we think about reindexing millions of docs. Because in that case, merging can be triggered when indexing is half way through, and since there are some live docs in the old segment (with old cofig), things will blow up. Please correct me if I am wrong. If you could guarantee a few things, you could be sure this will work. But it's a serious long shot. The change in schema might be such that when Lucene tries to merge them, it fails because the data in the old segments is incompatible with the new segments. If that happens, then you're sunk ... it won't work at all. If the merges of old and new segments are successful, then you would have to optimize the index after you're done indexing to be SURE there were no old documents remaining. Lucene calls that operation "ForceMerge". This operation is disruptive and can take a very long time. You would also have to be sure there was no query activity until the update/merge is completely done. Which probably means that you'd want to work on a copy of the index in another collection. And if you're going to do that, you might as well start indexing from scratch into a new/empty collection. That would also allow you to continue querying the old collection until the new one was ready. Thanks, Shawn
Re: converting string to solr.TextField
Did you read the long explanation in this thread already about segment merging? If so, can you ask specific questions about the information in those? Best, Erick > On Oct 17, 2020, at 8:23 AM, Vinay Rajput wrote: > > Sorry to jump into this discussion. I also get confused whenever I see this > strange Solr/Lucene behaviour. Probably, As @Erick said in his last year > talk, this is how it has been designed to avoid many problems that are > hard/impossible to solve. > > That said, one more time I want to come back to the same question: why > solr/lucene can not handle this when we are updating all the documents? > Let's take a couple of examples :- > > *Ex 1:* > Let's say I have only 10 documents in my index and all of them are in a > single segment (Segment 1). Now, I change the schema (update field type in > this case) and reindex all of them. > This is what (according to me) should happen internally :- > > 1st update req : Solr will mark 1st doc as deleted and index it again > (might run the analyser chain based on config) > 2nd update req : Solr will mark 2st doc as deleted and index it again > (might run the analyser chain based on config) > And so on.. > based on autoSoftCommit/autoCommit configuration, all new documents will be > indexed and probably flushed to disk as part of new segment (Segment 2) > > > Now, whenever segment merging happens (during commit or later in time), > lucene will create a new segment (Segment 3) can discard all the docs > present in segment 1 as there are no live docs in it. And there would *NOT* > be any situation to decide whether to choose the old config or new config > as there is not even a single live document with the old config. Isn't it? > > *Ex 2:* > I see that it can be an issue if we think about reindexing millions of > docs. Because in that case, merging can be triggered when indexing is half > way through, and since there are some live docs in the old segment (with > old cofig), things will blow up. Please correct me if I am wrong. > > I am *NOT* a Solr/Lucene expert and just started learning the ways things > are working internally. In the above example, I can be wrong at many > places. Can someone confirm if scenarios like Ex-2 are the reasons behind > the fact that even re-indexing all documents doesn't help if some > incompatible schema changes are done? Any other insight would also be > helpful. > > Thanks, > Vinay > > On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey wrote: > >> On 10/16/2020 2:36 PM, David Hastings wrote: >>> sorry, i was thinking just using the >>> *:* >>> method for clearing the index would leave them still >> >> In theory, if you delete all documents at the Solr level, Lucene will >> delete all the segment files on the next commit, because they are empty. >> I have not confirmed with testing whether this actually happens. >> >> It is far safer to use a new index as Erick has said, or to delete the >> index directories completely and restart Solr ... so you KNOW the index >> has nothing in it. >> >> Thanks, >> Shawn >>
Re: converting string to solr.TextField
Because Solr is not updating documents. Solr is adding to indexes of fields. You cannot add a TextField document to a StringField index. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 17, 2020, at 5:23 AM, Vinay Rajput wrote: > > Sorry to jump into this discussion. I also get confused whenever I see this > strange Solr/Lucene behaviour. Probably, As @Erick said in his last year > talk, this is how it has been designed to avoid many problems that are > hard/impossible to solve. > > That said, one more time I want to come back to the same question: why > solr/lucene can not handle this when we are updating all the documents? > Let's take a couple of examples :- > > *Ex 1:* > Let's say I have only 10 documents in my index and all of them are in a > single segment (Segment 1). Now, I change the schema (update field type in > this case) and reindex all of them. > This is what (according to me) should happen internally :- > > 1st update req : Solr will mark 1st doc as deleted and index it again > (might run the analyser chain based on config) > 2nd update req : Solr will mark 2st doc as deleted and index it again > (might run the analyser chain based on config) > And so on.. > based on autoSoftCommit/autoCommit configuration, all new documents will be > indexed and probably flushed to disk as part of new segment (Segment 2) > > > Now, whenever segment merging happens (during commit or later in time), > lucene will create a new segment (Segment 3) can discard all the docs > present in segment 1 as there are no live docs in it. And there would *NOT* > be any situation to decide whether to choose the old config or new config > as there is not even a single live document with the old config. Isn't it? > > *Ex 2:* > I see that it can be an issue if we think about reindexing millions of > docs. Because in that case, merging can be triggered when indexing is half > way through, and since there are some live docs in the old segment (with > old cofig), things will blow up. Please correct me if I am wrong. > > I am *NOT* a Solr/Lucene expert and just started learning the ways things > are working internally. In the above example, I can be wrong at many > places. Can someone confirm if scenarios like Ex-2 are the reasons behind > the fact that even re-indexing all documents doesn't help if some > incompatible schema changes are done? Any other insight would also be > helpful. > > Thanks, > Vinay > > On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey wrote: > >> On 10/16/2020 2:36 PM, David Hastings wrote: >>> sorry, i was thinking just using the >>> *:* >>> method for clearing the index would leave them still >> >> In theory, if you delete all documents at the Solr level, Lucene will >> delete all the segment files on the next commit, because they are empty. >> I have not confirmed with testing whether this actually happens. >> >> It is far safer to use a new index as Erick has said, or to delete the >> index directories completely and restart Solr ... so you KNOW the index >> has nothing in it. >> >> Thanks, >> Shawn >>
Re: Index Replication Failure
*Architecture is master->repeater->slave servers in hierarchy.* *One of the Below exceptions are occuring whenever replication fails.* 1)WARN : Error in fetching file: _4rnu_t.liv (downloaded 0 of 11505507 bytes) java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79) at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88) at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:139) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1443) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409) 2) WARN : Error getting file length for [segments_568] java.nio.file.NoSuchFileException: /data/solr/search/application/core-conf/im-search/data/index.20200711012319226/segments_568 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) at java.nio.file.Files.readAttributes(Files.java:1737) at java.nio.file.Files.size(Files.java:2332) at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) at org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:615) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:588) at org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:335) 3) WARN : Error in fetching file: _4nji.nvd (downloaded 507510784 of 555377795 bytes) org.apache.http.MalformedChunkCodingException: CRLF expected at end of chunk at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:255) at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227) at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79) at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:128) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1458) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409) at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1390) at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:872) at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:438) at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254) *Replication configuration of master,repeater,slave's is given below:* ${enable.master:false} commit startup 00:00:10 *Commit Configuration master,repeater,slave's is given below :* 10false On Sat, Oct 17, 2020 at 5:12 PM Erick Erickson wrote: > None of your images made it through the mail server. You’ll > have to put them somewhere and provide a link. > > > On Oct 17, 2020, at 5:17 AM, Parshant Kumar < > parshant.ku...@indiamart.com.INVALID> wrote: > > > > Architecture image: If not visible in previous mail > > > > > > > > > > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar < > parshant.ku...@indiamart.com> wrote: > > Hi all, > > > > We are having solr architecture as below. > > > > > > > > We are facing the frequent replication failure between master to > repeater server as well as between repeater to slave servers. > > On checking logs found every time one of the below exceptions occurred > whenever the replication have failed. > > > > 1) > > > > 2) > > > > > > 3) > > > > > > The replication configuration of master,repeater,slave's is given below: > > > > > > > > Commit Configuration master,repeater,slave's is given below : > > > > > > > > Replication between master and repeater oc
Re: converting string to solr.TextField
Sorry to jump into this discussion. I also get confused whenever I see this strange Solr/Lucene behaviour. Probably, As @Erick said in his last year talk, this is how it has been designed to avoid many problems that are hard/impossible to solve. That said, one more time I want to come back to the same question: why solr/lucene can not handle this when we are updating all the documents? Let's take a couple of examples :- *Ex 1:* Let's say I have only 10 documents in my index and all of them are in a single segment (Segment 1). Now, I change the schema (update field type in this case) and reindex all of them. This is what (according to me) should happen internally :- 1st update req : Solr will mark 1st doc as deleted and index it again (might run the analyser chain based on config) 2nd update req : Solr will mark 2st doc as deleted and index it again (might run the analyser chain based on config) And so on.. based on autoSoftCommit/autoCommit configuration, all new documents will be indexed and probably flushed to disk as part of new segment (Segment 2) Now, whenever segment merging happens (during commit or later in time), lucene will create a new segment (Segment 3) can discard all the docs present in segment 1 as there are no live docs in it. And there would *NOT* be any situation to decide whether to choose the old config or new config as there is not even a single live document with the old config. Isn't it? *Ex 2:* I see that it can be an issue if we think about reindexing millions of docs. Because in that case, merging can be triggered when indexing is half way through, and since there are some live docs in the old segment (with old cofig), things will blow up. Please correct me if I am wrong. I am *NOT* a Solr/Lucene expert and just started learning the ways things are working internally. In the above example, I can be wrong at many places. Can someone confirm if scenarios like Ex-2 are the reasons behind the fact that even re-indexing all documents doesn't help if some incompatible schema changes are done? Any other insight would also be helpful. Thanks, Vinay On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey wrote: > On 10/16/2020 2:36 PM, David Hastings wrote: > > sorry, i was thinking just using the > > *:* > > method for clearing the index would leave them still > > In theory, if you delete all documents at the Solr level, Lucene will > delete all the segment files on the next commit, because they are empty. > I have not confirmed with testing whether this actually happens. > > It is far safer to use a new index as Erick has said, or to delete the > index directories completely and restart Solr ... so you KNOW the index > has nothing in it. > > Thanks, > Shawn >
Re: Index Replication Failure
None of your images made it through the mail server. You’ll have to put them somewhere and provide a link. > On Oct 17, 2020, at 5:17 AM, Parshant Kumar > wrote: > > Architecture image: If not visible in previous mail > > > > > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar > wrote: > Hi all, > > We are having solr architecture as below. > > > > We are facing the frequent replication failure between master to repeater > server as well as between repeater to slave servers. > On checking logs found every time one of the below exceptions occurred > whenever the replication have failed. > > 1) > > 2) > > > 3) > > > The replication configuration of master,repeater,slave's is given below: > > > > Commit Configuration master,repeater,slave's is given below : > > > > Replication between master and repeater occurs every 10 mins. > Replication between repeater and slave servers occurs every 15 mins between > 4-7 am and after that in every 3 hours. > > Thanks, > Parshant Kumar > > > > > > >
Need help in understanding the below error message when running solr-exporter
Using Solr 8.2; Zoo 3.4; Solr mode: Cloud with multiple collections; Basic Authentication: Enabled I am trying to run the export JAVA_OPTS="-Djavax.net.ssl.trustStore=etc/solr-keystore.jks -Djavax.net.ssl.trustStorePassword=solrssl -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory -Dbasicauth=solrrocks:" export CLASSPATH_PREFIX="../../server/solr-webapp/webapp/WEB-INF/lib/commons-codec-1.11.jar" /bin/solr-exporter -p 8085 -z localhost:2181/solr -f ./conf/solr-exporter-config.xml -n 16 and seeing these below messages and on the grafana solr dashboard I do see panels coming in but data is not populating on them. Can someone help me if I am missing something interms of configuration? WARN - 2020-10-17 11:17:59.687; org.apache.solr.prometheus.scraper.Async; Error occurred during metrics collection => java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] at org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45) [solr-prometheus-exporter-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] at org.apache.solr.prometheus.scraper.Async$$Lambda$190/.accept(Unknown Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) [?:?] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) [?:?] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654) [?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:497) [?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:487) [?:?] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) [?:?] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) [?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239) [?:?] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?] at org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43) [solr-prometheus-exporter-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] at org.apache.solr.prometheus.scraper.Async$$Lambda$165/.apply(Unknown Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) [?:?] at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) [?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) [?:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705) [?:?] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) [solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown Source) [solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:92) ~[solr-prometheus-exporter-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] at org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$163/.get(Unknown Source) ~[?:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?] ... 5 more Caused by: java.lang.NullPointerException at org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:112) ~[solr-prometheus-exporter-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] at org.apache.solr.prometheus.scra
Re: Index Replication Failure
Architecture image: If not visible in previous mail [image: image.png] On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar wrote: > Hi all, > > We are having solr architecture as below. > > > > *We are facing the frequent replication failure between master to repeater > server as well as between repeater to slave servers.* > On checking logs found every time one of the below exceptions occurred > whenever the replication have failed. > > 1) > [image: image.png] > 2) > [image: image.png] > > 3) > [image: image.png] > > The replication configuration of master,repeater,slave's is given below: > > [image: image.png] > > Commit Configuration master,repeater,slave's is given below : > > [image: image.png] > > Replication between master and repeater occurs every 10 mins. > Replication between repeater and slave servers occurs every 15 mins > between 4-7 am and after that in every 3 hours. > > Thanks, > Parshant Kumar > > > > > > --
Index Replication Failure
Hi all, We are having solr architecture as below. *We are facing the frequent replication failure between master to repeater server as well as between repeater to slave servers.* On checking logs found every time one of the below exceptions occurred whenever the replication have failed. 1) [image: image.png] 2) [image: image.png] 3) [image: image.png] The replication configuration of master,repeater,slave's is given below: [image: image.png] Commit Configuration master,repeater,slave's is given below : [image: image.png] Replication between master and repeater occurs every 10 mins. Replication between repeater and slave servers occurs every 15 mins between 4-7 am and after that in every 3 hours. Thanks, Parshant Kumar --