Searching on multiple cores using MultiSearcher
Hi, At lucene level we have MultiSearcher to search a few cores at the same time with same query, at solr level can we perform such search (if using same config/schema)? Here I donot mean to search across shards of the same collection but independent collections? Thanks very much for helps, Lisheng
Solr 4.3 open a lot more files than solr 3.6
Hi, After upgrading solr from 3.6 to 4.3, we found that solr opened a lot more files compared to solr 3.6 (when core is open). Since we have many cores (more than 2K and still grow), we would like to reduce the number of open files. We already used shareSchema and sharedLib, we also shared SolrConfig across all cores, we also commented out autoSoftCommit in solrconfig.xml. In solr 3.6, it seems that indexWriter is opened only if indexing request comes and immediately closed after request is done, but in solr 4.3, IndexWriter kept open, is there an easy way to go back to 3.6 behavior (we donot need to use Near RealTime Search), can we change code to disable keeping IndexWriter open (if no better way)? Any guidance to reduce open files would be very helpful? Thanks very much for helps, Lisheng
Usage of "luceneMatchVersion" when upgrading from solr 3.6 to solr 4.3
Hi, We are upgrading solr from 3.6 to 4.3, but we have a large amount of indexed data and could not afford to to reindex all once. We wish solr 4.3 could do the following: 1/ still able to search on solr 3.6 indexed data 2/ whenever indexing new document, convert to 4.3 format (may not happen all once) In this case, should we use LUCENE_36 or LUCENE_43 for luceneMatchVersion (it is suggested that we should reindex all data if using LUCENE_43, so I think we should use LUCENE_36, since we cannot reindex all once, true)? Thanks very much for helps, Lisheng
RE: solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper
Thanks very much for all the helps! -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, July 12, 2013 7:31 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper On 7/12/2013 7:29 AM, Zhang, Lisheng wrote: > Sorry I might not have asked clearly, our issue is that we have > a few thousand collections (can be much more), so running that > command is rather tedius, is there a simpler way (all collections > share same schema/config)? When you create each collection with the Collections API (http calls), you tell it the name of a config set stored in zookeeper. You can give all your collections the same config set if you like. If you manually create collections with the CoreAdmin API instead, you must use the zkcli script included in Solr to link the collection to the config set, which can be done either before or after the collection is created. The zkcli script provides some automation for the java command that you were given by Furkan. Thanks, Shawn
RE: solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper
Sorry I might not have asked clearly, our issue is that we have a few thousand collections (can be much more), so running that command is rather tedius, is there a simpler way (all collections share same schema/config)? Thanks very much for helps, Lisheng -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Friday, July 12, 2013 1:17 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper If you have one collection you just need to define hostnames of Zookeeper ensembles and run that command once. 2013/7/11 Zhang, Lisheng > Hi, > > We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to > 4.3.0), in WIKI page > for solrCloud in Tomcat: > > http://wiki.apache.org/solr/SolrCloudTomcat > > we need to link each collection explicitly: > > /// > 8) Link uploaded config with target collection > java -classpath .:/home/myuser/solr-war-lib/* org.apache.solr.cloud.ZkCLI > -cmd linkconfig -collection mycollection -confname ... > /// > > But our application has many cores (a few thousands which all share same > schema/config, > is there a moe convenient way ? > > Thanks very much for helps, Lisheng >
RE: What happens in indexing request in solr cloud if Zookeepers are all dead?
Thanks very much for your clear explanation! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, July 11, 2013 1:55 PM To: solr-user@lucene.apache.org Subject: Re: What happens in indexing request in solr cloud if Zookeepers are all dead? Sorry, no updates if no Zookeepers. There would be no way to assure that any node knows the proper configuration. Queries are a little safer using most recent configuration without zookeeper, but update consistency requires accurate configuration information. -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Thursday, July 11, 2013 2:59 PM To: solr-user@lucene.apache.org Subject: RE: What happens in indexing request in solr cloud if Zookeepers are all dead? Yes, I should not have used word master/slave for solr cloud! So if all Zookeepers are dead, could indexing requests be handled properly (could solr remember the setting for indexing)? Thanks very much for helps, Lisheng -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, July 11, 2013 10:46 AM To: solr-user@lucene.apache.org Subject: Re: What happens in indexing request in solr cloud if Zookeepers are all dead? There are no masters or slaves in SolrCloud - it is fully distributed and "master-free". Leaders are temporary and can vary over time. The basic idea for quorum is to prevent "split brain" - two (or more) distinct sets of nodes (zookeeper nodes, that is) each thinking they constitute the authoritative source for access to configuration information. The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be (3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 + 1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be down. IOW, for n=2 no nodes can be down for the cluster to do updates. -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Thursday, July 11, 2013 9:28 AM To: solr-user@lucene.apache.org Subject: What happens in indexing request in solr cloud if Zookeepers are all dead? Hi, In solr cloud latest doc, it mentioned that if all Zookeepers are dead, distributed query still works because solr remembers the cluster state. How about the indexing request handling if all Zookeepers are dead, does solr needs Zookeeper to know which box is master and which is slave for indexing to work? Could solr remember master/slave relations without Zookeeper? Also doc said Zookeeper quorum needs to have a majority rule so that we must have 3 Zookeepers to handle the case one instance is crashed, what would happen if we have two instances in quorum and one instance is crashed (or quorum having 3 instances but two of them are crashed)? I felt the last one should take over? Thanks very much for helps, Lisheng
RE: What happens in indexing request in solr cloud if Zookeepers are all dead?
Yes, I should not have used word master/slave for solr cloud! So if all Zookeepers are dead, could indexing requests be handled properly (could solr remember the setting for indexing)? Thanks very much for helps, Lisheng -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, July 11, 2013 10:46 AM To: solr-user@lucene.apache.org Subject: Re: What happens in indexing request in solr cloud if Zookeepers are all dead? There are no masters or slaves in SolrCloud - it is fully distributed and "master-free". Leaders are temporary and can vary over time. The basic idea for quorum is to prevent "split brain" - two (or more) distinct sets of nodes (zookeeper nodes, that is) each thinking they constitute the authoritative source for access to configuration information. The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be (3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 + 1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be down. IOW, for n=2 no nodes can be down for the cluster to do updates. -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Thursday, July 11, 2013 9:28 AM To: solr-user@lucene.apache.org Subject: What happens in indexing request in solr cloud if Zookeepers are all dead? Hi, In solr cloud latest doc, it mentioned that if all Zookeepers are dead, distributed query still works because solr remembers the cluster state. How about the indexing request handling if all Zookeepers are dead, does solr needs Zookeeper to know which box is master and which is slave for indexing to work? Could solr remember master/slave relations without Zookeeper? Also doc said Zookeeper quorum needs to have a majority rule so that we must have 3 Zookeepers to handle the case one instance is crashed, what would happen if we have two instances in quorum and one instance is crashed (or quorum having 3 instances but two of them are crashed)? I felt the last one should take over? Thanks very much for helps, Lisheng
Solr 4.3.0 memory usage is higher than solr 3.6.1?
Hi, We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to 4.3.0), we have many cores (a few thousands). We have noticed solr 4.3.0 memory usage is much higher than solr 3.6.1 (without using solr cloud yet). With 2K cores, solr 3.6.1 is using 1.5G, but solr 4.3.0 is using close to 3G memory, when Tomcat is initially started. We used shareSchema and sharedLib, we also disabled searcher warm-up during startup. We are still debugging the issue, we would appreciate if you could provide any guidance? Thanks very much for helps, Lisheng
What happens in indexing request in solr cloud if Zookeepers are all dead?
Hi, In solr cloud latest doc, it mentioned that if all Zookeepers are dead, distributed query still works because solr remembers the cluster state. How about the indexing request handling if all Zookeepers are dead, does solr needs Zookeeper to know which box is master and which is slave for indexing to work? Could solr remember master/slave relations without Zookeeper? Also doc said Zookeeper quorum needs to have a majority rule so that we must have 3 Zookeepers to handle the case one instance is crashed, what would happen if we have two instances in quorum and one instance is crashed (or quorum having 3 instances but two of them are crashed)? I felt the last one should take over? Thanks very much for helps, Lisheng
solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper
Hi, We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to 4.3.0), in WIKI page for solrCloud in Tomcat: http://wiki.apache.org/solr/SolrCloudTomcat we need to link each collection explicitly: /// 8) Link uploaded config with target collection java -classpath .:/home/myuser/solr-war-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mycollection -confname ... /// But our application has many cores (a few thousands which all share same schema/config, is there a moe convenient way ? Thanks very much for helps, Lisheng
RE: solr 4.3: write.lock is not removed
I did more test and it seems that this is still a bug (previous issue 3/): 1/ Create a core by CURL command with dataDir=, core is created OK and later indexing worked OK also. 2/ But in solr.xml, dadaDir is not defined in element " dataDir=/data/new_collection_name In solr 3.6.1 we donot need to define schema/config because conf folder is not inside each collection. 1/ Indexing works OK but write.lock is not removed (we use "/update?commit=true..") 2/ Shutdown tomcat, I saw write.lock is gone 3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with some warning messages. It seems that in solr.xml, dataDir is not defined? Thanks very much for helps, Lisheng -Original Message----- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, May 30, 2013 10:57 AM To: solr-user@lucene.apache.org Subject: RE: solr 4.3: write.lock is not removed Hi, We just use CURL from PHP code to submit indexing request, like: /update?commit=true.. This worked well in solr 3.6.1. I saw the link you showed and really appreciate (if no other choice I will change java source code but hope there is a better way..)? Thanks very much for helps, Lisheng -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr starting time takes too long
Hi Eric, Thanks very much for helps (I should have responded sooner): 1/ My problem in 3.6 turned out to be much related to the fact I did not share schema, after using shareSchema, the start time is reduced up to 80% (to my great surprise, previously I thought burden is most in solrconfig). 2/ I just upgraded to solr 4.3, but somehow I did not see all the fixes mentioned in the WIKI (like shareConfig), I saw the resolution is "Won't fix", do you have plan to put the fix into next release? Thanks and best regards, Lisheng -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, May 22, 2013 4:57 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long Zhang: In 3.6, there's really no choice except to load all the cores on startup. 10 minutes still seems excessive, do you perhaps have a heavy-weight firstSearcher query? Yes, soft commits are 4.x only, so that's not your problem. There's a shareSchema option that tries to only load 1 copy of the schema that should help, but that doesn't help with loading solrconfig.xml. Also in the 4.3+ world there's the option to lazily-load cores, see: http://wiki.apache.org/solr/LotsOfCores for the overview. Perhaps not an option, but I thought I'd mention it. But I'm afraid you're stuck. You might be able to run bigger hardware (perhaps you're memory-starved). Other than that, you may need to use more than one machine to get fast enough startup times. Best, Erick On Wed, May 22, 2013 at 3:27 AM, Zhang, Lisheng wrote: > Thanks very much for quick helps! I searched but it seems that > autoSoftCommit is solr 4x feature and we are still using 3.6.1? > > Best regards, Lisheng > > -Original Message- > From: Carlos Bonilla [mailto:carlosbonill...@gmail.com] > Sent: Wednesday, May 22, 2013 12:17 AM > To: solr-user@lucene.apache.org > Subject: Re: solr starting time takes too long > > > Hi Lisheng, > I had the same problem when I enabled the "autoSoftCommit" in > solrconfig.xml. If you have it enabled, disabling it could fix your problem, > > Cheers. > Carlos. > > > 2013/5/22 Zhang, Lisheng > >> >> Hi, >> >> We are using solr 3.6.1, our application has many cores (more than 1K), >> the problem is that solr starting took a long time (>10m). Examing log >> file and code we found that for each core we loaded many resources, but >> in our app, we are sure we are always using the same solrconfig.xml and >> schema.xml for all cores. While we can config schema.xml to be shared, >> we cannot share SolrConfig object. But looking inside SolrConfig code, >> we donot use any of the cache. >> >> Could we somehow change config (or source code) to share resource between >> cores to reduce solr starting time? >> >> Thanks very much for helps, Lisheng >>
RE: solr 4.3: write.lock is not removed
Hi, Thanks very much for the explanation! Could we config to get to old behavior? I asked this option because our app has many small cores so that we prefer create/close writer on the fly (otherwise we may have memory issue quickly). We also do not need NRT for now. Thanks very much for helps, Lisheng -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, May 30, 2013 11:35 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed : I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that after finishing : indexing : : write.lock : : is NOT removed. Later if I index again it still works OK. Only after I shutdown Tomcat : then write.lock is removed. This behavior caused some problem like I could not use luke : to observe indexed data. IIRC, This was an intentional change. In older versions of Solr the IndexWRiter was only opened if/when updates needed to be made, but that made it impossible to safely take advantage of some internal optimizations related to NRT IndexReader reloading, so the logic was modified to always keep the IndexWriter open as lon as the SolrCore is loaded. In general, your past behavior of pointing luke at a live solr index could have also produced problems if updates came into solr while luke had the write lock active. -Hoss
RE: solr 4.3: write.lock is not removed
I did more tests and get more info: the basic setting is that we created core from PHP CURl API where we define: schema config instanceDir= dataDir=/data/new_collection_name In solr 3.6.1 we donot need to define schema/config because conf folder is not inside each collection. 1/ Indexing works OK but write.lock is not removed (we use "/update?commit=true..") 2/ Shutdown tomcat, I saw write.lock is gone 3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with some warning messages. It seems that in solr.xml, dataDir is not defined? Thanks very much for helps, Lisheng -Original Message----- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, May 30, 2013 10:57 AM To: solr-user@lucene.apache.org Subject: RE: solr 4.3: write.lock is not removed Hi, We just use CURL from PHP code to submit indexing request, like: /update?commit=true.. This worked well in solr 3.6.1. I saw the link you showed and really appreciate (if no other choice I will change java source code but hope there is a better way..)? Thanks very much for helps, Lisheng -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr 4.3: write.lock is not removed
Hi, We just use CURL from PHP code to submit indexing request, like: /update?commit=true.. This worked well in solr 3.6.1. I saw the link you showed and really appreciate (if no other choice I will change java source code but hope there is a better way..)? Thanks very much for helps, Lisheng -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 4.3: write.lock is not removed
Hi, I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that after finishing indexing write.lock is NOT removed. Later if I index again it still works OK. Only after I shutdown Tomcat then write.lock is removed. This behavior caused some problem like I could not use luke to observe indexed data. I did not see any error/warning messages. Is this the designed behavior? Can I have the old behavior (after commit write.lock is removed) through configuration? Thanks very much for helps, Lisheng
RE: solr starting time takes too long
Very sorry about hijacking existing thread (I thought it would be OK if I just change the title and content, but still wrong). It will never happen again. Lisheng -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, May 22, 2013 11:58 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long : Subject: solr starting time takes too long : In-Reply-To: <519c6cd6.90...@smartbit.be> : Thread-Topic: shard splitting https://people.apache.org/~hossman/#threadhijack -Hoss
RE: solr starting time takes too long
Thanks very much for quick helps! I searched but it seems that autoSoftCommit is solr 4x feature and we are still using 3.6.1? Best regards, Lisheng -Original Message- From: Carlos Bonilla [mailto:carlosbonill...@gmail.com] Sent: Wednesday, May 22, 2013 12:17 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long Hi Lisheng, I had the same problem when I enabled the "autoSoftCommit" in solrconfig.xml. If you have it enabled, disabling it could fix your problem, Cheers. Carlos. 2013/5/22 Zhang, Lisheng > > Hi, > > We are using solr 3.6.1, our application has many cores (more than 1K), > the problem is that solr starting took a long time (>10m). Examing log > file and code we found that for each core we loaded many resources, but > in our app, we are sure we are always using the same solrconfig.xml and > schema.xml for all cores. While we can config schema.xml to be shared, > we cannot share SolrConfig object. But looking inside SolrConfig code, > we donot use any of the cache. > > Could we somehow change config (or source code) to share resource between > cores to reduce solr starting time? > > Thanks very much for helps, Lisheng >
solr starting time takes too long
Hi, We are using solr 3.6.1, our application has many cores (more than 1K), the problem is that solr starting took a long time (>10m). Examing log file and code we found that for each core we loaded many resources, but in our app, we are sure we are always using the same solrconfig.xml and schema.xml for all cores. While we can config schema.xml to be shared, we cannot share SolrConfig object. But looking inside SolrConfig code, we donot use any of the cache. Could we somehow change config (or source code) to share resource between cores to reduce solr starting time? Thanks very much for helps, Lisheng
RE: SolrCloud leader to replica
Hi Otis and Timothy, Thanks very much for helps, sure I will test to make sure. What I mentioned before is a mere possibility, likely you are correct: the small delay may not matter in reality (yes we do use the same way to do pagination and no isse ever happened even once). Surely solr is enormously valuable to us and we really appreciate your helps! Lisheng -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, April 11, 2013 5:27 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud leader to replica Hi, I think Timothy is right about what Lisheng is really after, which is consistency. I agree with what Timothy is implying here - changes of search being inconsistent are very, very small. I'm guessing Lisheng is trying to solve a problem he doesn't actually have yet? Also, think about a non-SolrCloud solution. What happens when a user pages through results? Typically that just re-runs the same query, but with a different page offset. What happens if between page 1 and page 2 the index changes and a searcher is reopened? Same sort of problem can happen, right? Yet, in a few hundred client engagements involving Solr or ElasticSearch I don't recall this ever being an issue. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 8:13 PM, Timothy Potter wrote: > Hmmm ... I was following this discussion but then got confused when Lisheng > said to change Solr to "compromise consistency in order to increase > availability" when your concern is "how long replica is behind leader". > Seems you want more consistency vs. less in this case? One of the reasons > behind Solr's leader election approach is to achieve low-latency eventual > consistency (Mark's term from the linked to discussion). > > Un-committed docs are only visible if you use real-time get, in which case > the request is served by the shard leader (or replica) from its update log. > I suppose there's a chance of a few millis between the leader having the > request in its tlog and the replica having the doc it its tlog but that > seems like the nature of the beast. Meaning that Solr never promised to be > 100% consistent at millisecond granularity in a distributed model - any > small time-window between what a leader has and replica are probably > network latency which you should solve outside of Solr. I suspect you could > direct all your real-time get requests to leaders only using some smart > client like CloudSolrServer if it mattered that much. > > Otherwise, all other queries require the document to be committed to be > visible. I suppose there is a very small window when a new searcher is open > on the leader and the new searcher is not yet open on the replica. However, > with soft-commits, that too seems like a milli or two based on network > latency. > > @Shawn - yes, I've actually seen this work in my cluster. We lose replicas > from time-to-time and indexing keeps on trucking. > > > > > > On Thu, Apr 11, 2013 at 4:51 PM, Zhang, Lisheng < > lisheng.zh...@broadvision.com> wrote: > >> Hi Otis, >> >> Thanks very much for helps, your explanation is very clear. >> >> My main concern is not the return status for indexing calls (although >> which is >> also important), my main concern is how long replica is behind the leader >> (or >> putting in your way, how consistent search picture is to client A and B). >> >> Our application requires clients see same result whether he hits leader or >> replica, so it seems we do have a problem here. If no better solution I may >> consider to change solr4 a little (I have not read solr4x fully yet) to >> compromise >> consistency (C) in order to increase availability (A), on a high level do >> you see >> serious problems in this approach (I am familiar with lucene/solr code to >> some >> extent)? >> >> Thanks and best regards, Lisheng >> >> -Original Message- >> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] >> Sent: Thursday, April 11, 2013 2:50 PM >> To: solr-user@lucene.apache.org >> Subject: Re: SolrCloud leader to replica >> >> >> But note that I misspoke, which I realized after re-reading the thread >> I pointed you to. Mark explains it nicely there: >> * the index call returns only when (and IF!) indexing to all replicas >> succeeds >> >> BUT, that should not be mixed with what search clients see! >> Just because the indexing client sees the all or nothing situation >> depending on whether indexing was successful on all replicas does NOT >> mean that search clients will always see a 100% consis
RE: SolrCloud leader to replica
Hi Otis, Thanks very much for helps, your explanation is very clear. My main concern is not the return status for indexing calls (although which is also important), my main concern is how long replica is behind the leader (or putting in your way, how consistent search picture is to client A and B). Our application requires clients see same result whether he hits leader or replica, so it seems we do have a problem here. If no better solution I may consider to change solr4 a little (I have not read solr4x fully yet) to compromise consistency (C) in order to increase availability (A), on a high level do you see serious problems in this approach (I am familiar with lucene/solr code to some extent)? Thanks and best regards, Lisheng -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, April 11, 2013 2:50 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud leader to replica But note that I misspoke, which I realized after re-reading the thread I pointed you to. Mark explains it nicely there: * the index call returns only when (and IF!) indexing to all replicas succeeds BUT, that should not be mixed with what search clients see! Just because the indexing client sees the all or nothing situation depending on whether indexing was successful on all replicas does NOT mean that search clients will always see a 100% consistent picture. Client A could hit the leader and see a newly indexed document, while client B could query the replica and not see that same document simply because the doc hasn't gotten there yet, or because soft commit hasn't happened just yet. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 4:39 PM, Zhang, Lisheng wrote: > Thanks very much for your helps! > > -Original Message- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Thursday, April 11, 2013 1:23 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud leader to replica > > > Yes, I *think* that is the case. Some distributed systems have the > option to return success to caller only after data has been > added/indexed to N other nodes, but I think Solr doesn't have this > yet. Somebody please correct me if I'm wrong. > > See: http://search-lucene.com/?q=eventually+consistent&fc_project=Solr > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Thu, Apr 11, 2013 at 12:51 PM, Zhang, Lisheng > wrote: >> Hi Otis, >> >> Thanks very much for the quick help! We are considering to upgrade >> from solr 3.6 to 4x and use solrCloud, but we are concerned about >> performance related to replica? In this scenario it seems that the >> replica would be a few seconds beyond leader because replica would >> start indexing only afer leader finishes his? >> >> Thanks and best regards, Lisheng >> >> -Original Message- >> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] >> Sent: Thursday, April 11, 2013 8:11 AM >> To: solr-user@lucene.apache.org >> Subject: Re: SolrCloud leader to replica >> >> >> I believe it indexes locally on leader first. Otherwise one could end >> up with a situation where indexing to replica(s) succeeds and indexing >> to leader fails, which I suspect might create a mess. >> >> Otis >> -- >> Solr & ElasticSearch Support >> http://sematext.com/ >> >> >> >> >> >> On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng >> wrote: >>> Hi, >>> >>> In solr 4x solrCloud, suppose we have only one shard and >>> two replica, when leader receives the indexing request, >>> does it immediately forward request to two replicas or >>> it first indexes request itself, then sends request to its >>> two replica? >>> >>> Thanks very much for helps, Lisheng >>> >>>
RE: SolrCloud leader to replica
Thanks very much for your helps! -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, April 11, 2013 1:23 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud leader to replica Yes, I *think* that is the case. Some distributed systems have the option to return success to caller only after data has been added/indexed to N other nodes, but I think Solr doesn't have this yet. Somebody please correct me if I'm wrong. See: http://search-lucene.com/?q=eventually+consistent&fc_project=Solr Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 12:51 PM, Zhang, Lisheng wrote: > Hi Otis, > > Thanks very much for the quick help! We are considering to upgrade > from solr 3.6 to 4x and use solrCloud, but we are concerned about > performance related to replica? In this scenario it seems that the > replica would be a few seconds beyond leader because replica would > start indexing only afer leader finishes his? > > Thanks and best regards, Lisheng > > -Original Message- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Thursday, April 11, 2013 8:11 AM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud leader to replica > > > I believe it indexes locally on leader first. Otherwise one could end > up with a situation where indexing to replica(s) succeeds and indexing > to leader fails, which I suspect might create a mess. > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng > wrote: >> Hi, >> >> In solr 4x solrCloud, suppose we have only one shard and >> two replica, when leader receives the indexing request, >> does it immediately forward request to two replicas or >> it first indexes request itself, then sends request to its >> two replica? >> >> Thanks very much for helps, Lisheng >> >>
RE: SolrCloud leader to replica
Hi Otis, Thanks very much for the quick help! We are considering to upgrade from solr 3.6 to 4x and use solrCloud, but we are concerned about performance related to replica? In this scenario it seems that the replica would be a few seconds beyond leader because replica would start indexing only afer leader finishes his? Thanks and best regards, Lisheng -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, April 11, 2013 8:11 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud leader to replica I believe it indexes locally on leader first. Otherwise one could end up with a situation where indexing to replica(s) succeeds and indexing to leader fails, which I suspect might create a mess. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng wrote: > Hi, > > In solr 4x solrCloud, suppose we have only one shard and > two replica, when leader receives the indexing request, > does it immediately forward request to two replicas or > it first indexes request itself, then sends request to its > two replica? > > Thanks very much for helps, Lisheng > >
SolrCloud leader to replica
Hi, In solr 4x solrCloud, suppose we have only one shard and two replica, when leader receives the indexing request, does it immediately forward request to two replicas or it first indexes request itself, then sends request to its two replica? Thanks very much for helps, Lisheng
RE: Solr language-dependent sort
Hi, Thanks very much for quick help! In our case we mainly need to sort a field based on language defined at run time, but I understood that the principle is the same. Thanks and best regards, Lisheng -Original Message- From: Sujit Pal [mailto:sujitatgt...@gmail.com]On Behalf Of SUJIT PAL Sent: Monday, April 08, 2013 1:27 PM To: solr-user@lucene.apache.org Subject: Re: Solr language-dependent sort Hi Lisheng, We did something similar in Solr using a custom handler (but I think you could just build a custom QeryParser to do this), but you could do this in your application as well, ie, get the language and then rewrite your query to use the language specific fields. Come to think of it, the QueryParser would probably be sufficiently general to qualify as a patch for custom functionality. -sujit On Apr 8, 2013, at 12:28 PM, Zhang, Lisheng wrote: > > Hi, > > I found that in solr we need to define a special fieldType for each > language (http://wiki.apache.org/solr/UnicodeCollation), then point > a field to this type. > > But in our application one field (like 'title') can be used by various > users for their languages (user1 used for English, user2 used it for > Japanese ..), so it is even difficult for us to use dynamical field, > we would prefer to pass in a parameter like > > language = 'en' > > at run time, then solr API may use this parameter to call lucene API > to sort a field. This approach would be much more flexible (we programmed > this way when using lucene directly)? > > Thanks very much for helps, Lisheng
Solr language-dependent sort
Hi, I found that in solr we need to define a special fieldType for each language (http://wiki.apache.org/solr/UnicodeCollation), then point a field to this type. But in our application one field (like 'title') can be used by various users for their languages (user1 used for English, user2 used it for Japanese ..), so it is even difficult for us to use dynamical field, we would prefer to pass in a parameter like language = 'en' at run time, then solr API may use this parameter to call lucene API to sort a field. This approach would be much more flexible (we programmed this way when using lucene directly)? Thanks very much for helps, Lisheng
RE: Solr 3.6.1 ClassNotFound Exception
Hi Erick, Thanks! Actually this is other people's installation and I help to debug. I guess is that in solrconfig.xml, the line: (or other similar line) somehow does not work , I will try to look more. Thanks very much for helps, Lisheng -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, March 17, 2013 7:26 AM To: solr-user@lucene.apache.org Subject: Re: Solr 3.6.1 ClassNotFound Exception Hmmm, you shouldn't have to go looking for this, it should just be there. My guess is that you have some kind of classpath issue. If you have access to a machine that has never seen Solr, or a VM, I'd try installing a fresh copy of Solr. If that works, then you can be pretty sure you've changed your environment (perhaps inadvertently). Best Erick On Sat, Mar 16, 2013 at 10:11 AM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > > Hi, > > This is perhaps a trivial question but somehow I could not pin-down: > when trying to index a file (using solr 3.6.1) I got the error: > > Caused by: org.apache.solr.common.SolrException: Error loading class > 'solr.extraction.ExtractingRequestHandler' > > I know in solrconfig.xml we have defined > > /// >startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > /// > > and the jar file should be: > > /dist/apache-solr-cell-3.6.1.jar > > But above jar file only have class: > > jar tvf apache-solr-cell-3.6.1.jar | grep ExtractingRequestHandler > 5332 Tue Jul 17 12:45:40 PDT 2012 > org/apache/solr/handler/extraction/ExtractingRequestHandler.class > > Where we can find "solr.extraction.ExtractingRequestHandler" ? > > Thanks very much for helps, Lisheng >
Solr 3.6.1 ClassNotFound Exception
Hi, This is perhaps a trivial question but somehow I could not pin-down: when trying to index a file (using solr 3.6.1) I got the error: Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extraction.ExtractingRequestHandler' I know in solrconfig.xml we have defined /// /// and the jar file should be: /dist/apache-solr-cell-3.6.1.jar But above jar file only have class: jar tvf apache-solr-cell-3.6.1.jar | grep ExtractingRequestHandler 5332 Tue Jul 17 12:45:40 PDT 2012 org/apache/solr/handler/extraction/ExtractingRequestHandler.class Where we can find "solr.extraction.ExtractingRequestHandler" ? Thanks very much for helps, Lisheng
lucene merge policy in solr
Hi, In earlier lucene version it merges segements periodically according to merge policy, when it reached merge time, indexing request may take longer time to finish (in my test it may delay 10-30 seconds, depending on indexed data size). I read solr 3.6 - 4.1 doc and we have entries in solrconfig.xml to control segment merge. I am wondering if someone gives me a very high-level confirmation: in solr 3.6 - 4.1, indexing could be delayed also when big merge happens, and before merging finishes we cannot index (since collection is locked)? Thanks very much for helps, Lisheng
RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Thanks very much, it worked perfectly !! Best regards, Lisheng -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, February 08, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) (Sorry for my split message)... See the text_en_splitting field type for an example: ... -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Thanks very much for your valuable help, it worked perfectly !!! Lisheng -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, February 08, 2013 12:54 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_en_splitting field type for an example: ... -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Friday, February 08, 2013 3:51 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_ -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
RE: Solr exception when parsing XML
Hi, Thanks very much for helps! I checked solr source code, what happened is that for XML text inside one element, solr does not call URLDecoder (but to pass CTRL character, I have to call urlencode from PHP). So either I try to remove CTRL character from PHP side, or I change solr XMLReader slightly to call URLDecoder on text. Thanks and best regards, Lisheng -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, January 16, 2013 2:41 PM To: solr-user@lucene.apache.org Subject: RE: Solr exception when parsing XML In Apache Nutch we strip non-character code points with a simple method. Check the patch, the relevant part is easily ported to any language: https://issues.apache.org/jira/browse/NUTCH-1016 -Original message- > From:Zhang, Lisheng > Sent: Wed 16-Jan-2013 20:48 > To: solr-user@lucene.apache.org > Subject: RE: Solr exception when parsing XML > > Hi Alex, > > Thanks very much for helps! I switched to (I am using PHP in client side) > > createTextNode(urlencode($value)) > > so CTRL character problem is avoided, but I noticed that somehow solr did > not perform urldecode($value), so my initial value > > abc xyz > > becomes > > abc+xyz > > I have not fully read through solr code on this part, but guess maybe it > is a configuration issue (when using CDATA I donot have this issue)? > > Thanks and best regards, Lisheng > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Tuesday, January 15, 2013 12:56 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr exception when parsing XML > > > Interesting point. Looks like CDATA is more limiting than I thought: > http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the > recommendation is to avoid CDATA and automatically encode characters such > as yours, as well as less/more and ampersand. > > Regards, >Alex. >
RE: Solr exception when parsing XML
Hi Alex, Thanks very much for helps! I switched to (I am using PHP in client side) createTextNode(urlencode($value)) so CTRL character problem is avoided, but I noticed that somehow solr did not perform urldecode($value), so my initial value abc xyz becomes abc+xyz I have not fully read through solr code on this part, but guess maybe it is a configuration issue (when using CDATA I donot have this issue)? Thanks and best regards, Lisheng -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Tuesday, January 15, 2013 12:56 PM To: solr-user@lucene.apache.org Subject: Re: Solr exception when parsing XML Interesting point. Looks like CDATA is more limiting than I thought: http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the recommendation is to avoid CDATA and automatically encode characters such as yours, as well as less/more and ampersand. Regards, Alex.
Solr exception when parsing XML
Hi, I got SolrException when submitting XML for indexing (using solr 3.6.1) Jan 15, 2013 10:22:42 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, cod e 31)) at [row,col {unknown-source}]: [2,1169] at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81) Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 31)) ... at [row,col {unknown-source}]: [2,1169] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660) at com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240) at com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) I checked details, the data causing trouble is word1chr(31)word2 here both word1 and word2 are normail English characters and "chr(31)" is just the returning value of PHP function chr(31). Our XML is well constructed and encoding/charset are well defined. The problem is due to chr(31), if I replace it with another UTF-8 character, indexing is OK. I checked source code com.ctc.wstx.sr.BasicStreamReader.java, it seems that it is by design any CTRL character is not allowed inside CDATA text, but I am puzzled that how could we avoid CTRL character in text in general (sure it is not a common occurance but can still happen)? Thanks very much for helps, Lisheng
RE: theory of sets
Hi, Just thought this possibility: I think dynamic field is solr concept, on lcene level all fields are the same, but in initial startup, lucene should load all field information into memory (not field data, but schema). If we have too many fields (like *_my_fields, * => a1, a2, ...), does this take too much memory and slow down performance (even if very few fields are really used)? Best regards, Lisheng -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Monday, January 07, 2013 2:57 PM To: solr-user@lucene.apache.org Subject: Re: theory of sets Dynamic fields resulted in poor response times? How many fields did each document have? I can't see how a dynamic field should have any difference from any other field in terms of response time. Or are you querying across a large number of dynamic fields concurrently? I can imagine that slowing things down. Upayavira On Mon, Jan 7, 2013, at 05:18 PM, Uwe Reh wrote: > Hi Robi, > > thank you for the contribution. It's exiting to read, that your index > isn't contaminated by the number of fields. I can't exclude other > mistakes, but my first experience with extensive use of dynamic fields > have been very poor response times. > > Even though I found an other solution, I should give the straight > forward solution a second chance. > > Uwe > > Am 07.01.2013 17:40, schrieb Petersen, Robert: > > Hi Uwe, > > > > We have hundreds of dynamic fields but since most of our docs only use some > > of them it doesn't seem to be a performance drag. They can be viewed as a > > sparse matrix of fields in your indexed docs. Then if you make the > > sortinfo_for_groupx an int then that could be used in a function query to > > perform your sorting. See http://wiki.apache.org/solr/FunctionQuery >
RE: File content indexing
Hi Erik, I really meant to send this message earlier, I read code and tested, your suggestion solved my problem, really appreciate! Thanks very much for helps, Lisheng -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Tuesday, September 18, 2012 5:04 PM To: solr-user@lucene.apache.org Subject: Re: File content indexing Solr Cell can already do this. See the stream.file parameter and content steam info on the wiki. Erik On Sep 18, 2012, at 19:56, "Zhang, Lisheng" wrote: > Hi, > > Sorry I just sent out an unfinished message! > > Reading Solr cell, we indexing a file by first upload it through HTTP to > solr, in my > experience it is rather expensive to pass a big file through HTTP. > > If the file is local, maybe the better way is to pass file path to solr so > that solr can > use java.io API to get file content, maybe this can be much faster? > > I am thinking to change solr a little to do, do you think this is a sensible > thing to > do (I know how to do, but not sure it can improve performance significantly)? > > Thanks very much for helps, Lisheng
File content indexing
Hi, Sorry I just sent out an unfinished message! Reading Solr cell, we indexing a file by first upload it through HTTP to solr, in my experience it is rather expensive to pass a big file through HTTP. If the file is local, maybe the better way is to pass file path to solr so that solr can use java.io API to get file content, maybe this can be much faster? I am thinking to change solr a little to do, do you think this is a sensible thing to do (I know how to do, but not sure it can improve performance significantly)? Thanks very much for helps, Lisheng
File content indexing
Hi, Reading Solr cell, we indexing a file by first upload it through HTTP to solr, in my experience it is rather expensive to pass a big file through HTTP. If the file is local, maybe the better way is to pass file path to solr so that solr can use java.io API to get file content, maybe this can be much faster? I am thinking to change solr a little to do
RE: In multi-core, special dataDir is not used?
Thanks very much for your quick guidance, which is very helpful! Lisheng -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, September 17, 2012 6:30 PM To: solr-user@lucene.apache.org Subject: Re: In multi-core, special dataDir is not used? : I can't reproduce the problem you are seeing -- can you please provide : more details.. Correction: i can reproduce this. This was in fact some odd behavior in the 1.x and 3.x lines that has been changed for 4.x in SOLR-1897. If you had no in your solrconfig.xml, or if you had a *blank* then prior to 4.x the dataDir option specified when CREATEing a core would override the default -- but if you had any real path specified, then it would trump anything specified at runtime. The "workarround" i believe (but i haven't tested exhaustively) for 3.4-3.6.1 is not to specify a hardcoded dataDir in your solrconig.xml, but instead specify a property with a "default" value for the dataDir that can, and then use that property when issuing the CREATE command, ie... ${yourPropertyName:/some/default/path} ?action=CREATE&name=yourCoreName&instanceDir=yourCoreDir&property.yourPropertyName=/override/path -Hoss
In multi-core, special dataDir is not used?
Hi, I am using solr 3.6.1, I created a new core "whatever3" dynamically, and I see solr.xml updated as: ... But when I update data like http://localhost:8080/solr/whatever3/update?commit=true";, the data did not go to the newly specified dataDir (I can see core "whatver3" is apparently used from log)? Only way to make it work is NOT to define dataDir in solrconfig.xml, is this by design or I missed sth? Thanks very much for helps, Lisheng
RE: SolrCloud vs SolrReplication
Hi Erick, You mentioned that "it'll automatically use old-style replication to do the bulk synchronization" in solr cloud, so it uses HTTP for replication as in 3.6, does this mean the synchronization in solrCloud is not real time (has to have some delays)? Thanks very much for helps, Lisheng -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, September 08, 2012 1:44 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud vs SolrReplication See inline On Sat, Sep 8, 2012 at 1:09 AM, thaihai wrote: > Hi All, > > im little bit confussed about the new cloud functinalities. > > some questions: > > 1) its possible to use the old style solrreplication in solr4 (it means not > using solrcloud. not starting with zk params) ? > Yes. If you use SolrCloud (the Zookeeper options), you don't need to set up replication. But if you don't use SolrCloud it's just like it was in 3.x. > 2) in our production-environment we use solr 3.6 with solrreplication. we > have 1 index server und 2 front (slave) server. one webapp use the both > front-server for searching. another application push index-requests to the > index-server. the app have queueing. so we dont must have HA here. > if we make index (schema) changes or need to scratch and reeindex the whole > index we have do following szenario: > 1 remove replication for both front-server > 2 scratch index server > 3 reeindex index server > 4 remove front 1 server from web app (at this point webapp use only front2 > for searches) > 5 scratch front 1 > 6 enable front 1 replication > 7 test front 1 server with searches over lucene admin ui on front 1 > 8 if all correct, enable front 1 for web app > 9 done all with second slave at point 4 > > so, my problem is to do the same functionality with solr cloud ? > > supposed, i have a 2 shared with replicas cluster. how can i make a complete > re-eindex with no affects for the web app during the index process ? and i > will check the rebuild before i approve the new index to the web app. ??? > > any ideas or tips ? > > sorry for the bad english > > I'm not entirely sure about this, meaning I haven't done it personally. But I think you can do this... Let's take the simple 2-shard case, each with a leader and replica. Take one machine out of each slice (or have two other machines you can use). Make your schema changes and re-index to these non-user-facing machines. These are now a complete new index of two shards. Now point your user traffic to these new indexes (they are SolrCloud machines). Now simply scratch your old machines and bring them up in the same cluster as the two new machines, and SolrCloud will automatically 1> assign them as replicas of your two shards appropriately 2> synchronize the index (actually, it'll automatically use old-style replication to do the bulk synchronization, you don't have to configure anything). 3> route searches to the new replicas as appropriate. You really have to forget most of what you know about Solr replication when moving to the Solr Cloud world, it's all magic ... Best Erick > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-vs-SolrReplication-tp4006327.html > Sent from the Solr - User mailing list archive at Nabble.com.
RE: Bulk Indexing
Hi, Previously I asked a similar question and I have not fully implemented yet. My plan is: 1) use Solr only for search, not for indexing 2) have a separate java process to index (calling lucene API directly, maybe can call Solr API, I need to check more details). As other people pointed earlier, the problem with above plan is that Solr does not know when to reload IndexSearcher (namely underlying IndexReader) after indexing is done, since indexer and Solr are two separate processes? My plan is to let Solr not to cache any IndexReader (each time when performing search, just create a new IndexSearcher), because: 1) our app is made of many lucene indexed data folders (in Solr language, many cores), caching IndexSearcher would be too expensive. 2) in my experience, without caching search is still quite fast (this is maybe partially due to the fact our indexed data is not large, per folder). This is just my plan (not fully implemented yet). Best regards, Lisheng -Original Message- From: Sohail Aboobaker [mailto:sabooba...@gmail.com] Sent: Friday, July 27, 2012 6:56 AM To: solr-user@lucene.apache.org Subject: Bulk Indexing Hi, We have created a search service which is responsible for providing interface between Solr and rest of our application. It basically takes one document at a time and updates or adds it to appropriate index. Now, in application, we have processes, that add products (our document are based on products) in bulk using a data bulk load process. At this point, we use the same search service to add the documents in a loop. These can be up to 20,000 documents in one load. In a recent solr user discussion, it seems like this is a no-no strategy with red flags all around it. What are other alternatives? Thanks, Regards, Sohail Aboobaker.
RE: Bulk indexing data into solr
Hi, I really appreciate your quick helps! 1) I want to let solr not cache any IndexerReader (hopefully it is possible), because our app is made of many lucene folders and each of them not very large, from my previous test it seems that performance is fine if each time we just create IndexerReader. Hopefully doing this way we have no sync issue? 2) Our data is mainly in RDB (currently in mySQL and will move to Cassendra later). My main concern is that by using Solr we need to pass rather large amount of data through network layer via HTTP, which could be a problem? Best regards, Lisheng -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, July 26, 2012 12:46 PM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr IIRC about a two month ago problem with such scheme discussed here, but I can remember exact details. Scheme is generally correct. But you didn't tell how do you let solr know that it need to reread new index generation, after indexer fsync segments get. btw, it might be a possible issue: https://lucene.apache.org/core/old_versioned_docs//versions/3_0_1/api/all/org/apache/lucene/index/IndexWriter.html#commit() Note that this operation calls Directory.sync on the index files. That call should not return until the file contents & metadata are on stable storage. For FSDirectory, this calls the OS's fsync. But, beware: some hardware devices may in fact cache writes even during fsync, and return before the bits are actually on stable storage, to give the appearance of faster performance. you should ensure that after segments.get is fsync'ed, all other index files are fsynced for other processes too. Could you tell more about your data: what's the format? whether they are located relatively to indexer? And why you can't use remote streaming by Solr's upd handler or indexer client app with StreamingUpdateServer ? On Thu, Jul 26, 2012 at 10:47 PM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > Hi, > > I think at least before lucene 4.0 we can only allow one process/thread to > write on > a lucene folder. Based on this fact my initial plan is: > > 1) There is one set of lucene index folders. > 2) Solr server only perform queries in those servers > 3) Having a separate process (multi-threads) to index those lucene folders > (each >folder is a separate app). Only one thread will index one given lucene > folder. > > Thanks very much for helps, Lisheng > > > -Original Message- > From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] > Sent: Thursday, July 26, 2012 10:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Bulk indexing data into solr > > > Coming back to your original question. I'm puzzled a little. > It's not clear where you wanna call Lucene API directly from. > if you mean that you has standalone indexer, which write index files. Then > it stops and these files become available for Solr Process it will work. > Sharing index between processes, or using EmbeddedServer is looking for > problem (despite Lucene has Locks mechanism, which I'm not completely aware > of). > I can conclude that your data for indexing is collocate with the solr > server. In this case consider > http://wiki.apache.org/solr/ContentStream#RemoteStreaming > > Please give more details about your design. > > On Thu, Jul 26, 2012 at 1:22 PM, Zhang, Lisheng < > lisheng.zh...@broadvision.com> wrote: > > > > > Hi, > > > > I am starting to use solr, now I need to index a rather large amount of > > data, it seems > > that calling solr to pass data through HTTP is rather inefficient, I am > > think still call > > lucene API directly for bulk index but to use solr for search, is this > > design OK? > > > > Thanks very much for helps, Lisheng > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Tech Lead > Grid Dynamics > > <http://www.griddynamics.com> > > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com>
RE: Bulk indexing data into solr
Hi, I think at least before lucene 4.0 we can only allow one process/thread to write on a lucene folder. Based on this fact my initial plan is: 1) There is one set of lucene index folders. 2) Solr server only perform queries in those servers 3) Having a separate process (multi-threads) to index those lucene folders (each folder is a separate app). Only one thread will index one given lucene folder. Thanks very much for helps, Lisheng -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, July 26, 2012 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become available for Solr Process it will work. Sharing index between processes, or using EmbeddedServer is looking for problem (despite Lucene has Locks mechanism, which I'm not completely aware of). I can conclude that your data for indexing is collocate with the solr server. In this case consider http://wiki.apache.org/solr/ContentStream#RemoteStreaming Please give more details about your design. On Thu, Jul 26, 2012 at 1:22 PM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > > Hi, > > I am starting to use solr, now I need to index a rather large amount of > data, it seems > that calling solr to pass data through HTTP is rather inefficient, I am > think still call > lucene API directly for bulk index but to use solr for search, is this > design OK? > > Thanks very much for helps, Lisheng > > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com>
RE: Bulk indexing data into solr
Thanks very much, both your and Rafal's advice are very helpful! -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, July 26, 2012 8:47 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr On 7/26/2012 7:34 AM, Rafał Kuć wrote: > If you use Java (and I think you do, because you mention Lucene) you > should take a look at StreamingUpdateSolrServer. It not only allows > you to send data in batches, but also index using multiple threads. A caveat to what Rafał said: The streaming object has no error detection out of the box. It queues everything up internally and returns immediately. Behind the scenes, it uses multiple threads to send documents to Solr, but any errors encountered are simply sent to the logging mechanism, then ignored. When you use HttpSolrServer, all errors encountered will throw exceptions, but you have to wait for completion. If you need both concurrent capability and error detection, you would have to manage multiple indexing threads yourself. Apparently there is a method in the concurrent class that you can override and handle errors differently, though I have not seen how to write code so your program would know that an error occurred. I filed an issue with a patch to solve this, but some of the developers have come up with an idea that might be better. None of the ideas have been committed to the project. https://issues.apache.org/jira/browse/SOLR-3284 Just an FYI, the streaming class was renamed to ConcurrentUpdateSolrServer in Solr 4.0 Alpha. Both are available in 3.6.x. Thanks, Shawn
Bulk indexing data into solr
Hi, I am starting to use solr, now I need to index a rather large amount of data, it seems that calling solr to pass data through HTTP is rather inefficient, I am think still call lucene API directly for bulk index but to use solr for search, is this design OK? Thanks very much for helps, Lisheng
RE: Could I use Solr to index multiple applications?
Yury and Shashi, Thanks very much for helps! I am studying the options pointed out by you (Solr multiple cores and Elasticsearch). Best regards, Lisheng -Original Message- From: Yury Kats [mailto:yuryk...@yahoo.com] Sent: Tuesday, July 17, 2012 7:19 PM To: solr-user@lucene.apache.org Subject: Re: Could I use Solr to index multiple applications? On 7/17/2012 9:26 PM, Zhang, Lisheng wrote: > Thanks very much for quick help! Multicore sounds interesting, > I roughly read the doc, so we need to put each core name into > Solr config XML, if we add another core and change XML, do we > need to restart Solr? You can add/create cores on the fly, without restarting. See http://wiki.apache.org/solr/CoreAdmin#CREATE
RE: Could I use Solr to index multiple applications?
Thanks very much for quick help! Multicore sounds interesting, I roughly read the doc, so we need to put each core name into Solr config XML, if we add another core and change XML, do we need to restart Solr? Best regards, Lisheng -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com]On Behalf Of Shashi Kant Sent: Tuesday, July 17, 2012 5:46 PM To: solr-user@lucene.apache.org Subject: Re: Could I use Solr to index multiple applications? Look up multicore solr. Another choice could be ElasticSearch - which is more straightforward in managing multiple indexes IMO. On Tue, Jul 17, 2012 at 7:53 PM, Zhang, Lisheng wrote: > Hi, > > We have an application where we index data into many different directories > (each directory > is corresponding to a different lucene IndexSearcher). > > Looking at Solr config it seems that Solr expects there is only one indexed > data directory, > can we use Solr for our application? > > Thanks very much for helps, Lisheng >
Could I use Solr to index multiple applications?
Hi, We have an application where we index data into many different directories (each directory is corresponding to a different lucene IndexSearcher). Looking at Solr config it seems that Solr expects there is only one indexed data directory, can we use Solr for our application? Thanks very much for helps, Lisheng