Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
Yes, I did this and the Words with the Umlaute went through the Stopfilter. The ones without Umlaute were correctly removed. On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog goks...@gmail.com wrote: You can debug this with the 'Analysis' page in the Solr UI. You pick 'text_general' and then give words with umlauts in the text box for indexing and queries. Lance - Original Message - | From: Daniel Brügge daniel.brue...@googlemail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, November 7, 2012 8:45:45 AM | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters | | Hi, | | i am running a SolrCloud cluster with the 4.0.0 version. I have a | stopwords | file | which is in the correct encoding. It contains german Umlaute like | e.g. 'ü'. | I am | also running a standalone Zookeeper which contains this stopwords | file. In | my schema | i am using the stopwords file in the standard way: | | | fieldType name=text_general class=solr.TextField | positionIncrementGap=100 |analyzer type=index | tokenizer class=solr.StandardTokenizerFactory/ | filter class=solr.StopFilterFactory | ignoreCase=true | words=my_stopwords.txt | enablePositionIncrements=true / | | | When I am indexing i recognized, that all stopwords without Umlaute | are | correctly removed, but the ones with | Umlaute still exist. | | Is this a problem with ZK or Solr? | | Thanks regards | | Daniel |
Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
When I look at the text_de fieldType provided in the example schema i can see: filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.GermanLightStemFilterFactory/ I have tried with this and this removed the words with Umlaute. It seems, that is because of format=snowball. I haven't used this, because I though I had one word per line. But maybe some invisible characters got into my stopword file and destroyed it. Thanks. Daniel On Thu, Nov 8, 2012 at 10:36 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Yes, I did this and the Words with the Umlaute went through the Stopfilter. The ones without Umlaute were correctly removed. On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog goks...@gmail.com wrote: You can debug this with the 'Analysis' page in the Solr UI. You pick 'text_general' and then give words with umlauts in the text box for indexing and queries. Lance - Original Message - | From: Daniel Brügge daniel.brue...@googlemail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, November 7, 2012 8:45:45 AM | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters | | Hi, | | i am running a SolrCloud cluster with the 4.0.0 version. I have a | stopwords | file | which is in the correct encoding. It contains german Umlaute like | e.g. 'ü'. | I am | also running a standalone Zookeeper which contains this stopwords | file. In | my schema | i am using the stopwords file in the standard way: | | | fieldType name=text_general class=solr.TextField | positionIncrementGap=100 |analyzer type=index | tokenizer class=solr.StandardTokenizerFactory/ | filter class=solr.StopFilterFactory | ignoreCase=true | words=my_stopwords.txt | enablePositionIncrements=true / | | | When I am indexing i recognized, that all stopwords without Umlaute | are | correctly removed, but the ones with | Umlaute still exist. | | Is this a problem with ZK or Solr? | | Thanks regards | | Daniel |
Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
I trust the 'file' command output. And if i can read there UTF-8 Unicode I believe that this is correct. Don't know if this is the 'correct answer' for you ;) BTW: It works locally, but not with ZK. So it's maybe more a ZK issue, which somehow destroys my file. Will check. On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note: Because I can read it is not the correct answer. Ensure any of your stopwords files etc are in UTF-8. This is often different from the encoding your computer uses by default if you open a file, start typing in it, and press save.
Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
Weird, if i return the file contents in ZK with 'get' it returns me w??rde | would w??rden | would for example. So the Umlaute are not shown. Does anyone have an idea if this is because of Zookeepers cli or of the file contents itself? Thanks regards. On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: I trust the 'file' command output. And if i can read there UTF-8 Unicode I believe that this is correct. Don't know if this is the 'correct answer' for you ;) BTW: It works locally, but not with ZK. So it's maybe more a ZK issue, which somehow destroys my file. Will check. On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note: Because I can read it is not the correct answer. Ensure any of your stopwords files etc are in UTF-8. This is often different from the encoding your computer uses by default if you open a file, start typing in it, and press save.
Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
Ah, I have fixed it. It was necessary to import the files into Zookeeper using the file.encoding system property and set it to UTF-8. Then it worked. Hooray. :) e.g. java -Dfile.encoding=UTF-8 -Dbootstrap_confdir=/home/me/myconfdir -Dcollection.configName=config1 -DzkHost=zkhost:2181 -DnumShards=2 -Dsolr.solr.home=/home/me/solr -jar start.jar On Thu, Nov 8, 2012 at 2:09 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Weird, if i return the file contents in ZK with 'get' it returns me w??rde | would w??rden | would for example. So the Umlaute are not shown. Does anyone have an idea if this is because of Zookeepers cli or of the file contents itself? Thanks regards. On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: I trust the 'file' command output. And if i can read there UTF-8 Unicode I believe that this is correct. Don't know if this is the 'correct answer' for you ;) BTW: It works locally, but not with ZK. So it's maybe more a ZK issue, which somehow destroys my file. Will check. On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note: Because I can read it is not the correct answer. Ensure any of your stopwords files etc are in UTF-8. This is often different from the encoding your computer uses by default if you open a file, start typing in it, and press save.
SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. It contains german Umlaute like e.g. 'ü'. I am also running a standalone Zookeeper which contains this stopwords file. In my schema i am using the stopwords file in the standard way: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=my_stopwords.txt enablePositionIncrements=true / When I am indexing i recognized, that all stopwords without Umlaute are correctly removed, but the ones with Umlaute still exist. Is this a problem with ZK or Solr? Thanks regards Daniel
Solrcloud not reachable and after restart just a no servers hosting shard
Hi, I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow, so that it wasn't reachable. CPU load was 100%. After a restart i couldn't access the data it just telled me: no servers hosting shard Is there a way to get the data back? Thanks regards Daniel
Re: querying using filter query and lots of possible values
Hi, thanks for this hint. Will check this out. Sounds promising. Daniel On Sat, Jul 28, 2012 at 3:18 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : the list of IDs is constant for a longer time. I will take a look at : these join thematic. : Maybe another solution would be to really create a whole new : collection or set of documents containing the aggregated documents (from the : ids) from scratch and to execute queries on this collection. Then this : would take : some time, but maybe it's worth it because the querying will thank you. Another avenue to consider... http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/schema/ExternalFileField.html ...would allow you to map values in your source_id to some numeric values (many to many) and these numeric values would then be accessible in functions -- so you could use something like fq={!frange ...} to select all docs with value 67 where your extenral file field says that value 67 is mapped ot the following thousand source_id values. the external field fields can then be modified at any time just by doing a commit on your index. -Hoss
Deduplication in SolrCloud
Hi, in my old Solr Setup I have used the deduplication feature in the update chain with couple of fields. updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldsuuid,type,url,content_hash/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain This worked fine. When I now use this in my 2 shards SolrCloud setup when inserting 150.000 documents, I am always getting an error: *INFO: end_commit_flush* *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log* *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread* * at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) * I am inserting the documents via CSV import and curl command and split them also into 50k chunks. Without the dedupe chain, the import finishes after 40secs. The curl command writes to one of my shards. Do you have an idea why this happens? Should I reduce the fields to one? I have read that not using the id as dedupe fields could be an issue? I have searched for deduplication with SolrCloud and I am wondering if it is already working correctly? see e.g. http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html Thanks regards Daniel
querying using filter query and lots of possible values
Hi, i am facing the following issue: I have couple of million documents, which have a field called source_id. My problem is, that I want to retrieve all the documents which have a source_id in a specific range of values. This range can be pretty big, so for example a list of 200 to 2000 source ids. I was thinking that a filter query can be used like fq=source_id:(1 2 3 4 5 6 .) but this reminds me of SQLs WHERE IN (...) which was always bit slow for a huge number of values. Another solution that came into my mind was to assigned all the documents I want to retrieve a new kind of filter id. So all the documents which i want to analyse get a new id. But i need to update all the millions of documents for this and assign them a new id. This could take some time. Do you can think of a nicer way to solve this issue? Regards greetings Daniel
Re: querying using filter query and lots of possible values
Hey Chantal, thanks for your answer. The range queries would not work, because they are not values in a row. They can be randomly ordered with gaps. Above was just an example. Excluding is also not a solution, because the list of excluded id would be even longer. To specify it even more. The IDs are not even integers, but UUIDs. And they are tens of thousands. And the document pool contains hundreds of million documents. Thanks. Daniel On Thu, Jul 26, 2012 at 6:22 PM, Chantal Ackermann c.ackerm...@it-agenten.com wrote: Hi Daniel, index the id into a field of type tint or tlong and use a range query ( http://wiki.apache.org/solr/SolrQuerySyntax?highlight=%28rangequery%29): fq=id:[200 TO 2000] If you want to exclude certain ids it might be wiser to simply add an exclusion query in addition to the range query instead of listing all the single values. You will run into problems with too long request urls. If you cannot avoid long urls you might want to increase maxBooleanClauses (see http://wiki.apache.org/solr/SolrConfigXml/#The_Query_Section). Cheers, Chantal Am 26.07.2012 um 18:01 schrieb Daniel Brügge: Hi, i am facing the following issue: I have couple of million documents, which have a field called source_id. My problem is, that I want to retrieve all the documents which have a source_id in a specific range of values. This range can be pretty big, so for example a list of 200 to 2000 source ids. I was thinking that a filter query can be used like fq=source_id:(1 2 3 4 5 6 .) but this reminds me of SQLs WHERE IN (...) which was always bit slow for a huge number of values. Another solution that came into my mind was to assigned all the documents I want to retrieve a new kind of filter id. So all the documents which i want to analyse get a new id. But i need to update all the millions of documents for this and assign them a new id. This could take some time. Do you can think of a nicer way to solve this issue? Regards greetings Daniel
Is it possible or wise to query multiple cores in parallel in SolrCloud
Hi, I am playing around with a SolrCloud setup (4 shards) and thousands of cores. I am thinking of executing queries on hundreds of cores like a distributed query. Is this possible at all from SolrCloud side. And is this wise? Thanks regards Daniel
Re: querying using filter query and lots of possible values
Thanks Alexandre, the list of IDs is constant for a longer time. I will take a look at these join thematic. Maybe another solution would be to really create a whole new collection or set of documents containing the aggregated documents (from the ids) from scratch and to execute queries on this collection. Then this would take some time, but maybe it's worth it because the querying will thank you. Daniel On Thu, Jul 26, 2012 at 7:43 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: You can't update the original documents except by reindexing them, so no easy group assigment option. If you create this 'collection' once but query it multiple times, you may be able to use SOLR4 join with IDs being stored separately and joined on. Still not great because the performance is an issue when mapping on IDs: http://www.lucidimagination.com/blog/2012/06/20/solr-and-joins/ . If the list is some sort of combination of smaller lists - you could probably precompute (at index time) those fragments and do compound query over them. But if you have to query every time and the list is different every time, that could be complicated. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jul 26, 2012 at 12:01 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am facing the following issue: I have couple of million documents, which have a field called source_id. My problem is, that I want to retrieve all the documents which have a source_id in a specific range of values. This range can be pretty big, so for example a list of 200 to 2000 source ids. I was thinking that a filter query can be used like fq=source_id:(1 2 3 4 5 6 .) but this reminds me of SQLs WHERE IN (...) which was always bit slow for a huge number of values. Another solution that came into my mind was to assigned all the documents I want to retrieve a new kind of filter id. So all the documents which i want to analyse get a new id. But i need to update all the millions of documents for this and assign them a new id. This could take some time. Do you can think of a nicer way to solve this issue? Regards greetings Daniel
Re: querying using filter query and lots of possible values
Exactly. Creating a new index from the aggregated documents is the plan I described above. I don't really now, how long this will take for each new index. Hopefully under 1 hour or so. That would be tolerable. Thanks. Daniel On Thu, Jul 26, 2012 at 8:47 PM, Chantal Ackermann c.ackerm...@it-agenten.com wrote: Hi Daniel, depending on how you decide on the list of ids, in the first place, you could also create a new index (core) and populate it with DIH which would select only documents from your main index (core) in this range of ids. When updating you could try a delta import. Of course, this is only worth the effort if that core would exist for some time - but you've written that the subset of ids is constant for a longer time. Just another idea on top ;-) Chantal
Re: separation of indexes to optimize facet queries without fulltext
Hi Chris, thanks for the answer. the plan is that in lots of queries I just need faceted values and don't even do a fulltext search. And on the other hand I need the fulltext search for exactly one task in my application, which is search documents and returning them. Here no faceting at all is need, but only filtering with fields, which i also use for the other queries. So if 95% of the queries don't use the fulltext i thought it would make sense to split them. Your suggestion to have one main master index and several slave indexes sounds promising. Is it possible to have this replication in SolrCloud e.g with different kind of schemas etc? Thanks. Daniel On Thu, Jul 26, 2012 at 9:05 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : My thought was, that I could separate indexes. So for the facet queries : where I don't need : fulltext search (so also no indexed fulltext field) I can use a completely : new setup of a : sharded Solr which doesn't include the indexed fulltext, so the index is : kept small containing : just the few fields I have. : : And for the fulltext queries I have the current Solr configuration which : includes as mentioned : above all the fields incl. the index fulltext field. : : Is this a normal way of handling these requirements. That there are : different kind of : Solr configurations for the different needs? Because the huge redundancy It's definitley doable -- one thing i'm not clear on is why, if your faceting queries don't care about the full text, you would need to leave those small fields in your full index ... is your plan to do faceting and drill down using the smaller index, but then display docs resulting from those queries by using the same fq params when querying the full index ? if so then it should work, if not -- you may not need those fields in that index. In general there is nothing wrong with having multiple indexes to solve multiple usecases -- an index is usually an inverted denormalization of some structured source data designed for fast queries/retrieval. If there are multiple distinct ways you want to query/retrieve data that don't lend themselves to the same denormalization, there's nothing wrong with multiple denormalizations. Something else to consider is an approach i've used many times: having a single index, but using special purpose replicas. You can have a master index that you update at the rate of change, one set of slaves that are used for one type of query pattern (faceting on X, Y, and Z for example) and a differnet set of slaves that are used for a different query pattern (faceting on A, B, and C) so each set of slaves gets a higher cahce hit rate then if the queries were randomized across all machines -Hoss
separation of indexes to optimize facet queries without fulltext
Hi, I have currently one big sharded Solr setup storing couple of million documents with some 'small' fields and one fulltext field in each doc. The latter blows up the index. My thought was, that I could separate indexes. So for the facet queries where I don't need fulltext search (so also no indexed fulltext field) I can use a completely new setup of a sharded Solr which doesn't include the indexed fulltext, so the index is kept small containing just the few fields I have. And for the fulltext queries I have the current Solr configuration which includes as mentioned above all the fields incl. the index fulltext field. Is this a normal way of handling these requirements. That there are different kind of Solr configurations for the different needs? Because the huge redundancy scares me a bit. I will have the fields twice. Thanks in advance greetings Daniel
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
Will check later to use different data dirs for the core on each instance. But because each Solr sits in it's own openvz instance (virtual server respectively) they should be totally separated. At least from my point of understanding virtualization. Will check and get back here... Thanks. On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote: Thats an interesting data dir location: NativeFSLock@/home/myuser/ data/index/write.lock Where are the other data dirs located? Are you sharing one drive or something? It looks like something already has a writer lock - are you sure another solr instance is not running somehow? On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
OK, I think I have found it. I provided when starting the 4 solr instances via start.jar always the data directory property via *-Dsolr.data.dir=/home/myuser/data * After removing this it worked fine. What is weird is, that all 4 instances are totally separated, so that instance-2 should never conflict with instance-1. they could also be on totally different physical servers. Thanks. Daniel On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote: Thats an interesting data dir location: NativeFSLock@/home/myuser/ data/index/write.lock Where are the other data dirs located? Are you sharing one drive or something? It looks like something already has a writer lock - are you sure another solr instance is not running somehow? On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
Aha, OK. That was new to me. Will check this. Thanks. On Thu, Jun 14, 2012 at 3:52 PM, Yury Kats yuryk...@yahoo.com wrote: On 6/14/2012 2:05 AM, Daniel Brügge wrote: Will check later to use different data dirs for the core on each instance. But because each Solr sits in it's own openvz instance (virtual server respectively) they should be totally separated. At least from my point of understanding virtualization. Depending on how your VMs are configured, their filesystems could be mapped to the same place of the host's filesystem. What you describe sounds like this is the case.
LockObtainFailedException after trying to create cores on second SolrCloud instance
Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)* * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)* * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) * * at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) * * at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) * * at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) * * at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) * * at org.eclipse.jetty.server.Server.handle(Server.java:351)* * at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) * * at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) * * at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) * * at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) * * at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)* * at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)* * at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) * * at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) * * at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) * * at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) * * at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) * * at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) * * at org.eclipse.jetty.server.Server.handle(Server.java:351)* * at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) * * at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) * * at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) * * at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954
Re: shard distribution of multiple collections in SolrCloud
Ok, thanks a lot, good to know. BTW: The speed of creating a collections is not the fastest - at least here on this server I use (approx. second), but this is normal right? On Wed, May 23, 2012 at 9:28 PM, Mark Miller markrmil...@gmail.com wrote: Yeah, currently you have to create the core on each node...we are working on a 'collections' api that will make this a simple one call operation. We should have this soon. - Mark On May 23, 2012, at 2:36 PM, Daniel Brügge wrote: Hi, i am creating several cores using the following script. I use this for testing SolrCloud and to learn about the distribution of multiple collections. max=500 for ((i=2; i=$max; ++i )) ; do curl http://solrinstance1:8983/solr/admin/cores?action=CREATEname=collection$icollection=collection$icollection.configName=myconfig done I've setup a SolrCloud with 2 shards which are each replicated by 2 other instances I start. When I first start the installation I have the default collection1 in place which is sharded over shard1 and shard2 with 2 leader nodes and 2 nodes which replicate the leaders. When I run this script above which calls the Coreadmin on one of the shards, all the collections are created on only this shard without a replica. So e.g. collection8:{shard1:{solrinstance1:8983_solr_collection8:{ shard:shard1, leader:true, state:active, core:collection8, collection:collection8, node_name:solrinstance1:8983_solr, base_url:http://solrinstance1:8983/solr}}} I always thought, that via zookeeper these collections are sharded and replicated or do I need to call on each node the create core action? But then I need to know about these nodes, right? Thanks regards Daniel - Mark Miller lucidimagination.com
shard distribution of multiple collections in SolrCloud
Hi, i am creating several cores using the following script. I use this for testing SolrCloud and to learn about the distribution of multiple collections. max=500 for ((i=2; i=$max; ++i )) ; do curl http://solrinstance1:8983/solr/admin/cores?action=CREATEname=collection$icollection=collection$icollection.configName=myconfig done I've setup a SolrCloud with 2 shards which are each replicated by 2 other instances I start. When I first start the installation I have the default collection1 in place which is sharded over shard1 and shard2 with 2 leader nodes and 2 nodes which replicate the leaders. When I run this script above which calls the Coreadmin on one of the shards, all the collections are created on only this shard without a replica. So e.g. collection8:{shard1:{solrinstance1:8983_solr_collection8:{ shard:shard1, leader:true, state:active, core:collection8, collection:collection8, node_name:solrinstance1:8983_solr, base_url:http://solrinstance1:8983/solr}}} I always thought, that via zookeeper these collections are sharded and replicated or do I need to call on each node the create core action? But then I need to know about these nodes, right? Thanks regards Daniel
Re: CloudSolrServer not working with standalone Zookeeper
Ok, it seems that a maven dependency to zookeeper version 3.3 broke this. Now it connects to the zk instance. Thanks. On Mon, May 21, 2012 at 5:31 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Thanks for your feedback. I don't know. I've tried just now with the newest trunk version and the embedded ZK on port 9983. In the logs of the zk-solr it shows: *INFO: Accepted socket connection from /XXX.XXX.XXX.XXX:1055* *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn doIO* *WARNING: EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket* *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn closeSock* *INFO: Closed socket connection for client /XXX.XXX.XXX.XXX:1055 (no session established for client)* So it can definitely connects to the port in my opinion, but it closes the connection after the defined timeout (here 1ms) *Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper MYZKHOST.:9983 within 1 m* Hmm. I also thought that this trivial setup should work. Will check again. Daniel On Fri, May 18, 2012 at 4:23 PM, Mark Miller markrmil...@gmail.comwrote: Seems something is stopping the connection from occurring? Tests are constantly running and doing this using an embedded zk server - and I know more than a few people using an external zk setup. I'd have to guess something in your env or URL is causing this? On May 16, 2012, at 3:11 PM, Daniel Brügge wrote: OK, it's also not working with an internal started Zookeeper. On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am just playing around with SolrCloud and have read in articles like http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/thatit is sufficient to create the connection to the Zookeeper instance and not to the Solr instance. When I try to connect to my standalone Zookeeper instance (not started with a Solr instance and -DzkRun) I am getting this error: Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper I am also getting this error when I try to connect directly to one of the Solr instances. My code looks like this: solr = new CloudSolrServer(myzkhost:2181); ((CloudSolrServer) solr).setDefaultCollection(collection1); I am working with the latest Solr trunk version ( https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/) Do I need to start the zookeeper in Solr to keep this working? Thanks regards Daniel - Mark Miller lucidimagination.com
CloudSolrServer not working with standalone Zookeeper
Hi, I am just playing around with SolrCloud and have read in articles like http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that it is sufficient to create the connection to the Zookeeper instance and not to the Solr instance. When I try to connect to my standalone Zookeeper instance (not started with a Solr instance and -DzkRun) I am getting this error: Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper I am also getting this error when I try to connect directly to one of the Solr instances. My code looks like this: solr = new CloudSolrServer(myzkhost:2181); ((CloudSolrServer) solr).setDefaultCollection(collection1); I am working with the latest Solr trunk version ( https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/) Do I need to start the zookeeper in Solr to keep this working? Thanks regards Daniel
Re: CloudSolrServer not working with standalone Zookeeper
OK, it's also not working with an internal started Zookeeper. On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am just playing around with SolrCloud and have read in articles like http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that it is sufficient to create the connection to the Zookeeper instance and not to the Solr instance. When I try to connect to my standalone Zookeeper instance (not started with a Solr instance and -DzkRun) I am getting this error: Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper I am also getting this error when I try to connect directly to one of the Solr instances. My code looks like this: solr = new CloudSolrServer(myzkhost:2181); ((CloudSolrServer) solr).setDefaultCollection(collection1); I am working with the latest Solr trunk version ( https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/) Do I need to start the zookeeper in Solr to keep this working? Thanks regards Daniel
Filter facet_fields with Solr similar to stopwords
Hi, I am using a solr.StopFilterFactory in a query filter for a text_general field (here: content). It works fine, when I query the field for the stopword, then I am getting no results. But I am also doing a facet.field=content call to get the words which are used in the text. What I am trying to achieve is, to also filter the stopwords from the facet_fields, but it's not working. It would only work if the stopwords are also used during the indexing of the text_general field, right? The problem here is, that it's too much data to re-index every time I add a new stopword. My current solution is to 'filter' with code after retrieving the facet_fields from Solr. But is there a Solr-based way to do this niftier? Thanks regards Daniel
Re: solr out of memory
Maybe the index is to big and you need to add more memory to the JVM via the -Xmx parameter. See also http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors Daniel On Tue, Mar 6, 2012 at 10:01 AM, C.Yunqin 345804...@qq.com wrote: sometimes when i search a simple word ,like id: chenm the solr report eror: SEVERE: java.lang.OutOfMemoryError: Java heap space i do not know why? sometime the query goes on well. anyone have an ideal of that? thanks a lot
Re: Filter facet_fields with Solr similar to stopwords
OK, I've found this posting from 2009: http://lucene.472066.n3.nabble.com/excluding-certain-terms-from-facet-counts-when-faceting-based-on-indexed-terms-of-a-field-td501104.html But this facet.field={!terms=WORDTOEXCLUDE}content approach also only shows me the counting of the word I want to exclude. On Tue, Mar 6, 2012 at 11:33 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am using a solr.StopFilterFactory in a query filter for a text_general field (here: content). It works fine, when I query the field for the stopword, then I am getting no results. But I am also doing a facet.field=content call to get the words which are used in the text. What I am trying to achieve is, to also filter the stopwords from the facet_fields, but it's not working. It would only work if the stopwords are also used during the indexing of the text_general field, right? The problem here is, that it's too much data to re-index every time I add a new stopword. My current solution is to 'filter' with code after retrieving the facet_fields from Solr. But is there a Solr-based way to do this niftier? Thanks regards Daniel
Re: Index-Analyzer on Master with StopFilterFactory and Query-Analyzer on Slave with StopFilterFactory
OK, thanks Erick. Then I won't touch it. I was just wondering, if it would make sense. But on the other hand the schema.xml is also replicated in my setup, so maybe it's really confusing. Thanks Daniel On Tue, Jan 31, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.comwrote: I think it would be easy to get confused about what was where, resulting in hard-to-track bugs because the config file wasn't what you were expecting. I also don't understand why you think this is desirable. There might be an infinitesimal savings in memory, due to not instantiating one analysis chain, but I'm not even sure about that. The savings is so tiny that the increased risk of messing up seems far too high a price to pay. Best Erick On Mon, Jan 30, 2012 at 11:44 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am using a 'text_general' fieldType (class = solr.TextField) in my schema. And I have a master/slave setup, where I index on the master and read from the slaves. In the text_general field I am using 2 analyzers. One for indexing and one for querying with stopword-filters. What I am thinking is if it would make sense to have a different schema on the master than on the slave? So just the index-analyzer on the master's schema and the query-analyzer on the slave's schema? fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType What do you think? Thanks best regards Daniel
Index-Analyzer on Master with StopFilterFactory and Query-Analyzer on Slave with StopFilterFactory
Hi, I am using a 'text_general' fieldType (class = solr.TextField) in my schema. And I have a master/slave setup, where I index on the master and read from the slaves. In the text_general field I am using 2 analyzers. One for indexing and one for querying with stopword-filters. What I am thinking is if it would make sense to have a different schema on the master than on the slave? So just the index-analyzer on the master's schema and the query-analyzer on the slave's schema? fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType What do you think? Thanks best regards Daniel
Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?
Thanks Erick, i think i then will try to setup 6 slaves and configure them nicely. Daniel On Fri, Jan 20, 2012 at 5:38 PM, Erick Erickson erickerick...@gmail.comwrote: There will be some increase pressure on your resources when replication happens to the slaves. That said, you can also allocate resources differently between the two. For instance, you do not need any memory for the RAMBuffer on the slaves since you're not indexing. On the master, you don't need any caches to speak of (e.g. filterCache) because you're not doing any searching. And let's assume you're pushing structured documents, e.g. Word or PDF docs to Solr. Those are resource-intensive things to parse and index so wouldn't compete with search requests on the slaves. Having an index big enough that it requires sharding is almost a sure sign that trying to index and search on the same box containing the shard is going to cause trouble. Bottom line: you have many fewer options for tuning the search process. Usually people only have relatively small indexes on a single box for both searching and indexing. Best Erick On Fri, Jan 20, 2012 at 5:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Erick, yes, currently I have 6 shards, which accept writes and reads. Sometimes I delete data from all 6 and try to balance them, fill them up respectively, so they have approx. the same amount of data on it. So all 6 are 'in motion' somehow. I would like that the writing would take place more often than now, but after a write the querying slows down, so I reduce writing to every n hours. So I've thought maybe it would make sense to add 6 slave shards. But what I don't know is, if the slave-shards also suffer after a replication and the querying will take some time too. I had a master/slave setup before, but without sharding. So only one big master and one slave. And after a replication it took couple of minutes to get a proper performance. Daniel On Fri, Jan 20, 2012 at 3:05 AM, Erick Erickson erickerick...@gmail.com wrote: It's generally recommended that you do the indexing on the master and searches on the slaves. In that case, firstSearcher and newSearcher sections are irrelevant on the master and shouldn't be there. I don't understand why you would need 5 more machines, are you sharding? Best Erick On Thu, Jan 19, 2012 at 7:25 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am currently running multiple Solr instances and often write data to them. I also query them. Both works fine right now, because I don't have so many search requests. For querying I recognized that the firstSearcher and newSearcher static warming with one facet query really brings a performance boost. But the downside is, that writing now is really slow. Does it make sense at all to place firstSearcher and newSearcher on a Solr server, which get lot's of writes. Or is the best strategy to introduce some slave server, where these event listeners are integrated, but to keep them away from the master? The thing is, that I would need 6 additional Solr slaves, if I would pick this approach. :) What do you think? Thanks. Daniel
Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?
Erick, yes, currently I have 6 shards, which accept writes and reads. Sometimes I delete data from all 6 and try to balance them, fill them up respectively, so they have approx. the same amount of data on it. So all 6 are 'in motion' somehow. I would like that the writing would take place more often than now, but after a write the querying slows down, so I reduce writing to every n hours. So I've thought maybe it would make sense to add 6 slave shards. But what I don't know is, if the slave-shards also suffer after a replication and the querying will take some time too. I had a master/slave setup before, but without sharding. So only one big master and one slave. And after a replication it took couple of minutes to get a proper performance. Daniel On Fri, Jan 20, 2012 at 3:05 AM, Erick Erickson erickerick...@gmail.comwrote: It's generally recommended that you do the indexing on the master and searches on the slaves. In that case, firstSearcher and newSearcher sections are irrelevant on the master and shouldn't be there. I don't understand why you would need 5 more machines, are you sharding? Best Erick On Thu, Jan 19, 2012 at 7:25 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am currently running multiple Solr instances and often write data to them. I also query them. Both works fine right now, because I don't have so many search requests. For querying I recognized that the firstSearcher and newSearcher static warming with one facet query really brings a performance boost. But the downside is, that writing now is really slow. Does it make sense at all to place firstSearcher and newSearcher on a Solr server, which get lot's of writes. Or is the best strategy to introduce some slave server, where these event listeners are integrated, but to keep them away from the master? The thing is, that I would need 6 additional Solr slaves, if I would pick this approach. :) What do you think? Thanks. Daniel
Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?
Hi, I am currently running multiple Solr instances and often write data to them. I also query them. Both works fine right now, because I don't have so many search requests. For querying I recognized that the firstSearcher and newSearcher static warming with one facet query really brings a performance boost. But the downside is, that writing now is really slow. Does it make sense at all to place firstSearcher and newSearcher on a Solr server, which get lot's of writes. Or is the best strategy to introduce some slave server, where these event listeners are integrated, but to keep them away from the master? The thing is, that I would need 6 additional Solr slaves, if I would pick this approach. :) What do you think? Thanks. Daniel
Re: Can Apache Solr Handle TeraByte Large Data
Hi, it's definitely a problem to store 5TB in Solr without using sharding. I try to split data over solr instances, so that the index will fit in my memory on the server. I ran into trouble with a Solr using 50G index. Daniel On Jan 13, 2012, at 1:08 PM, mustafozbek wrote: I am an apache solr user about a year. I used solr for simple search tools but now I want to use solr with 5TB of data. I assume that 5TB data will be 7TB when solr index it according to filter that I use. And then I will add nearly 50MB of data per hour to the same index. 1-Are there any problem using single solr server with 5TB data. (without shards) a- Can solr server answers the queries in an acceptable time b- what is the expected time for commiting of 50MB data on 7TB index. c- Is there an upper limit for index size. 2-what are the suggestions that you offer a- How many shards should I use b- Should I use solr cores c- What is the committing frequency you offered. (is 1 hour OK) 3-are there any test results for this kind of large data There is no available 5TB data, I just want to estimate what will be the result. Note: You can assume that hardware resourses are not a problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656484.html Sent from the Solr - User mailing list archive at Nabble.com.