Re: edismax available in solr 3.1?
is edixmax available in solr 3.1? I don't see any documentation about it. if it is, does it support the prefix and fuzzy query? Yes and yes. See snippet taken from changes.txt New Features -- * SOLR-1553: New dismax parser implementation (accessible as edismax) that supports full lucene syntax, improved reserved char escaping, fielded queries, improved proximity boosting, and improved stopword handling. Note: status is experimental for now. (yonik)
Field Cache
Hi, I have read lucene field cache is used in faceting and sorting. Is it also populated/used when only selected fields are retrieved using the 'fl' OR 'included fields in collapse' parameters? Is it also used for collapsing? -- Regards, Samarth
Whole unfiltered content in response document field
Hi, I have a question to the content of the document fields. My configuration is ok so far, I index a database with DIH and have configured a index analyser as folow: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer ... fields field name=id type=int indexed=true stored=true required=true / field name=text type=text indexed=true stored=true/ /fields On the analysis view, my filters work poperly. On the end of the filter chain I have only interest tokens. But when I search with Solr, I become as a response the whole content of the indexed databse field. The field contains stopwords, whitespaces, upercases and so on. I search for stopwords, and I can find them. I would expect, I find in the response document only the filtered content in the field and not the original raw content that I would to index. Is this a normal behaviour? Do I understand Solr right? Many thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-unfiltered-content-in-response-document-field-tp2911588p2911588.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: uima fieldMappings and solr dynamicField
I've opened https://issues.apache.org/jira/browse/SOLR-2503 . Koji -- http://www.rondhuit.com/en/ (11/05/06 20:15), Koji Sekiguchi wrote: Hello, I'd like to use dynamicField in feature-field mapping of uima update processor. It doesn't seem to be acceptable currently. Is it a bad idea in terms of use of uima? If it is not so bad, I'd like to try a patch. Background: Because my uima annotator can generate many types of named entity from a text, I don't want to implement so many types, but one type NamedEntity: typeSystemDescription types typeDescription namecom.rondhuit.uima.next.NamedEntity/name description/ supertypeNameuima.tcas.Annotation/supertypeName features featureDescription namename/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription featureDescription nameentity/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription /features /typeDescription /types /typeSystemDescription sample extracted named entities: name=PERSON, entity=Barack Obama name=TITLE, entity=the President Now, I'd like to map these named entities to Solr fields like this: PERSON_S:Barack Obama TITLE_S:the President Because the type of name (PERSON, TITLE, etc.) can be so many, I'd like to use dynamicField *_s. And where * is replaced by the name feature of NamedEntity. I think this is natural requirement from Solr view point, but I'm not sure my uima annotator implementation is correct or not. In other words, should I implement many types for each entity types? (e.g. PersonEntity, TitleEntity, ... instead of NamedEntity) Thank you! Koji
Re: Whole unfiltered content in response document field
analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer ... fields field name=id type=int indexed=true stored=true required=true / field name=text type=text indexed=true stored=true/ /fields On the analysis view, my filters work poperly. On the end of the filter chain I have only interest tokens. But when I search with Solr, I become as a response the whole content of the indexed databse field. The field contains stopwords, whitespaces, upercases and so on. I search for stopwords, and I can find them. I would expect, I find in the response document only the filtered content in the field and not the original raw content that I would to index. Is this a normal behaviour? Do I understand Solr right? On the response, solr shows raw content. So you want to see analyzed/indexed content of a document in the response? Searching and finding stop-words is not normal. May be you need to move StopFilter to under the WordDelimeter. Some punctuations may cause this.
Re: Replication Clarification Please
I did not see answers... I am not an authority, but will tell you what I think Did you get some answers? On 5/6/11 2:52 PM, Ravi Solr ravis...@gmail.com wrote: Hello, Pardon me if this has been already answered somewhere and I apologize for a lengthy post. I was wondering if anybody could help me understand Replication internals a bit more. We have a single master-slave setup (solr 1.4.1) with the configurations as shown below. Our environment is quite commit heavy (almost 100s of docs every 5 minutes), and all indexing is done on Master and all searches go to the Slave. We are seeing that the slave replication performance gradually decreases and the speed decreases 1kbps and ultimately gets backed up. Once we reload the core on slave it will be work fine for sometime and then it again gets backed up. We have mergeFactor set to 10 and ramBufferSizeMB is set to 32MB and solr itself is running with 2GB memory and locktype is simple on both master and slave. How big is your index? How many rows and GB ? Every time you replicate, there are several resets on caching. So if you are constantly Indexing, you need to be careful on how that performance impact will apply. I am hoping that the following questions might help me understand the replication performance issue better (Replication Configuration is given at the end of the email) 1. Does the Slave get the whole index every time during replication or just the delta since the last replication happened ? It depends. If you do an OPTIMIZE every time your index, then you will be sending the whole index down. If the amount of time if 10 segments, I believe that might also trigger a whole index, since you cycled all the segments. In that case I think you might want to increase the mergeFactor. 2. If there are huge number of queries being done on slave will it affect the replication ? How can I improve the performance ? (see the replications details at he bottom of the page) It seems that might be one way the you get the index.* directories. At least I see it more frequently when there is huge load and you are trying to replicate. You could replicate less frequently. 3. Will the segment names be same be same on master and slave after replication ? I see that they are different. Is this correct ? If it is correct how does the slave know what to fetch the next time i.e. the delta. Yes they better be. In the old days you could just rsync the data directory from master and slave and reload the core, that worked fine. 4. When and why does the index.TIMESTAMP folder get created ? I see this type of folder getting created only on slave and the slave instance is pointing to it. I would love to know all the conditions... I believe it is supposed to replicate to index.*, then reload to point to it. But sometimes it gets stuck in index.* land and never goes back to straight index. There are several bug fixes for this in 3.1. 5. Does replication process copy both the index and index.TIMESTAMP folder ? I believe it is supposed to copy the segment or whole index/ from master to index.* on slave. 6. what happens if the replication kicks off even before the previous invocation has not completed ? will the 2nd invocation block or will it go through causing more confusion ? That is not supposed to happen, if a replication is in process, it should not copy again until that one is complete. Try it, just delete the data/*, restart SOLR, and force a replication, while it is syncing, force it again. Does not seem to work for me. 7. If I have to prep a new master-slave combination is it OK to copy the respective contents into the new master-slave and start solr ? or do I have have to wipe the new slave and let it replicate from its new master ? If you shut down the slave, copy the data/* directory amd restart you should be fine. That is how we fix the data/ dir when there is corruption. 8. Doing an 'ls | wc -l' on index folder of master and slave gave 194 and 17968 respectively...I slave has lot of segments_xxx files. Is this normal ? Several bugs fixed in 3.1 for this one. Not a good thing You are getting leftover segments or index.* directories. MASTER requestHandler name=/replication class=solr. ReplicationHandler lst name=master str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,stopwords.txt/str str name=commitReserveDuration00:00:10/str /lst /requestHandler SLAVE requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlmaster core url/str str name=pollInterval00:03:00/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler REPLICATION DETAILS FROM PAGE Master master core url Poll Interval 00:03:00 Local Index Index Version: