Does ContentStreamDataSource support delta import?
ContentStreamDataSource works fine with full-import command but i can't make it work with delta-import command, i have to use full-import and no clean instead Does ContentStreamDataSource support delta import? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-ContentStreamDataSource-support-delta-import-tp3992008.html Sent from the Solr - User mailing list archive at Nabble.com.
NGram and full word
Hi I have a question regarding the NGram filter and full word search. When I insert arkadicolson into Solr and search for arkadic, solr will find a match. When searching for arkadicols, Solr will not find a match because the maxGramSize is set to 8. However when searching for the full word arkadicolson Solr will also not match. Is there a way to also match full word in combination with NGram? Thanks! fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Dutch / filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=8/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/-- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Dutch / /analyzer /fieldType -- Smartbit bvba Hoogstraat 13 B-3670 Meeuwen T: +32 11 64 08 80 F: +32 89 46 81 10 W: http://www.smartbit.be E: ark...@smartbit.be
leaks in solr
Hi list, while monitoring my solr 3.6.1 installation I recognized an increase of memory usage in OldGen JVM heap on my slave. I decided to force Full GC from jvisualvm and send optimize to the already optimized slave index. Normally this helps because I have monitored this issue over the past. But not this time. The Full GC didn't free any memory. So I decided to take a heap dump and see what MemoryAnalyzer is showing. The heap dump is about 23 GB in size. 1.) Report Top consumers - Biggest Objects: Total: 12.3 GB org.apache.lucene.search.FieldCacheImpl : 8.1 GB class java.lang.ref.Finalizer : 2.1 GB org.apache.solr.util.ConcurrentLRUCache : 1.5 GB org.apache.lucene.index.ReadOnlySegmentReader : 622.5 MB ... As you can see, Finalizer has already reached 2.1 GB!!! * java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x37b056fd0 * segments java.util.concurrent.ConcurrentHashMap @ 0x39b02d268 * map org.apache.solr.util.ConcurrentLRUCache @ 0x398f33c30 * referent java.lang.ref.Finalizer @ 0x37affa810 * next java.lang.ref.Finalizer @ 0x37affa838 ... Seams to be org.apache.solr.util.ConcurrentLRUCache The attributes are: Type |Name | Value - boolean| isDestroyed | true - ref| cleanupThread| null ref| evictionListener | null --- long | oldestEntry | 0 -- int| acceptableWaterMark | 9500 -- ref| stats| org.apache.solr.util.ConcurrentLRUCache$Stats @ 0x37b074dc8 boolean| islive | true - boolean| newThreadForCleanup | false boolean| isCleaning | false ref| markAndSweepLock | java.util.concurrent.locks.ReentrantLock @ 0x39bf63978 - int| lowerWaterMark | 9000 - int| upperWaterMark | 1 - ref| map | java.util.concurrent.ConcurrentHashMap @ 0x39b02d268 -- 2.) While searching for open files and their references I noticed that there are references to index files which are already deleted from disk. E.g. recent index files are data/index/_2iqw.frq and data/index/_2iqx.frq. But I also see references to data/index/_2hid.frq which are quite old and are deleted way back from earlier replications. I have to analyze this a bit deeper. So far my report, I go on analyzing this huge heap dump. If you need any other info or even the heap dump, let me know. Regards Bernd
Re: Strange behaviour with default request handler
And when i search for soph, i only get Sophie in the results and not Sophia. Do you want your query q=soph to return both Sophie and Sophia? If that's the case then you can use wildcard queries. q=soph* Also you didn't provide field definition type=text. It seems that you have stemming filter in your analysis chain. You can inspect how tokens Sophie and Sophia are indexed using solr/admin/analysis.jsp page.
Re: what is precisionStep and positionIncrementGap:
For PrecisionStep, see: http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/NumericRangeQuery.html?is-external=true positionIncrementgap is for multiValued text fields, it is the space put between the last token of one entry and the first of the next. e.g. field name=mvsome stuff/field field name=mvmore things/field Assume the two were in a single document you added and assume the increment gap were 100. The token positions would be 0, 1, 101 and 102. so the phrase stuff more wouldn't match. Best Erick On Tue, Jun 26, 2012 at 1:47 AM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Hi, in the schema.xml, usually there will be fieldType definition like this: fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ the precisionStep and positionIncrementGap is not very clear to me. Could you please elaborate more on these 2? Thanks!
Re: Query Logic Question
I think you're assuming that this is Boolean logic. It's not, see: http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ Best Erick On Thu, Jun 28, 2012 at 9:27 AM, Rublex ruble...@hotmail.com wrote: Jack, Thank you the *:* solutions seems to work. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Logic-Question-tp3991689p3991881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it compulsory to define a tokenizer when defining field types in solr
Yes, it's mandatory to define at least one tokenizer (and only one tokenizer). If you need the whole input treated as one token, you can use KeywordTokenizerFactory. Best Erick On Thu, Jun 28, 2012 at 11:10 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, When defining a fieldtype is it compulsory to include a tokenizer in its definition? I have a field defined as follows without tokenizer: fieldType name=lowercase_pattern class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Using this field when i try to start up Solr it says the field is not recognised. But when i change it to the following with tokenizer included it works: fieldType name=lowercase_pattern class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Thanks.
Wildcard searches with leading and ending wildcard
Hi all, I've been searching for an answer to this everywhere but I can never find an answer that is perfect for my case, so I'll ask this myself. I'm on Solr 3.6. I'm using I use the *ReversedWildcardFilterFactory* in a field containing a telephone number. So only one word to be indexed, no phrases no strange tokens. To be more exact: filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/ I can check with Luke that two words are being indexed, one the reverse of the other. Perfect. I can run a query like this:*/ Num:*1234/* that will match docs starting with 1234 and I can run a query like this:* /Num:1234*/* that will match docs ending with 1234 but this is the question that everybody seems to be asking. Can I run in any way a query that will match records that contains the value 1234? If I write this: Num:*1234* this will match docs containing 1234 but also docs containing 4321 which is wrong. this means this query: /Num*4321*/ and this query: /Num:*1234*/ return exactly the same result. Is this the wrong approach? has anybody tried the N-gram solution to this problem? thanks very much Maurizio -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html Sent from the Solr - User mailing list archive at Nabble.com.
Replication Issue
Hi, I'm having trouble with replication on a brand new rollout of 3.6. Basically I've traced it to the slave always thinking the index it creates when it warms up is newer than what's on the master, no matter what I do... deleting the slave's index, committing or optimizing on the master, etc. I can see the replication request come in on the master, but nothing happens, presumably because of the Index Version discrepancy. The clocks of the two machines are within 3 seconds of one another, but I don't know if that's significant. Actually, I'm having trouble figuring out how Index Version is calculated at all, and before I dive into the source, I thought I'd ask here. My slave is saying Index Version 1340979968338, Generation 1, and my master says Index Version 1340052708476, Generation 83549. Anybody have any ideas? Thanks, Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com
Re: Strange spikes in query response times...any ideas where else to look?
Otis, Thanks for the response. We'll check out that tool and see how it goes. Regarding JMeter...you are exactly correct in that I was assuming 1 thread = 1 query per second. I thought we had set up some sort of throttling mechanism to ensure that...and clearly I was mistaken. By the math we are getting A LOT more qps...and in a preliminary look those spikes look like they just might be correlated to high qps. We are pursuing this line and my gut tells me this *is* the problem. Thanks for the info on the tool (which we will look at) and for the heads-up on the qps. Peter Lee ProQuest Quoting Otis Gospodnetic otis_gospodne...@yahoo.com: Peter, These could be JVM, or it could be index reopening and warmup queries, or Grab SPM for Solr - http://sematext.com/spm - in 24-48h we'll release an agent that tracks and graphs errors and timings of each Solr search component, which may reveal interesting stuff. In the mean time, look at the graph with IO as well as graph with caches. That's where I'd first look for signs. Re users/threads question - if I understand correctly, this is the problem: JMeter is set up to run 15 threads from a single test machine...but I noticed that the JMeter report is showing close to 47 queries per second. It sounds like you re equating # of threads to QPS, which isn't right. Imagine you had 10 threads and each query took 0.1 seconds (processed by a single CPU core) and the server had 10 CPU cores. That would mean that your 1 thread could run 10 queries per second utilizing just 1 CPU core. And 10 threads would utilize all 10 CPU cores and would give you 10x higher throughput - 10x10=100 QPS. So if you need to simulate just 2-5 QPS, just lower the number of threads. What that number should be depends on query complexity and hw resources (cores or IO). Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: s...@isshomefront.com s...@isshomefront.com To: solr-user@lucene.apache.org Sent: Thursday, June 28, 2012 9:20 PM Subject: RE: Strange spikes in query response times...any ideas where else to look? Michael, Thank you for responding...and for the excellent questions. 1) We have never seen this response time spike with a user-interactive search. However, in the span of about 40 minutes, which included about 82,000 queries, we only saw a handful of near-equally distributed spikes. We have tried sending queries from the admin tool while the test was running, but given those odds, I'm not surprised we've never hit on one of those few spikes we are seeing in the test results. 2) Good point and I should have mentioned this. We are using multiple methods to track these response times. a) Looking at the catalina.out file and plotting the response times recorded there (I think this is logging the QTime as seen by Solr). b) Looking at what JMeter is reporting as response times. In general, these are very close if not identical to what is being seen in the Catalina.out file. I have not run a line-by-line comparison, but putting the query response graphs next to each other shows them to be nearly (or possibly exactly) the same. Nothing looked out of the ordinary. 3) We are using multiple threads. Before your email I was looking at the results, doing some math, and double checking the reports from JMeter. I did notice that our throughput is much higher than we meant for it to be. JMeter is set up to run 15 threads from a single test machine...but I noticed that the JMeter report is showing close to 47 queries per second. We are only targeting TWO to FIVE queries per second. This is up next on our list of things to look at and how to control more effectively. We do have three separate machines set up for JMeter testing and we are investigating to see if perhaps all three of these machines are inadvertently being launched during the test at one time and overwhelming the server. This *might* be one facet of the problem. Agreed on that. Even as we investigate this last item regarding the number of users/threads, I wouldn't mind any other thoughts you or anyone else had to offer. We are checking on this user/threads issue and for the sake of anyone else you finds this discussion useful I'll note what we find. Thanks again. Peter S. Lee ProQuest Quoting Michael Ryan mr...@moreover.com: A few questions... 1) Do you only see these spikes when running JMeter? I.e., do you ever see a spike when you manually run a query? 2) How are you measuring the response time? In my experience there are three different ways to measure query speed. Usually all of them will be approximately equal, but in some situations they can be quite different, and this difference can be a clue as to where the bottleneck is: 1) The response time as seen by the end user (in this case,
Re: How do we use HTMLStripCharFilterFactory
thnks @Kiran...will do things u have suggested and hope it works...thnks again.. Rgds Rohit -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-we-use-HTMLStripCharFilterFactory-tp3991955p3992104.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trying to avoid filtering on score, as I'm told that's bad
Thanks, this worked using: qq={!func}sub(sum(geodist(pt1,30.271567,-97.741886),geodist(pt2,36.054889,-95.716187),product(1.609344, Dist)), 1000) asc sort=$qq fq={!frange u=100}$qq -- View this message in context: http://lucene.472066.n3.nabble.com/Trying-to-avoid-filtering-on-score-as-I-m-told-that-s-bad-tp3991696p3992106.html Sent from the Solr - User mailing list archive at Nabble.com.
Using custom user-defined caches to store user app data while indexing
Hi, I'm trying to implement a custom UpdateRequestProcessorFactory class that works with the XSLT Request handler for indexing. My UpdateRequestProcessorFactory has to examine some of the document fields and compare them against some regular expressions that are stored in an external MySQL database. Currently, my UpdateRequestProcessorFactory works by establishing a connection to the database and them retrieving the regular expressions for every new document that needs to be indexed. However, I would like to speed up this processing and store the regular expressions in memory. I tried to define a new user cache in solrconfig.xml (http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches). As far as I understand, these caches can be used to store any user application data. But when I implement the UpdateRequestProcessorFactory, I do not arrive to access this cache. What would be the method to read/write into a user defined sorl cache while indexing? How can I access the current SolrIndexSearcher from my code? Are there any other solutions that I should look at? Thanks! Iana
Why won't dismax create multiple DisjunctionMaxQueries when autoGeneratePhraseQueries is false?
Hi, I am trying to configure Solr for Chinese search and I've been having trouble getting the dismax query parser to behave correctly. In schema.xml, I'm using SmartChineseAnalyzer on my fulltext field with autoGeneratePhraseQueries=false. I've verified that it is correctly tokenizing Chinese words, and the query parser is in fact not generating phrase queries. But I can't figure out why dismax is only producing a single DisjunctionMaxQuery object for multiple Chinese terms, thereby producing an OR effect, which is not what I want. Here's an example of the parsed query debug output that I get for a multiple term English query: str name=rawquerystringmy friend/str str name=querystringmy friend/str str name=parsedquery +((DisjunctionMaxQuery((t_field_keywords:unified_fulltext:my)~0.01) DisjunctionMaxQuery((t_field_keywords:unified_fulltext:friend)~0.01))~2) /str str name=parsedquery_toString +(((t_field_keywords:unified_fulltext:my)~0.01 (t_field_keywords:unified_fulltext:friend)~0.01)~2) /str This is exactly what I want to happen for Chinese queries. But for a Chinese query, you can see that I only get a single DisjunctionMaxQuery object: str name=rawquerystring我的朋友/str str name=querystring我的朋友/str str name=parsedquery +DisjunctionMaxQuery(((t_field_keywords:unified_fulltext:我 t_field_keywords:unified_fulltext:的 t_field_keywords:unified_fulltext:朋友))~0.01) /str str name=parsedquery_toString +((t_field_keywords:unified_fulltext:我 t_field_keywords:unified_fulltext:的 t_field_keywords:unified_fulltext:朋友))~0.01 /str The result of this is that an increase in the number of terms increases the number of results, instead of narrowing them as it should. I feel like this is so close to working... does anybody know what I need to do to get the query parser to behave correctly? Any help would be much appreciated! Joel
Solr - query
HI , I am searching a string using wildcard and I would like to change my query from http://localhost:/solr/addrinst/select?q=1234+BAYstart=0rows=10 to http://localhost:/solr/addrinst/select?q=1234 BAYstart=0rows=10 my request hanlder is requestHandler class=solr.SearchHandler default=true name=auto lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfid name Street_Addr/str str name=flid,name,Street_Addr/str /lst /requestHandler Can some one give me a clue where I am going wrong Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-tp3992117.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it compulsory to define a tokenizer when defining field types in solr
Thanks Erick for the clarification. Cheers! On Fri, Jun 29, 2012 at 2:08 PM, Erick Erickson erickerick...@gmail.comwrote: Yes, it's mandatory to define at least one tokenizer (and only one tokenizer). If you need the whole input treated as one token, you can use KeywordTokenizerFactory. Best Erick On Thu, Jun 28, 2012 at 11:10 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, When defining a fieldtype is it compulsory to include a tokenizer in its definition? I have a field defined as follows without tokenizer: fieldType name=lowercase_pattern class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Using this field when i try to start up Solr it says the field is not recognised. But when i change it to the following with tokenizer included it works: fieldType name=lowercase_pattern class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Thanks.
Searching against stored wild cards
Hi, I Want to know if it is in any way possible for me to do this Solr: 1. Store this field in Solr index - AB-CD-EF-* 2. Do a search for AB-CD-EF-GH and return back AB-CD-EF-* Thanks.
Re: Replication Issue
Clocks on the separate machines are irrelevant, so don't worry about that bit. The index version _starts out_ as a timestamp as I understand it, but from there on when you change the index and commit it should just bump up NOT get a new timestamp. 1 it's strange that the version on the master when you committed. _Unless_ you didn't actually change the index. A commit doesn't do anything at all without some underlying change to the index, not even bump the index version I don't think. But you should be seeing the very last digits change on commit _if_ there have been underlying changes. 2 It looks like you somehow changed the index on the slave at some point. Did you update the index there sometime independent of the master? Even though when you fire up the slave for the first time, it gets a default timestamp of right now, it's changed to the version that corresponds to the master on the first replication. 3 Blowing away the index on the slave should have worked _if_ you removed the index directory. Just issuing a delete on *:* wouldn't do much. when I want to be absolutely, completely sure I've gotten rid of an index, I shut down the sever and rm -rf solr_home/data/index (you can also just rmdir -rf solr_home/data). It's important that you remove the _directory_, not just the contents of solr_home/data/index. Bottom line: I suspect something else happened in the mean-time that changed the underlying slave timestamp that got you into this situation, perhaps you directly updated the slave index sometime? Best Erick On Fri, Jun 29, 2012 at 12:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Nevermind, I realized that my master index was not tickling the index version number when a commit or optimize happened. I gave in and nuke and paved it, and now it seems fine. Is there any known reason why this would happen, so I can avoid this in the future? Thanks, Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Fri, Jun 29, 2012 at 10:42 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, I'm having trouble with replication on a brand new rollout of 3.6. Basically I've traced it to the slave always thinking the index it creates when it warms up is newer than what's on the master, no matter what I do... deleting the slave's index, committing or optimizing on the master, etc. I can see the replication request come in on the master, but nothing happens, presumably because of the Index Version discrepancy. The clocks of the two machines are within 3 seconds of one another, but I don't know if that's significant. Actually, I'm having trouble figuring out how Index Version is calculated at all, and before I dive into the source, I thought I'd ask here. My slave is saying Index Version 1340979968338, Generation 1, and my master says Index Version 1340052708476, Generation 83549. Anybody have any ideas? Thanks, Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com
Re: Searching against stored wild cards
Skip the asterisk and analyse you search terms as an ngram, maybe an edge-ngram, and then it'll match. You'd be querying for: A AB AB- AB-C AB-CD AB-CD- etc... Any of those terms would match your terms. Upayavira On Fri, Jun 29, 2012, at 06:35 PM, Kissue Kissue wrote: Hi, I Want to know if it is in any way possible for me to do this Solr: 1. Store this field in Solr index - AB-CD-EF-* 2. Do a search for AB-CD-EF-GH and return back AB-CD-EF-* Thanks.
Re: Replication Issue
Ugh, after a mess of additional flailing around, it appears I just discovered that the Replicate Now form on the Replication Admin page does not work in the text-based browser 'links'. :( Running /replication?command=fetchindex with curl did the trick. Now everything is synced up. Thanks for your reply, Erick! Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Fri, Jun 29, 2012 at 1:51 PM, Erick Erickson erickerick...@gmail.com wrote: Clocks on the separate machines are irrelevant, so don't worry about that bit. The index version _starts out_ as a timestamp as I understand it, but from there on when you change the index and commit it should just bump up NOT get a new timestamp. 1 it's strange that the version on the master when you committed. _Unless_ you didn't actually change the index. A commit doesn't do anything at all without some underlying change to the index, not even bump the index version I don't think. But you should be seeing the very last digits change on commit _if_ there have been underlying changes. 2 It looks like you somehow changed the index on the slave at some point. Did you update the index there sometime independent of the master? Even though when you fire up the slave for the first time, it gets a default timestamp of right now, it's changed to the version that corresponds to the master on the first replication. 3 Blowing away the index on the slave should have worked _if_ you removed the index directory. Just issuing a delete on *:* wouldn't do much. when I want to be absolutely, completely sure I've gotten rid of an index, I shut down the sever and rm -rf solr_home/data/index (you can also just rmdir -rf solr_home/data). It's important that you remove the _directory_, not just the contents of solr_home/data/index. Bottom line: I suspect something else happened in the mean-time that changed the underlying slave timestamp that got you into this situation, perhaps you directly updated the slave index sometime? Best Erick On Fri, Jun 29, 2012 at 12:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Nevermind, I realized that my master index was not tickling the index version number when a commit or optimize happened. I gave in and nuke and paved it, and now it seems fine. Is there any known reason why this would happen, so I can avoid this in the future? Thanks, Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Fri, Jun 29, 2012 at 10:42 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, I'm having trouble with replication on a brand new rollout of 3.6. Basically I've traced it to the slave always thinking the index it creates when it warms up is newer than what's on the master, no matter what I do... deleting the slave's index, committing or optimizing on the master, etc. I can see the replication request come in on the master, but nothing happens, presumably because of the Index Version discrepancy. The clocks of the two machines are within 3 seconds of one another, but I don't know if that's significant. Actually, I'm having trouble figuring out how Index Version is calculated at all, and before I dive into the source, I thought I'd ask here. My slave is saying Index Version 1340979968338, Generation 1, and my master says Index Version 1340052708476, Generation 83549. Anybody have any ideas? Thanks, Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com
Re: Solr - query
I think quotes are legal in URL encoding, so you might get away with just putting a + between 1234 and BAY or failing that, %20. Usually it's easier if you use a Solr client-side library to make these types of calls so URL encoding isn't your problem, but I'm not sure if that's a route that's available to you. Michael Della Bitta P.S. I think I had Thai food once near 1234 Bay. :) Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Fri, Jun 29, 2012 at 1:19 PM, gopes saraladevi.ramamoor...@gmail.com wrote: HI , I am searching a string using wildcard and I would like to change my query from http://localhost:/solr/addrinst/select?q=1234+BAYstart=0rows=10 to http://localhost:/solr/addrinst/select?q=1234 BAYstart=0rows=10 my request hanlder is requestHandler class=solr.SearchHandler default=true name=auto lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfid name Street_Addr/str str name=flid,name,Street_Addr/str /lst /requestHandler Can some one give me a clue where I am going wrong Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-tp3992117.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: NGram and full word
With the help of this list, I solved a similar issue by altering my query as follows: Before (did not return full word matches): q=searchTerm* After (returned full-word matches and wildcard searches as you would expect): q=searchTerm OR searchTerm* You can also boost the exact match by doing the following: q=searchTerm^2 OR searchTerm* Not sure if the NGram changes things or not, but it might be a starting point. Mike -Original Message- From: Arkadi Colson [mailto:ark...@smartbit.be] Sent: Friday, June 29, 2012 3:17 AM To: solr-user@lucene.apache.org Subject: NGram and full word Hi I have a question regarding the NGram filter and full word search. When I insert arkadicolson into Solr and search for arkadic, solr will find a match. When searching for arkadicols, Solr will not find a match because the maxGramSize is set to 8. However when searching for the full word arkadicolson Solr will also not match. Is there a way to also match full word in combination with NGram? Thanks! fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Dutch / filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=8/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/-- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Dutch / /analyzer /fieldType -- Smartbit bvba Hoogstraat 13 B-3670 Meeuwen T: +32 11 64 08 80 F: +32 89 46 81 10 W: http://www.smartbit.be E: ark...@smartbit.be
Re: NGram and full word
The search for the full word arkadicolson exceeds 8 characters so thats why it's not working. The fix is to add another field that will tokenize into full words. The query would look like this some_field_ngram:arkadicolson AND some_field_whole_word:arkadicolson -- View this message in context: http://lucene.472066.n3.nabble.com/NGram-and-full-word-tp3992035p3992160.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard searches with leading and ending wildcard
I think a doubled-ended wildcard essentially defeats the whole point of the reverse wildcard filter, which is to improve performance by avoiding a leading wildcard. So, if your data is such that a leading wildcard is okay, just use normal wildcards to begin with. -- Jack Krupansky -Original Message- From: maurizio1976 Sent: Friday, June 29, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: Wildcard searches with leading and ending wildcard Hi all, I've been searching for an answer to this everywhere but I can never find an answer that is perfect for my case, so I'll ask this myself. I'm on Solr 3.6. I'm using I use the *ReversedWildcardFilterFactory* in a field containing a telephone number. So only one word to be indexed, no phrases no strange tokens. To be more exact: filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/ I can check with Luke that two words are being indexed, one the reverse of the other. Perfect. I can run a query like this:*/ Num:*1234/* that will match docs starting with 1234 and I can run a query like this:* /Num:1234*/* that will match docs ending with 1234 but this is the question that everybody seems to be asking. Can I run in any way a query that will match records that contains the value 1234? If I write this: Num:*1234* this will match docs containing 1234 but also docs containing 4321 which is wrong. this means this query: /Num*4321*/ and this query: /Num:*1234*/ return exactly the same result. Is this the wrong approach? has anybody tried the N-gram solution to this problem? thanks very much Maurizio -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html Sent from the Solr - User mailing list archive at Nabble.com.